Forthcoming and Online First Articles

International Journal of Business Intelligence and Data Mining

International Journal of Business Intelligence and Data Mining (IJBIDM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Business Intelligence and Data Mining (48 papers in press)

Regular Issues

  • Analysis and Prediction of Heart Disease Aid of Various Data Mining Techniques: A Survey   Order a copy of this article
    by V. Poornima, D. Gladis 
    Abstract: In recent times, health diseases are expanding gradually because of inherited. Particularly, heart disease has turned out to be the more typical nowadays, i.e., life of individuals is at hazard. The data mining strategies specifically decision tree, Naive Byes, neural network, K-means clustering, association classification, support vector machine (SVM), fuzzy, rough set theory and orthogonal local preserving methodologies are examined on heart disease database. In this paper, we survey distinctive papers in which at least one algorithms of data mining are utilised for the forecast of heart disease. This survey comprehends the current procedures required in vulnerability prediction of heart disease for classification in data mining. Survey of pertinent data mining strategies which are included in risk prediction of heart disease gives best expectation display as hybrid approach contrasting with the single model approach.
    Keywords: Data mining; Heart Disease Prediction; performance measure; Fuzzy; and clustering.
    DOI: 10.1504/IJBIDM.2018.10014620
  • A predictive model of electricity quality indicator in distribution subsidiaries   Order a copy of this article
    by Ana Flávia L. Gonçalves, Rafael Frinhani, Bruno G. Batista, Rafael P. Pagan, Edvard M. De Oliveira, Bruno T. Kuehne, João Paulo R. R. Leite, João Víctor De M. S. Gomes 
    Abstract: Electricity concessionaires give off high financial amounts annually in repairs to consumers that experience service unavailability. Availability of the energy supply is a major challenge because the distribution infrastructure is constantly affected by climatic, environmental, and social causes. To assist decision making in mitigating grid failures, this study aims to predict the number of incidences of electricity shortage for consumers. A predictive model was developed using predictive data analysis and conforms to a knowledge discovery process. A hybrid classifier was developed from the model, using both unsupervised and supervised methods. The experiments were carried out with real incidence and climatic data from four subsidiaries of an energy concessionaire. The results show the forecasting model’s feasibility, which presented classification accuracy between 58.33% to 91.66%. The results show that peculiarities in terms of geographic location, energy demand, and climatic conditions make it difficult to use a generic prediction model.
    Keywords: electric quality indicator; predictive data analysis; machine learning; unsupervised methods; supervised methods; knowledge discovery in data.
    DOI: 10.1504/IJBIDM.2022.10041550
    by Alok Khode, Sagar Jambhorkar 
    Abstract: Patents are critical intellectual assets for any business. With the rapid increase in the patent filings, patent prior art retrieval has become an important task. The goal of the prior art retrieval is to find documents relevant to a patent application. Due to special nature of the patent documents, only relying on the keyword-based queries do not prove effective in patent retrieval. Previous work have used international patent classification (IPC) to improve the effectiveness of keyword-based search. However, these systems have used two-stage retrieval process using IPC mostly to filter patent documents or to re-rank the documents retrieved by keyword-based query. In the approach proposed in this paper, weighted IPC code hierarchies have been explored to augment keyword-based search, thereby eliminating the use of an additional processing step. Experiments on the CLEF-IP 2011 benchmark dataset show that the proposed approach outperforms the baseline on the MAP, Recall and PRES.
    Keywords: patent retrieval; prior art search; international patent classification; IPC; query formulation; query expansion; information retrieval; IPC hierarchy; weighted IPC.
    DOI: 10.1504/IJBIDM.2022.10041582
  • Optimizing the number of course sections given optimal course sequence to support student retention   Order a copy of this article
    by Akash Gupta, Amir Gharehgozli, Seung-Kuk Paik 
    Abstract: Although higher education institutions strive to create the environments that foster student retention, many students depart before graduation. Therefore, it is paramount to understand important factors that derive students retention. We observed that student retention is tied to the student grade point average (GPA) and, subsequently, the GPA is co-related to the order in which student enroll in courses. In this study, initially using statistical methods, we determine the best order of taking core courses. Then, we develop a prescriptive model using a mixed-integer linear programming. This model determines the optimal number of sections to be offered for each course so that maximum students can follow the optimal course order in a resource constrained environment. We also propose heuristic subroutines to solve the proposed model and determine the optimal number of sections for each course. In addition, we highlight the social and demographics factors that influence student retention. This study helps college administration to plan courses so that student retention can be improved.
    Keywords: education; student success; data analytics; retention; course sequence.
    DOI: 10.1504/IJBIDM.2022.10042240
  • Exploring Appropriate ERP Framework towards Indian Small and Medium Enterprises using Decision Tree   Order a copy of this article
    by Aveek Basu, Sraboni Dutta, Sanchita Ghosh 
    Abstract: The small and medium enterprises (SMEs) enhance the outcome of the various business processes through implementing ERP framework. However, they are in muddle while selecting the appropriate ERP as on premise solution entails a large upfront capital expense, which ultimately raises a question in the sustenance of these small firms especially in this pandemic situation. Cloud-based ERP system can reduce the risk to a certain level due to their low infrastructure cost and flexible payment options but has its own constraints. Thus, selection of the appropriate ERP is always a challenge, which motivates the current researchers to explore a decision tree-based technique to predict the most suitable framework that needs to be adopted by a SME in a specific situation. The inferences drawn from the decision tree clearly shows the efficacy of the implemented technique as the right decision can be derived easily by traversing the tree.
    Keywords: enterprise resource planning; ERP; Cloud ERP; on premise ERP; hybrid ERP; small and medium enterprise; SME; decision tree.
    DOI: 10.1504/IJBIDM.2022.10042760
  • A cluster and label approach for classifying imbalanced data streams in the presence of scarcely labeled data   Order a copy of this article
    by Kiran Bhowmick, Meera Narvekar 
    Abstract: Classifying imbalanced data streams is often a challenging task primarily due to the continuous flow of infinite data and due to the unavailability of class labels. The problem is two-fold when the stream is imbalanced in nature. Due to the characteristics of data streams, it is impossible to store and process the data and deal with imbalance. There is a need to provide a solution that can consider the unavailability of class labels and classify the imbalanced data streams. This paper proposes a semi-supervised learning (SSL)-based model to classify scarcely labelled imbalanced data streams. A modified cluster and label SSL approach that uses expectation maximisation for clustering and similarity-based label propagation for labelling the unlabelled clusters is proposed. The model also employs a novel imbalance sensitive cluster merge technique to deal with the imbalance data. The results prove that the model outperforms standard stream classification algorithms.
    Keywords: data streams; classification; imbalanced data; semi-supervised learning; scarcely labelled; cluster and label; micro cluster; label propagation.
    DOI: 10.1504/IJBIDM.2022.10042780
  • Application of a Record Linkage Software to Identify Mortality of Enrollees of Large Integrated Health Care Organization   Order a copy of this article
    by Yichen Zhou, Zhi Liang, Sungching Glenn, Wansu Chen, Fagen Xie 
    Abstract: Information on mortality is important for the improvement of public health and the conduct of medical research. Health care organisations typically lack complete and accurate information on mortality. This paper proposes a comprehensive process to link the records of the enrollees of a health care organisation with the death records of 2015 obtained from the California State via a commercial data linkage software. The developed linkage process has successfully identified 23,628 and 21,009 death records of health plan enrollees from the State file after the initial and second post-linkage, respectively. Validation of the linkage process against the deaths records documented in the internal systems of the organisation achieved a sensitivity of 97.5% and a positive predictive value of 88.7% at the time of initial linkage but increased to 99.4% in three years using more information available later. The linkage process demonstrated high accuracy and can be utilised to support various business needs.
    Keywords: data cleaning; data standardisation; data matching; mortality linkage.
    DOI: 10.1504/IJBIDM.2022.10042864
  • Exploring Outliers in Global Economic Dataset having the Impact of Covid-19 Pandemic   Order a copy of this article
    by Anindita Desarkar, Ajanta Das, Chitrita Chaudhuri 
    Abstract: Outlier is a value that lies outside most of the other values in a dataset. Outlier exploration has a huge importance in almost all the industry applications like medical diagnosis, credit card fraudulence and intrusion detection systems. Similarly, in economic domain, it can be applied to analyse many unexpected events to harvest new knowledge like sudden crash of stock market, mismatch between country’s per capita incomes and overall development, abrupt change in unemployment rate and steep falling of bank interest. These situations can arise due to several reasons, out of which the present covid-19 pandemic is a leading one. This motivates the present researchers to identify a few such vulnerable areas in the economic sphere and ferret out the most affected countries for each of them. Two well-known machine-learning techniques DBSCAN and Z-score are utilised to get these insights, which can serve as a guideline towards improving the overall scenario subsequently.
    Keywords: economic outlier; machine learning; gross domestic product; GDP; per capita; human development index; HDI; covid-19 pandemic; total death percentage.
    DOI: 10.1504/IJBIDM.2022.10043040
    by Kellen Endler, Cassius Tadeu Scarpin, Maria Teresinha Arns Steiner, Tamires Almeida Sfeir, Claudimar Pereira Da Veiga 
    Abstract: The purpose of this article is to present a methodology based on the extraction process of knowledge discovery in databases (KDD) to predict the expenditure of different customer profiles, considering their characteristics, and the type of store they would buy from, in one of the largest retail chains in the Brazilian supermarket and hypermarket segment. These stores have different characteristics, such as physical size, product assortment and customer profile. This heterogeneity in terms of commercial offers implies a desire for consumption by customers that differs from store to store, depending on how their preferences are met. The proposed methodology was applied to a real marketing case based in a business-to-consumer (B2C) environment to aid retailers during the segmentation process. The results show that it is possible to highlight relationships between the data that enabled the prediction of customers’ consumption, which can contribute towards generating useful information to retail businesses.
    Keywords: knowledge discovery in databases; KDD; data mining; market segmentation; retail; principal component analysis; PCA; cluster analysis; multiple linear regression.
    DOI: 10.1504/IJBIDM.2022.10043148
  • Performance Evaluation of Oversampling Algorithm: MAHAKIL using Ensemble Classifiers   Order a copy of this article
    by C. Arun, C. Lakshmi 
    Abstract: Class imbalance is a known problem that exist in real-world applications, which consists of disparity in the existence of samples count of different classes, which results in biased performance. Class imbalance issue has been catered by many sampling techniques which may either fall into an oversampling approach that solves issues to a greater extent or under sampling. MAHAKIL is a diversity-based oversampling approach influenced by the theory of inheritance, in which minority samples are synthesised in view of balancing the class using Mahalanobis distance measure. In this study the performance of MAHAKIL algorithm has been tested using various ensemble classifiers which are proved to be effective due to its multi hypothesis learning approach and better performance. The results of the experiment conducted on 20 imbalanced software defect prediction datasets using six different ensemble approaches showcase XGBoost provides better performance and reduced false alarm rate compared to other models.
    Keywords: class imbalance; software fault prediction; synthetic samples; over sampling techniques; MAHAKIL; false alarm rate; evolutionary algorithm; ensemble; inheritance.
    DOI: 10.1504/IJBIDM.2022.10043149
  • Machine learning based forecasting of significant daily returns in foreign exchange markets   Order a copy of this article
    by Firuz Kamalov, Ikhlaas Gurrib 
    Abstract: Financial forecasting has always attracted an enormous amount of interest among researchers in quantitative analysis. The advent of modern machine learning models has introduced new tools to tackle this classical problem. In this paper, we apply machine learning algorithms to a hitherto unexplored question of forecasting instances of significant fluctuations in currency exchange rates. We carry out an extensive comparative study of ten modern machine learning methods. In our experiments, we use data on four major currency pairs over a 20-year period. A key contribution is the novel use of outlier detection methods for this purpose. Numerical experiments show that outlier detection methods substantially outperform traditional machine learning and finance techniques. In addition, we show that a recently proposed new outlier detection method PKDE produces the best overall results. Our findings hold across different currency pairs, significance levels, and time horizons indicating the robustness of the proposed method.
    Keywords: foreign exchange; forecasting; machine learning; outlier detection; kernel density estimation; KDE; neural networks; tail events.
    DOI: 10.1504/IJBIDM.2022.10043208
  • Using unstructured logs generated in complex large scale micro-service-based architecture for data analysis   Order a copy of this article
    by Anukampa Behera, Sitesh Behera, Chhabi Rani Panigrahi, Tien-Hsiung Weng 
    Abstract: With deployments of complicated or complex large scale micro-service architectures the kind of data generated from all those systems makes a typical production infrastructure huge, complicated and difficult to manage. In this scenario, logs play a major role and can be considered as an important source of information in a large scale secured environment. Till date many researchers have contributed various methods towards conversion of unstructured logs to structured ones. However post conversion the dimension of the dataset generated increases many folds which are too complex for data analysis. In this paper, we have discussed techniques and methods to deal with extraction of all features from a produced structured log, reducing N-dimensional features to fixed dimensions without compromising the quality of data in a cost-efficient manner that can be used for any further machine learning-based analysis.
    Keywords: json data; micro services; data parsing; principal component analysis; PCA; multivariate data; unstructured data; tagged data; feature reduction.
    DOI: 10.1504/IJBIDM.2022.10043252
  • Approaches to Parallelize Eclat algorithm and Analyzing its Performance for K Length Prefix based Equivalence Classes   Order a copy of this article
    by C.G. Anupama, C. Lakshmi 
    Abstract: Frequent item set mining (FIM), being one of the prevalent, well-known method of data mining and topic of interest for the researchers in the field of decision making. With the establishment of the period of big data where the data is continuously generated from multidimensional sources with enormous volume, variety in an almost unrevealed way, transforming this data into a valuable knowledge discovery which can add value to the organisations to make an efficient decision making places a challenge in the present research. This leads to the problem of discovery of the maximum frequent patterns in vast datasets and to create a more generalised and interpretable representation of veracity. Targeting the problems stated above, this paper suggests a parallelisation method suitable for any type of parallel environment. The implemented algorithm can be run on a single computer with multi-core processor as well as on a cluster of such machines.
    Keywords: item set mining; frequent items; frequent patterns; Eclat; parallel Eclat; frequent item set mining; FIM.
    DOI: 10.1504/IJBIDM.2022.10043400
  • Mining Models for Predicting Product Quality Properties of Petroleum Products   Order a copy of this article
    by NG`AMBILANI ZULU, Douglas Kunda 
    Abstract: There is a huge generation of raw data during production processes of refinery products and in most cases this data remains under-utilised for knowledge acquisition and decision making. The purpose of this study was to demonstrate how data mining techniques can be used to develop models to predict product quality properties for petroleum products. This study used petroleum refinery production raw data to build predicting models for product quality control activities. The plant and laboratory data for the period of about 18 months was mined from the refinery repositories in order to build the datasets required for analysis using Orange3 data mining software. Four data mining algorithms were chosen for experiments in order to determine the best predicting model using cross-validation technique as a validation method. This study only employed two measuring metrics, classification accuracy (CA) and root mean square error (RMSE) as performance indicators. Random forest came out as the best performing model suitable for predicting both categorical (CA) and numeric data (RMSE). The study was also able to establish the relationship between the variables that could be used in critical operational decisions.
    Keywords: data mining; machine learning; industries; petroleum refinery; product quality; parameter optimisation.
    DOI: 10.1504/IJBIDM.2023.10043436
  • Fraud Detection with Machine Learning - Model Comparison   Order a copy of this article
    by João Carlos Pacheco Junior, João Luiz Chela, Guilherme Ferreira Pelucio Salome 
    Abstract: This work evaluates the performance of different models for predicting three types of fraudulent behaviour in a novel dataset with imbalanced data. The logistic regression model, a staple in the credit risk industry, is compared to several machine learning models. This work shows that in the binary classification case, all compared models achieved similar results to the logistic regression. The random forest model showed superior performance when classifying credit frauds ending in lawsuits. In the multi-label classification case, the logistic regression attains high levels of precision for all types of fraud, but at lower recall rates. Whereas, the random forest model achieves higher recall rates, but with lower precision rates.
    Keywords: fraud detection; machine learning; imbalanced data; multi-label classification.
    DOI: 10.1504/IJBIDM.2023.10044239
    by Goutam Mylavarapu, K. Ashwin Viswanathan, Johnson P. Thomas 
    Abstract: Data analysis is a crucial process in the field of data science that extracts useful information from any form of data. With the rapid growth of technology, more and more unstructured data, such as text and images, are being produced in large amounts. Apart from the analytical techniques used, the quality of the data plays a prominent role in the accurate analysis Data quality becomes inferior to poor maintenance and mediocre data generation strategies employed by amateur users. This problem escalates with the advent of big data. In this paper, we propose a quality assessment model for the textual form of unstructured data (TDQA). The context of data plays an important role in determining the quality of the data. Therefore, we automate the process of context extraction in textual data using natural language processing to identify data errors and assess quality.
    Keywords: automated data quality assessment; textual data; context-aware; data context; sentiment analysis; lexicon; Doc2Vec; data accuracy; data consistency.
    DOI: 10.1504/IJBIDM.2023.10044353
  • A deep regression convolutional neural network using whole image-based inferencing for dynamic visual crowd estimation   Order a copy of this article
    by Shen Khang Teoh, Vooi Voon Yap, Humaira Nisar 
    Abstract: As intelligent surveillance system applications become ubiquitous, automated crowd counting solutions must be made continually faster and accurate. This paper presents an improved convolutional neural network (CNN) architecture for accurate visual crowd counting in crowd images. Multi-column convolutional neural network (MCNN) is widely used in previous works to predict the density map for visual crowd counting. However, this method has limitations in predicting a quality density map. Instead, the proposed model is architected using powerful CNN layers, dense layers, and one regressor node with whole image-based inference. Therefore, it is less computationally intensive and inference speed can be increased. Tested on the mall dataset, the proposed model achieved 2.01 mean absolute error and 8.53 mean square error. Moreover, benchmarking on different CNN architectures has been conducted. The proposed model shows promising counting accuracy and reasonable inference speed against the existing state-of-art approaches.
    Keywords: visual crowd counting; convolutional neural network; CNN; whole image-based inference; edge embedded platform; multi-column convolutional neural network; MCNN.
    DOI: 10.1504/IJBIDM.2022.10044713
  • Factors That Drive the Selection of Business Intelligence Tools in South African Financial Services Providers   Order a copy of this article
    by Bonginkosi P. Gina, Adheesh Budree 
    Abstract: Innovation and technology advancements in information systems (IS) have resulted in a multitude of product offerings and business intelligence (BI) software tools in the market to implement business intelligence systems (BIS). As a result, a high proportion of organisations fail to employ suitable software tools meeting organisational needs. The study aimed to discover critical factors influencing the selection of BI tools. This was a quantitative study and questionnaire-surveyed data was collected from 92 participants. The data was analysed by employing SPSS and SmartPLS-3 software’s to test the significance of influential factors. The findings showed that software tool technical factors, vendor technical factors, and opinion non-technical factors are significant. The study contributes to both academia and industry by providing influential determinants for software tool selection. It is hoped that the findings presented will contribute to a greater understanding of factors influencing the selection of BI tools to researchers and practitioners alike.
    Keywords: business intelligence tools; BITs; business intelligence systems; BIS; business intelligence; BI; software factors; software selection; software tool.
    DOI: 10.1504/IJBIDM.2023.10044714
  • Effect of IT Integration on Firm performance: The Mediating Role of Supply Chain Integration and Flexibility   Order a copy of this article
    by Gaurav Abhishek Tigga, Ganesan Kannabiran, P. Sridevi 
    Abstract: IT integration complements the functional and operational processes, as well as helps the firm in the development of inimitable competitive advantage. The study examines the effect of ITI on supply chain integration, supplier flexibility and manufacturing flexibility; and their subsequent effects on firm performance. The extended resource-based view has been used as the theoretical perspective to develop the research model. A survey was carried out among the manufacturing industries in India. Structural equation modelling with the partial least squares algorithm was used to analyse the hypotheses proposed in the study. The results reported that ITI has a significant effect on SCI, manufacturing flexibility and SF and subsequently affects FP.
    Keywords: IT integration; supply chain integration; supplier flexibility; manufacturing flexibility; firm performance.
    DOI: 10.1504/IJBIDM.2022.10044810
  • Credit Card Fraud Detection: An Evaluation of SMOTE Resampling and Machine Learning Model Performance   Order a copy of this article
    by Faleh Alshameri, Ran Xia 
    Abstract: Credit card fraud has been a noted security issue that requires financial organisations to continuously improve their fraud detection system. In most cases, a credit transaction dataset is expected to have a significantly larger number of normal transactions than fraud transactions. Therefore, the accuracy of a fraud detection system depends on building a model that can adequately handle such an imbalanced dataset. The purpose of this paper is to explore one of the techniques of dataset rebalancing, the synthetic minority oversampling technique (SMOTE). To evaluate the effects of this technique on model training, we selected four basic classification algorithms, complement naïve Bayes (CNB), K-nearest neighbour (KNN), random forest and support vector machine (SVM). We then compared the performances of the four models trained on the rebalanced and original dataset using the area under precision-recall curve (AUPRC) plots.
    Keywords: credit card; imbalanced dataset; resampling method; synthetic minority oversampling technique; SMOTE; AUPRC; classification algorithms.
    DOI: 10.1504/IJBIDM.2023.10044811
  • On Prevention of Attribute Disclosure and Identity Disclosure Against Insider Attack in Collaborative Social Network Data Publishing   Order a copy of this article
    by Bintu Kadhiwala, Sankita Patel 
    Abstract: In collaborative social network data publishing setup, privacy preservation of individuals is a vital issue. Existing privacy-preserving techniques assume the existence of attackers from external data recipients and hence, are vulnerable to insider attack performed by colluding data providers. Additionally, these techniques protect data against identity disclosure but not against attribute disclosure. To overcome these limitations, in this paper, we address the problem of privacy-preserving data publishing for collaborative social network. Our motive is to prevent both attribute and identity disclosure of collaborative social network data against insider attack. For the purpose, we propose an approach that utilises p-sensitive k-anonymity and m-privacy techniques. Experimental outcomes affirm that our approach preserves privacy with a reasonable increase in information loss and maintains an adequate utility of collaborative social network data.
    Keywords: collaborative social network data publishing; attribute disclosure; identity disclosure; insider attack; k-anonymity; m-privacy.
    DOI: 10.1504/IJBIDM.2023.10045007
  • Identification of Authorship and Prevention Fraudulent Transactions / Cybercrime using Efficient High Performance Machine Learning Techniques   Order a copy of this article
    by Sowmya BJ, Hanumantharaju R, Pradeep Kumar D, Srinivasa K. G 
    Abstract: Cognitive computing refers to the usage of computer models to simulate human intelligence and thought process in a complex situation. Artificial intelligence (AI) is an augmentation to the limits of human capacity for a particular domain and works as an absolute reflection of reality; where a computer program is able to efficiently make decisions without previous explicit knowledge and instruction. The concept of cognitive intelligence was introduced. The most interesting use case for this would be an AI bot that doubles as a digital assistant. This is aimed at solving core problems in AI like open domain question answering, context understanding, aspect-based sentiment analysis, text generation, etc. The work presents a model to develop a multi-resolution RNN to identify local and global context, develop contextual embedding via transformers to pass into a seq2seq architecture and add heavy regularisation and augment data with reinforcement learning, and optimise via recursive neural networks.
    Keywords: cognitive computing; artificial intelligence; AI; data augmentation; human intelligence; recurrent neural network; transformer model.
    DOI: 10.1504/IJBIDM.2022.10045310
  • Forecasting With Information Extracted From The Residuals of ARIMA In Financial Time Series Using Continuous Wavelet Transform   Order a copy of this article
    by Heng Yew Lee, Woan Lin Beh, Kong Hoong Lem 
    Abstract: Time series of financial or economic data are often considered to have certain trends and patterns. It is believed that the study of historical patterns helps in the forecasting into the future. ARIMA model is one of the popular models for the task. However, long-term forecasting with ARIMA often appears as a straight line. This is due to ARIMA’s dependency on previous values and its tendency to omit the outliers that lie outside of the captured general trend. This paper sought to capture useful outlier information from the residual of ARIMA modelling by using continuous wavelet transform (CWT). The CWT captured information was then added to the ARIMA forecasted values to form non-homogenous long-term forecasting. The final results were encouraging. It was also found that choices of certain CWT related parameters have positive or negative effect to the forecasting outcomes.
    Keywords: wavelet; forecasting; autoregressive integrated moving average; ARIMA; time series; continuous wavelet transform; CWT.
    DOI: 10.1504/IJBIDM.2022.10045646
  • DAMIAN -Data Accrual Machine Intelligence with Augmented Networks for Contextually Coherent Creative Story Generation   Order a copy of this article
    by Sowmya BJ, Pradeep Kumar D, Hanumantharaju R, Srinivasa K. G 
    Abstract: Cognitive computing refers to the usage of computer models to simulate human intelligence and thought process in a complex situation. Artificial intelligence (AI) is an augmentation to the limits of human capacity for a particular domain and works as an absolute reflection of reality; where a computer program is able to efficiently make decisions without previous explicit knowledge and instruction. The concept of cognitive intelligence was introduced. The most interesting use case for this would be an AI bot that doubles as a digital assistant. This is aimed at solving core problems in AI like open domain question answering, context understanding, aspect-based sentiment analysis, text generation, etc. The work presents a model to develop a multi-resolution RNN to identify local and global context, develop contextual embedding via transformers to pass into a seq2seq architecture and add heavy regularisation and augment data with reinforcement learning, and optimise via recursive neural networks.
    Keywords: cognitive computing; artificial intelligence; AI; data augmentation; human intelligence; recurrent neural network; transformer model.
    DOI: 10.1504/IJBIDM.2022.10045744
  • EmoRile: A Personalized Emoji Prediction Scheme Based on User Profiling   Order a copy of this article
    by Vandita Grover, Hema Banati 
    Abstract: Emojis are widely used to express emotions and complement text communication. Existing approaches for emoji prediction are generic and generally utilise text or time for emoji prediction. However, research reveals that emoji usage differs among users. So individual users’ preferences for certain emojis need to be captured while predicting emojis for them. In this paper, a novel emoji-usage-based profiling: EmoRile is proposed. In EmoRile, emoji-usage-based user profiles were created which could be accomplished by compiling a new dataset that included users’ information also. Distinct models with different combinations of text, text sentiment, and users’ preferred emojis were created for emoji prediction. These models were tested on various architectures with a very large emoji label space. Rigorous experimentation showed that even with a large label space, EmoRile predicted emojis with similar accuracy as compared to existing emoji prediction approaches with a smaller label space; making it a competitive emoji prediction approach.
    Keywords: emojis in sentiment analysis; emoji prediction; user profile-based emojis.
    DOI: 10.1504/IJBIDM.2023.10045810
  • Brain Hemorrhage Classification from CT Scan Images using Fine-tuned Transfer Learning Deep Features   Order a copy of this article
    by Arpita Ghosh, Badal Soni, Ujwala Baruah 
    Abstract: Classification of brain haemorrhage is a challenging task and needs to solved to help advance medical treatment. Recently, it has been observed that efficient deep learning architectures have been developed to detect such bleeding accurately. The proposed system includes two different transfer learning strategies to train and fine tune ImageNet pre-trained state-of-the-art architecture such that VGG 16, Inception V3, DenseNet121. The evaluation metrics have been calculated based on the performance analysis of the employed networks. Experimental results show that the modified fine-tuned Inception V3 perform well and achieved the highest test accuracy.
    Keywords: transfer learning; VGG 16; Inception V3; DenseNet121; brain haemorrhage; ReLU; binary cross entropy.
    DOI: 10.1504/IJBIDM.2022.10046012
  • A Novel Classification-based Parallel Frequent Pattern Discovery Model for Decision making and Strategic planning in Retailing   Order a copy of this article
    by Rajiv Senapati 
    Abstract: Exponential growth of retail transactions with different interests of variety of customer makes the pattern mining problem trivial. Hence this paper proposes a novel model for mining frequent patterns. As per the proposed model the frequent pattern discovery is carried out in three phases. In first phase, dataset is divided into n partitions based on the time stamp. In the second phase, clustering is performed in each of the partitions parallelly to classify the customers as HIG, MIG, and LIG. In the third phase, proposed algorithm is applied on each of the classified groups to obtain frequent patterns. Finally, the proposed model is validated using a sample dataset and experimental results are presented to explain the capability and usefulness of the proposed model and algorithm. Further, the proposed algorithm is compared with the existing algorithm and it is observed that the proposed algorithm performs better in terms of time complexity.
    Keywords: data mining; frequent pattern; association rule; classification; algorithm; decision making; retailing.
    DOI: 10.1504/IJBIDM.2023.10046447
  • Distributed Computing and Shared Memory based Utility List Buffer Miner with Parallel Frameworks for High Utility Itemset Mining   Order a copy of this article
    by Eduardus Hardika Sandy Atmaja, Kavita Sonawane 
    Abstract: High Utility Itemset Mining (HUIM) is a well-known pattern mining technique. It considers the utility of the items that leads to finding high profit patterns which are more useful for real conditions. Handling large and complex dataset are the major challenges in HUIM. The main problem here is the exponential time complexity. Literature Review shows multicore approaches to solve this problem by parallelizing the tasks but it is limited to single machine resources and also needs a novel strategy. To address this problem, we proposed new strategies namely Distributed Computing (DC-PLB) and Shared Memory (SM-PLB) based Utility List Buffer Miner with Parallel Frameworks (PLB). It utilizes cluster nodes to parallelize and distribute the tasks efficiently. Thorough experiments with results proved that the proposed frameworks achieved better runtime (448s) in dense datasets compared to the existing PLB (2237s). It has effectively addressed the challenges of handing large and complex datasets.
    Keywords: HUIM; PLB; DC-PLB; SM-PLB; cluster computing; parallel and distributed computing; data mining; MPI; Apache Spark.
    DOI: 10.1504/IJBIDM.2023.10046448
  • A Survey on Adoption of Blockchain in Healthcare   Order a copy of this article
    by Shantha Shalini K., M. Nithya 
    Abstract: In this technology and automation era, blockchain technology travels in the direction of consistent studies and adoption in different sectors. Blockchain technology with a chain of the block provides security and establishes a trusted environment between individuals. In the past couple of years, blockchain technology attracted many research scholars, industrialists to study, analyse and apply the technology in their own application needs. The major advantage of blockchain technology is the security, user privacy preserved, transparency. The purpose of this proposed paper is to provide a survey on blockchain scope in healthcare providing high security of patient health information’s during sharing and their impact to reduce the operational and capital investments. Also, this paper briefs on the new business opportunities in the health sector integrating blockchain technology.
    Keywords: healthcare; blockchain; patient health records.
    DOI: 10.1504/IJBIDM.2023.10046449
  • An Optimized Soft Computing based Approach for Multimedia Data Mining   Order a copy of this article
    by M. Ravi, M. Ekambaram Naidu, G. Narsimha 
    Abstract: Multimedia mining is a sub-field of information mining which is exploited to discover fascinating data of certain information from interactive media information bases. The information mining is ordered into two general classifications, such as static media and dynamic media. Static media possesses text and pictures. Dynamic kind of media consists of Audio and Video. Multimedia mining alludes to investigation of huge measure of mixed media data so as to extricate design patterns dependent on their factual connections. Multimedia mining frameworks can find significant data or image design patterns from a colossal assortment of imageries. In this paper, a hybrid method is proposed which exploits statistical and applied soft computing-based primitives and building blocks, i.e., a novel feature engineering algorithm, aided with convolutional neural networks-based efficient modelling procedure. The optimal parameters are chosen such as number of filters, kernel size, strides, input shape and nonlinear activation function. Experiments are performed on standard web multimedia data (here, image dataset is exploited as multimedia data) and achieved multi-class image categorisation and analysis. Our obtained results are also compared with other significant existing methods and presented in the form of an intensive comparative analysis.
    Keywords: knowledge discovery; supervised learning; multimedia databases; image data; soft computing; feature engineering.
    DOI: 10.1504/IJBIDM.2023.10046450
  • Variable Item Value based High Utility Itemset Recommendation Using Statistical Approach   Order a copy of this article
    by ABDULLAH BOKIR, V.B. Narasimha 
    Abstract: High utility mining has become an absolute requirement for an efficient corporate management procedure. The challenge persists in identifying the top-out or bottom-out conditions in the context of the available HUM solutions, and it is critical for enterprises to manage adequate inventory to have higher yield outcomes. Taking these aspects into consideration, this paper proposed a comprehensive method named as "Variable Item Value-based High Utility Itemset Recommendation (VIVHUIR)". Unlike the contemporary models, which are focusing utility mining by constant utility factor, the proposed model is focusing on variable utility factor to perform utility mining based on profitability for an itemset. In addition, the drift (variability) in utility factor detection methodology is fundamentally based on the Average True Range for an itemset and the Relative Strength Index assessment for analysis, which is unique and novel feature of the proposal. To comprehend the elements influencing profit, the proposed four-layered filtering model depends on quantities, demand, supply, and gain/loss inventory. The experimental research of the model refers to potential solutions that are pragmatic in a real-time situation.
    Keywords: High Utility Mining; Dynamic Utility; Average True Range; Relative Strength Index; Economic Order Quantity; Inventory Storage Cost.
    DOI: 10.1504/IJBIDM.2023.10047036
  • Multi-modal feature fusion for object detection using neighbourhood component analysis and bounding box regression   Order a copy of this article
    by Anamika Dhillon, Gyanendra K. Verma 
    Abstract: Object detection has gained remarkable interest in the research area of computer vision applications. This paper presents an efficient method to detect multiple objects and it contains two parts: 1) training phase; 2) testing phase. During training phase, firstly we have exploited two convolutional neural network models namely Inception-ResNet-V2 and MobileNet-V2 for feature extraction and then we fuse the features extracted from these two models by using concatenation operation. To acquire a more compact presentation of features, we have utilised neighbourhood component analysis (NCA). After that, we classify the multiple objects by using SVM classifier. During the testing phase, to detect various objects in an image, a bounding box regression module is proposed by applying LSTM. We have performed our experiments on two datasets; wild animal camera trap and gun. In particular, our method achieves an accuracy rate of 97.80% and 97.0% on wild animal camera trap and gun datasets respectively.
    Keywords: deep convolution networks; object detection; neighbourhood component analysis; NCA; support vector machine; SVM; long short-term memory; LSTM.
    DOI: 10.1504/IJBIDM.2022.10047465
  • A widespread Survey on Machine Learning Techniques and User Substantiation Methods for Credit Card Fraud Detection   Order a copy of this article
    Abstract: In this modern scientific digital world, credit card usage was enormously increased everyday. Simultaneously huge amount of credit card misuse also has been expressively popular. It prompts monetary misfortunes for both charge cardholders and monetary associations. To keep away from that monetary association, creating and convey Visa extortion discovery techniques. In the upcoming everybody will utilise the greatest exchange through online mode just to save their time. So we partition this review into two primary parts. From the start part, we centre around old-style AI models. In this model what the client knows (knowledge-based strategy). We focus more on the turn of events procedure of client verification, and their conduct biometrics to distinguish an individual remarkable conduct while utilising their electronic gadgets. An outline of the current methodology in this writing review means to grow a more precise, dependable, versatile, superfast, effective, and modest model of charge card extortion identification.
    Keywords: credit card transaction; machine learning; bio-metrics; XGBoost; SVM; random forest.
    DOI: 10.1504/IJBIDM.2023.10047750
  • Identifying influential nodes in large scale social networks using Global and local structural information   Order a copy of this article
    by Noosheen Shareefi, Mehdi Bateni 
    Abstract: According to the importance of identifying influential nodes in different applications, many methods have been proposed for it. Some of them are not accurate enough or have high temporal complexity. In this paper, a method named new GLS (NGLS) is developed based on the global and local search (GLS) algorithm. GLS, despite its high accuracy compared to other methods is not fast and efficient enough. NGLS is developed to improve the efficiency and scalability of GLS. To reach this goal, the number of common neighbours of each node is counted only up to a radius of two. The execution time of NGLS on average has been reduced by 85% in real-world networks and 97% on simulated networks, while the accuracy of NGLS is the same as GLS accuracy. Therefore, NGLS is applicable for larger real-world networks.
    Keywords: influential nodes; global and local information; large networks; centrality measure; neighbour contribution; complex network; propagation; propagation models; complexity; social network analysis.
    DOI: 10.1504/IJBIDM.2023.10047751
    by Nafees Muneera, P. Sriramya 
    Abstract: Recently, a vast amount of text data has been increased rapidly and then information must be summarised to retrieve useful knowledge. First, the preprocessing module utilises the fixed-length stemming method, and then the segmentation module makes use of a pre-trained bidirectional encoder representations from transformers (BERT). The text of input is segmented with the utilisation of feedforward and multi-head attention layer. This BERT segmentation paradigm is adjoined alongside shark smell optimisation (SSO) methodology, and thus, the phrases that are extricated are employed to prepare the document stage of a dataset of Amazon merchandise assessment. This study aspires for creating a concise summary and invigorating headlines, which grab the focus of the readers. This paper exhibits that it performs by amalgamating the duo extractive and abstractive procedures employing a pipelined technique for creating a succinct summary that is later utilised for headline creation. Experimentation was executed on publically accessible datasets CNN/Daily Mail.
    Keywords: abstractive; text summarisation; optimisation; transformer; clustering; similarity index.
    DOI: 10.1504/IJBIDM.2023.10047979
  • State-of-Art approaches for Event Detection over Twitter Stream: a Survey   Order a copy of this article
    by Jagrati Singh, Anil Kumar Singh 
    Abstract: In the present time, social network applications like Twitter, Facebook and YouTube have evolved as a popular way of information sharing for general users. On these platforms, valuable information appears as breaking hot news, trending topics, public opinion, and so on. Twitter is the most popular microblogging service that generates huge volumes of data with high velocity and variety (i.e., images, text and video). Due to the growth of discussed real-world events over Twitter, the event detection problem is becoming an interesting and challenging issue. Event detection is the practice of applying natural language processing and text analysis techniques to identify and extract event information from text. This survey paper explores important research works for event detection using Twitter data. We classify approaches according to feature modelling methods: vector space model, statistical model and graph model. We highlight research challenges, issues, and the limitation of existing approaches to find the research gaps for future directions.
    Keywords: Twitter stream; clustering; data sharing; supervised technique; unsupervised technique; semantic correlation; keyword co-occurrence; topic modelling.
    DOI: 10.1504/IJBIDM.2023.10048271
  • Data Quality Based View Selection in Big Data Integration System   Order a copy of this article
    by Samir Anter 
    Abstract: An integration system is an intermediate tool between a user and a set of distributed sources. It provides transparent access to information through an Interface using a unique query language. This provides an illusion to the end user as if it is accessing a homogeneous central repository. In a hybrid system, one part of the data is queried on demand where as another part is extracted, filtered and stored in a local database. This approach is very much promising for data access in big data context. However, obtaining satisfactory results depend on the correct choice of data to materialise. Further this task is even more difficult in big data context. In this article, a novel approach has been proposed to overcome above problem which uses data quality to select views that will be materialised.
    Keywords: data integration; materialised views; big data; data quality; view selection.
    DOI: 10.1504/IJBIDM.2023.10048381
  • Evaluation of Factors Involved in Predicting Indian Stock Price Using Machine Learning Algorithms   Order a copy of this article
    by ARCHIT A. VOHRA, Paresh J. Tanna 
    Abstract: This study evaluates the effect of training dataset size, dimensionality and rolling dataset on the prediction accuracy of decision tree regression (DTR), support vector regression (SVR), long short-term memory (LSTM) and neural network multi-layer perceptron (NNMLP). Data of ten stocks from different sectors of National Stock Exchange Fifty (NIFTY 50) was considered. Execution time for each model is calculated to find out the fastest algorithm. Finally, correlation between prediction accuracy and performance measures is established. The results clearly show that increasing the training dataset size does not always increase the prediction accuracy. Characteristics of the dataset is one major factor that is responsible for prediction accuracy. DTR and SVR have very low average execution time compared to LSTM and NNMLP. Very strong negative correlation was found between mean absolute percentage error (MAPE) and prediction accuracy.
    Keywords: prediction accuracy; training dataset size; rolling dataset; performance measures; regression; neural network; execution time; stock price.
    DOI: 10.1504/IJBIDM.2023.10048648
  • Text Document Learning using Distributed Incremental Clustering Algorithm: Educational Certificates   Order a copy of this article
    by Archana Chaudhari, Preeti Mulay, Ayushi Agarwal, Krithika Iyer, Saloni Sarbhai 
    Abstract: Technological advancements have now allowed each one of us to learn new skills at home or through various workshops conducted, and one of the ways to award your skill is by providing certificates. The digital and handwritten certificates datasets are usually in images. We can use this information to provide analysis on which subject has recently gained popularity and how to improve the field of study at different universities. Therefore this paper proposes distributed incremental clustering with closeness factor-based algorithm (DIC2FBA) for text clustering. The primarily focused on Faculty development program certificates dataset that cover both text and numeric data. The proposed system used AWS EC2 instance and AWS S3 bucket, which helps to cluster data from multiple sites in iterative and incremental mode. Further, we have compared the findings achieved using the DIC2FBA with K-means modified inter and intra clustering (KM-I2C) algorithm based on silhouette score, and Davis Bouldin index. The proposed system will help educational institutions understand the popular skill set of faculties which can further be used to understand the effectiveness of such programs.
    Keywords: distributed incremental clustering; text document learning; educational certificates; faculty development program; FDP; AWS.
    DOI: 10.1504/IJBIDM.2024.10049120
  • Machine Learning approach for Data Analysis and Predicting Coronavirus Using COVID -19 India Dataset   Order a copy of this article
    by Soni Singh, Dr.K.R.Ramkumar Kumar, Ashima Kukkar 
    Abstract: According to the World Health Organisation (WHO), the COVID-19 virus would infect 83,558,756 persons worldwide in 2020, resulting in 646,949 deaths. In this research, we aim to find the link between the time series data and current circumstances to predict the future outbreak and try to figure out which technique is best for modelling for accurate predictions. The performance of different machine learning (ML) models such as sigmoid function, Facebook (FB) prophet model, seasonal auto-regressive integrated moving average with eXogenous factors (SARIMAX) model, support vector machine (SVM) learning model, linear regression (LR) model, and polynomial regression (PR) Model are analysed along with their error rate. A comparison is also done to evaluate a best-suited model for prediction based on different categorisation approaches on the WHO authenticated dataset of India. The result states that the PR model shows the best performance with time-series data of COVID-19 whereas the sigmoid model has the consistently smallest prediction error rates for tracking the dynamics of incidents. In contrast, the PR model provided the most realistic prediction to identify a plateau point in the incident’s growth curve.
    Keywords: COVID-19; pandemics; analysis on India; machine learning; prediction; comparison; support vector machine; SVM.
    DOI: 10.1504/IJBIDM.2024.10049479
  • Prediction of Stock Prices of Blue-Chip Companies using Machine Learning Algorithms   Order a copy of this article
    by Rajvir Kaur, Anurag Sharma 
    Abstract: Accurate stock market prediction is very challenging task for experts due to its volatile nature. To determine the future value of stock market, several researches are based on historical data. But nowadays, there are some external factors like social media and news headlines greatly affect the stock market. This research work is based on the prediction of future stock prices by using both twitter social media and news data along with historical data to get the high prediction results. The performance of machine learning algorithms logistic regression, SVM, random forest is analysed using matrices like accuracy, precision, recall, and F1 score. To train and test the final dataset, it is divided into 80:20 ratios. For each blue chip company, the testing dataset contains 248 samples, which exhibited the highest prediction accuracies ranging from 85% to 89% for prediction of stock prices is achieved using logistic regression algorithm.
    Keywords: blue-chip companies; machine learning; news headlines; social media; stock market prediction; Twitter.
    DOI: 10.1504/IJBIDM.2023.10049725
    by Syavasya CVSR, A. Lakshmi Muddana 
    Abstract: The accuracy of the data mining (DM) outcomes might be affected by mining and analysing incomplete datasets with missing values (MV). Thus, a complete dataset is created by the imputation of MV, which makes the analysis easier. An effectual missing values imputation (MVI) is proposed and evaluated utilising Gaussian kernel-K harmonic means (GK-KH Means) and hyperbolic tangent radial-recurrent neural networks (HTR-RNN) to combat this issue. At first, preprocessing is performed on the input data as of the CKD dataset wherein the duplicate form of the data gets eradicated. Next, the missing data are handled by ignoring them; and utilising GK-KH Means, the MV is imputed. Next, the data are rationalised into a structured format. Then, SDRM-DHO selects the most optimal features as of the extracted features. Lastly, the HTR-RNN classifier accepts these chosen features as input. Proposed work performed well in more accurate missing value imputation.
    Keywords: missing value imputation; K harmonic means; Gaussian kernel function; recurrent neural network; swap displacement reversion operation.
    DOI: 10.1504/IJBIDM.2023.10049909
  • Detection of spammers disseminating obscene content on Twitter   Order a copy of this article
    by Deepali Dhaka, Surbhi Kakar, Monica Mehrotra 
    Abstract: Spammers distributing adult content are becoming an apparent and yet intrusive problem with the increasing prevalence of online social networks among users. For improving user experience and especially preventing exposure to users of lower age groups, these accounts need to be detected efficiently. In this work, a model is proposed in which a lexicon-based approach is used to label users with their values. This study is based on the fact that users behave according to the values they possess. The amalgamation of content-based features like values, the entropy of words, lexical diversity, and context-based word embeddings are found to be robust. Among several machine learning models, XGboost performs exceedingly well with accuracy (92.28 ± 1.28%) for all features. Feature importance and their discriminative power have also been shown. A comparative study is also done with one of the latest approaches and our approach is found to be more efficient.
    Keywords: values; emotions; Twitter; online social network; spammer; pornographic spammer.
    DOI: 10.1504/IJBIDM.2022.10040432
  • Suspicious tweet identification using machine learning approaches for improving social media marketing analysis   Order a copy of this article
    by Senthil Arasu Balasubramanian, Jonath Backia Seelan, Thamaraiselvan Natarajan 
    Abstract: Social media acts as one of the eminent platforms for communication. Twitter is one of the leading social media microblogging platforms, where users can post and interact. #Hashtags specify the tweeter trends on a certain topic. Currently, the hashtag value or trend ranking for a particular hashtag has been calculated based on the cumulative number of tweets. This type of cumulative amount of hashtag ranking may result in an anonymous intervention of irrelevant tweets, which affects social media marketing. The proposed approach uses the relevance of tweets and #hashtags to improve and identify the suspicious or irrelevant tweets of media marketing. The proposed research work uses the linear regression algorithm, which is one of the familiar machine learning approaches to explain the spam tweet generation and the method to identify. The test results found the proposed system has 84% of significance when compared to the market analysis algorithms.
    Keywords: tweets; hashtags; trend prediction; linear regression; social media marketing.
    DOI: 10.1504/IJBIDM.2022.10040478
  • Factors influencing the moving up the value chain by Indian IT service organisations   Order a copy of this article
    by B. Mahendramohan, G. Kannabiran, P. Sridevi 
    Abstract: Indian information technology (IT) service organisations that were providing low value-added services are moving up the value chain of IT services to overcome threats of competition and automation. The purpose of this study is to evaluate the impact of Indian IT service organisations' capabilities on moving up the value chain. Using a resource-based view perspective, this research examines the influence of the service provider's capabilities on moving up the value chain. The research was conducted by collecting responses from 188 employees of Indian IT service organisations. The data were analysed using structural equation modelling. The study shows that the service provider's capabilities, namely, relationship management capability, project management capability, domain understanding and IT advancement positively impact service quality and innovativeness. The service provider's service quality and innovativeness, and the absorptive capacity of the client enhance the effectiveness in moving up the value chain.
    Keywords: moving-up value chain; innovativeness; service quality; project management; relationship management; domain understanding; information technology advancement; absorptive capacity.
    DOI: 10.1504/IJBIDM.2022.10048760
  • Leveraging the fog-based machine learning model for ECG-based coronary disease prediction   Order a copy of this article
    by R. Hanumantharaju, K.N. Shreenath, B.J. Sowmya, K.G. Srinivasa 
    Abstract: Smart healthcare systems need a remote monitoring system based on the internet of things. Smart healthcare services are an innovative way of synergising the benefits of sensors for large-scale analytics to communicate better patient care. Work provides the sick with healthcare administrations as a sound population through remote observation using detailed calculations, tools and methods for better care. The proposed system integrates architecture based on IoT, fog computing and machine learning (ML) algorithms. The dimensionality of the data collected about heart diseases is loaded, filtered and extracted attributes at the fog layer; the classification model is built at the fog nodes. The resultant of the model is sent to the cloud layer to train classifiers. Cloud layer estimates the level of ML algorithms to predict disease. Result shows that random forest has better feature extraction than naive Bayes with flawlessness of 3% in precision, 3% in recall, and 13% in f-measure.
    Keywords: internet of things; IoT; machine learning; random forest; naive Bayes; fog layer; remote monitoring; feature extraction.
    DOI: 10.1504/IJBIDM.2022.10041200
  • An optimal dimension reduction strategy and experimental evaluation for Parkinson's disease classification   Order a copy of this article
    by D. Saidulu, R. Sasikala 
    Abstract: The amount of data streamed and generated through various healthcare systems is exponentially increasing day by day. Applying traditional data mining algorithms on this massive sized data to construct automated decision support systems is a tedious and time consuming task. In recent years, there has been increasing interest in the development of telediagnosis and telemonitoring systems for Parkinson's disease (PD). Parkinson's disease is a progressive neurodegenerative disease which affect the movement characteristics. PD patients commonly face vocal impairments during the early stages of the disease. This work proposes a computationally efficient method for dimension reduction and classification of healthcare related data. The devised framework is capable to deal with the data having discrete as well as continuous natured features. The experimental evaluation is performed on Parkinson's disease classification database (Sakar et al., 2018). The statistical performance metrices used are - validation and test accuracy, precision, recall, F1-score, etc. There will be computational complexity advantages when this reduced dimension data is further processed for modelling and building prediction system. In order to prove the optimality of proposed framework, comparative analysis is performed with the significant existing approaches.
    Keywords: big data; learning; dimension reduction; machine learning; knowledge discovery; information retrieval.
    DOI: 10.1504/IJBIDM.2022.10040204
  • A review of scalable time series pattern recognition   Order a copy of this article
    by Kwan-Hua Sim, Kwan-Yong Sim, Valliappan Raman 
    Abstract: Time series data mining helps derive new, meaningful and hidden knowledge from time series data. Thus, time series pattern recognition has been the core functionality in time series data mining applications. However, mining of unknown scalable time series patterns with variable lengths is by no means trivial. It could result in quadratic computational complexities to the search space, which is computationally untenable even with the state-of-the-art time series pattern mining algorithms. The mining of scalable unknown time series patterns also requires the superiority of the similarity measure, which is clearly beyond the comprehension of standard distance measure in time series. It has been a deadlock in the pursuit of a robust similarity measure, while trying to contain the complexity of the time series pattern search algorithm. This paper aims to provide a review of the existing literature in time series pattern recognition by highlighting the challenges and gaps in scalable time series pattern mining.
    Keywords: time series pattern recognition; scalable time series pattern matching; motif discovery; time series data mining; distance measure; dimension reduction; sliding window search.
    DOI: 10.1504/IJBIDM.2022.10041672