International Journal of Business Intelligence and Data Mining (48 papers in press)
Regular Issues
Analysis and Prediction of Heart Disease Aid of Various Data Mining Techniques: A Survey  by V. Poornima, D. Gladis Abstract: In recent times, health diseases are expanding gradually because of inherited. Particularly, heart disease has turned out to be the more typical nowadays, i.e., life of individuals is at hazard. The data mining strategies specifically decision tree, Naive Byes, neural network, K-means clustering, association classification, support vector machine (SVM), fuzzy, rough set theory and orthogonal local preserving methodologies are examined on heart disease database. In this paper, we survey distinctive papers in which at least one algorithms of data mining are utilised for the forecast of heart disease. This survey comprehends the current procedures required in vulnerability prediction of heart disease for classification in data mining. Survey of pertinent data mining strategies which are included in risk prediction of heart disease gives best expectation display as hybrid approach contrasting with the single model approach. Keywords: Data mining; Heart Disease Prediction; performance measure; Fuzzy; and clustering. DOI: 10.1504/IJBIDM.2018.10014620
A predictive model of electricity quality indicator in distribution subsidiaries  by Ana Flávia L. Gonçalves, Rafael Frinhani, Bruno G. Batista, Rafael P. Pagan, Edvard M. De Oliveira, Bruno T. Kuehne, João Paulo R. R. Leite, João Víctor De M. S. Gomes Abstract: Electricity concessionaires give off high financial amounts annually in repairs to consumers that experience service unavailability. Availability of the energy supply is a major challenge because the distribution infrastructure is constantly affected by climatic, environmental, and social causes. To assist decision making in mitigating grid failures, this study aims to predict the number of incidences of electricity shortage for consumers. A predictive model was developed using predictive data analysis and conforms to a knowledge discovery process. A hybrid classifier was developed from the model, using both unsupervised and supervised methods. The experiments were carried out with real incidence and climatic data from four subsidiaries of an energy concessionaire. The results show the forecasting models feasibility, which presented classification accuracy between 58.33% to 91.66%. The results show that peculiarities in terms of geographic location, energy demand, and climatic conditions make it difficult to use a generic prediction model. Keywords: electric quality indicator; predictive data analysis; machine learning; unsupervised methods; supervised methods; knowledge discovery in data. DOI: 10.1504/IJBIDM.2022.10041550
AUGMENTING KEYWORD-BASED PATENT PRIOR ART SEARCH USING WEIGHTED CLASSIFICATION CODE HIERARCHIES  by Alok Khode, Sagar Jambhorkar Abstract: Patents are critical intellectual assets for any business. With the rapid increase in the patent filings, patent prior art retrieval has become an important task. The goal of the prior art retrieval is to find documents relevant to a patent application. Due to special nature of the patent documents, only relying on the keyword-based queries do not prove effective in patent retrieval. Previous work have used international patent classification (IPC) to improve the effectiveness of keyword-based search. However, these systems have used two-stage retrieval process using IPC mostly to filter patent documents or to re-rank the documents retrieved by keyword-based query. In the approach proposed in this paper, weighted IPC code hierarchies have been explored to augment keyword-based search, thereby eliminating the use of an additional processing step. Experiments on the CLEF-IP 2011 benchmark dataset show that the proposed approach outperforms the baseline on the MAP, Recall and PRES. Keywords: patent retrieval; prior art search; international patent classification; IPC; query formulation; query expansion; information retrieval; IPC hierarchy; weighted IPC. DOI: 10.1504/IJBIDM.2022.10041582
Optimizing the number of course sections given optimal course sequence to support student retention  by Akash Gupta, Amir Gharehgozli, Seung-Kuk Paik Abstract: Although higher education institutions strive to create the environments that foster student retention, many students depart before graduation. Therefore, it is paramount to understand important factors that derive students retention. We observed that student retention is tied to the student grade point average (GPA) and, subsequently, the GPA is co-related to the order in which student enroll in courses. In this study, initially using statistical methods, we determine the best order of taking core courses. Then, we develop a prescriptive model using a mixed-integer linear programming. This model determines the optimal number of sections to be offered for each course so that maximum students can follow the optimal course order in a resource constrained environment. We also propose heuristic subroutines to solve the proposed model and determine the optimal number of sections for each course. In addition, we highlight the social and demographics factors that influence student retention. This study helps college administration to plan courses so that student retention can be improved. Keywords: education; student success; data analytics; retention; course sequence. DOI: 10.1504/IJBIDM.2022.10042240
Exploring Appropriate ERP Framework towards Indian Small and Medium Enterprises using Decision Tree  by Aveek Basu, Sraboni Dutta, Sanchita Ghosh Abstract: The small and medium enterprises (SMEs) enhance the outcome of the various business processes through implementing ERP framework. However, they are in muddle while selecting the appropriate ERP as on premise solution entails a large upfront capital expense, which ultimately raises a question in the sustenance of these small firms especially in this pandemic situation. Cloud-based ERP system can reduce the risk to a certain level due to their low infrastructure cost and flexible payment options but has its own constraints. Thus, selection of the appropriate ERP is always a challenge, which motivates the current researchers to explore a decision tree-based technique to predict the most suitable framework that needs to be adopted by a SME in a specific situation. The inferences drawn from the decision tree clearly shows the efficacy of the implemented technique as the right decision can be derived easily by traversing the tree. Keywords: enterprise resource planning; ERP; Cloud ERP; on premise ERP; hybrid ERP; small and medium enterprise; SME; decision tree. DOI: 10.1504/IJBIDM.2022.10042760
A cluster and label approach for classifying imbalanced data streams in the presence of scarcely labeled data  by Kiran Bhowmick, Meera Narvekar Abstract: Classifying imbalanced data streams is often a challenging task primarily due to the continuous flow of infinite data and due to the unavailability of class labels. The problem is two-fold when the stream is imbalanced in nature. Due to the characteristics of data streams, it is impossible to store and process the data and deal with imbalance. There is a need to provide a solution that can consider the unavailability of class labels and classify the imbalanced data streams. This paper proposes a semi-supervised learning (SSL)-based model to classify scarcely labelled imbalanced data streams. A modified cluster and label SSL approach that uses expectation maximisation for clustering and similarity-based label propagation for labelling the unlabelled clusters is proposed. The model also employs a novel imbalance sensitive cluster merge technique to deal with the imbalance data. The results prove that the model outperforms standard stream classification algorithms. Keywords: data streams; classification; imbalanced data; semi-supervised learning; scarcely labelled; cluster and label; micro cluster; label propagation. DOI: 10.1504/IJBIDM.2022.10042780
Application of a Record Linkage Software to Identify Mortality of Enrollees of Large Integrated Health Care Organization  by Yichen Zhou, Zhi Liang, Sungching Glenn, Wansu Chen, Fagen Xie Abstract: Information on mortality is important for the improvement of public health and the conduct of medical research. Health care organisations typically lack complete and accurate information on mortality. This paper proposes a comprehensive process to link the records of the enrollees of a health care organisation with the death records of 2015 obtained from the California State via a commercial data linkage software. The developed linkage process has successfully identified 23,628 and 21,009 death records of health plan enrollees from the State file after the initial and second post-linkage, respectively. Validation of the linkage process against the deaths records documented in the internal systems of the organisation achieved a sensitivity of 97.5% and a positive predictive value of 88.7% at the time of initial linkage but increased to 99.4% in three years using more information available later. The linkage process demonstrated high accuracy and can be utilised to support various business needs. Keywords: data cleaning; data standardisation; data matching; mortality linkage. DOI: 10.1504/IJBIDM.2022.10042864
Exploring Outliers in Global Economic Dataset having the Impact of Covid-19 Pandemic  by Anindita Desarkar, Ajanta Das, Chitrita Chaudhuri Abstract: Outlier is a value that lies outside most of the other values in a dataset. Outlier exploration has a huge importance in almost all the industry applications like medical diagnosis, credit card fraudulence and intrusion detection systems. Similarly, in economic domain, it can be applied to analyse many unexpected events to harvest new knowledge like sudden crash of stock market, mismatch between countrys per capita incomes and overall development, abrupt change in unemployment rate and steep falling of bank interest. These situations can arise due to several reasons, out of which the present covid-19 pandemic is a leading one. This motivates the present researchers to identify a few such vulnerable areas in the economic sphere and ferret out the most affected countries for each of them. Two well-known machine-learning techniques DBSCAN and Z-score are utilised to get these insights, which can serve as a guideline towards improving the overall scenario subsequently. Keywords: economic outlier; machine learning; gross domestic product; GDP; per capita; human development index; HDI; covid-19 pandemic; total death percentage. DOI: 10.1504/IJBIDM.2022.10043040
KNOWLEDGE DISCOVERY IN DATABASES: AN APPLICATION TO MARKET SEGMENTATION IN RETAIL SUPERMARKETS  by Kellen Endler, Cassius Tadeu Scarpin, Maria Teresinha Arns Steiner, Tamires Almeida Sfeir, Claudimar Pereira Da Veiga Abstract: The purpose of this article is to present a methodology based on the extraction process of knowledge discovery in databases (KDD) to predict the expenditure of different customer profiles, considering their characteristics, and the type of store they would buy from, in one of the largest retail chains in the Brazilian supermarket and hypermarket segment. These stores have different characteristics, such as physical size, product assortment and customer profile. This heterogeneity in terms of commercial offers implies a desire for consumption by customers that differs from store to store, depending on how their preferences are met. The proposed methodology was applied to a real marketing case based in a business-to-consumer (B2C) environment to aid retailers during the segmentation process. The results show that it is possible to highlight relationships between the data that enabled the prediction of customers consumption, which can contribute towards generating useful information to retail businesses. Keywords: knowledge discovery in databases; KDD; data mining; market segmentation; retail; principal component analysis; PCA; cluster analysis; multiple linear regression. DOI: 10.1504/IJBIDM.2022.10043148
Performance Evaluation of Oversampling Algorithm: MAHAKIL using Ensemble Classifiers  by C. Arun, C. Lakshmi Abstract: Class imbalance is a known problem that exist in real-world applications, which consists of disparity in the existence of samples count of different classes, which results in biased performance. Class imbalance issue has been catered by many sampling techniques which may either fall into an oversampling approach that solves issues to a greater extent or under sampling. MAHAKIL is a diversity-based oversampling approach influenced by the theory of inheritance, in which minority samples are synthesised in view of balancing the class using Mahalanobis distance measure. In this study the performance of MAHAKIL algorithm has been tested using various ensemble classifiers which are proved to be effective due to its multi hypothesis learning approach and better performance. The results of the experiment conducted on 20 imbalanced software defect prediction datasets using six different ensemble approaches showcase XGBoost provides better performance and reduced false alarm rate compared to other models. Keywords: class imbalance; software fault prediction; synthetic samples; over sampling techniques; MAHAKIL; false alarm rate; evolutionary algorithm; ensemble; inheritance. DOI: 10.1504/IJBIDM.2022.10043149
Machine learning based forecasting of significant daily returns in foreign exchange markets  by Firuz Kamalov, Ikhlaas Gurrib Abstract: Financial forecasting has always attracted an enormous amount of interest among researchers in quantitative analysis. The advent of modern machine learning models has introduced new tools to tackle this classical problem. In this paper, we apply machine learning algorithms to a hitherto unexplored question of forecasting instances of significant fluctuations in currency exchange rates. We carry out an extensive comparative study of ten modern machine learning methods. In our experiments, we use data on four major currency pairs over a 20-year period. A key contribution is the novel use of outlier detection methods for this purpose. Numerical experiments show that outlier detection methods substantially outperform traditional machine learning and finance techniques. In addition, we show that a recently proposed new outlier detection method PKDE produces the best overall results. Our findings hold across different currency pairs, significance levels, and time horizons indicating the robustness of the proposed method. Keywords: foreign exchange; forecasting; machine learning; outlier detection; kernel density estimation; KDE; neural networks; tail events. DOI: 10.1504/IJBIDM.2022.10043208
Using unstructured logs generated in complex large scale micro-service-based architecture for data analysis  by Anukampa Behera, Sitesh Behera, Chhabi Rani Panigrahi, Tien-Hsiung Weng Abstract: With deployments of complicated or complex large scale micro-service architectures the kind of data generated from all those systems makes a typical production infrastructure huge, complicated and difficult to manage. In this scenario, logs play a major role and can be considered as an important source of information in a large scale secured environment. Till date many researchers have contributed various methods towards conversion of unstructured logs to structured ones. However post conversion the dimension of the dataset generated increases many folds which are too complex for data analysis. In this paper, we have discussed techniques and methods to deal with extraction of all features from a produced structured log, reducing N-dimensional features to fixed dimensions without compromising the quality of data in a cost-efficient manner that can be used for any further machine learning-based analysis. Keywords: json data; micro services; data parsing; principal component analysis; PCA; multivariate data; unstructured data; tagged data; feature reduction. DOI: 10.1504/IJBIDM.2022.10043252
Approaches to Parallelize Eclat algorithm and Analyzing its Performance for K Length Prefix based Equivalence Classes  by C.G. Anupama, C. Lakshmi Abstract: Frequent item set mining (FIM), being one of the prevalent, well-known method of data mining and topic of interest for the researchers in the field of decision making. With the establishment of the period of big data where the data is continuously generated from multidimensional sources with enormous volume, variety in an almost unrevealed way, transforming this data into a valuable knowledge discovery which can add value to the organisations to make an efficient decision making places a challenge in the present research. This leads to the problem of discovery of the maximum frequent patterns in vast datasets and to create a more generalised and interpretable representation of veracity. Targeting the problems stated above, this paper suggests a parallelisation method suitable for any type of parallel environment. The implemented algorithm can be run on a single computer with multi-core processor as well as on a cluster of such machines. Keywords: item set mining; frequent items; frequent patterns; Eclat; parallel Eclat; frequent item set mining; FIM. DOI: 10.1504/IJBIDM.2022.10043400
Mining Models for Predicting Product Quality Properties of Petroleum Products  by NG`AMBILANI ZULU, Douglas Kunda Abstract: There is a huge generation of raw data during production processes of refinery products and in most cases this data remains under-utilised for knowledge acquisition and decision making. The purpose of this study was to demonstrate how data mining techniques can be used to develop models to predict product quality properties for petroleum products. This study used petroleum refinery production raw data to build predicting models for product quality control activities. The plant and laboratory data for the period of about 18 months was mined from the refinery repositories in order to build the datasets required for analysis using Orange3 data mining software. Four data mining algorithms were chosen for experiments in order to determine the best predicting model using cross-validation technique as a validation method. This study only employed two measuring metrics, classification accuracy (CA) and root mean square error (RMSE) as performance indicators. Random forest came out as the best performing model suitable for predicting both categorical (CA) and numeric data (RMSE). The study was also able to establish the relationship between the variables that could be used in critical operational decisions. Keywords: data mining; machine learning; industries; petroleum refinery; product quality; parameter optimisation. DOI: 10.1504/IJBIDM.2023.10043436
Fraud Detection with Machine Learning - Model Comparison  by João Carlos Pacheco Junior, João Luiz Chela, Guilherme Ferreira Pelucio Salome Abstract: This work evaluates the performance of different models for predicting three types of fraudulent behaviour in a novel dataset with imbalanced data. The logistic regression model, a staple in the credit risk industry, is compared to several machine learning models. This work shows that in the binary classification case, all compared models achieved similar results to the logistic regression. The random forest model showed superior performance when classifying credit frauds ending in lawsuits. In the multi-label classification case, the logistic regression attains high levels of precision for all types of fraud, but at lower recall rates. Whereas, the random forest model achieves higher recall rates, but with lower precision rates. Keywords: fraud detection; machine learning; imbalanced data; multi-label classification. DOI: 10.1504/IJBIDM.2023.10044239
CONTEXT-AWARE AUTOMATED QUALITY ASSESSMENT OF TEXTUAL DATA  by Goutam Mylavarapu, K. Ashwin Viswanathan, Johnson P. Thomas Abstract: Data analysis is a crucial process in the field of data science that extracts useful information from any form of data. With the rapid growth of technology, more and more unstructured data, such as text and images, are being produced in large amounts. Apart from the analytical techniques used, the quality of the data plays a prominent role in the accurate analysis Data quality becomes inferior to poor maintenance and mediocre data generation strategies employed by amateur users. This problem escalates with the advent of big data. In this paper, we propose a quality assessment model for the textual form of unstructured data (TDQA). The context of data plays an important role in determining the quality of the data. Therefore, we automate the process of context extraction in textual data using natural language processing to identify data errors and assess quality. Keywords: automated data quality assessment; textual data; context-aware; data context; sentiment analysis; lexicon; Doc2Vec; data accuracy; data consistency. DOI: 10.1504/IJBIDM.2023.10044353
A deep regression convolutional neural network using whole image-based inferencing for dynamic visual crowd estimation  by Shen Khang Teoh, Vooi Voon Yap, Humaira Nisar Abstract: As intelligent surveillance system applications become ubiquitous, automated crowd counting solutions must be made continually faster and accurate. This paper presents an improved convolutional neural network (CNN) architecture for accurate visual crowd counting in crowd images. Multi-column convolutional neural network (MCNN) is widely used in previous works to predict the density map for visual crowd counting. However, this method has limitations in predicting a quality density map. Instead, the proposed model is architected using powerful CNN layers, dense layers, and one regressor node with whole image-based inference. Therefore, it is less computationally intensive and inference speed can be increased. Tested on the mall dataset, the proposed model achieved 2.01 mean absolute error and 8.53 mean square error. Moreover, benchmarking on different CNN architectures has been conducted. The proposed model shows promising counting accuracy and reasonable inference speed against the existing state-of-art approaches. Keywords: visual crowd counting; convolutional neural network; CNN; whole image-based inference; edge embedded platform; multi-column convolutional neural network; MCNN. DOI: 10.1504/IJBIDM.2022.10044713
Factors That Drive the Selection of Business Intelligence Tools in South African Financial Services Providers  by Bonginkosi P. Gina, Adheesh Budree Abstract: Innovation and technology advancements in information systems (IS) have resulted in a multitude of product offerings and business intelligence (BI) software tools in the market to implement business intelligence systems (BIS). As a result, a high proportion of organisations fail to employ suitable software tools meeting organisational needs. The study aimed to discover critical factors influencing the selection of BI tools. This was a quantitative study and questionnaire-surveyed data was collected from 92 participants. The data was analysed by employing SPSS and SmartPLS-3 softwares to test the significance of influential factors. The findings showed that software tool technical factors, vendor technical factors, and opinion non-technical factors are significant. The study contributes to both academia and industry by providing influential determinants for software tool selection. It is hoped that the findings presented will contribute to a greater understanding of factors influencing the selection of BI tools to researchers and practitioners alike. Keywords: business intelligence tools; BITs; business intelligence systems; BIS; business intelligence; BI; software factors; software selection; software tool. DOI: 10.1504/IJBIDM.2023.10044714
Effect of IT Integration on Firm performance: The Mediating Role of Supply Chain Integration and Flexibility  by Gaurav Abhishek Tigga, Ganesan Kannabiran, P. Sridevi Abstract: IT integration complements the functional and operational processes, as well as helps the firm in the development of inimitable competitive advantage. The study examines the effect of ITI on supply chain integration, supplier flexibility and manufacturing flexibility; and their subsequent effects on firm performance. The extended resource-based view has been used as the theoretical perspective to develop the research model. A survey was carried out among the manufacturing industries in India. Structural equation modelling with the partial least squares algorithm was used to analyse the hypotheses proposed in the study. The results reported that ITI has a significant effect on SCI, manufacturing flexibility and SF and subsequently affects FP. Keywords: IT integration; supply chain integration; supplier flexibility; manufacturing flexibility; firm performance. DOI: 10.1504/IJBIDM.2022.10044810
Credit Card Fraud Detection: An Evaluation of SMOTE Resampling and Machine Learning Model Performance  by Faleh Alshameri, Ran Xia Abstract: Credit card fraud has been a noted security issue that requires financial organisations to continuously improve their fraud detection system. In most cases, a credit transaction dataset is expected to have a significantly larger number of normal transactions than fraud transactions. Therefore, the accuracy of a fraud detection system depends on building a model that can adequately handle such an imbalanced dataset. The purpose of this paper is to explore one of the techniques of dataset rebalancing, the synthetic minority oversampling technique (SMOTE). To evaluate the effects of this technique on model training, we selected four basic classification algorithms, complement naïve Bayes (CNB), K-nearest neighbour (KNN), random forest and support vector machine (SVM). We then compared the performances of the four models trained on the rebalanced and original dataset using the area under precision-recall curve (AUPRC) plots. Keywords: credit card; imbalanced dataset; resampling method; synthetic minority oversampling technique; SMOTE; AUPRC; classification algorithms. DOI: 10.1504/IJBIDM.2023.10044811
On Prevention of Attribute Disclosure and Identity Disclosure Against Insider Attack in Collaborative Social Network Data Publishing  by Bintu Kadhiwala, Sankita Patel Abstract: In collaborative social network data publishing setup, privacy preservation of individuals is a vital issue. Existing privacy-preserving techniques assume the existence of attackers from external data recipients and hence, are vulnerable to insider attack performed by colluding data providers. Additionally, these techniques protect data against identity disclosure but not against attribute disclosure. To overcome these limitations, in this paper, we address the problem of privacy-preserving data publishing for collaborative social network. Our motive is to prevent both attribute and identity disclosure of collaborative social network data against insider attack. For the purpose, we propose an approach that utilises p-sensitive k-anonymity and m-privacy techniques. Experimental outcomes affirm that our approach preserves privacy with a reasonable increase in information loss and maintains an adequate utility of collaborative social network data. Keywords: collaborative social network data publishing; attribute disclosure; identity disclosure; insider attack; k-anonymity; m-privacy. DOI: 10.1504/IJBIDM.2023.10045007
Identification of Authorship and Prevention Fraudulent Transactions / Cybercrime using Efficient High Performance Machine Learning Techniques  by Sowmya BJ, Hanumantharaju R, Pradeep Kumar D, Srinivasa K. G Abstract: Cognitive computing refers to the usage of computer models to simulate human intelligence and thought process in a complex situation. Artificial intelligence (AI) is an augmentation to the limits of human capacity for a particular domain and works as an absolute reflection of reality; where a computer program is able to efficiently make decisions without previous explicit knowledge and instruction. The concept of cognitive intelligence was introduced. The most interesting use case for this would be an AI bot that doubles as a digital assistant. This is aimed at solving core problems in AI like open domain question answering, context understanding, aspect-based sentiment analysis, text generation, etc. The work presents a model to develop a multi-resolution RNN to identify local and global context, develop contextual embedding via transformers to pass into a seq2seq architecture and add heavy regularisation and augment data with reinforcement learning, and optimise via recursive neural networks. Keywords: cognitive computing; artificial intelligence; AI; data augmentation; human intelligence; recurrent neural network; transformer model. DOI: 10.1504/IJBIDM.2022.10045310
Forecasting With Information Extracted From The Residuals of ARIMA In Financial Time Series Using Continuous Wavelet Transform  by Heng Yew Lee, Woan Lin Beh, Kong Hoong Lem Abstract: Time series of financial or economic data are often considered to have certain trends and patterns. It is believed that the study of historical patterns helps in the forecasting into the future. ARIMA model is one of the popular models for the task. However, long-term forecasting with ARIMA often appears as a straight line. This is due to ARIMAs dependency on previous values and its tendency to omit the outliers that lie outside of the captured general trend. This paper sought to capture useful outlier information from the residual of ARIMA modelling by using continuous wavelet transform (CWT). The CWT captured information was then added to the ARIMA forecasted values to form non-homogenous long-term forecasting. The final results were encouraging. It was also found that choices of certain CWT related parameters have positive or negative effect to the forecasting outcomes. Keywords: wavelet; forecasting; autoregressive integrated moving average; ARIMA; time series; continuous wavelet transform; CWT. DOI: 10.1504/IJBIDM.2022.10045646
DAMIAN -Data Accrual Machine Intelligence with Augmented Networks for Contextually Coherent Creative Story Generation  by Sowmya BJ, Pradeep Kumar D, Hanumantharaju R, Srinivasa K. G Abstract: Cognitive computing refers to the usage of computer models to simulate human intelligence and thought process in a complex situation. Artificial intelligence (AI) is an augmentation to the limits of human capacity for a particular domain and works as an absolute reflection of reality; where a computer program is able to efficiently make decisions without previous explicit knowledge and instruction. The concept of cognitive intelligence was introduced. The most interesting use case for this would be an AI bot that doubles as a digital assistant. This is aimed at solving core problems in AI like open domain question answering, context understanding, aspect-based sentiment analysis, text generation, etc. The work presents a model to develop a multi-resolution RNN to identify local and global context, develop contextual embedding via transformers to pass into a seq2seq architecture and add heavy regularisation and augment data with reinforcement learning, and optimise via recursive neural networks. Keywords: cognitive computing; artificial intelligence; AI; data augmentation; human intelligence; recurrent neural network; transformer model. DOI: 10.1504/IJBIDM.2022.10045744
EmoRile: A Personalized Emoji Prediction Scheme Based on User Profiling  by Vandita Grover, Hema Banati Abstract: Emojis are widely used to express emotions and complement text communication. Existing approaches for emoji prediction are generic and generally utilise text or time for emoji prediction. However, research reveals that emoji usage differs among users. So individual users preferences for certain emojis need to be captured while predicting emojis for them. In this paper, a novel emoji-usage-based profiling: EmoRile is proposed. In EmoRile, emoji-usage-based user profiles were created which could be accomplished by compiling a new dataset that included users information also. Distinct models with different combinations of text, text sentiment, and users preferred emojis were created for emoji prediction. These models were tested on various architectures with a very large emoji label space. Rigorous experimentation showed that even with a large label space, EmoRile predicted emojis with similar accuracy as compared to existing emoji prediction approaches with a smaller label space; making it a competitive emoji prediction approach. Keywords: emojis in sentiment analysis; emoji prediction; user profile-based emojis. DOI: 10.1504/IJBIDM.2023.10045810
Brain Hemorrhage Classification from CT Scan Images using Fine-tuned Transfer Learning Deep Features  by Arpita Ghosh, Badal Soni, Ujwala Baruah Abstract: Classification of brain haemorrhage is a challenging task and needs to solved to help advance medical treatment. Recently, it has been observed that efficient deep learning architectures have been developed to detect such bleeding accurately. The proposed system includes two different transfer learning strategies to train and fine tune ImageNet pre-trained state-of-the-art architecture such that VGG 16, Inception V3, DenseNet121. The evaluation metrics have been calculated based on the performance analysis of the employed networks. Experimental results show that the modified fine-tuned Inception V3 perform well and achieved the highest test accuracy. Keywords: transfer learning; VGG 16; Inception V3; DenseNet121; brain haemorrhage; ReLU; binary cross entropy. DOI: 10.1504/IJBIDM.2022.10046012
A Novel Classification-based Parallel Frequent Pattern Discovery Model for Decision making and Strategic planning in Retailing  by Rajiv Senapati Abstract: Exponential growth of retail transactions with different interests of variety of customer makes the pattern mining problem trivial. Hence this paper proposes a novel model for mining frequent patterns. As per the proposed model the frequent pattern discovery is carried out in three phases. In first phase, dataset is divided into n partitions based on the time stamp. In the second phase, clustering is performed in each of the partitions parallelly to classify the customers as HIG, MIG, and LIG. In the third phase, proposed algorithm is applied on each of the classified groups to obtain frequent patterns. Finally, the proposed model is validated using a sample dataset and experimental results are presented to explain the capability and usefulness of the proposed model and algorithm. Further, the proposed algorithm is compared with the existing algorithm and it is observed that the proposed algorithm performs better in terms of time complexity. Keywords: data mining; frequent pattern; association rule; classification; algorithm; decision making; retailing. DOI: 10.1504/IJBIDM.2023.10046447
Distributed Computing and Shared Memory based Utility List Buffer Miner with Parallel Frameworks for High Utility Itemset Mining  by Eduardus Hardika Sandy Atmaja, Kavita Sonawane Abstract: High Utility Itemset Mining (HUIM) is a well-known pattern mining technique. It considers the utility of the items that leads to finding high profit patterns which are more useful for real conditions. Handling large and complex dataset are the major challenges in HUIM. The main problem here is the exponential time complexity. Literature Review shows multicore approaches to solve this problem by parallelizing the tasks but it is limited to single machine resources and also needs a novel strategy. To address this problem, we proposed new strategies namely Distributed Computing (DC-PLB) and Shared Memory (SM-PLB) based Utility List Buffer Miner with Parallel Frameworks (PLB). It utilizes cluster nodes to parallelize and distribute the tasks efficiently. Thorough experiments with results proved that the proposed frameworks achieved better runtime (448s) in dense datasets compared to the existing PLB (2237s). It has effectively addressed the challenges of handing large and complex datasets. Keywords: HUIM; PLB; DC-PLB; SM-PLB; cluster computing; parallel and distributed computing; data mining; MPI; Apache Spark. DOI: 10.1504/IJBIDM.2023.10046448
A Survey on Adoption of Blockchain in Healthcare  by Shantha Shalini K., M. Nithya Abstract: In this technology and automation era, blockchain technology travels in the direction of consistent studies and adoption in different sectors. Blockchain technology with a chain of the block provides security and establishes a trusted environment between individuals. In the past couple of years, blockchain technology attracted many research scholars, industrialists to study, analyse and apply the technology in their own application needs. The major advantage of blockchain technology is the security, user privacy preserved, transparency. The purpose of this proposed paper is to provide a survey on blockchain scope in healthcare providing high security of patient health informations during sharing and their impact to reduce the operational and capital investments. Also, this paper briefs on the new business opportunities in the health sector integrating blockchain technology. Keywords: healthcare; blockchain; patient health records. DOI: 10.1504/IJBIDM.2023.10046449
An Optimized Soft Computing based Approach for Multimedia Data Mining  by M. Ravi, M. Ekambaram Naidu, G. Narsimha Abstract: Multimedia mining is a sub-field of information mining which is exploited to discover fascinating data of certain information from interactive media information bases. The information mining is ordered into two general classifications, such as static media and dynamic media. Static media possesses text and pictures. Dynamic kind of media consists of Audio and Video. Multimedia mining alludes to investigation of huge measure of mixed media data so as to extricate design patterns dependent on their factual connections. Multimedia mining frameworks can find significant data or image design patterns from a colossal assortment of imageries. In this paper, a hybrid method is proposed which exploits statistical and applied soft computing-based primitives and building blocks, i.e., a novel feature engineering algorithm, aided with convolutional neural networks-based efficient modelling procedure. The optimal parameters are chosen such as number of filters, kernel size, strides, input shape and nonlinear activation function. Experiments are performed on standard web multimedia data (here, image dataset is exploited as multimedia data) and achieved multi-class image categorisation and analysis. Our obtained results are also compared with other significant existing methods and presented in the form of an intensive comparative analysis. Keywords: knowledge discovery; supervised learning; multimedia databases; image data; soft computing; feature engineering. DOI: 10.1504/IJBIDM.2023.10046450
Variable Item Value based High Utility Itemset Recommendation Using Statistical Approach  by ABDULLAH BOKIR, V.B. Narasimha Abstract: High utility mining has become an absolute requirement for an efficient corporate management procedure. The challenge persists in identifying the top-out or bottom-out conditions in the context of the available HUM solutions, and it is critical for enterprises to manage adequate inventory to have higher yield outcomes. Taking these aspects into consideration, this paper proposed a comprehensive method named as "Variable Item Value-based High Utility Itemset Recommendation (VIVHUIR)". Unlike the contemporary models, which are focusing utility mining by constant utility factor, the proposed model is focusing on variable utility factor to perform utility mining based on profitability for an itemset. In addition, the drift (variability) in utility factor detection methodology is fundamentally based on the Average True Range for an itemset and the Relative Strength Index assessment for analysis, which is unique and novel feature of the proposal. To comprehend the elements influencing profit, the proposed four-layered filtering model depends on quantities, demand, supply, and gain/loss inventory. The experimental research of the model refers to potential solutions that are pragmatic in a real-time situation. Keywords: High Utility Mining; Dynamic Utility; Average True Range; Relative Strength Index; Economic Order Quantity; Inventory Storage Cost. DOI: 10.1504/IJBIDM.2023.10047036
Multi-modal feature fusion for object detection using neighbourhood component analysis and bounding box regression  by Anamika Dhillon, Gyanendra K. Verma Abstract: Object detection has gained remarkable interest in the research area of computer vision applications. This paper presents an efficient method to detect multiple objects and it contains two parts: 1) training phase; 2) testing phase. During training phase, firstly we have exploited two convolutional neural network models namely Inception-ResNet-V2 and MobileNet-V2 for feature extraction and then we fuse the features extracted from these two models by using concatenation operation. To acquire a more compact presentation of features, we have utilised neighbourhood component analysis (NCA). After that, we classify the multiple objects by using SVM classifier. During the testing phase, to detect various objects in an image, a bounding box regression module is proposed by applying LSTM. We have performed our experiments on two datasets; wild animal camera trap and gun. In particular, our method achieves an accuracy rate of 97.80% and 97.0% on wild animal camera trap and gun datasets respectively. Keywords: deep convolution networks; object detection; neighbourhood component analysis; NCA; support vector machine; SVM; long short-term memory; LSTM. DOI: 10.1504/IJBIDM.2022.10047465
A widespread Survey on Machine Learning Techniques and User Substantiation Methods for Credit Card Fraud Detection  by JOHN BERKMANS THOBIYAS, Karthick S Abstract: In this modern scientific digital world, credit card usage was enormously increased everyday. Simultaneously huge amount of credit card misuse also has been expressively popular. It prompts monetary misfortunes for both charge cardholders and monetary associations. To keep away from that monetary association, creating and convey Visa extortion discovery techniques. In the upcoming everybody will utilise the greatest exchange through online mode just to save their time. So we partition this review into two primary parts. From the start part, we centre around old-style AI models. In this model what the client knows (knowledge-based strategy). We focus more on the turn of events procedure of client verification, and their conduct biometrics to distinguish an individual remarkable conduct while utilising their electronic gadgets. An outline of the current methodology in this writing review means to grow a more precise, dependable, versatile, superfast, effective, and modest model of charge card extortion identification. Keywords: credit card transaction; machine learning; bio-metrics; XGBoost; SVM; random forest. DOI: 10.1504/IJBIDM.2023.10047750
Identifying influential nodes in large scale social networks using Global and local structural information  by Noosheen Shareefi, Mehdi Bateni Abstract: According to the importance of identifying influential nodes in different applications, many methods have been proposed for it. Some of them are not accurate enough or have high temporal complexity. In this paper, a method named new GLS (NGLS) is developed based on the global and local search (GLS) algorithm. GLS, despite its high accuracy compared to other methods is not fast and efficient enough. NGLS is developed to improve the efficiency and scalability of GLS. To reach this goal, the number of common neighbours of each node is counted only up to a radius of two. The execution time of NGLS on average has been reduced by 85% in real-world networks and 97% on simulated networks, while the accuracy of NGLS is the same as GLS accuracy. Therefore, NGLS is applicable for larger real-world networks. Keywords: influential nodes; global and local information; large networks; centrality measure; neighbour contribution; complex network; propagation; propagation models; complexity; social network analysis. DOI: 10.1504/IJBIDM.2023.10047751
AN EFFECTIVE ABSTRACT TEXT SUMMARIZATION USING SHARK SMELL OPTIMIZED BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMER  by Nafees Muneera, P. Sriramya Abstract: Recently, a vast amount of text data has been increased rapidly and then information must be summarised to retrieve useful knowledge. First, the preprocessing module utilises the fixed-length stemming method, and then the segmentation module makes use of a pre-trained bidirectional encoder representations from transformers (BERT). The text of input is segmented with the utilisation of feedforward and multi-head attention layer. This BERT segmentation paradigm is adjoined alongside shark smell optimisation (SSO) methodology, and thus, the phrases that are extricated are employed to prepare the document stage of a dataset of Amazon merchandise assessment. This study aspires for creating a concise summary and invigorating headlines, which grab the focus of the readers. This paper exhibits that it performs by amalgamating the duo extractive and abstractive procedures employing a pipelined technique for creating a succinct summary that is later utilised for headline creation. Experimentation was executed on publically accessible datasets CNN/Daily Mail. Keywords: abstractive; text summarisation; optimisation; transformer; clustering; similarity index. DOI: 10.1504/IJBIDM.2023.10047979
State-of-Art approaches for Event Detection over Twitter Stream: a Survey  by Jagrati Singh, Anil Kumar Singh Abstract: In the present time, social network applications like Twitter, Facebook and YouTube have evolved as a popular way of information sharing for general users. On these platforms, valuable information appears as breaking hot news, trending topics, public opinion, and so on. Twitter is the most popular microblogging service that generates huge volumes of data with high velocity and variety (i.e., images, text and video). Due to the growth of discussed real-world events over Twitter, the event detection problem is becoming an interesting and challenging issue. Event detection is the practice of applying natural language processing and text analysis techniques to identify and extract event information from text. This survey paper explores important research works for event detection using Twitter data. We classify approaches according to feature modelling methods: vector space model, statistical model and graph model. We highlight research challenges, issues, and the limitation of existing approaches to find the research gaps for future directions. Keywords: Twitter stream; clustering; data sharing; supervised technique; unsupervised technique; semantic correlation; keyword co-occurrence; topic modelling. DOI: 10.1504/IJBIDM.2023.10048271
Data Quality Based View Selection in Big Data Integration System  by Samir Anter Abstract: An integration system is an intermediate tool between a user and a set of distributed sources. It provides transparent access to information through an Interface using a unique query language. This provides an illusion to the end user as if it is accessing a homogeneous central repository. In a hybrid system, one part of the data is queried on demand where as another part is extracted, filtered and stored in a local database. This approach is very much promising for data access in big data context. However, obtaining satisfactory results depend on the correct choice of data to materialise. Further this task is even more difficult in big data context. In this article, a novel approach has been proposed to overcome above problem which uses data quality to select views that will be materialised. Keywords: data integration; materialised views; big data; data quality; view selection. DOI: 10.1504/IJBIDM.2023.10048381
Evaluation of Factors Involved in Predicting Indian Stock Price Using Machine Learning Algorithms  by ARCHIT A. VOHRA, Paresh J. Tanna Abstract: This study evaluates the effect of training dataset size, dimensionality and rolling dataset on the prediction accuracy of decision tree regression (DTR), support vector regression (SVR), long short-term memory (LSTM) and neural network multi-layer perceptron (NNMLP). Data of ten stocks from different sectors of National Stock Exchange Fifty (NIFTY 50) was considered. Execution time for each model is calculated to find out the fastest algorithm. Finally, correlation between prediction accuracy and performance measures is established. The results clearly show that increasing the training dataset size does not always increase the prediction accuracy. Characteristics of the dataset is one major factor that is responsible for prediction accuracy. DTR and SVR have very low average execution time compared to LSTM and NNMLP. Very strong negative correlation was found between mean absolute percentage error (MAPE) and prediction accuracy. Keywords: prediction accuracy; training dataset size; rolling dataset; performance measures; regression; neural network; execution time; stock price. DOI: 10.1504/IJBIDM.2023.10048648
Text Document Learning using Distributed Incremental Clustering Algorithm: Educational Certificates  by Archana Chaudhari, Preeti Mulay, Ayushi Agarwal, Krithika Iyer, Saloni Sarbhai Abstract: Technological advancements have now allowed each one of us to learn new skills at home or through various workshops conducted, and one of the ways to award your skill is by providing certificates. The digital and handwritten certificates datasets are usually in images. We can use this information to provide analysis on which subject has recently gained popularity and how to improve the field of study at different universities. Therefore this paper proposes distributed incremental clustering with closeness factor-based algorithm (DIC2FBA) for text clustering. The primarily focused on Faculty development program certificates dataset that cover both text and numeric data. The proposed system used AWS EC2 instance and AWS S3 bucket, which helps to cluster data from multiple sites in iterative and incremental mode. Further, we have compared the findings achieved using the DIC2FBA with K-means modified inter and intra clustering (KM-I2C) algorithm based on silhouette score, and Davis Bouldin index. The proposed system will help educational institutions understand the popular skill set of faculties which can further be used to understand the effectiveness of such programs. Keywords: distributed incremental clustering; text document learning; educational certificates; faculty development program; FDP; AWS. DOI: 10.1504/IJBIDM.2024.10049120
Machine Learning approach for Data Analysis and Predicting Coronavirus Using COVID -19 India Dataset  by Soni Singh, Dr.K.R.Ramkumar Kumar, Ashima Kukkar Abstract: According to the World Health Organisation (WHO), the COVID-19 virus would infect 83,558,756 persons worldwide in 2020, resulting in 646,949 deaths. In this research, we aim to find the link between the time series data and current circumstances to predict the future outbreak and try to figure out which technique is best for modelling for accurate predictions. The performance of different machine learning (ML) models such as sigmoid function, Facebook (FB) prophet model, seasonal auto-regressive integrated moving average with eXogenous factors (SARIMAX) model, support vector machine (SVM) learning model, linear regression (LR) model, and polynomial regression (PR) Model are analysed along with their error rate. A comparison is also done to evaluate a best-suited model for prediction based on different categorisation approaches on the WHO authenticated dataset of India. The result states that the PR model shows the best performance with time-series data of COVID-19 whereas the sigmoid model has the consistently smallest prediction error rates for tracking the dynamics of incidents. In contrast, the PR model provided the most realistic prediction to identify a plateau point in the incidents growth curve. Keywords: COVID-19; pandemics; analysis on India; machine learning; prediction; comparison; support vector machine; SVM. DOI: 10.1504/IJBIDM.2024.10049479
Prediction of Stock Prices of Blue-Chip Companies using Machine Learning Algorithms  by Rajvir Kaur, Anurag Sharma Abstract: Accurate stock market prediction is very challenging task for experts due to its volatile nature. To determine the future value of stock market, several researches are based on historical data. But nowadays, there are some external factors like social media and news headlines greatly affect the stock market. This research work is based on the prediction of future stock prices by using both twitter social media and news data along with historical data to get the high prediction results. The performance of machine learning algorithms logistic regression, SVM, random forest is analysed using matrices like accuracy, precision, recall, and F1 score. To train and test the final dataset, it is divided into 80:20 ratios. For each blue chip company, the testing dataset contains 248 samples, which exhibited the highest prediction accuracies ranging from 85% to 89% for prediction of stock prices is achieved using logistic regression algorithm. Keywords: blue-chip companies; machine learning; news headlines; social media; stock market prediction; Twitter. DOI: 10.1504/IJBIDM.2023.10049725
AN EFFICIENT MISSING VALUE IMPUTATION AND EVALUATION USING GK-KH MEANS AND HTR-RNN  by Syavasya CVSR, A. Lakshmi Muddana Abstract: The accuracy of the data mining (DM) outcomes might be affected by mining and analysing incomplete datasets with missing values (MV). Thus, a complete dataset is created by the imputation of MV, which makes the analysis easier. An effectual missing values imputation (MVI) is proposed and evaluated utilising Gaussian kernel-K harmonic means (GK-KH Means) and hyperbolic tangent radial-recurrent neural networks (HTR-RNN) to combat this issue. At first, preprocessing is performed on the input data as of the CKD dataset wherein the duplicate form of the data gets eradicated. Next, the missing data are handled by ignoring them; and utilising GK-KH Means, the MV is imputed. Next, the data are rationalised into a structured format. Then, SDRM-DHO selects the most optimal features as of the extracted features. Lastly, the HTR-RNN classifier accepts these chosen features as input. Proposed work performed well in more accurate missing value imputation. Keywords: missing value imputation; K harmonic means; Gaussian kernel function; recurrent neural network; swap displacement reversion operation. DOI: 10.1504/IJBIDM.2023.10049909
Detection of spammers disseminating obscene content on Twitter  by Deepali Dhaka, Surbhi Kakar, Monica Mehrotra Abstract: Spammers distributing adult content are becoming an apparent and yet intrusive problem with the increasing prevalence of online social networks among users. For improving user experience and especially preventing exposure to users of lower age groups, these accounts need to be detected efficiently. In this work, a model is proposed in which a lexicon-based approach is used to label users with their values. This study is based on the fact that users behave according to the values they possess. The amalgamation of content-based features like values, the entropy of words, lexical diversity, and context-based word embeddings are found to be robust. Among several machine learning models, XGboost performs exceedingly well with accuracy (92.28 ± 1.28%) for all features. Feature importance and their discriminative power have also been shown. A comparative study is also done with one of the latest approaches and our approach is found to be more efficient. Keywords: values; emotions; Twitter; online social network; spammer; pornographic spammer. DOI: 10.1504/IJBIDM.2022.10040432
Suspicious tweet identification using machine learning approaches for improving social media marketing analysis  by Senthil Arasu Balasubramanian, Jonath Backia Seelan, Thamaraiselvan Natarajan Abstract: Social media acts as one of the eminent platforms for communication. Twitter is one of the leading social media microblogging platforms, where users can post and interact. #Hashtags specify the tweeter trends on a certain topic. Currently, the hashtag value or trend ranking for a particular hashtag has been calculated based on the cumulative number of tweets. This type of cumulative amount of hashtag ranking may result in an anonymous intervention of irrelevant tweets, which affects social media marketing. The proposed approach uses the relevance of tweets and #hashtags to improve and identify the suspicious or irrelevant tweets of media marketing. The proposed research work uses the linear regression algorithm, which is one of the familiar machine learning approaches to explain the spam tweet generation and the method to identify. The test results found the proposed system has 84% of significance when compared to the market analysis algorithms. Keywords: tweets; hashtags; trend prediction; linear regression; social media marketing. DOI: 10.1504/IJBIDM.2022.10040478
Factors influencing the moving up the value chain by Indian IT service organisations  by B. Mahendramohan, G. Kannabiran, P. Sridevi Abstract: Indian information technology (IT) service organisations that were providing low value-added services are moving up the value chain of IT services to overcome threats of competition and automation. The purpose of this study is to evaluate the impact of Indian IT service organisations' capabilities on moving up the value chain. Using a resource-based view perspective, this research examines the influence of the service provider's capabilities on moving up the value chain. The research was conducted by collecting responses from 188 employees of Indian IT service organisations. The data were analysed using structural equation modelling. The study shows that the service provider's capabilities, namely, relationship management capability, project management capability, domain understanding and IT advancement positively impact service quality and innovativeness. The service provider's service quality and innovativeness, and the absorptive capacity of the client enhance the effectiveness in moving up the value chain. Keywords: moving-up value chain; innovativeness; service quality; project management; relationship management; domain understanding; information technology advancement; absorptive capacity. DOI: 10.1504/IJBIDM.2022.10048760
Leveraging the fog-based machine learning model for ECG-based coronary disease prediction  by R. Hanumantharaju, K.N. Shreenath, B.J. Sowmya, K.G. Srinivasa Abstract: Smart healthcare systems need a remote monitoring system based on the internet of things. Smart healthcare services are an innovative way of synergising the benefits of sensors for large-scale analytics to communicate better patient care. Work provides the sick with healthcare administrations as a sound population through remote observation using detailed calculations, tools and methods for better care. The proposed system integrates architecture based on IoT, fog computing and machine learning (ML) algorithms. The dimensionality of the data collected about heart diseases is loaded, filtered and extracted attributes at the fog layer; the classification model is built at the fog nodes. The resultant of the model is sent to the cloud layer to train classifiers. Cloud layer estimates the level of ML algorithms to predict disease. Result shows that random forest has better feature extraction than naive Bayes with flawlessness of 3% in precision, 3% in recall, and 13% in f-measure. Keywords: internet of things; IoT; machine learning; random forest; naive Bayes; fog layer; remote monitoring; feature extraction. DOI: 10.1504/IJBIDM.2022.10041200
An optimal dimension reduction strategy and experimental evaluation for Parkinson's disease classification  by D. Saidulu, R. Sasikala Abstract: The amount of data streamed and generated through various healthcare systems is exponentially increasing day by day. Applying traditional data mining algorithms on this massive sized data to construct automated decision support systems is a tedious and time consuming task. In recent years, there has been increasing interest in the development of telediagnosis and telemonitoring systems for Parkinson's disease (PD). Parkinson's disease is a progressive neurodegenerative disease which affect the movement characteristics. PD patients commonly face vocal impairments during the early stages of the disease. This work proposes a computationally efficient method for dimension reduction and classification of healthcare related data. The devised framework is capable to deal with the data having discrete as well as continuous natured features. The experimental evaluation is performed on Parkinson's disease classification database (Sakar et al., 2018). The statistical performance metrices used are - validation and test accuracy, precision, recall, F1-score, etc. There will be computational complexity advantages when this reduced dimension data is further processed for modelling and building prediction system. In order to prove the optimality of proposed framework, comparative analysis is performed with the significant existing approaches. Keywords: big data; learning; dimension reduction; machine learning; knowledge discovery; information retrieval. DOI: 10.1504/IJBIDM.2022.10040204
A review of scalable time series pattern recognition  by Kwan-Hua Sim, Kwan-Yong Sim, Valliappan Raman Abstract: Time series data mining helps derive new, meaningful and hidden knowledge from time series data. Thus, time series pattern recognition has been the core functionality in time series data mining applications. However, mining of unknown scalable time series patterns with variable lengths is by no means trivial. It could result in quadratic computational complexities to the search space, which is computationally untenable even with the state-of-the-art time series pattern mining algorithms. The mining of scalable unknown time series patterns also requires the superiority of the similarity measure, which is clearly beyond the comprehension of standard distance measure in time series. It has been a deadlock in the pursuit of a robust similarity measure, while trying to contain the complexity of the time series pattern search algorithm. This paper aims to provide a review of the existing literature in time series pattern recognition by highlighting the challenges and gaps in scalable time series pattern mining. Keywords: time series pattern recognition; scalable time series pattern matching; motif discovery; time series data mining; distance measure; dimension reduction; sliding window search. DOI: 10.1504/IJBIDM.2022.10041672
|