Forthcoming and Online First Articles

International Journal of Business Intelligence and Data Mining

International Journal of Business Intelligence and Data Mining (IJBIDM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Business Intelligence and Data Mining (57 papers in press)

Regular Issues

  • Analysis and Prediction of Heart Disease Aid of Various Data Mining Techniques: A Survey   Order a copy of this article
    by V. Poornima, D. Gladis 
    Abstract: In recent times, health diseases are expanding gradually because of inherited. Particularly, heart disease has turned out to be the more typical nowadays, i.e., life of individuals is at hazard. The data mining strategies specifically decision tree, Naive Byes, neural network, K-means clustering, association classification, support vector machine (SVM), fuzzy, rough set theory and orthogonal local preserving methodologies are examined on heart disease database. In this paper, we survey distinctive papers in which at least one algorithms of data mining are utilised for the forecast of heart disease. This survey comprehends the current procedures required in vulnerability prediction of heart disease for classification in data mining. Survey of pertinent data mining strategies which are included in risk prediction of heart disease gives best expectation display as hybrid approach contrasting with the single model approach.
    Keywords: Data mining; Heart Disease Prediction; performance measure; Fuzzy; and clustering.
    DOI: 10.1504/IJBIDM.2018.10014620
  • An evaluation method for searching the functional relationships between property prices and influencing factors in the detected data   Order a copy of this article
    by Pierluigi Morano, Francesco Tajani, Vincenzo Del Giudice, Pierfrancesco De Paola, Felicia Di Liddo 
    Abstract: The economic crisis of the last decade, started from the real estate sector, has spread the awareness of the importance of the use of advanced evaluation models, as a support in the assessments and in the periodic value updates of public and private property assets. With reference to a sample of recently sold properties located in the city of Rome (Italy), an innovative automated valuation model is explained and applied. The outputs are represented by different mathematical expressions, able to interpret and to simulate the investigated phenomena (i.e. the market prices formation). The application carried out outlines, in the selection phase of the best model, the fundamental condition that the valuer must adequately know the reference market. In this way, it is possible to identify the existing patterns in the detected data in terms of mathematical expressions, according to the empirical knowledge of the economic phenomena.
    Keywords: price property formation; office market; retail market; automated valuation methods; AVMs; genetic algorithm; reliable valuations.
    DOI: 10.1504/IJBIDM.2022.10035383
  • A Unified Workflow Strategy for Analysing Large Scale TripAdvisor Reviews with BOW Model   Order a copy of this article
    by Jale Bektas, Arwa Abdalmajed 
    Abstract: Nowadays, firms need to transform customer online reviews data properly into information to achieve goals such as having a competitive edge and improving the quality of service. This paper presents a unified workflow to solve the problems of analysing large-scale data with 710,450 reviews for 1,134 hotels by using text mining methods among the different touristic regions of Turkey. Firstly, a star schema dimensional data mart is built that includes one fact table and two dimensional tables. Then, a series of text mining processes which includes data cleaning, tokenisation, and analysis are applied. Text mining is implemented through standard BOW and the extended BON model. The results show significant findings through this workflow. We propose to build a dimensional model dataset before performing any text mining process, since building such a dataset will optimise the data retrieval process and help to represent the data along with different measures of interest.
    Keywords: online TripAdvisor reviews; text mining; big data; N-gram tokenisation; dimensional data mart; data mining; BOW; BON.
    DOI: 10.1504/IJBIDM.2022.10037062
  • Apriori-Roaring: Frequent Pattern Mining Method Based on Compressed Bitmaps   Order a copy of this article
    by Alexandre Colombo, Roberta Spolon, Aleardo Junior Manacero, Renata Spolon Lobato, Marcos Antônio Cavenaghi 
    Abstract: Association rule mining is one of the most common tasks in data analysis. It has a descriptive feature used to discover patterns in sets of data. Most existing approaches to data analysis have a constraint related to execution time. However, as the size of datasets used in the analysis grows, memory usage tends to be the constraint instead, and this prevents these approaches from being used. This article presents a new method for data analysis called apriori-roaring. The apriori-roaring method is designed to identify frequent items with a focus on scalability. The implementation of this method employs compressed bitmap structures, which use less memory to store the original dataset and to calculate the support metric. The results show that apriori-roaring allows the identification of frequent elements with much lower memory usage and shorter execution time. In terms of scalability, our proposed approach outperforms the various traditional approaches available.
    Keywords: frequent pattern mining; bitmap compression; data mining; association rules; knowledge discovery.
    DOI: 10.1504/IJBIDM.2022.10037305
  • Financial accounts reconciliation systems using enhanced mapping algorithm   Order a copy of this article
    by Olufunke Oluyemi Sarumi, Bolanle A. Ojokoh, Oluwafemi A. Sarumi, Olumide S. Adewale 
    Abstract: Account reconciliation has become a daunting task for many financial organisations due to the heterogeneity of data involved in the accounts reconciliation process-coupled with the recent data deluge in many accounting firms. Many organisations are using a heuristic-based algorithm for their account reconciliation process while in some firms the process is completely manual. These methods are already inundated and were no longer efficient in the light of the recent data explosion and are such, prone to lots of errors that could expose the organisations to several financial risks. In this regard, there is a need to develop a robust financial data analytic algorithm that can effectively handle the account reconciliation needs of financial organisations. In this paper, we propose a computational data analytic model that provides an efficient solution to the account reconciliation bottlenecks in financial organisations. Evaluation results show the effectiveness of our data analytic model for enhancing faster decision making in financial account reconciliation systems.
    Keywords: accounts reconciliation; financial analytics; functions; fraud; big data.
    DOI: 10.1504/IJBIDM.2022.10037414
  • Privacy Preserving Data Mining - Past and Present   Order a copy of this article
    by G. SATHISH KUMAR, K. Premalatha 
    Abstract: Data mining is the process of discovering patterns and correlations within the huge volume of data to forecast the outcomes. There are serious challenges occurring in data mining techniques due to privacy violation and sensitive information disclosure while providing the dataset to third parties. It is necessary to protect user’s private and sensitive data from exposure without the authorisation of data holders or providers when extracting useful information and revealing patterns from the dataset. Also, internet phishing gives more threat over the web on extensive spread of private information. Privacy preserving data mining (PPDM) is an essential for exchanging confidential information in terms of data analysis, validation, and publishing. To achieve data privacy, a number of algorithms have been designed in the data mining sector. This article delivers a broad survey on privacy preserving data mining algorithms, different datasets used in the research and analyses the techniques based on certain parameters. The survey is highlighted by identifying the outcome of each research along with its advantages and disadvantages. This survey will guide the feature researches in PPDM to choose the appropriate techniques for their research.
    Keywords: data mining; privacy preserving data mining; PPDM; privacy preserving techniques; sensitive attributes; privacy threats.
    DOI: 10.1504/IJBIDM.2022.10037595
  • STEM: STacked Ensemble Model design for aggregation technique in Group Recommendation System   Order a copy of this article
    by Nagarajan Kumar, P. Arun Raj Kumar 
    Abstract: A group recommendation system is required to provide a list of recommended items to a group of users. The challenge lies in aggregating the preferences of all members in a group to provide well-suited suggestions. In this paper, we propose an aggregation technique using stacked ensemble model (STEM). STEM involves two stages. In stage 1, the k-nearest neighbour (k-NN), singular value decomposition (SVD), and a combination of user-based and item-based collaborative filtering is used as base learners. In the second stage, the decision trees predictive model is used to aggregate the outputs obtained from the base learners by prioritising the most preferred items. From the experiments, it is evident that STEM provides a better group recommendation strategy than the existing techniques.
    Keywords: group recommendation system; aggregating user preferences; decision trees; stacked ensemble; machine learning.
    DOI: 10.1504/IJBIDM.2022.10037757
  • Convolutional Neural Network for Classification of SiO2 Scanning Electron Microscope Images   Order a copy of this article
    by Kavitha Jayaram, G. Prakash, V. Jayaram 
    Abstract: The recent development in deep learning has made image and speech classification and recognition tasks possible with better accuracy. An attempt was made to automatically extract required sections from literature published in journals to analyse and classify them according to their application. This paper presents high-temperature materials classification into four categories according to their wide applications such as electronic, high temperature, semiconductors, and ceramics. The challenging act is to extract SEM images' unique features as they are microscopic with different resolutions. A total of 10,000 Scanning Electron Microscope (SEM) images are classified into two labeled categories namely crystalline and amorphous structure. The image classification and recognition process of SiO2 was implemented using Convolutional Neural Network (CNN) deep learning framework. Our algorithm successfully classified with a precision of 96% and accuracy of 95.5% of the test dataset of SEM images.
    Keywords: deep learning; machine learning; image classification; convolution neural network; CNN; material.
    DOI: 10.1504/IJBIDM.2022.10038244
  • Rule-based Database Intrusion Detections Using Coactive Artificial Neuro-Fuzzy Inference System and Genetic Algorithm   Order a copy of this article
    by Anitarani Brahma, SUVASINI PANIGRAHI, Neelamani Samal, Debasis Gountia 
    Abstract: Recently, a fuzzy system having learning and adaptation capabilities is gaining lots of interest in research communities. In the current approach, two of the most successful soft computing approaches neural network and genetic algorithm with learning capabilities are hybridised to approximate reasoning method of fuzzy systems. The objective of this paper is to develop a coactive neuro-fuzzy inference system with genetic algorithm-based database intrusion detection system that can detect malicious transactions in database very efficiently. Experimental investigation and comparative assessment has been conducted with an existing statistical database intrusion technique to justify the efficacy of the proposed system.
    Keywords: fuzzy inference system; database intrusion detection; neural network; genetic algorithm; artificial neuro-fuzzy inference system; coactive artificial neuro-fuzzy inference system.
    DOI: 10.1504/IJBIDM.2022.10038259
  • A Regression Model to Evaluate Interactive Question Answering using GEP   Order a copy of this article
    by Mohammad Mehdi Hosseini 
    Abstract: Evaluation plays a pivotal role in the interactive question answering (IQA) systems. However, much uncertainty still exists on evaluating IQA systems and there is practically no specific methodology to evaluate these systems. One of the main challenges in designing an assessment method for IQA systems lies in the fact that it is rarely possible to predict the interaction part. To this end, human needs to be involved in the evaluation process. In this paper, an appropriate model is presented by introducing a set of characteristics features for evaluating IQA systems. Data were collected from four IQA systems at various timespans. For the purpose of analysis, pre-processing is performed on each conversation, the statistical characteristics of the conversations are extracted to form the characteristic matrix. The characteristics matrix is classified into three separate clusters using K-means. Then, an equation is allotted to each of the clusters with an application of gene expression programming (GEP). The results reveal that the proposed model has the least error with an average of 0.09 root mean square error between real data and GEP model.
    Keywords: evaluation; interactive question; answering systems; nonlinear regression; gene expression programming; GEP; feature extraction.
    DOI: 10.1504/IJBIDM.2022.10038261
  • Examining the impact of business intelligence related practices on organizational performance in Oman   Order a copy of this article
    Abstract: Business intelligence can greatly enhance organisational capabilities in devising profitable business actions and activities. It provides understanding of both current and future trends relating to customers, markets, competitors, or regulatory, and most importantly, the understanding of organisations’ own capabilities to compete. Business intelligence is arguably one of the key drivers to organisational competiveness. This paper looks at examining the extent to which organisations in Oman embrace business intelligence and the contributions of the different business intelligence components on organisational performance. Quantitative empirical approach is used with Microsoft Excel data analysis tool pack as the investigative tool to analyse and develop a regression model to better understand the impact of business intelligence related components on organisational performance. The finding shows a strong correlation between business intelligence and organisational performance. It also shows that by having the right IT functionalities with capable employees using them are the key to performance enhancement. Furthermore, having IT infrastructure without the appropriate functionalities and personnel or not embracing business intelligence will not result in any performance gain.
    Keywords: business intelligence; business intelligence components; organisational performance; Oman.
    DOI: 10.1504/IJBIDM.2022.10038337
  • Next location prediction using Transformers   Order a copy of this article
    by Salah Eddine Henouda, Laallam Fatima Zohra, Okba KAZAR, Abdessamed Sassi 
    Abstract: This work seeks to solve next location prediction problem of mobile users. Chiefly, we focus on ROBERTA architecture (robustly optimised BERT approach) in order to build a next location prediction model through the use of a subset of a large real mobility trace database. The latter was made available to the public through the CRAWDAD project. ROBERTA, which is a well-known model in natural language processing (NLP), works intentionally on predicting hidden sections of text based on language masking strategy. The current paper follows a similar architecture as ROBERTA and proposes a new combination of Bertwordpiece tokeniser and ROBERTA for location prediction that we call WP-BERTA. The results demonstrated that our proposed model WP-BERTA outperformed the state-of-the-art models. They also indicated that the proposed model provided a significant improvement in the next location prediction accuracy compared to the state-of-the-art models. We particularly revealed that WP-BERTA outperformed Markovian models, support vector machine (SVM), convolutional neural networks (CNNs), and long short-term memory networks (LSTMs).
    Keywords: machine learning; deep learning; transformer; neural networks; Wi-Fi; mobility traces; next location prediction; big data.
    DOI: 10.1504/IJBIDM.2022.10038854
  • Supervised and Unsupervised learning for characterizing the industrial material defects   Order a copy of this article
    by P. Radha, N. Selvakumar, J. Raja Sekar, J.V. Johnsonselva 
    Abstract: The ultrasonic based NDT is used in industries to examine the internal defects without damaging the components since the materials used in the industrial standard components must be 100% perfection. The ultrasonic signals are difficult to interpret and the domain expert has to concentrate at every sampling point to identify the defect. Hence, the existing ultrasonic based NDT method is improved by applying IoT, machine learning, deep learning techniques to process the ultrasonic data. This wok integrates NDT and IoT to analyse the properties of materials using deep learning based supervised model and filter outliers using unsupervised model like density-based clustering method. After analysing the different categories of defects, the notifications are sent to various stakeholders to either repair or replace the defective components through their mobile using advanced communication techniques to avoid expensive experimentation or maintenance.
    Keywords: ultrasonic testing; internet of things; IoT; machine learning; density based clustering; deep learning; deep neural network; DNN.
    DOI: 10.1504/IJBIDM.2022.10039148
  • An Optimal Dimension Reduction Strategy and Experimental Evaluation for Parkinson’s Disease Classification   Order a copy of this article
    by Saidulu D, Sasikala Ramasamy 
    Abstract: The amount of data streamed and generated through various healthcare systems is exponentially increasing day by day. Applying traditional data mining algorithms on this massive sized data to construct automated decision support systems is a tedious and time consuming task. In recent years, there has been increasing interest in the development of telediagnosis and telemonitoring systems for Parkinsons disease (PD). Parkinsons disease is a progressive neurodegenerative disease which affect the movement characteristics. PD patients commonly face vocal impairments during the early stages of the disease. This work proposes a computationally efficient method for dimension reduction and classification of healthcare related data. The devised framework is capable to deal with the data having discrete as well as continuous natured features. The experimental evaluation is performed on Parkinsons disease classification database (Sakar et al., 2018). The statistical performance metrices used are validation and test accuracy, precision, recall, F1-score, etc. There will be computational complexity advantages when this reduced dimension data is further processed for modelling and building prediction system. In order to prove the optimality of proposed framework, comparative analysis is performed with the significant existing approaches.
    Keywords: big data; learning; dimension reduction; machine learning; knowledge discovery; information retrieval.
    DOI: 10.1504/IJBIDM.2022.10040204
  • Detection of Spammers disseminating obscene content on Twitter   Order a copy of this article
    by Deepali Dhaka, Surbhi Kakar, Monica Mehrotra 
    Abstract: Spammers distributing adult content are becoming an apparent and yet intrusive problem with the increasing prevalence of online social networks among users. For improving user experience and especially preventing exposure to users of lower age groups, these accounts need to be detected efficiently. In this work, a model is proposed, in which a lexicon-based approach is used to label users with their values. This study is based on the fact that users behave according to the values they possess. The amalgamation of content-based features like values, the entropy of words, lexical diversity, and context-based word embeddings are found to be robust. Among several machine learning models, XGboost performs exceedingly well with accuracy (92.28 ± 1.28%) for all features. Feature importance and their discriminative power have also been shown. A comparative study is also done with one of the latest approaches and our approach is found to be more efficient.
    Keywords: values; emotions; Twitter; online social network; spammer; pornographic spammer.
    DOI: 10.1504/IJBIDM.2022.10040432
  • Suspicious Tweet Identification Using Machine Learning Approaches for Improving Social Media Marketing Analysis   Order a copy of this article
    by Senthil Arasu Balasubramanian, Jonath BackiaSeelan, Thamaraiselvan Natarajan 
    Abstract: Social media acts as one of the eminent platforms for communication. Twitter is one of the leading social media microblogging platforms, where users can post and interact. #Hashtags specify the tweeter trends on a certain topic. Currently, the hashtag value or trend ranking for a particular hashtag has been calculated based on the cumulative number of tweets. This type of cumulative amount of hashtag ranking may result in an anonymous intervention of irrelevant tweets, which affects social media marketing. The proposed approach uses the relevance of tweets and #hashtags to improve and identify the suspicious or irrelevant tweets of media marketing. The proposed research work uses the linear regression algorithm, which is one of the familiar machine learning approaches to explain the spam tweet generation and the method to identify. The test results found the proposed system has 84% of significance when compared to the market analysis algorithms.
    Keywords: tweets; hashtags; trend prediction; linear regression; social media marketing.
    DOI: 10.1504/IJBIDM.2022.10040478
  • An Evolutionary-based Approach for Providing Accurate and Novel Recommendations   Order a copy of this article
    by Chemseddine Berbague, Hassina Seridi, Nour El-Islam Karabadji, Panagiotis Symeonidis, Markus Zanker 
    Abstract: For memory-based collaborative filtering, the quality of the target user’s neighbourhood plays an important role for providing him/her succesful item recommendations. The existent techniques for neighbourhood selection aim to maximise the pairwise similarity between the target user and his/her neighbours, which mainly improves only the recommendation accuracy. However, these methods do not consider other important aspects for succesful recommendations such as providing diversified and novel item recommendations, which also affect highly users’ satisfaction. In this paper, we linearly combine two probabilistic criteria for selecting the right neighbourhood of a target user and provide him/her accurate, and novel item recommendations. The combination of these two probabilistic quality measures forms a fitness function, which guides the evolution of a genetic algorithm. For each target user, the genetic algorithm explores the user’s whole search space and selects the most suitable neighbourhood for him. Thus, our approach makes a balance between the accuracy and the novelty of the provided item recommendations, as will be experimentally shown on MovieLens dataset.
    Keywords: genetic algorithm; neighbourhood selection; novelty; diversity; relevancy; cold start problem.
    DOI: 10.1504/IJBIDM.2022.10040584
  • Leveraging the Fog based Machine Learning Model for ECG based Coronary disease prediction   Order a copy of this article
    by Hanumantharaju R, Shreenath KN, Sowmya BJ, K.G. Srinivasa 
    Abstract: Smart healthcare systems needs a remote monitoring system based on the Internet of Things. Smart healthcare services are an innovative way of synergising the benefits of sensors for large-scale analytics to communicate better patient care. Work provides the sick with healthcare administrations as a sound population through remote observation using detailed calculations, tools and methods for better care. The proposed system integrates architecture based on IoT, fog computing and machine learning (ML) algorithms. The dimensionality of the data collected about heart diseases is loaded, filtered and extracted attributes at the fog layer, the classification model is built at the fog nodes. The resultant of the model is sent to the cloud layer to train classifiers. Cloud layer estimates the level of ML algorithms to predict disease. Result shows that random forest has better feature extraction than naive Bayes with flawlessness of 3% in precision, 3% in recall, 13% in f-measure.
    Keywords: internet of things; IoT; machine learning; random forest; naive Bayes; fog layer; remote monitoring; feature extraction.
    DOI: 10.1504/IJBIDM.2022.10041200
  • A predictive model of electricity quality indicator in distribution subsidiaries   Order a copy of this article
    by Ana Flávia L. Gonçalves, Rafael Frinhani, Bruno G. Batista, Rafael P. Pagan, Edvard M. De Oliveira, Bruno T. Kuehne, João Paulo R. R. Leite, João Víctor De M. S. Gomes 
    Abstract: Electricity concessionaires give off high financial amounts annually in repairs to consumers that experience service unavailability. Availability of the energy supply is a major challenge because the distribution infrastructure is constantly affected by climatic, environmental, and social causes. To assist decision making in mitigating grid failures, this study aims to predict the number of incidences of electricity shortage for consumers. A predictive model was developed using predictive data analysis and conforms to a knowledge discovery process. A hybrid classifier was developed from the model, using both unsupervised and supervised methods. The experiments were carried out with real incidence and climatic data from four subsidiaries of an energy concessionaire. The results show the forecasting model’s feasibility, which presented classification accuracy between 58.33% to 91.66%. The results show that peculiarities in terms of geographic location, energy demand, and climatic conditions make it difficult to use a generic prediction model.
    Keywords: electric quality indicator; predictive data analysis; machine learning; unsupervised methods; supervised methods; knowledge discovery in data.
    DOI: 10.1504/IJBIDM.2022.10041550
  • Real-Time Predictive Big Data Analytics System: Forecasting Stock Trend Using Technical Indicators   Order a copy of this article
    by Myat Cho Mon Oo  
    Abstract: The emergence of financial big data stocks has caused dramatic changes, and predictive analytics systems require a scalable architecture to intelligently process these data. In this paper, a real-time predictive big data analytics (RPBA) system is proposed using the Technical Indicators to predict stock market trend. Scalable random forest (SRF) is enhanced as a financial instrument by contributing the hyperparameters optimisation. This paper explores the novel alternative by the combination of features engineering and enhanced SRF to maximise the desired measure of stock prediction models based on the data from four stocks periods: inactive, sub-active, active, and strong-active periods. The empirical findings indicate that the proposed RPBA system can provide high predictability 85% for short-term and 99% for long-term predictions over real-time financial eight stock markets.
    Keywords: big data; predictive analytics system; technical indicators; stock trend.
    DOI: 10.1504/IJBIDM.2022.10041467
    by Alok Khode, Sagar Jambhorkar 
    Abstract: Patents are critical intellectual assets for any business. With the rapid increase in the patent filings, patent prior art retrieval has become an important task. The goal of the prior art retrieval is to find documents relevant to a patent application. Due to special nature of the patent documents, only relying on the keyword-based queries do not prove effective in patent retrieval. Previous work have used international patent classification (IPC) to improve the effectiveness of keyword-based search. However, these systems have used two-stage retrieval process using IPC mostly to filter patent documents or to re-rank the documents retrieved by keyword-based query. In the approach proposed in this paper, weighted IPC code hierarchies have been explored to augment keyword-based search, thereby eliminating the use of an additional processing step. Experiments on the CLEF-IP 2011 benchmark dataset show that the proposed approach outperforms the baseline on the MAP, Recall and PRES.
    Keywords: patent retrieval; prior art search; international patent classification; IPC; query formulation; query expansion; information retrieval; IPC hierarchy; weighted IPC.
    DOI: 10.1504/IJBIDM.2022.10041582
  • A Review of Scalable Time Series Pattern Recognition   Order a copy of this article
    by Kwan Hua Sim, Kwan Yong Sim, Valliappan Raman 
    Abstract: Time series data mining helps derive new, meaningful and hidden knowledge from time series data. Thus, time series pattern recognition has been the core functionality in time series data mining applications. However, mining of unknown scalable time series patterns with variable lengths is by no means trivial. It could result in quadratic computational complexities to the search space, which is computationally untenable even with the state-of-the-art time series pattern mining algorithms. The mining of scalable unknown time series patterns also requires the superiority of the similarity measure, which is clearly beyond the comprehension of standard distance measure in time series. It has been a deadlock in the pursuit of a robust similarity measure, while trying to contain the complexity of the time series pattern search algorithm. This paper aims to provide a review of the existing literature in time series pattern recognition by highlighting the challenges and gaps in scalable time series pattern mining.
    Keywords: time series pattern recognition; scalable time series pattern matching; motif discovery; time series data mining; distance measure; dimension reduction; sliding window search.
    DOI: 10.1504/IJBIDM.2022.10041672
  • Optimizing the number of course sections given optimal course sequence to support student retention   Order a copy of this article
    by Akash Gupta, Amir Gharehgozli, Seung-Kuk Paik 
    Abstract: Although higher education institutions strive to create the environments that foster student retention, many students depart before graduation. Therefore, it is paramount to understand important factors that derive students retention. We observed that student retention is tied to the student grade point average (GPA) and, subsequently, the GPA is co-related to the order in which student enroll in courses. In this study, initially using statistical methods, we determine the best order of taking core courses. Then, we develop a prescriptive model using a mixed-integer linear programming. This model determines the optimal number of sections to be offered for each course so that maximum students can follow the optimal course order in a resource constrained environment. We also propose heuristic subroutines to solve the proposed model and determine the optimal number of sections for each course. In addition, we highlight the social and demographics factors that influence student retention. This study helps college administration to plan courses so that student retention can be improved.
    Keywords: education; student success; data analytics; retention; course sequence.
    DOI: 10.1504/IJBIDM.2022.10042240
  • Exploring Appropriate ERP Framework towards Indian Small and Medium Enterprises using Decision Tree   Order a copy of this article
    by Aveek Basu, Sraboni Dutta, Sanchita Ghosh 
    Abstract: The small and medium enterprises (SMEs) enhance the outcome of the various business processes through implementing ERP framework. However, they are in muddle while selecting the appropriate ERP as on premise solution entails a large upfront capital expense, which ultimately raises a question in the sustenance of these small firms especially in this pandemic situation. Cloud-based ERP system can reduce the risk to a certain level due to their low infrastructure cost and flexible payment options but has its own constraints. Thus, selection of the appropriate ERP is always a challenge, which motivates the current researchers to explore a decision tree-based technique to predict the most suitable framework that needs to be adopted by a SME in a specific situation. The inferences drawn from the decision tree clearly shows the efficacy of the implemented technique as the right decision can be derived easily by traversing the tree.
    Keywords: enterprise resource planning; ERP; Cloud ERP; on premise ERP; hybrid ERP; small and medium enterprise; SME; decision tree.
    DOI: 10.1504/IJBIDM.2022.10042760
  • A cluster and label approach for classifying imbalanced data streams in the presence of scarcely labeled data   Order a copy of this article
    by Kiran Bhowmick, Meera Narvekar 
    Abstract: Classifying imbalanced data streams is often a challenging task primarily due to the continuous flow of infinite data and due to the unavailability of class labels. The problem is two-fold when the stream is imbalanced in nature. Due to the characteristics of data streams, it is impossible to store and process the data and deal with imbalance. There is a need to provide a solution that can consider the unavailability of class labels and classify the imbalanced data streams. This paper proposes a semi-supervised learning (SSL)-based model to classify scarcely labelled imbalanced data streams. A modified cluster and label SSL approach that uses expectation maximisation for clustering and similarity-based label propagation for labelling the unlabelled clusters is proposed. The model also employs a novel imbalance sensitive cluster merge technique to deal with the imbalance data. The results prove that the model outperforms standard stream classification algorithms.
    Keywords: data streams; classification; imbalanced data; semi-supervised learning; scarcely labelled; cluster and label; micro cluster; label propagation.
    DOI: 10.1504/IJBIDM.2022.10042780
  • Application of a Record Linkage Software to Identify Mortality of Enrollees of Large Integrated Health Care Organization   Order a copy of this article
    by Yichen Zhou, Zhi Liang, Sungching Glenn, Wansu Chen, Fagen Xie 
    Abstract: Information on mortality is important for the improvement of public health and the conduct of medical research. Health care organisations typically lack complete and accurate information on mortality. This paper proposes a comprehensive process to link the records of the enrollees of a health care organisation with the death records of 2015 obtained from the California State via a commercial data linkage software. The developed linkage process has successfully identified 23,628 and 21,009 death records of health plan enrollees from the State file after the initial and second post-linkage, respectively. Validation of the linkage process against the deaths records documented in the internal systems of the organisation achieved a sensitivity of 97.5% and a positive predictive value of 88.7% at the time of initial linkage but increased to 99.4% in three years using more information available later. The linkage process demonstrated high accuracy and can be utilised to support various business needs.
    Keywords: data cleaning; data standardisation; data matching; mortality linkage.
    DOI: 10.1504/IJBIDM.2022.10042864
  • Exploring Outliers in Global Economic Dataset having the Impact of Covid-19 Pandemic   Order a copy of this article
    by Anindita Desarkar, Ajanta Das, Chitrita Chaudhuri 
    Abstract: Outlier is a value that lies outside most of the other values in a dataset. Outlier exploration has a huge importance in almost all the industry applications like medical diagnosis, credit card fraudulence and intrusion detection systems. Similarly, in economic domain, it can be applied to analyse many unexpected events to harvest new knowledge like sudden crash of stock market, mismatch between country’s per capita incomes and overall development, abrupt change in unemployment rate and steep falling of bank interest. These situations can arise due to several reasons, out of which the present covid-19 pandemic is a leading one. This motivates the present researchers to identify a few such vulnerable areas in the economic sphere and ferret out the most affected countries for each of them. Two well-known machine-learning techniques DBSCAN and Z-score are utilised to get these insights, which can serve as a guideline towards improving the overall scenario subsequently.
    Keywords: economic outlier; machine learning; gross domestic product; GDP; per capita; human development index; HDI; covid-19 pandemic; total death percentage.
    DOI: 10.1504/IJBIDM.2022.10043040
    by Kellen Endler, Cassius Tadeu Scarpin, Maria Teresinha Arns Steiner, Tamires Almeida Sfeir, Claudimar Pereira Da Veiga 
    Abstract: The purpose of this article is to present a methodology based on the extraction process of knowledge discovery in databases (KDD) to predict the expenditure of different customer profiles, considering their characteristics, and the type of store they would buy from, in one of the largest retail chains in the Brazilian supermarket and hypermarket segment. These stores have different characteristics, such as physical size, product assortment and customer profile. This heterogeneity in terms of commercial offers implies a desire for consumption by customers that differs from store to store, depending on how their preferences are met. The proposed methodology was applied to a real marketing case based in a business-to-consumer (B2C) environment to aid retailers during the segmentation process. The results show that it is possible to highlight relationships between the data that enabled the prediction of customers’ consumption, which can contribute towards generating useful information to retail businesses.
    Keywords: knowledge discovery in databases; KDD; data mining; market segmentation; retail; principal component analysis; PCA; cluster analysis; multiple linear regression.
    DOI: 10.1504/IJBIDM.2022.10043148
  • Performance Evaluation of Oversampling Algorithm: MAHAKIL using Ensemble Classifiers   Order a copy of this article
    by C. Arun, C. Lakshmi 
    Abstract: Class imbalance is a known problem that exist in real-world applications, which consists of disparity in the existence of samples count of different classes, which results in biased performance. Class imbalance issue has been catered by many sampling techniques which may either fall into an oversampling approach that solves issues to a greater extent or under sampling. MAHAKIL is a diversity-based oversampling approach influenced by the theory of inheritance, in which minority samples are synthesised in view of balancing the class using Mahalanobis distance measure. In this study the performance of MAHAKIL algorithm has been tested using various ensemble classifiers which are proved to be effective due to its multi hypothesis learning approach and better performance. The results of the experiment conducted on 20 imbalanced software defect prediction datasets using six different ensemble approaches showcase XGBoost provides better performance and reduced false alarm rate compared to other models.
    Keywords: class imbalance; software fault prediction; synthetic samples; over sampling techniques; MAHAKIL; false alarm rate; evolutionary algorithm; ensemble; inheritance.
    DOI: 10.1504/IJBIDM.2022.10043149
  • Machine learning based forecasting of significant daily returns in foreign exchange markets   Order a copy of this article
    by Firuz Kamalov, Ikhlaas Gurrib 
    Abstract: Financial forecasting has always attracted an enormous amount of interest among researchers in quantitative analysis. The advent of modern machine learning models has introduced new tools to tackle this classical problem. In this paper, we apply machine learning algorithms to a hitherto unexplored question of forecasting instances of significant fluctuations in currency exchange rates. We carry out an extensive comparative study of ten modern machine learning methods. In our experiments, we use data on four major currency pairs over a 20-year period. A key contribution is the novel use of outlier detection methods for this purpose. Numerical experiments show that outlier detection methods substantially outperform traditional machine learning and finance techniques. In addition, we show that a recently proposed new outlier detection method PKDE produces the best overall results. Our findings hold across different currency pairs, significance levels, and time horizons indicating the robustness of the proposed method.
    Keywords: foreign exchange; forecasting; machine learning; outlier detection; kernel density estimation; KDE; neural networks; tail events.
    DOI: 10.1504/IJBIDM.2022.10043208
  • Using unstructured logs generated in complex large scale micro-service-based architecture for data analysis   Order a copy of this article
    by Anukampa Behera, Sitesh Behera, Chhabi Rani Panigrahi, Tien-Hsiung Weng 
    Abstract: With deployments of complicated or complex large scale micro-service architectures the kind of data generated from all those systems makes a typical production infrastructure huge, complicated and difficult to manage. In this scenario, logs play a major role and can be considered as an important source of information in a large scale secured environment. Till date many researchers have contributed various methods towards conversion of unstructured logs to structured ones. However post conversion the dimension of the dataset generated increases many folds which are too complex for data analysis. In this paper, we have discussed techniques and methods to deal with extraction of all features from a produced structured log, reducing N-dimensional features to fixed dimensions without compromising the quality of data in a cost-efficient manner that can be used for any further machine learning-based analysis.
    Keywords: json data; micro services; data parsing; principal component analysis; PCA; multivariate data; unstructured data; tagged data; feature reduction.
    DOI: 10.1504/IJBIDM.2022.10043252
  • Approaches to Parallelize Eclat algorithm and Analyzing its Performance for K Length Prefix based Equivalence Classes   Order a copy of this article
    by C.G. Anupama, C. Lakshmi 
    Abstract: Frequent item set mining (FIM), being one of the prevalent, well-known method of data mining and topic of interest for the researchers in the field of decision making. With the establishment of the period of big data where the data is continuously generated from multidimensional sources with enormous volume, variety in an almost unrevealed way, transforming this data into a valuable knowledge discovery which can add value to the organisations to make an efficient decision making places a challenge in the present research. This leads to the problem of discovery of the maximum frequent patterns in vast datasets and to create a more generalised and interpretable representation of veracity. Targeting the problems stated above, this paper suggests a parallelisation method suitable for any type of parallel environment. The implemented algorithm can be run on a single computer with multi-core processor as well as on a cluster of such machines.
    Keywords: item set mining; frequent items; frequent patterns; Eclat; parallel Eclat; frequent item set mining; FIM.
    DOI: 10.1504/IJBIDM.2022.10043400
  • Mining Models for Predicting Product Quality Properties of Petroleum Products   Order a copy of this article
    by NG`AMBILANI ZULU, Douglas Kunda 
    Abstract: There is a huge generation of raw data during production processes of refinery products and in most cases this data remains under-utilised for knowledge acquisition and decision making. The purpose of this study was to demonstrate how data mining techniques can be used to develop models to predict product quality properties for petroleum products. This study used petroleum refinery production raw data to build predicting models for product quality control activities. The plant and laboratory data for the period of about 18 months was mined from the refinery repositories in order to build the datasets required for analysis using Orange3 data mining software. Four data mining algorithms were chosen for experiments in order to determine the best predicting model using cross-validation technique as a validation method. This study only employed two measuring metrics, classification accuracy (CA) and root mean square error (RMSE) as performance indicators. Random forest came out as the best performing model suitable for predicting both categorical (CA) and numeric data (RMSE). The study was also able to establish the relationship between the variables that could be used in critical operational decisions.
    Keywords: data mining; machine learning; industries; petroleum refinery; product quality; parameter optimisation.
    DOI: 10.1504/IJBIDM.2023.10043436
  • Fraud Detection with Machine Learning - Model Comparison   Order a copy of this article
    by João Carlos Pacheco Junior, João Luiz Chela, Guilherme Ferreira Pelucio Salome 
    Abstract: This work evaluates the performance of different models for predicting three types of fraudulent behaviour in a novel dataset with imbalanced data. The logistic regression model, a staple in the credit risk industry, is compared to several machine learning models. This work shows that in the binary classification case, all compared models achieved similar results to the logistic regression. The random forest model showed superior performance when classifying credit frauds ending in lawsuits. In the multi-label classification case, the logistic regression attains high levels of precision for all types of fraud, but at lower recall rates. Whereas, the random forest model achieves higher recall rates, but with lower precision rates.
    Keywords: fraud detection; machine learning; imbalanced data; multi-label classification.
    DOI: 10.1504/IJBIDM.2023.10044239
    by Goutam Mylavarapu, K. Ashwin Viswanathan, Johnson P. Thomas 
    Abstract: Data analysis is a crucial process in the field of data science that extracts useful information from any form of data. With the rapid growth of technology, more and more unstructured data, such as text and images, are being produced in large amounts. Apart from the analytical techniques used, the quality of the data plays a prominent role in the accurate analysis Data quality becomes inferior to poor maintenance and mediocre data generation strategies employed by amateur users. This problem escalates with the advent of big data. In this paper, we propose a quality assessment model for the textual form of unstructured data (TDQA). The context of data plays an important role in determining the quality of the data. Therefore, we automate the process of context extraction in textual data using natural language processing to identify data errors and assess quality.
    Keywords: automated data quality assessment; textual data; context-aware; data context; sentiment analysis; lexicon; Doc2Vec; data accuracy; data consistency.
    DOI: 10.1504/IJBIDM.2023.10044353
  • A deep regression convolutional neural network using whole image-based inferencing for dynamic visual crowd estimation   Order a copy of this article
    by Shen Khang Teoh, Vooi Voon Yap, Humaira Nisar 
    Abstract: As intelligent surveillance system applications become ubiquitous, automated crowd counting solutions must be made continually faster and accurate. This paper presents an improved convolutional neural network (CNN) architecture for accurate visual crowd counting in crowd images. Multi-column convolutional neural network (MCNN) is widely used in previous works to predict the density map for visual crowd counting. However, this method has limitations in predicting a quality density map. Instead, the proposed model is architected using powerful CNN layers, dense layers, and one regressor node with whole image-based inference. Therefore, it is less computationally intensive and inference speed can be increased. Tested on the mall dataset, the proposed model achieved 2.01 mean absolute error and 8.53 mean square error. Moreover, benchmarking on different CNN architectures has been conducted. The proposed model shows promising counting accuracy and reasonable inference speed against the existing state-of-art approaches.
    Keywords: visual crowd counting; convolutional neural network; CNN; whole image-based inference; edge embedded platform; multi-column convolutional neural network; MCNN.
    DOI: 10.1504/IJBIDM.2022.10044713
  • Factors That Drive the Selection of Business Intelligence Tools in South African Financial Services Providers   Order a copy of this article
    by Bonginkosi P. Gina, Adheesh Budree 
    Abstract: Innovation and technology advancements in information systems (IS) have resulted in a multitude of product offerings and business intelligence (BI) software tools in the market to implement business intelligence systems (BIS). As a result, a high proportion of organisations fail to employ suitable software tools meeting organisational needs. The study aimed to discover critical factors influencing the selection of BI tools. This was a quantitative study and questionnaire-surveyed data was collected from 92 participants. The data was analysed by employing SPSS and SmartPLS-3 software’s to test the significance of influential factors. The findings showed that software tool technical factors, vendor technical factors, and opinion non-technical factors are significant. The study contributes to both academia and industry by providing influential determinants for software tool selection. It is hoped that the findings presented will contribute to a greater understanding of factors influencing the selection of BI tools to researchers and practitioners alike.
    Keywords: business intelligence tools; BITs; business intelligence systems; BIS; business intelligence; BI; software factors; software selection; software tool.
    DOI: 10.1504/IJBIDM.2023.10044714
  • Effect of IT Integration on Firm performance: The Mediating Role of Supply Chain Integration and Flexibility   Order a copy of this article
    by Gaurav Abhishek Tigga, Ganesan Kannabiran, P. Sridevi 
    Abstract: IT integration complements the functional and operational processes, as well as helps the firm in the development of inimitable competitive advantage. The study examines the effect of ITI on supply chain integration, supplier flexibility and manufacturing flexibility; and their subsequent effects on firm performance. The extended resource-based view has been used as the theoretical perspective to develop the research model. A survey was carried out among the manufacturing industries in India. Structural equation modelling with the partial least squares algorithm was used to analyse the hypotheses proposed in the study. The results reported that ITI has a significant effect on SCI, manufacturing flexibility and SF and subsequently affects FP.
    Keywords: IT integration; supply chain integration; supplier flexibility; manufacturing flexibility; firm performance.
    DOI: 10.1504/IJBIDM.2022.10044810
  • Credit Card Fraud Detection: An Evaluation of SMOTE Resampling and Machine Learning Model Performance   Order a copy of this article
    by Faleh Alshameri, Ran Xia 
    Abstract: Credit card fraud has been a noted security issue that requires financial organisations to continuously improve their fraud detection system. In most cases, a credit transaction dataset is expected to have a significantly larger number of normal transactions than fraud transactions. Therefore, the accuracy of a fraud detection system depends on building a model that can adequately handle such an imbalanced dataset. The purpose of this paper is to explore one of the techniques of dataset rebalancing, the synthetic minority oversampling technique (SMOTE). To evaluate the effects of this technique on model training, we selected four basic classification algorithms, complement naïve Bayes (CNB), K-nearest neighbour (KNN), random forest and support vector machine (SVM). We then compared the performances of the four models trained on the rebalanced and original dataset using the area under precision-recall curve (AUPRC) plots.
    Keywords: credit card; imbalanced dataset; resampling method; synthetic minority oversampling technique; SMOTE; AUPRC; classification algorithms.
    DOI: 10.1504/IJBIDM.2023.10044811
  • On Prevention of Attribute Disclosure and Identity Disclosure Against Insider Attack in Collaborative Social Network Data Publishing   Order a copy of this article
    by Bintu Kadhiwala, Sankita Patel 
    Abstract: In collaborative social network data publishing setup, privacy preservation of individuals is a vital issue. Existing privacy-preserving techniques assume the existence of attackers from external data recipients and hence, are vulnerable to insider attack performed by colluding data providers. Additionally, these techniques protect data against identity disclosure but not against attribute disclosure. To overcome these limitations, in this paper, we address the problem of privacy-preserving data publishing for collaborative social network. Our motive is to prevent both attribute and identity disclosure of collaborative social network data against insider attack. For the purpose, we propose an approach that utilises p-sensitive k-anonymity and m-privacy techniques. Experimental outcomes affirm that our approach preserves privacy with a reasonable increase in information loss and maintains an adequate utility of collaborative social network data.
    Keywords: collaborative social network data publishing; attribute disclosure; identity disclosure; insider attack; k-anonymity; m-privacy.
    DOI: 10.1504/IJBIDM.2023.10045007
  • Identification of Authorship and Prevention Fraudulent Transactions / Cybercrime using Efficient High Performance Machine Learning Techniques   Order a copy of this article
    by Sowmya BJ, Hanumantharaju R, Pradeep Kumar D, Srinivasa K. G 
    Abstract: Cognitive computing refers to the usage of computer models to simulate human intelligence and thought process in a complex situation. Artificial intelligence (AI) is an augmentation to the limits of human capacity for a particular domain and works as an absolute reflection of reality; where a computer program is able to efficiently make decisions without previous explicit knowledge and instruction. The concept of cognitive intelligence was introduced. The most interesting use case for this would be an AI bot that doubles as a digital assistant. This is aimed at solving core problems in AI like open domain question answering, context understanding, aspect-based sentiment analysis, text generation, etc. The work presents a model to develop a multi-resolution RNN to identify local and global context, develop contextual embedding via transformers to pass into a seq2seq architecture and add heavy regularisation and augment data with reinforcement learning, and optimise via recursive neural networks.
    Keywords: cognitive computing; artificial intelligence; AI; data augmentation; human intelligence; recurrent neural network; transformer model.
    DOI: 10.1504/IJBIDM.2022.10045310
  • Forecasting With Information Extracted From The Residuals of ARIMA In Financial Time Series Using Continuous Wavelet Transform   Order a copy of this article
    by Heng Yew Lee, Woan Lin Beh, Kong Hoong Lem 
    Abstract: Time series of financial or economic data are often considered to have certain trends and patterns. It is believed that the study of historical patterns helps in the forecasting into the future. ARIMA model is one of the popular models for the task. However, long-term forecasting with ARIMA often appears as a straight line. This is due to ARIMA’s dependency on previous values and its tendency to omit the outliers that lie outside of the captured general trend. This paper sought to capture useful outlier information from the residual of ARIMA modelling by using continuous wavelet transform (CWT). The CWT captured information was then added to the ARIMA forecasted values to form non-homogenous long-term forecasting. The final results were encouraging. It was also found that choices of certain CWT related parameters have positive or negative effect to the forecasting outcomes.
    Keywords: wavelet; forecasting; autoregressive integrated moving average; ARIMA; time series; continuous wavelet transform; CWT.
    DOI: 10.1504/IJBIDM.2022.10045646
  • DAMIAN -Data Accrual Machine Intelligence with Augmented Networks for Contextually Coherent Creative Story Generation   Order a copy of this article
    by Sowmya BJ, Pradeep Kumar D, Hanumantharaju R, Srinivasa K. G 
    Abstract: Cognitive computing refers to the usage of computer models to simulate human intelligence and thought process in a complex situation. Artificial intelligence (AI) is an augmentation to the limits of human capacity for a particular domain and works as an absolute reflection of reality; where a computer program is able to efficiently make decisions without previous explicit knowledge and instruction. The concept of cognitive intelligence was introduced. The most interesting use case for this would be an AI bot that doubles as a digital assistant. This is aimed at solving core problems in AI like open domain question answering, context understanding, aspect-based sentiment analysis, text generation, etc. The work presents a model to develop a multi-resolution RNN to identify local and global context, develop contextual embedding via transformers to pass into a seq2seq architecture and add heavy regularisation and augment data with reinforcement learning, and optimise via recursive neural networks.
    Keywords: cognitive computing; artificial intelligence; AI; data augmentation; human intelligence; recurrent neural network; transformer model.
    DOI: 10.1504/IJBIDM.2022.10045744
  • EmoRile: A Personalized Emoji Prediction Scheme Based on User Profiling   Order a copy of this article
    by Vandita Grover, Hema Banati 
    Abstract: Emojis are widely used to express emotions and complement text communication. Existing approaches for emoji prediction are generic and generally utilise text or time for emoji prediction. However, research reveals that emoji usage differs among users. So individual users’ preferences for certain emojis need to be captured while predicting emojis for them. In this paper, a novel emoji-usage-based profiling: EmoRile is proposed. In EmoRile, emoji-usage-based user profiles were created which could be accomplished by compiling a new dataset that included users’ information also. Distinct models with different combinations of text, text sentiment, and users’ preferred emojis were created for emoji prediction. These models were tested on various architectures with a very large emoji label space. Rigorous experimentation showed that even with a large label space, EmoRile predicted emojis with similar accuracy as compared to existing emoji prediction approaches with a smaller label space; making it a competitive emoji prediction approach.
    Keywords: emojis in sentiment analysis; emoji prediction; user profile-based emojis.
    DOI: 10.1504/IJBIDM.2023.10045810
  • Brain Hemorrhage Classification from CT Scan Images using Fine-tuned Transfer Learning Deep Features   Order a copy of this article
    by Arpita Ghosh, Badal Soni, Ujwala Baruah 
    Abstract: Classification of brain haemorrhage is a challenging task and needs to solved to help advance medical treatment. Recently, it has been observed that efficient deep learning architectures have been developed to detect such bleeding accurately. The proposed system includes two different transfer learning strategies to train and fine tune ImageNet pre-trained state-of-the-art architecture such that VGG 16, Inception V3, DenseNet121. The evaluation metrics have been calculated based on the performance analysis of the employed networks. Experimental results show that the modified fine-tuned Inception V3 perform well and achieved the highest test accuracy.
    Keywords: transfer learning; VGG 16; Inception V3; DenseNet121; brain haemorrhage; ReLU; binary cross entropy.
    DOI: 10.1504/IJBIDM.2022.10046012
  • A Novel Classification-based Parallel Frequent Pattern Discovery Model for Decision making and Strategic planning in Retailing   Order a copy of this article
    by Rajiv Senapati 
    Abstract: Exponential growth of retail transactions with different interests of variety of customer makes the pattern mining problem trivial. Hence this paper proposes a novel model for mining frequent patterns. As per the proposed model the frequent pattern discovery is carried out in three phases. In first phase, dataset is divided into n partitions based on the time stamp. In the second phase, clustering is performed in each of the partitions parallelly to classify the customers as HIG, MIG, and LIG. In the third phase, proposed algorithm is applied on each of the classified groups to obtain frequent patterns. Finally, the proposed model is validated using a sample dataset and experimental results are presented to explain the capability and usefulness of the proposed model and algorithm. Further, the proposed algorithm is compared with the existing algorithm and it is observed that the proposed algorithm performs better in terms of time complexity.
    Keywords: data mining; frequent pattern; association rule; classification; algorithm; decision making; retailing.
    DOI: 10.1504/IJBIDM.2023.10046447
  • Distributed Computing and Shared Memory based Utility List Buffer Miner with Parallel Frameworks for High Utility Itemset Mining   Order a copy of this article
    by Eduardus Hardika Sandy Atmaja, Kavita Sonawane 
    Abstract: High Utility Itemset Mining (HUIM) is a well-known pattern mining technique. It considers the utility of the items that leads to finding high profit patterns which are more useful for real conditions. Handling large and complex dataset are the major challenges in HUIM. The main problem here is the exponential time complexity. Literature Review shows multicore approaches to solve this problem by parallelizing the tasks but it is limited to single machine resources and also needs a novel strategy. To address this problem, we proposed new strategies namely Distributed Computing (DC-PLB) and Shared Memory (SM-PLB) based Utility List Buffer Miner with Parallel Frameworks (PLB). It utilizes cluster nodes to parallelize and distribute the tasks efficiently. Thorough experiments with results proved that the proposed frameworks achieved better runtime (448s) in dense datasets compared to the existing PLB (2237s). It has effectively addressed the challenges of handing large and complex datasets.
    Keywords: HUIM; PLB; DC-PLB; SM-PLB; cluster computing; parallel and distributed computing; data mining; MPI; Apache Spark.
    DOI: 10.1504/IJBIDM.2023.10046448
  • A Survey on Adoption of Blockchain in Healthcare   Order a copy of this article
    by Shantha Shalini K., M. Nithya 
    Abstract: In this technology and automation era, blockchain technology travels in the direction of consistent studies and adoption in different sectors. Blockchain technology with a chain of the block provides security and establishes a trusted environment between individuals. In the past couple of years, blockchain technology attracted many research scholars, industrialists to study, analyse and apply the technology in their own application needs. The major advantage of blockchain technology is the security, user privacy preserved, transparency. The purpose of this proposed paper is to provide a survey on blockchain scope in healthcare providing high security of patient health information’s during sharing and their impact to reduce the operational and capital investments. Also, this paper briefs on the new business opportunities in the health sector integrating blockchain technology.
    Keywords: healthcare; blockchain; patient health records.
    DOI: 10.1504/IJBIDM.2023.10046449
  • An Optimized Soft Computing based Approach for Multimedia Data Mining   Order a copy of this article
    by M. Ravi, M. Ekambaram Naidu, G. Narsimha 
    Abstract: Multimedia mining is a sub-field of information mining which is exploited to discover fascinating data of certain information from interactive media information bases. The information mining is ordered into two general classifications, such as static media and dynamic media. Static media possesses text and pictures. Dynamic kind of media consists of Audio and Video. Multimedia mining alludes to investigation of huge measure of mixed media data so as to extricate design patterns dependent on their factual connections. Multimedia mining frameworks can find significant data or image design patterns from a colossal assortment of imageries. In this paper, a hybrid method is proposed which exploits statistical and applied soft computing-based primitives and building blocks, i.e., a novel feature engineering algorithm, aided with convolutional neural networks-based efficient modelling procedure. The optimal parameters are chosen such as number of filters, kernel size, strides, input shape and nonlinear activation function. Experiments are performed on standard web multimedia data (here, image dataset is exploited as multimedia data) and achieved multi-class image categorisation and analysis. Our obtained results are also compared with other significant existing methods and presented in the form of an intensive comparative analysis.
    Keywords: knowledge discovery; supervised learning; multimedia databases; image data; soft computing; feature engineering.
    DOI: 10.1504/IJBIDM.2023.10046450
  • Variable Item Value based High Utility Itemset Recommendation Using Statistical Approach   Order a copy of this article
    by ABDULLAH BOKIR, V.B. Narasimha 
    Abstract: High utility mining has become an absolute requirement for an efficient corporate management procedure. The challenge persists in identifying the top-out or bottom-out conditions in the context of the available HUM solutions, and it is critical for enterprises to manage adequate inventory to have higher yield outcomes. Taking these aspects into consideration, this paper proposed a comprehensive method named as "Variable Item Value-based High Utility Itemset Recommendation (VIVHUIR)". Unlike the contemporary models, which are focusing utility mining by constant utility factor, the proposed model is focusing on variable utility factor to perform utility mining based on profitability for an itemset. In addition, the drift (variability) in utility factor detection methodology is fundamentally based on the Average True Range for an itemset and the Relative Strength Index assessment for analysis, which is unique and novel feature of the proposal. To comprehend the elements influencing profit, the proposed four-layered filtering model depends on quantities, demand, supply, and gain/loss inventory. The experimental research of the model refers to potential solutions that are pragmatic in a real-time situation.
    Keywords: High Utility Mining; Dynamic Utility; Average True Range; Relative Strength Index; Economic Order Quantity; Inventory Storage Cost.
    DOI: 10.1504/IJBIDM.2023.10047036
  • Multi-modal feature fusion for object detection using neighbourhood component analysis and bounding box regression   Order a copy of this article
    by Anamika Dhillon, Gyanendra K. Verma 
    Abstract: Object detection has gained remarkable interest in the research area of computer vision applications. This paper presents an efficient method to detect multiple objects and it contains two parts: 1) training phase; 2) testing phase. During training phase, firstly we have exploited two convolutional neural network models namely Inception-ResNet-V2 and MobileNet-V2 for feature extraction and then we fuse the features extracted from these two models by using concatenation operation. To acquire a more compact presentation of features, we have utilised neighbourhood component analysis (NCA). After that, we classify the multiple objects by using SVM classifier. During the testing phase, to detect various objects in an image, a bounding box regression module is proposed by applying LSTM. We have performed our experiments on two datasets; wild animal camera trap and gun. In particular, our method achieves an accuracy rate of 97.80% and 97.0% on wild animal camera trap and gun datasets respectively.
    Keywords: deep convolution networks; object detection; neighbourhood component analysis; NCA; support vector machine; SVM; long short-term memory; LSTM.
    DOI: 10.1504/IJBIDM.2022.10047465
  • Solving a restriction of Bayesian network in giving domain knowledge by introducing factor nodes   Order a copy of this article
    by Yutaka Iwakami, Hironori Takuma, Motoi Iwashita 
    Abstract: Bayesian network is a probabilistic inference model that is effective for decision-making in business such as product development. Multiple events are represented as oval nodes and their relationships are drawn as edges among them. However, in order to obtain a sufficient effect, it is necessary to appropriately configure domain knowledge, for example more customer response to the product leads to more clarity of requirements for products. Such domain knowledge is configured as an edge connecting nodes. But in some cases, the constraint of the structure in a Bayesian network prevents this configuration. In this study, the authors propose a method to avoid this constraint by introducing the redundant factor nodes generated by applying factor analysis to the data related with domain knowledge. With this approach more domain knowledge can be applied to the Bayesian network, and the accuracy of decision-making in business is expected to be improved.
    Keywords: model improvement; data extraction; data driven insight; probabilistic inference; decision-making; product development; Bayesian network; factor analysis; key goal indicator; KGI; key performance indicator; KPI.
    DOI: 10.1504/IJBIDM.2022.10036731
  • Privacy preservation of the user data and properly balancing between privacy and utility   Order a copy of this article
    by N. Yuvaraj, K. Praghash, T. Karthikeyan 
    Abstract: The privacy and utility are the trade-off factors, where the performance of one factor should sacrifice to achieve the other. If privacy is achieved without publishing the data, then efficient utility cannot be achieved, hence the original dataset tends to get published without privacy. Therefore, it is essential to maintain the equilibrium between privacy and utility of datasets. In this paper, we propose a new privacy utility method, where the privacy is maintained by lightweight elliptical curve cryptography (ECC), and utility is maintained through ant colony optimisation (ACO) clustering. Initially, the datasets are clustered using ACO and then the privacy of clustered datasets is maintained using ECC. The proposed method has experimented over medical datasets and it is compared with existing methods through several performance metrics like clustering accuracy, F-measure, data utility, and privacy metrics. The analysis shows that the proposed method obtains improved privacy preservation using the clustering algorithm than existing methods.
    Keywords: ant colony optimisation; ACO; elliptical curve cryptography; ECC; privacy preservation; utility.
    DOI: 10.1504/IJBIDM.2022.10035576
  • Chaotic activities recognising during the pre-processing event data phase   Order a copy of this article
    by Zineb Lamghari, Rajaa Saidi, Maryam Radgui, Moulay Driss Rahmani 
    Abstract: Process mining aims at obtaining insights into business processes by extracting knowledge from event data. Indeed, the quality of events is a crucial element for generating process models, to reflect business process reality. To do so, pre-processing methods are appeared, to clean events from deficiencies (noise, incompleteness and infrequent behaviours) in the limit of chaotic activities' emergence. Chaotic activities are executed arbitrarily in the process and impact the quality of discovered models. Beyond, a supervised learning approach has been proposed, using labelled samples to detect chaotic activities. This puts forward the difficulty of defining chaotic activities in the case of no ground knowledge on which activities are truly chaotic. To that end, we develop an approach for recognising chaotic activities without having labelling training data, using unsupervised learning techniques.
    Keywords: pre-processing; process discovery; process mining; chaotic activity; business process intelligent; machine learning algorithms.
    DOI: 10.1504/IJBIDM.2022.10035223
  • Harnessing the meteorological effect for predicting the retail price of rice in Bangladesh   Order a copy of this article
    by Abdullah Al Imran, Zaman Wahid, Alpana Akhi Prova, Md. Hannan 
    Abstract: Bangladesh has seen an absurd, steeper prize-hike for the last couple of years in one of the most consumed foods taken by millions of people every single day: rice. The impact of this phenomenon, however, is indispensably critical, especially to the one striving for daily meals. Thus, understanding the latent facts is vital to policymakers for better strategic measures and decision-making. In this paper, we have applied five different machine learning algorithms to predict the retail price of rice, find out the top-most factors responsible for the price hike, and determine the best model that produces higher prediction results. Leveraging six evaluation metrics, we found that random forest produces the best result with an explain variance score of 0.87 and an R2 score of 0.86 whereas gradient boosting produces the least, meanwhile discovering that average wind speed is the topmost reason for rice price hike in retail markets.
    Keywords: data mining; rice price prediction; pattern mining; regression; retail markets.
    DOI: 10.1504/IJBIDM.2022.10035542
  • Predicting students' academic performance using machine learning techniques: a literature review   Order a copy of this article
    by Aya Nabil, Mohammed Seyam, Ahmed Abou-Elfetouh 
    Abstract: The amount of students' data stored in educational databases is increasing rapidly. These databases contain hidden patterns and useful information about students' behaviour and performance. Data mining is the most effective method to analyse the stored educational data. Educational data mining (EDM) is the process of applying different data mining techniques in educational environments to analyse huge amounts of educational data. Several researchers applied different machine learning techniques to analyse students' data and extract hidden knowledge from them. Prediction of students' academic performance is necessary for educational environments to measure the quality of the learning process. Therefore, it is one of the most common applications of EDM. In this survey paper, we present a review of data mining techniques, EDM and its applications, and discuss previous studies in predicting students' academic performance. An analysis of different machine learning techniques used in previous studies is also presented in this paper.
    Keywords: data mining; educational data mining; EDM; prediction; student academic performance; machine learning techniques; deep learning.
    DOI: 10.1504/IJBIDM.2022.10035540
  • Customer segmentation using various machine learning techniques   Order a copy of this article
    by Samyuktha Palangad Othayoth, Raja Muthalagu 
    Abstract: In the field of retail industry and marketing, customer segmentation is one of the most important tasks. A proper customer segmentation can help the managers to enhance the quality of products and provide better services for the targeting segments. Various machine learning algorithms-based customer segmentation techniques are used to get an insight about the customer's behaviour and the potential customers that could be targeted to maximise profit. Based on the previous studies, this paper proposes improved machine learning models for customer segmentation in e-commerce. The agglomerative clustering algorithms have been implemented to segment the customers with the new metric for customer behaviour. Also, we have proposed a systematic approach for combining agglomerative clustering algorithm and filtering-based recommender system to improve customer experience and customer retention. In the experiment, the results were compared with K-means clustering model, and it was found that BLS greatly reduced training time while guaranteeing accuracy.
    Keywords: customer segmentation; agglomerative clustering algorithms; machine learning algorithms; K-means.
    DOI: 10.1504/IJBIDM.2022.10036753