Forthcoming and Online First Articles

International Journal of Business Intelligence and Data Mining

International Journal of Business Intelligence and Data Mining (IJBIDM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Business Intelligence and Data Mining (29 papers in press)

Regular Issues

  • Analysis and Prediction of Heart Disease Aid of Various Data Mining Techniques: A Survey   Order a copy of this article
    by V. Poornima, D. Gladis 
    Abstract: In recent times, health diseases are expanding gradually because of inherited. Particularly, heart disease has turned out to be the more typical nowadays, i.e., life of individuals is at hazard. The data mining strategies specifically decision tree, Naive Byes, neural network, K-means clustering, association classification, support vector machine (SVM), fuzzy, rough set theory and orthogonal local preserving methodologies are examined on heart disease database. In this paper, we survey distinctive papers in which at least one algorithms of data mining are utilised for the forecast of heart disease. This survey comprehends the current procedures required in vulnerability prediction of heart disease for classification in data mining. Survey of pertinent data mining strategies which are included in risk prediction of heart disease gives best expectation display as hybrid approach contrasting with the single model approach.
    Keywords: Data mining; Heart Disease Prediction; performance measure; Fuzzy; and clustering.
    DOI: 10.1504/IJBIDM.2018.10014620
     
  • Exploring Outliers in Global Economic Dataset having the Impact of Covid-19 Pandemic   Order a copy of this article
    by Anindita Desarkar, Ajanta Das, Chitrita Chaudhuri 
    Abstract: Outlier is a value that lies outside most of the other values in a dataset. Outlier exploration has a huge importance in almost all the industry applications like medical diagnosis, credit card fraudulence and intrusion detection systems. Similarly, in economic domain, it can be applied to analyse many unexpected events to harvest new knowledge like sudden crash of stock market, mismatch between country’s per capita incomes and overall development, abrupt change in unemployment rate and steep falling of bank interest. These situations can arise due to several reasons, out of which the present covid-19 pandemic is a leading one. This motivates the present researchers to identify a few such vulnerable areas in the economic sphere and ferret out the most affected countries for each of them. Two well-known machine-learning techniques DBSCAN and Z-score are utilised to get these insights, which can serve as a guideline towards improving the overall scenario subsequently.
    Keywords: economic outlier; machine learning; gross domestic product; GDP; per capita; human development index; HDI; covid-19 pandemic; total death percentage.
    DOI: 10.1504/IJBIDM.2022.10043040
     
  • KNOWLEDGE DISCOVERY IN DATABASES: AN APPLICATION TO MARKET SEGMENTATION IN RETAIL SUPERMARKETS   Order a copy of this article
    by Kellen Endler, Cassius Tadeu Scarpin, Maria Teresinha Arns Steiner, Tamires Almeida Sfeir, Claudimar Pereira Da Veiga 
    Abstract: The purpose of this article is to present a methodology based on the extraction process of knowledge discovery in databases (KDD) to predict the expenditure of different customer profiles, considering their characteristics, and the type of store they would buy from, in one of the largest retail chains in the Brazilian supermarket and hypermarket segment. These stores have different characteristics, such as physical size, product assortment and customer profile. This heterogeneity in terms of commercial offers implies a desire for consumption by customers that differs from store to store, depending on how their preferences are met. The proposed methodology was applied to a real marketing case based in a business-to-consumer (B2C) environment to aid retailers during the segmentation process. The results show that it is possible to highlight relationships between the data that enabled the prediction of customers’ consumption, which can contribute towards generating useful information to retail businesses.
    Keywords: knowledge discovery in databases; KDD; data mining; market segmentation; retail; principal component analysis; PCA; cluster analysis; multiple linear regression.
    DOI: 10.1504/IJBIDM.2022.10043148
     
  • Mining Models for Predicting Product Quality Properties of Petroleum Products   Order a copy of this article
    by NG`AMBILANI ZULU, Douglas Kunda 
    Abstract: There is a huge generation of raw data during production processes of refinery products and in most cases this data remains under-utilised for knowledge acquisition and decision making. The purpose of this study was to demonstrate how data mining techniques can be used to develop models to predict product quality properties for petroleum products. This study used petroleum refinery production raw data to build predicting models for product quality control activities. The plant and laboratory data for the period of about 18 months was mined from the refinery repositories in order to build the datasets required for analysis using Orange3 data mining software. Four data mining algorithms were chosen for experiments in order to determine the best predicting model using cross-validation technique as a validation method. This study only employed two measuring metrics, classification accuracy (CA) and root mean square error (RMSE) as performance indicators. Random forest came out as the best performing model suitable for predicting both categorical (CA) and numeric data (RMSE). The study was also able to establish the relationship between the variables that could be used in critical operational decisions.
    Keywords: data mining; machine learning; industries; petroleum refinery; product quality; parameter optimisation.
    DOI: 10.1504/IJBIDM.2023.10043436
     
  • Fraud Detection with Machine Learning - Model Comparison   Order a copy of this article
    by João Carlos Pacheco Junior, João Luiz Chela, Guilherme Ferreira Pelucio Salome 
    Abstract: This work evaluates the performance of different models for predicting three types of fraudulent behaviour in a novel dataset with imbalanced data. The logistic regression model, a staple in the credit risk industry, is compared to several machine learning models. This work shows that in the binary classification case, all compared models achieved similar results to the logistic regression. The random forest model showed superior performance when classifying credit frauds ending in lawsuits. In the multi-label classification case, the logistic regression attains high levels of precision for all types of fraud, but at lower recall rates. Whereas, the random forest model achieves higher recall rates, but with lower precision rates.
    Keywords: fraud detection; machine learning; imbalanced data; multi-label classification.
    DOI: 10.1504/IJBIDM.2023.10044239
     
  • CONTEXT-AWARE AUTOMATED QUALITY ASSESSMENT OF TEXTUAL DATA   Order a copy of this article
    by Goutam Mylavarapu, K. Ashwin Viswanathan, Johnson P. Thomas 
    Abstract: Data analysis is a crucial process in the field of data science that extracts useful information from any form of data. With the rapid growth of technology, more and more unstructured data, such as text and images, are being produced in large amounts. Apart from the analytical techniques used, the quality of the data plays a prominent role in the accurate analysis Data quality becomes inferior to poor maintenance and mediocre data generation strategies employed by amateur users. This problem escalates with the advent of big data. In this paper, we propose a quality assessment model for the textual form of unstructured data (TDQA). The context of data plays an important role in determining the quality of the data. Therefore, we automate the process of context extraction in textual data using natural language processing to identify data errors and assess quality.
    Keywords: automated data quality assessment; textual data; context-aware; data context; sentiment analysis; lexicon; Doc2Vec; data accuracy; data consistency.
    DOI: 10.1504/IJBIDM.2023.10044353
     
  • Factors That Drive the Selection of Business Intelligence Tools in South African Financial Services Providers   Order a copy of this article
    by Bonginkosi P. Gina, Adheesh Budree 
    Abstract: Innovation and technology advancements in information systems (IS) have resulted in a multitude of product offerings and business intelligence (BI) software tools in the market to implement business intelligence systems (BIS). As a result, a high proportion of organisations fail to employ suitable software tools meeting organisational needs. The study aimed to discover critical factors influencing the selection of BI tools. This was a quantitative study and questionnaire-surveyed data was collected from 92 participants. The data was analysed by employing SPSS and SmartPLS-3 software’s to test the significance of influential factors. The findings showed that software tool technical factors, vendor technical factors, and opinion non-technical factors are significant. The study contributes to both academia and industry by providing influential determinants for software tool selection. It is hoped that the findings presented will contribute to a greater understanding of factors influencing the selection of BI tools to researchers and practitioners alike.
    Keywords: business intelligence tools; BITs; business intelligence systems; BIS; business intelligence; BI; software factors; software selection; software tool.
    DOI: 10.1504/IJBIDM.2023.10044714
     
  • Credit Card Fraud Detection: An Evaluation of SMOTE Resampling and Machine Learning Model Performance   Order a copy of this article
    by Faleh Alshameri, Ran Xia 
    Abstract: Credit card fraud has been a noted security issue that requires financial organisations to continuously improve their fraud detection system. In most cases, a credit transaction dataset is expected to have a significantly larger number of normal transactions than fraud transactions. Therefore, the accuracy of a fraud detection system depends on building a model that can adequately handle such an imbalanced dataset. The purpose of this paper is to explore one of the techniques of dataset rebalancing, the synthetic minority oversampling technique (SMOTE). To evaluate the effects of this technique on model training, we selected four basic classification algorithms, complement naïve Bayes (CNB), K-nearest neighbour (KNN), random forest and support vector machine (SVM). We then compared the performances of the four models trained on the rebalanced and original dataset using the area under precision-recall curve (AUPRC) plots.
    Keywords: credit card; imbalanced dataset; resampling method; synthetic minority oversampling technique; SMOTE; AUPRC; classification algorithms.
    DOI: 10.1504/IJBIDM.2023.10044811
     
  • On Prevention of Attribute Disclosure and Identity Disclosure Against Insider Attack in Collaborative Social Network Data Publishing   Order a copy of this article
    by Bintu Kadhiwala, Sankita Patel 
    Abstract: In collaborative social network data publishing setup, privacy preservation of individuals is a vital issue. Existing privacy-preserving techniques assume the existence of attackers from external data recipients and hence, are vulnerable to insider attack performed by colluding data providers. Additionally, these techniques protect data against identity disclosure but not against attribute disclosure. To overcome these limitations, in this paper, we address the problem of privacy-preserving data publishing for collaborative social network. Our motive is to prevent both attribute and identity disclosure of collaborative social network data against insider attack. For the purpose, we propose an approach that utilises p-sensitive k-anonymity and m-privacy techniques. Experimental outcomes affirm that our approach preserves privacy with a reasonable increase in information loss and maintains an adequate utility of collaborative social network data.
    Keywords: collaborative social network data publishing; attribute disclosure; identity disclosure; insider attack; k-anonymity; m-privacy.
    DOI: 10.1504/IJBIDM.2023.10045007
     
  • EmoRile: A Personalized Emoji Prediction Scheme Based on User Profiling   Order a copy of this article
    by Vandita Grover, Hema Banati 
    Abstract: Emojis are widely used to express emotions and complement text communication. Existing approaches for emoji prediction are generic and generally utilise text or time for emoji prediction. However, research reveals that emoji usage differs among users. So individual users’ preferences for certain emojis need to be captured while predicting emojis for them. In this paper, a novel emoji-usage-based profiling: EmoRile is proposed. In EmoRile, emoji-usage-based user profiles were created which could be accomplished by compiling a new dataset that included users’ information also. Distinct models with different combinations of text, text sentiment, and users’ preferred emojis were created for emoji prediction. These models were tested on various architectures with a very large emoji label space. Rigorous experimentation showed that even with a large label space, EmoRile predicted emojis with similar accuracy as compared to existing emoji prediction approaches with a smaller label space; making it a competitive emoji prediction approach.
    Keywords: emojis in sentiment analysis; emoji prediction; user profile-based emojis.
    DOI: 10.1504/IJBIDM.2023.10045810
     
  • Brain Hemorrhage Classification from CT Scan Images using Fine-tuned Transfer Learning Deep Features   Order a copy of this article
    by Arpita Ghosh, Badal Soni, Ujwala Baruah 
    Abstract: Classification of brain haemorrhage is a challenging task and needs to solved to help advance medical treatment. Recently, it has been observed that efficient deep learning architectures have been developed to detect such bleeding accurately. The proposed system includes two different transfer learning strategies to train and fine tune ImageNet pre-trained state-of-the-art architecture such that VGG 16, Inception V3, DenseNet121. The evaluation metrics have been calculated based on the performance analysis of the employed networks. Experimental results show that the modified fine-tuned Inception V3 perform well and achieved the highest test accuracy.
    Keywords: transfer learning; VGG 16; Inception V3; DenseNet121; brain haemorrhage; ReLU; binary cross entropy.
    DOI: 10.1504/IJBIDM.2022.10046012
     
  • A Novel Classification-based Parallel Frequent Pattern Discovery Model for Decision making and Strategic planning in Retailing   Order a copy of this article
    by Rajiv Senapati 
    Abstract: Exponential growth of retail transactions with different interests of variety of customer makes the pattern mining problem trivial. Hence this paper proposes a novel model for mining frequent patterns. As per the proposed model the frequent pattern discovery is carried out in three phases. In first phase, dataset is divided into n partitions based on the time stamp. In the second phase, clustering is performed in each of the partitions parallelly to classify the customers as HIG, MIG, and LIG. In the third phase, proposed algorithm is applied on each of the classified groups to obtain frequent patterns. Finally, the proposed model is validated using a sample dataset and experimental results are presented to explain the capability and usefulness of the proposed model and algorithm. Further, the proposed algorithm is compared with the existing algorithm and it is observed that the proposed algorithm performs better in terms of time complexity.
    Keywords: data mining; frequent pattern; association rule; classification; algorithm; decision making; retailing.
    DOI: 10.1504/IJBIDM.2023.10046447
     
  • Distributed Computing and Shared Memory based Utility List Buffer Miner with Parallel Frameworks for High Utility Itemset Mining   Order a copy of this article
    by Eduardus Hardika Sandy Atmaja, Kavita Sonawane 
    Abstract: High Utility Itemset Mining (HUIM) is a well-known pattern mining technique. It considers the utility of the items that leads to finding high profit patterns which are more useful for real conditions. Handling large and complex dataset are the major challenges in HUIM. The main problem here is the exponential time complexity. Literature Review shows multicore approaches to solve this problem by parallelizing the tasks but it is limited to single machine resources and also needs a novel strategy. To address this problem, we proposed new strategies namely Distributed Computing (DC-PLB) and Shared Memory (SM-PLB) based Utility List Buffer Miner with Parallel Frameworks (PLB). It utilizes cluster nodes to parallelize and distribute the tasks efficiently. Thorough experiments with results proved that the proposed frameworks achieved better runtime (448s) in dense datasets compared to the existing PLB (2237s). It has effectively addressed the challenges of handing large and complex datasets.
    Keywords: HUIM; PLB; DC-PLB; SM-PLB; cluster computing; parallel and distributed computing; data mining; MPI; Apache Spark.
    DOI: 10.1504/IJBIDM.2023.10046448
     
  • An Optimized Soft Computing based Approach for Multimedia Data Mining   Order a copy of this article
    by M. Ravi, M. Ekambaram Naidu, G. Narsimha 
    Abstract: Multimedia mining is a sub-field of information mining which is exploited to discover fascinating data of certain information from interactive media information bases. The information mining is ordered into two general classifications, such as static media and dynamic media. Static media possesses text and pictures. Dynamic kind of media consists of Audio and Video. Multimedia mining alludes to investigation of huge measure of mixed media data so as to extricate design patterns dependent on their factual connections. Multimedia mining frameworks can find significant data or image design patterns from a colossal assortment of imageries. In this paper, a hybrid method is proposed which exploits statistical and applied soft computing-based primitives and building blocks, i.e., a novel feature engineering algorithm, aided with convolutional neural networks-based efficient modelling procedure. The optimal parameters are chosen such as number of filters, kernel size, strides, input shape and nonlinear activation function. Experiments are performed on standard web multimedia data (here, image dataset is exploited as multimedia data) and achieved multi-class image categorisation and analysis. Our obtained results are also compared with other significant existing methods and presented in the form of an intensive comparative analysis.
    Keywords: knowledge discovery; supervised learning; multimedia databases; image data; soft computing; feature engineering.
    DOI: 10.1504/IJBIDM.2023.10046450
     
  • Variable Item Value based High Utility Itemset Recommendation Using Statistical Approach   Order a copy of this article
    by ABDULLAH BOKIR, V.B. Narasimha 
    Abstract: High utility mining has become an absolute requirement for an efficient corporate management procedure. The challenge persists in identifying the top-out or bottom-out conditions in the context of the available HUM solutions, and it is critical for enterprises to manage adequate inventory to have higher yield outcomes. Taking these aspects into consideration, this paper proposed a comprehensive method named as "Variable Item Value-based High Utility Itemset Recommendation (VIVHUIR)". Unlike the contemporary models, which are focusing utility mining by constant utility factor, the proposed model is focusing on variable utility factor to perform utility mining based on profitability for an itemset. In addition, the drift (variability) in utility factor detection methodology is fundamentally based on the Average True Range for an itemset and the Relative Strength Index assessment for analysis, which is unique and novel feature of the proposal. To comprehend the elements influencing profit, the proposed four-layered filtering model depends on quantities, demand, supply, and gain/loss inventory. The experimental research of the model refers to potential solutions that are pragmatic in a real-time situation.
    Keywords: High Utility Mining; Dynamic Utility; Average True Range; Relative Strength Index; Economic Order Quantity; Inventory Storage Cost.
    DOI: 10.1504/IJBIDM.2023.10047036
     
  • Multi-modal feature fusion for object detection using neighbourhood component analysis and bounding box regression   Order a copy of this article
    by Anamika Dhillon, Gyanendra K. Verma 
    Abstract: Object detection has gained remarkable interest in the research area of computer vision applications. This paper presents an efficient method to detect multiple objects and it contains two parts: 1) training phase; 2) testing phase. During training phase, firstly we have exploited two convolutional neural network models namely Inception-ResNet-V2 and MobileNet-V2 for feature extraction and then we fuse the features extracted from these two models by using concatenation operation. To acquire a more compact presentation of features, we have utilised neighbourhood component analysis (NCA). After that, we classify the multiple objects by using SVM classifier. During the testing phase, to detect various objects in an image, a bounding box regression module is proposed by applying LSTM. We have performed our experiments on two datasets; wild animal camera trap and gun. In particular, our method achieves an accuracy rate of 97.80% and 97.0% on wild animal camera trap and gun datasets respectively.
    Keywords: deep convolution networks; object detection; neighbourhood component analysis; NCA; support vector machine; SVM; long short-term memory; LSTM.
    DOI: 10.1504/IJBIDM.2022.10047465
     
  • Identifying influential nodes in large scale social networks using Global and local structural information   Order a copy of this article
    by Noosheen Shareefi, Mehdi Bateni 
    Abstract: According to the importance of identifying influential nodes in different applications, many methods have been proposed for it. Some of them are not accurate enough or have high temporal complexity. In this paper, a method named new GLS (NGLS) is developed based on the global and local search (GLS) algorithm. GLS, despite its high accuracy compared to other methods is not fast and efficient enough. NGLS is developed to improve the efficiency and scalability of GLS. To reach this goal, the number of common neighbours of each node is counted only up to a radius of two. The execution time of NGLS on average has been reduced by 85% in real-world networks and 97% on simulated networks, while the accuracy of NGLS is the same as GLS accuracy. Therefore, NGLS is applicable for larger real-world networks.
    Keywords: influential nodes; global and local information; large networks; centrality measure; neighbour contribution; complex network; propagation; propagation models; complexity; social network analysis.
    DOI: 10.1504/IJBIDM.2023.10047751
     
  • AN EFFECTIVE ABSTRACT TEXT SUMMARIZATION USING SHARK SMELL OPTIMIZED BIDIRECTIONAL ENCODER REPRESENTATIONS FROM TRANSFORMER   Order a copy of this article
    by Nafees Muneera, P. Sriramya 
    Abstract: Recently, a vast amount of text data has been increased rapidly and then information must be summarised to retrieve useful knowledge. First, the preprocessing module utilises the fixed-length stemming method, and then the segmentation module makes use of a pre-trained bidirectional encoder representations from transformers (BERT). The text of input is segmented with the utilisation of feedforward and multi-head attention layer. This BERT segmentation paradigm is adjoined alongside shark smell optimisation (SSO) methodology, and thus, the phrases that are extricated are employed to prepare the document stage of a dataset of Amazon merchandise assessment. This study aspires for creating a concise summary and invigorating headlines, which grab the focus of the readers. This paper exhibits that it performs by amalgamating the duo extractive and abstractive procedures employing a pipelined technique for creating a succinct summary that is later utilised for headline creation. Experimentation was executed on publically accessible datasets CNN/Daily Mail.
    Keywords: abstractive; text summarisation; optimisation; transformer; clustering; similarity index.
    DOI: 10.1504/IJBIDM.2023.10047979
     
  • State-of-Art approaches for Event Detection over Twitter Stream: a Survey   Order a copy of this article
    by Jagrati Singh, Anil Kumar Singh 
    Abstract: In the present time, social network applications like Twitter, Facebook and YouTube have evolved as a popular way of information sharing for general users. On these platforms, valuable information appears as breaking hot news, trending topics, public opinion, and so on. Twitter is the most popular microblogging service that generates huge volumes of data with high velocity and variety (i.e., images, text and video). Due to the growth of discussed real-world events over Twitter, the event detection problem is becoming an interesting and challenging issue. Event detection is the practice of applying natural language processing and text analysis techniques to identify and extract event information from text. This survey paper explores important research works for event detection using Twitter data. We classify approaches according to feature modelling methods: vector space model, statistical model and graph model. We highlight research challenges, issues, and the limitation of existing approaches to find the research gaps for future directions.
    Keywords: Twitter stream; clustering; data sharing; supervised technique; unsupervised technique; semantic correlation; keyword co-occurrence; topic modelling.
    DOI: 10.1504/IJBIDM.2023.10048271
     
  • Data Quality Based View Selection in Big Data Integration System   Order a copy of this article
    by Samir Anter 
    Abstract: An integration system is an intermediate tool between a user and a set of distributed sources. It provides transparent access to information through an Interface using a unique query language. This provides an illusion to the end user as if it is accessing a homogeneous central repository. In a hybrid system, one part of the data is queried on demand where as another part is extracted, filtered and stored in a local database. This approach is very much promising for data access in big data context. However, obtaining satisfactory results depend on the correct choice of data to materialise. Further this task is even more difficult in big data context. In this article, a novel approach has been proposed to overcome above problem which uses data quality to select views that will be materialised.
    Keywords: data integration; materialised views; big data; data quality; view selection.
    DOI: 10.1504/IJBIDM.2023.10048381
     
  • Evaluation of Factors Involved in Predicting Indian Stock Price Using Machine Learning Algorithms   Order a copy of this article
    by ARCHIT A. VOHRA, Paresh J. Tanna 
    Abstract: This study evaluates the effect of training dataset size, dimensionality and rolling dataset on the prediction accuracy of decision tree regression (DTR), support vector regression (SVR), long short-term memory (LSTM) and neural network multi-layer perceptron (NNMLP). Data of ten stocks from different sectors of National Stock Exchange Fifty (NIFTY 50) was considered. Execution time for each model is calculated to find out the fastest algorithm. Finally, correlation between prediction accuracy and performance measures is established. The results clearly show that increasing the training dataset size does not always increase the prediction accuracy. Characteristics of the dataset is one major factor that is responsible for prediction accuracy. DTR and SVR have very low average execution time compared to LSTM and NNMLP. Very strong negative correlation was found between mean absolute percentage error (MAPE) and prediction accuracy.
    Keywords: prediction accuracy; training dataset size; rolling dataset; performance measures; regression; neural network; execution time; stock price.
    DOI: 10.1504/IJBIDM.2023.10048648
     
  • Text Document Learning using Distributed Incremental Clustering Algorithm: Educational Certificates   Order a copy of this article
    by Archana Chaudhari, Preeti Mulay, Ayushi Agarwal, Krithika Iyer, Saloni Sarbhai 
    Abstract: Technological advancements have now allowed each one of us to learn new skills at home or through various workshops conducted, and one of the ways to award your skill is by providing certificates. The digital and handwritten certificates datasets are usually in images. We can use this information to provide analysis on which subject has recently gained popularity and how to improve the field of study at different universities. Therefore this paper proposes distributed incremental clustering with closeness factor-based algorithm (DIC2FBA) for text clustering. The primarily focused on Faculty development program certificates dataset that cover both text and numeric data. The proposed system used AWS EC2 instance and AWS S3 bucket, which helps to cluster data from multiple sites in iterative and incremental mode. Further, we have compared the findings achieved using the DIC2FBA with K-means modified inter and intra clustering (KM-I2C) algorithm based on silhouette score, and Davis Bouldin index. The proposed system will help educational institutions understand the popular skill set of faculties which can further be used to understand the effectiveness of such programs.
    Keywords: distributed incremental clustering; text document learning; educational certificates; faculty development program; FDP; AWS.
    DOI: 10.1504/IJBIDM.2024.10049120
     
  • Machine Learning approach for Data Analysis and Predicting Coronavirus Using COVID -19 India Dataset   Order a copy of this article
    by Soni Singh, Dr.K.R.Ramkumar Kumar, Ashima Kukkar 
    Abstract: According to the World Health Organisation (WHO), the COVID-19 virus would infect 83,558,756 persons worldwide in 2020, resulting in 646,949 deaths. In this research, we aim to find the link between the time series data and current circumstances to predict the future outbreak and try to figure out which technique is best for modelling for accurate predictions. The performance of different machine learning (ML) models such as sigmoid function, Facebook (FB) prophet model, seasonal auto-regressive integrated moving average with eXogenous factors (SARIMAX) model, support vector machine (SVM) learning model, linear regression (LR) model, and polynomial regression (PR) Model are analysed along with their error rate. A comparison is also done to evaluate a best-suited model for prediction based on different categorisation approaches on the WHO authenticated dataset of India. The result states that the PR model shows the best performance with time-series data of COVID-19 whereas the sigmoid model has the consistently smallest prediction error rates for tracking the dynamics of incidents. In contrast, the PR model provided the most realistic prediction to identify a plateau point in the incident’s growth curve.
    Keywords: COVID-19; pandemics; analysis on India; machine learning; prediction; comparison; support vector machine; SVM.
    DOI: 10.1504/IJBIDM.2024.10049479
     
  • Prediction of Stock Prices of Blue-Chip Companies using Machine Learning Algorithms   Order a copy of this article
    by Rajvir Kaur, Anurag Sharma 
    Abstract: Accurate stock market prediction is very challenging task for experts due to its volatile nature. To determine the future value of stock market, several researches are based on historical data. But nowadays, there are some external factors like social media and news headlines greatly affect the stock market. This research work is based on the prediction of future stock prices by using both twitter social media and news data along with historical data to get the high prediction results. The performance of machine learning algorithms logistic regression, SVM, random forest is analysed using matrices like accuracy, precision, recall, and F1 score. To train and test the final dataset, it is divided into 80:20 ratios. For each blue chip company, the testing dataset contains 248 samples, which exhibited the highest prediction accuracies ranging from 85% to 89% for prediction of stock prices is achieved using logistic regression algorithm.
    Keywords: blue-chip companies; machine learning; news headlines; social media; stock market prediction; Twitter.
    DOI: 10.1504/IJBIDM.2023.10049725
     
  • AN EFFICIENT MISSING VALUE IMPUTATION AND EVALUATION USING GK-KH MEANS AND HTR-RNN   Order a copy of this article
    by Syavasya CVSR, A. Lakshmi Muddana 
    Abstract: The accuracy of the data mining (DM) outcomes might be affected by mining and analysing incomplete datasets with missing values (MV). Thus, a complete dataset is created by the imputation of MV, which makes the analysis easier. An effectual missing values imputation (MVI) is proposed and evaluated utilising Gaussian kernel-K harmonic means (GK-KH Means) and hyperbolic tangent radial-recurrent neural networks (HTR-RNN) to combat this issue. At first, preprocessing is performed on the input data as of the CKD dataset wherein the duplicate form of the data gets eradicated. Next, the missing data are handled by ignoring them; and utilising GK-KH Means, the MV is imputed. Next, the data are rationalised into a structured format. Then, SDRM-DHO selects the most optimal features as of the extracted features. Lastly, the HTR-RNN classifier accepts these chosen features as input. Proposed work performed well in more accurate missing value imputation.
    Keywords: missing value imputation; K harmonic means; Gaussian kernel function; recurrent neural network; swap displacement reversion operation.
    DOI: 10.1504/IJBIDM.2023.10049909
     
  • Modeling of Electro chemical machining Parameters by Dimensional Analysis and Artificial Neural Network   Order a copy of this article
    by Senthilkumar C 
    Abstract: Metal matrix composites (MMCs) are gaining increasing attention for applications in various industries due to their light weight and greater wear resistance than those of conventional materials. Manufacturers embracing that difficulty in machining of MMC due to the abrasive nature of reinforcing particles, shorten the tool life. Electro chemical machining (ECM) is an enormously used non-conventional process for removing material in die making, aerospace, and automobile industries and machine any material with highest hardness. Hence in the present study ECM was used to machine metal matrix composite (MMC) made by stir casting process. Model was developed by using Buckingham’s π theorem and ANN to establish correlate the independent parameters with dependent responses of ECM process due to it is pretty difficult. Finally experimental values are compared with the predicted values of both models and found high prediction accuracy.
    Keywords: electro chemical machining; ECM; metal removal rate; MRR; surface roughness; dimensional analysis; artificial neural network; ANN.
    DOI: 10.1504/IJBIDM.2023.10050368
     
  • Using advanced web ontology language properties for deriving novel and consistent association rules   Order a copy of this article
    by Eliot Bytyci, Lule Ahmedi 
    Abstract: Association rule mining has long been used to discover relationship between data. On the other hand, using ontology properties can lead to the discovery of new knowledge. That can be combined with raw data to produce increasing number of association rules generated. The exploitation can also prevent the creation of additional, erroneous rules. Three domain ontologies are employed in the studies to support both assertions and determine which attributes likely to have a greater impact on rules creation. Initial enrichment of ontologies with the same type of properties, is then followed by application of association rules algorithms to each ontology. Results are contrasted with those produced using association rules applied to raw data. The work’s contribution can be divided into two categories: creating new rules and preventing the creation of new conflicting rules.
    Keywords: association rule mining; ARM; ontology; web ontology language; OWL; advanced properties; semantic web.
    DOI: 10.1504/IJBIDM.2024.10051202
     
  • Machine Learning Models for Predicting Customer Churn: A case study in a Software-as-a-Service Inventory Management Company.   Order a copy of this article
    by Naragain Phumchusri, Phongsatorn Amornvetchayakul 
    Abstract: Software-as-a-service (SaaS) is a software licensing model, which allows access to software a subscription basis using external servers. This article proposes customer churn prediction models for a SaaS inventory management company in Thailand. The main focus of this work is seeking the most suitable customer churn prediction model for this case-study SaaS inventory management company who is currently having a high churn rate issue. This paper explores four machine learning algorithms, which are logistic regression, support vector machine, decision tree (DT) and random forest. The results show that the optimised DT model is capable to outperform other classification models toward recall scorer with validated testing scores of 94.4% of recall and 88.2% of F1-score. Moreover, feature importance scores are investigated for practical insights to identify features that are significantly related to churn behaviour. Therefore, this finding can help the case-study company indicate customers who are going to churn more precisely and enhance the effectiveness in making managerial decision and effective marketing movement.
    Keywords: churn prediction; machine learning; software-as-a-service; SaaS.
    DOI: 10.1504/IJBIDM.2024.10051203
     
  • Discovery of Dangerous Self-Medication Methods with patients, by using Social Network Mining   Order a copy of this article
    by Reza Samizadeh, Morteza Khavanin Zadeh, Mahsa Jadidi, Mohammad Rezapour, Sahar Vatankhah 
    Abstract: Nowadays, social networks have replaced traditional media for information, and unfortunately some people around the world instead of reading books to writings that are easily accessible on these networks. Present study categorizes Persian texts on the Telegram social network at Jam Hospital and some Iranian websites on metabolic disease, obesity, and diabetes. Classifying data was done by text mining algorithms and the Naive Bayes was more accurate than Support Vector Machine. The results conclude that the “Venustat” is one of the treatments that are emphasized by people, and they recommend this treatment to each other. In medical science, this drug has many complications, and it should not be used arbitrarily. Also a very dangerous drug namely “Super Slim” is another drug that is strongly recommended by users. Therefore, raising public awareness is necessary to avoid relying on unscientific media content and facilitating access to medical services such as telemedicine.
    Keywords: Text Mining; Sentiment Analysis; Data Mining; Support Vector Machine; Naive Bayes; Social Network Mining; Health; Obesity and Metabolic; Diabetes Mellitus; Self-Medication.
    DOI: 10.1504/IJBIDM.2023.10052199