Template-Type: ReDIF-Article 1.0 Author-Name: Shambhu Kumar Author-X-Name-First: Shambhu Author-X-Name-Last: Kumar Author-Name: Arti Jain Author-X-Name-First: Arti Author-X-Name-Last: Jain Author-Name: Dinesh C.S. Bisht Author-X-Name-First: Dinesh C.S. Author-X-Name-Last: Bisht Title: Enhancing link prediction in dynamic social networks: a novel algorithm integrating global and local topological structures Abstract: The link prediction problem has gained significant importance due to the emergence of many social networks. Existing link prediction algorithms in social networks often prioritise local or global attributes, yielding satisfactory performance on specific network types but with limitations like reduced accuracy or higher computational burden. This paper presents a novel link prediction approach that integrates global and local topological structures, assessing node similarity through a similarity index formula between two node pairs that is based on three key features: the number of common neighbours between nodes with some penalty factor introduced for each common node, node influence, and the shortest path distance between unconnected nodes. Evaluation using AUC has been performed against seven datasets and demonstrates significant improvement over baseline and state-of-the-art methods, enhancing accuracy by 30% and 6.75%. This highlights the efficacy of integrating global and local features for more accurate link prediction. Journal: Int. J. of Data Mining, Modelling and Management Pages: 26-53 Issue: 1 Volume: 17 Year: 2025 Keywords: social network; link prediction; common neighbour; similarity measure; degree centrality; node distance. File-URL: http://www.inderscience.com/link.php?id=144611 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:1:p:26-53 Template-Type: ReDIF-Article 1.0 Author-Name: Serkan Alkan Author-X-Name-First: Serkan Author-X-Name-Last: Alkan Title: Comparative analysis of distance measures in stock network construction and cluster analysis Abstract: The mutual information (MI) metric and the Pearson correlation metric are both widely used in cluster analysis and stock network construction. This paper presents a detailed comparison between the MI metric and the Pearson correlation metric. To detect nonlinear relationships, polynomial and natural cubic spline regressions are proposed as alternatives to the MI metric. The methodology for computing model-fitting indices for determining network adjacencies is explained in detail, along with a comparison of the results with the MI methodology. This study employs two data sets derived from the log returns of the daily adjusted closing prices of 402 stocks in the S%P500 index to measure the impact of a financial crisis on nonlinearity: one covering the crisis period from January 2007 to December 2009, and the other covering the non-crisis period between January 2012 and December 2015. The local and global properties of hierarchical stock networks are compared using the minimum spanning tree for each distance measure. The graph-theoretic internal cluster validity indices and external indices are also used to investigate the relationship between the performance of the community detection algorithm and the selection of metrics. Journal: Int. J. of Data Mining, Modelling and Management Pages: 75-102 Issue: 1 Volume: 17 Year: 2025 Keywords: financial networks; mutual information; Pearson correlation; regression models; community detection. File-URL: http://www.inderscience.com/link.php?id=144614 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:1:p:75-102 Template-Type: ReDIF-Article 1.0 Author-Name: Ambily Balaram Author-X-Name-First: Ambily Author-X-Name-Last: Balaram Author-Name: Nedunchezhian Raju Author-X-Name-First: Nedunchezhian Author-X-Name-Last: Raju Title: A frequent itemset generation approach in data mining using transaction-labelling dynamic itemset counting method Abstract: A significant amount of data is generated, gathered, stored, and evaluated in real-world applications as a result of technology breakthroughs. Data mining (DM) combines a number of disciplines to efficiently discover hidden patterns from vast archives of historical information. To significantly reduce complexities associated with data, the proposed method, transaction-labelling dynamic itemset counting (TL-DIC), utilises a labelling approach on the given transactional database to logically arrange and process the underlying transactions. This method generates frequent itemsets thereby improving the performance of conventional dynamic itemset counting (DIC) method. Based on experimental findings, the average scan count in DIC and M-Apriori is 4% and 3.66%, respectively higher than TL-DIC, for different support counts. TL-DIC executes 20% and 16% quicker than DIC and M-Apriori, respectively, in terms of execution time. These results validate the proposed approach's efficacy in creating frequent itemsets from large datasets. Journal: Int. J. of Data Mining, Modelling and Management Pages: 54-74 Issue: 1 Volume: 17 Year: 2025 Keywords: data mining; association rule mining; ARM; dynamic itemset counting method; DIC; frequent itemset generation; transaction labelling; TL; labelling; complexities; scan count; transactional database; minimum support threshold; transaction-labelling dynamic itemset counting; TL-DIC. File-URL: http://www.inderscience.com/link.php?id=144615 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:1:p:54-74 Template-Type: ReDIF-Article 1.0 Author-Name: Wallace Anacleto Pinheiro Author-X-Name-First: Wallace Anacleto Author-X-Name-Last: Pinheiro Author-Name: Ricardo Q.A. Fernandes Author-X-Name-First: Ricardo Q.A. Author-X-Name-Last: Fernandes Author-Name: Ana Bárbara Sapienza Pinheiro Author-X-Name-First: Ana Bárbara Sapienza Author-X-Name-Last: Pinheiro Title: Sorting paired points: a dissimilarity measure based on sorting of series Abstract: We propose a new dissimilarity measure, sorting different time series and measuring their absolute and relative degree of disorganisation. This work compares this strategy with the state-of-the-art of dissimilarities or similarities measures, such as DTW, maximal information coefficient (MIC) and complexity-invariant distance (CID). Two clustering algorithms, one deterministic and one non-deterministic, K-means and hierarchical, allow us to analyse their results. To infer the accuracy, we use two different indexes, maximal HITS, and adjusted Rand index. The results of the experiments, over 128 different datasets, demonstrate that the proposed approach provides more accurate results for different domains using the proposed metrics. Journal: Int. J. of Data Mining, Modelling and Management Pages: 1-25 Issue: 1 Volume: 17 Year: 2025 Keywords: clustering; similarity; time series; entropy; sorting. File-URL: http://www.inderscience.com/link.php?id=144620 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:1:p:1-25 Template-Type: ReDIF-Article 1.0 Author-Name: Nongyao Nai-arun Author-X-Name-First: Nongyao Author-X-Name-Last: Nai-arun Author-Name: Warachanan Choothong Author-X-Name-First: Warachanan Author-X-Name-Last: Choothong Title: Ensemble learning models for predicting the gaming addiction behaviours of adolescents Abstract: This paper proposes: 1) to create a prediction model for the game addiction of adolescents using six data mining algorithms; 2) to optimise the models by adjusting the parameters; 3) to create an ensemble model. Bagging and boosting algorithms were investigated for improving the models. Data were collected from eight Northern Rajabhat Universities in Thailand. The results found that bagging with neural network had shown the highest performance with an accuracy of 99.35%, followed by the boosting with neural network (99.02%), the model with the best-optimised parameters of the neural network algorithm achieved by adjusting the learning rate. The best model was used to develop a web application for predicting the gaming addiction behaviours of adolescents, which would contribute to solve the problem. Journal: Int. J. of Data Mining, Modelling and Management Pages: 103-125 Issue: 1 Volume: 17 Year: 2025 Keywords: classification; ensemble learning; bagging; boosting; neural network; random forest; optimisation; gaming addiction behaviours. File-URL: http://www.inderscience.com/link.php?id=144623 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:1:p:103-125 Template-Type: ReDIF-Article 1.0 Author-Name: Hari Lal Bhaskar Author-X-Name-First: Hari Lal Author-X-Name-Last: Bhaskar Title: Analysis and evaluation of business process management tools and techniques in the Industry 4.0 Abstract: The purpose of this paper is to analyse and evaluate the different tools and techniques of business process management (BPM) as well as selection and adoption factors for process mining tools in Industry 4.0 for BPM. This paper also discusses that how tools and techniques of process mining can be used to drive the pedals of microeconomics principles. This paper discusses the core concepts of BPM and process mining tool in Industry 4.0 as well as evaluation of different types of models, etc. A tactical roadmap has been provided with a lot of comparative analysis for selecting a process mining tool or software for initiating a business process optimisation or BPR program. This work lies in the fact that how the modern-day digitally enabled organisation, Industry 4.0 to be specific, can actually benefit and re-organise its legacy systems using data-driven business insights, in order to achieve operational excellence. Journal: Int. J. of Data Mining, Modelling and Management Pages: 165-199 Issue: 2 Volume: 17 Year: 2025 Keywords: business process management; BPM; digital transformation; digitalisation; process mining; Industry 4.0; BPM tools; industrial internet of things; IIoT. File-URL: http://www.inderscience.com/link.php?id=146584 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:2:p:165-199 Template-Type: ReDIF-Article 1.0 Author-Name: Mrunal Prakash Gavali Author-X-Name-First: Mrunal Prakash Author-X-Name-Last: Gavali Author-Name: Abhishek Verma Author-X-Name-First: Abhishek Author-X-Name-Last: Verma Title: Ensemble of large self-supervised transformers for improving speech emotion recognition Abstract: Speech emotion recognition (SER) is a challenging and active field of collaborative, social robotics to improve human-robot interaction (HRI) and affective computing as a feedback mechanism. More recently self-supervised learning (SSL) approaches have become an important method for learning speech representations. We present results of experiments on the challenging large-scale speech emotion RAVDESS dataset. Six very large state-of-the-art self-supervised learning transformer models were trained on the speech emotion dataset. Wav2Vec2.0-XLSR-53 was the most successful of the six level-0 models and achieved classification accuracy of 93%. We propose majority voting ensemble models that combined three and five level-0 models. The five-model and three-model majority voting ensemble models achieved 96.88% and 96.53% accuracy respectively and thereby significantly outperformed the best level-0 model and surpassed the state-of-the-art. Journal: Int. J. of Data Mining, Modelling and Management Pages: 217-244 Issue: 2 Volume: 17 Year: 2025 Keywords: speech emotion recognition; SER; self-supervised learning; SSL; emotion AI; transformers; speech processing; acoustic features. File-URL: http://www.inderscience.com/link.php?id=146585 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:2:p:217-244 Template-Type: ReDIF-Article 1.0 Author-Name: Sowjanya Yerramaneni Author-X-Name-First: Sowjanya Author-X-Name-Last: Yerramaneni Author-Name: Sudheer K. Reddy Author-X-Name-First: Sudheer K. Author-X-Name-Last: Reddy Title: A review on breast cancer detection using machine learning techniques Abstract: One of the major diseases that has a high mortality rate in women is breast cancer. As the death rate of women has been increasing every year, it is necessary to decrease this number to detect the cancerous cells accurately by employing various methods. This paper presents a review of various works on the detection of breast cancer using various machine learning techniques such as decision tree, random forest, K-nearest neighbour, support vector machine, logistic regression and Naïve Bayes classifier. In addition, the paper also covers various deep neural network techniques and the comparison of various works. It follows various steps, namely pre-processing of breast image, mass detection, feature selection and image segmentation, feature extraction and classification. These steps are applied on various datasets namely, Wisconsin dataset, ImageNet, BreakHis, histopathological images and MIAS. The performance of various models has been examined and made a comparative study by considering accuracy, sensitivity and specificity metrics. Authors of this paper presented an overview of the current developments in cancer research by leveraging machine learning, deep learning and transformer models. Further, the authors also proposed the future scope of the work. Journal: Int. J. of Data Mining, Modelling and Management Pages: 142-164 Issue: 2 Volume: 17 Year: 2025 Keywords: breast cancer; classification models; machine learning; neural networks; deep learning. File-URL: http://www.inderscience.com/link.php?id=146586 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:2:p:142-164 Template-Type: ReDIF-Article 1.0 Author-Name: Sena Kumcu Author-X-Name-First: Sena Author-X-Name-Last: Kumcu Author-Name: Bahar Özyörük Author-X-Name-First: Bahar Author-X-Name-Last: Özyörük Title: An approach to improve the healthcare purchase decision: an application in a healthcare centre in Türkiye Abstract: For the healthcare sector, the right supplier selection and order quantity allocation decisions for the healthcare sector are crucial because the healthcare sector must deliver its products and services to its patients properly and on time. However, in this sector, supplier selection and order allocation decision are still not given enough attention. For this reason, there is a significant research and application gap in the literature. In this study, first, in order to determine the annual purchasing needs of the medical equipment that are vital for a healthcare centre in Ankara, Türkiye, always-better-control vital-essential-desirable (ABC-VED) analysis were used. Then six different scenarios for determined vital equipments were created by using goal programming model with GAMS (24.1.3) program to help the decision maker improve the purchase decision process. This proposed approach increases the efficiency of the decision process by providing the decision maker with alternative decision plans. Journal: Int. J. of Data Mining, Modelling and Management Pages: 127-141 Issue: 2 Volume: 17 Year: 2025 Keywords: healthcare procurement practices; supplier selection; order allocation; goal programming; ABC-VED analysis. File-URL: http://www.inderscience.com/link.php?id=146587 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:2:p:127-141 Template-Type: ReDIF-Article 1.0 Author-Name: Blanka Bártová Author-X-Name-First: Blanka Author-X-Name-Last: Bártová Author-Name: Vladislav Bína Author-X-Name-First: Vladislav Author-X-Name-Last: Bína Title: Training an artificial neural network for an effective PCB defect detection Abstract: The printed circuit boards (PCBs) are crucial components of most electronic devices. In the last decades, the PCBs' manufacturing process was significantly improved, mainly by surface mounted technology (SMT) and automatic optical inspection (AOI) implementation. The real data as an output from the AOI device used for our analysis have been composed in a real manufacturing company. The currently used AOI solution achieves an accuracy of 95.82%. The goal of our study was to train an artificial neural network (ANN) to detect the defect PCBs with the highest possible accuracy. Different approaches have been used for ANN training, such as the experimental approach, regression, and Taguchi method. The resulted PCA-ANN model combines principal components analysis (PCA) method for data dimensionality reduction and ANN for low quality products detection. Our proposed model increases the AOI accuracy rate by 3.95%. Journal: Int. J. of Data Mining, Modelling and Management Pages: 200-216 Issue: 2 Volume: 17 Year: 2025 Keywords: artificial neural network; ANN; Taguchi; printed circuit board; PCB; defect; detection; surface mounted technology; SMT; regression; data mining; networks training; quality management; Industry 4.0. File-URL: http://www.inderscience.com/link.php?id=146588 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:2:p:200-216 Template-Type: ReDIF-Article 1.0 Author-Name: Xin Zhang Author-X-Name-First: Xin Author-X-Name-Last: Zhang Author-Name: Zhixin Kang Author-X-Name-First: Zhixin Author-X-Name-Last: Kang Author-Name: Guanlin Gao Author-X-Name-First: Guanlin Author-X-Name-Last: Gao Author-Name: Xinyan Shi Author-X-Name-First: Xinyan Author-X-Name-Last: Shi Title: Analysing and forecasting COVID-19 vaccination - evidence from a Native American community in North Carolina, USA Abstract: This study examines the determining factors of vaccination decisions for adults and children in a historical tribal region and evaluates various machine learning models in their predicting powers. COVID-19 vaccination data were investigated; though, the proposed method may be used for evaluating other vaccination data. We administrated a survey and collected cross-sectional data (e.g., socio-demographics, COVID-19 testing behaviours, vaccination status, and people's knowledge about, attitude toward, and belief in the vaccines), developed new features and built predicting models (e.g., random forest, neural network, and decision tree), and evaluated their performance against the benchmark logistic regression models. The results show that people, who tested more frequently, believed vaccination is a social responsibility, and were provided with paid leaves from employers are more likely to be fully vaccinated and vaccinate their children. Our results also show that not all machine learning models outperform the logistic regression model. Journal: Int. J. of Data Mining, Modelling and Management Pages: 245-271 Issue: 3 Volume: 17 Year: 2025 Keywords: COVID-19 vaccination intention; feature design and evaluation; vaccination forecasting; machine learning; Bayesian-correlation; model evaluation. File-URL: http://www.inderscience.com/link.php?id=148835 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:3:p:245-271 Template-Type: ReDIF-Article 1.0 Author-Name: Jyotirmayee Rautaray Author-X-Name-First: Jyotirmayee Author-X-Name-Last: Rautaray Author-Name: Sangram Panigrahi Author-X-Name-First: Sangram Author-X-Name-Last: Panigrahi Author-Name: Ajit Kumar Nayak Author-X-Name-First: Ajit Kumar Author-X-Name-Last: Nayak Title: Multi-document text summarisation using DL-BILSTM model with hybrid algorithms Abstract: With the overwhelming amount of information available online, it becomes challenging for users to access relevant data. Automated techniques are essential to effectively filter and extract valuable information from vast datasets. Recently, text summarisation has emerged as a key method for distilling relevant content from lengthy documents. This work introduces a novel deep learning-based approach for multi-document text summarisation. The proposed system begins with preprocessing tasks such as stop word removal, sentence and paragraph chunking, stemming, and lemmatisation. Textual phrases are transformed into vector space models using TF-ISF and sentence scores are evaluated. A deep learning-based bidirectional long short-term memory model is employed for summarisation. Additionally, cat swarm optimisation and aquila optimisers refine DL model's parameters. The approach is validated using DUC 2002, DUC 2003, and DUC 2005 datasets, demonstrating superior performance across various metrics including Rouge scores, BLEU scores, cohesion, sensitivity, positive predictive value, and readability when compared to other summarisation methods. Journal: Int. J. of Data Mining, Modelling and Management Pages: 334-363 Issue: 3 Volume: 17 Year: 2025 Keywords: multi-document text summarisation; MDTS; BiLSTM; term frequency-inverse sentence frequency; deep learning; Aquila optimiser; cat swarm optimisation; CSO; natural language processing; NLP. File-URL: http://www.inderscience.com/link.php?id=148836 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:3:p:334-363 Template-Type: ReDIF-Article 1.0 Author-Name: Bibi Saqia Author-X-Name-First: Bibi Author-X-Name-Last: Saqia Author-Name: Khairullah Khan Author-X-Name-First: Khairullah Author-X-Name-Last: Khan Author-Name: Atta Ur Rahman Author-X-Name-First: Atta Ur Author-X-Name-Last: Rahman Title: Identifying immoral posts on social media platforms: a review Abstract: Social media has become an integral part of our lives, connecting people across different parts of the world. Recently, there has been an increasing concern over the proliferation of immoral content on social media platforms. The ease and speed of communication on social media have made it a popular platform for people to express their opinions. Still, it has also led to the spread of harmful and immoral content. Hate speech, cyberbullying, and other forms of immoral behaviour are common on social media platforms, which can have serious consequences for the individuals involved and the wider community. Current literature reviews have normally fixated on a specific class of immoral posts as hate speech. According to the study, no review has been dedicated to overall categories of immoral post-identification. This paper describes a systematic literature review of computational approaches, resources, challenges, and research gaps about overall categories of immoral post-identification. Journal: Int. J. of Data Mining, Modelling and Management Pages: 296-333 Issue: 3 Volume: 17 Year: 2025 Keywords: immoral posts; social media; cyberbullying; hate speech; challenges and issues. File-URL: http://www.inderscience.com/link.php?id=148837 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:3:p:296-333 Template-Type: ReDIF-Article 1.0 Author-Name: Nur Izzaty Author-X-Name-First: Nur Author-X-Name-Last: Izzaty Author-Name: Adelia Shinta Author-X-Name-First: Adelia Author-X-Name-Last: Shinta Author-Name: Riski Arifin Author-X-Name-First: Riski Author-X-Name-Last: Arifin Author-Name: Sri Rahmawati Author-X-Name-First: Sri Author-X-Name-Last: Rahmawati Title: Sentiment analysis on customer reviews in Indonesian marketplace using natural language processing (a case study of organic face mask) Abstract: The increasing development of technology nowadays has led to the transformation of customers behaviour in purchasing products, from offline to online through marketplace. One of the most popular marketplaces in Indonesia is Shopee with the best seller skincare product is organic face mask. This study aims to analyse the sentiment of customer reviews using natural language processing (NLP) and term frequency-inversed document frequency (TF-IDF). The result revealed that from 882 reviews extracted, 89.7% was classified as positive reviews (rating 4 and 5) and the rest as much as 10.3% was the negative ones (rating 1 and 2). The sentiments were visualised using word cloud. Among the positive reviews were 'very good', 'quickly absorbed', and 'convenient'. Meanwhile, among the negative reviews were 'disappointed', 'delivery', and 'acne'. In summary, the performance metrics used for the evaluation of the classification model showed that the model accuracy reached 95%. Journal: Int. J. of Data Mining, Modelling and Management Pages: 364-381 Issue: 3 Volume: 17 Year: 2025 Keywords: customer reviews; natural language processing; NLP; sentiment analysis; term frequency-inverse document frequency; TF-IDF; skincare; organic face mask. File-URL: http://www.inderscience.com/link.php?id=148839 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:3:p:364-381 Template-Type: ReDIF-Article 1.0 Author-Name: Shini Lawrance Author-X-Name-First: Shini Author-X-Name-Last: Lawrance Author-Name: J.R. Jeba Author-X-Name-First: J.R. Author-X-Name-Last: Jeba Title: Ensemble model with improved DCNN for big data classification by handling class imbalance problem Abstract: This research suggests a big data classification model that uses an improved deep convolutional neural network (IDCNN) and has five phases. In the first stage, Z-score normalisation is employed for preprocessing the input data. The second phase involves processing the preprocessed data for improved class imbalance using SMOTE-ENC. Then, the subsequent phase involves extracting the collection of features, which also includes raw data and features based on correlation, entropy, and MI. Then, in the fourth phase, to guarantee appropriate feature selection, an improved recursive feature elimination (IRFE) approach is employed for the selection of features is performed using the extracted features. Finally, ensemble classification using a collection of classifiers like Bi-LSTM, SVM, RNN and IDCNN is performed depending on the features that have been chosen. The IDCNN classifier is used in this case to categorise the final result by taking Bi-LSTM, SVM and RNN output scores as input. Journal: Int. J. of Data Mining, Modelling and Management Pages: 272-295 Issue: 3 Volume: 17 Year: 2025 Keywords: data; classification; class imbalance; deep convolutional neural network; DCNN; improved recursive feature elimination; IRFE. File-URL: http://www.inderscience.com/link.php?id=148853 File-Format: text/html File-Restriction: Access to full text is restricted to subscribers. Handle: RePEc:ids:ijdmmm:v:17:y:2025:i:3:p:272-295