Forthcoming and Online First Articles

International Journal of Data Mining, Modelling and Management

International Journal of Data Mining, Modelling and Management (IJDMMM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Mining, Modelling and Management (20 papers in press)

Regular Issues

  • Discrete Cuckoo Search for 0-1 knapsack problem   Order a copy of this article
    by Aziz Ouaarab 
    Abstract: This paper presents a resolution of a space management optimisation problem such as 0-1 knapsack problems (KP) by discrete cuckoo search algorithm (DCS). The proposed approach includes an adaptation process of three main components: the objective function, the solution representation, and the step move operator. A simplified conception of these three components is designed without introducing an additional technique, especially in the search process for the optimal solution. Three sets of benchmark instances have been taken from the literature to test the performance of DCS. Experimental results prove that DCS is effective in solving different types of 0-1 KP instances. The result comparisons with other state-of-the-art algorithms show that DCS is a competitive approach that outperforms most of them.
    Keywords: 0-1 knapsack problem; discrete cuckoo search; DCS; combinatorial optimisation; L´evy flights; approximate algorithm.
    DOI: 10.1504/IJDMMM.2024.10064048
     
  • Early Stage Analysis of Breast Cancer Using Intelligent System   Order a copy of this article
    by Arpita Nath Boruah, Mrinal Goswami 
    Abstract: Breast cancer (BC) poses a considerable global health concern for women which makes a significant issue for women's well-being worldwide. It is crucial to develop a system that can proactively identify the critical risk factors associated with BC. The present study introduces an intelligent system for BC by analysing risk factors (IS-BC-analysing-RF) which utilises decision tree rules to identify the primary risk factors underlying BC accurately. The rules are processed based on the proposed score function to get the most relevant ones. Finally, using the sequential search approach, the critical risk factors are identified along with their respective ranges. Based on the simulation results using University of California at Irvine (UCI) repository BC dataset, the findings indicate that the proposed IS-BC-analysing-RF system is highly significant and has the potential to effectively mitigate the risk of BC by targeting and managing one or two crucial risk factors.
    Keywords: decision system; breast cancer; decision tree; machine learning; risk factor.
    DOI: 10.1504/IJDMMM.2024.10064214
     
  • A Novel LWT-based Robust Watermark Strategy for Colour Images   Order a copy of this article
    by Prachee Dewangan, Debabala Swain, Monalisa Swain 
    Abstract: With the progress of information technology, digital data larceny and duplicity become very easier. Image watermarking in cryptography is a major domain that provides manifold security features like confidentiality, authenticity, integrity, etc. This research introduces a robust watermarking scheme for colour images. The proposed technique segments the colour image into three layers red, green and blue. The lifting wavelet transform (LWT) and differential histogram shifting are used to embed text watermark information into the R layer. The performance of the proposed technique was assessed using the SIPI image dataset. Test outputs show that the proposed scheme maintains the balance between imperceptibility and robustness. This scheme has a better resistance against all types of attacks like different noises, filter effects, image compressions, etc. Besides, the text watermark can be successfully extracted for different types of tampering like content removal attacks, and content addition attacks.
    Keywords: robust watermarking; geometric attack; fragile attack; dual watermark; lifting wavelet transform.
    DOI: 10.1504/IJDMMM.2024.10064256
     
  • Detecting Driver Mutations in Colorectal Cancer through Big Data Analysis   Order a copy of this article
    by Amna Sethi, Muhammad Saad Khan, Fatima Hashmi, Saim Ali Akber 
    Abstract: Colorectal cancer (CRC) is a complex disease causing a significant challenge to global health with profound impacts on morbidity and mortality. There is a need to identify genetic biomarkers for early diagnosis of disease. In this study, a comprehensive analysis of CRC genomes was conducted to identify consistent mutations in both coding and non-coding highlighting their pivotal role in CRC pathogenesis. The results of this study revealed consistent mutations in coding regions that validated known CRC driver genes. The consistent non-coding mutations were also identified within transcription factors binding sites (TFBS) in CRC cell lines. The statistical significance of these mutations suggests their potential impact on gene regulation leading to the development and progression of CRC. They might act as potential biomarkers for early diagnosis of the disease. To conclude, the findings of this study might provide novel therapeutic targets and diagnostic markers for personalised medicine.
    Keywords: colorectal cancer; CRC; driver mutations; driver genes; biomarkers; transcription factors binding sites; TFBS.
    DOI: 10.1504/IJDMMM.2024.10064784
     
  • Enhancing Link Prediction in Dynamic Social Networks: A Novel Algorithm Integrating Global and Local Topological Structures   Order a copy of this article
    by Shambhu Kumar, Arti Jain, DINESH BISHT 
    Abstract: The link prediction problem has gained significant importance due to the emergence of many social networks. Existing link prediction algorithms in social networks often prioritise local or global attributes, yielding satisfactory performance on specific network types but with limitations like reduced accuracy or higher computational burden. This paper presents a novel link prediction approach that integrates global and local topological structures, assessing node similarity through a similarity index formula between two node pairs that is based on three key features: the number of common neighbours between nodes with some penalty factor introduced for each common node, node influence, and the shortest path distance between unconnected nodes. Evaluation using AUC has been performed against seven datasets and demonstrates significant improvement over baseline and state-of-the-art methods, enhancing accuracy by 30% and 6.75%. This highlights the efficacy of integrating global and local features for more accurate link prediction.
    Keywords: social network; link prediction; common neighbour; similarity measure; degree centrality; node distance.
    DOI: 10.1504/IJDMMM.2025.10064902
     
  • Comparative Analysis of Distance Measures in Stock Network Construction and Cluster Analysis   Order a copy of this article
    by Serkan Alkan 
    Abstract: The mutual information (MI) metric and the Pearson correlation metric are both widely used in cluster analysis and stock network construction. This paper presents a detailed comparison between the MI metric and the Pearson correlation metric. To detect nonlinear relationships, polynomial and natural cubic spline regressions are proposed as alternatives to the MI metric. The methodology for computing model-fitting indices for determining network adjacencies is explained in detail, along with a comparison of the results with the MI methodology. This study employs two data sets derived from the log returns of the daily adjusted closing prices of 402 stocks in the S&P500 index to measure the impact of a financial crisis on nonlinearity: one covering the crisis period from January 2007 to December 2009, and the other covering the non-crisis period between January 2012 and December 2015. The local and global properties of hierarchical stock networks are compared using the minimum spanning tree for each distance measure. The graph-theoretic internal cluster validity indices and external indices are also used to investigate the relationship between the performance of the community detection algorithm and the selection of metrics.
    Keywords: financial networks; mutual information; Pearson correlation; regression models; community detection.
    DOI: 10.1504/IJDMMM.2025.10065097
     
  • Analysis and Evaluation of Business Process Management (BPM) Tools and Techniques in the Industry 4.0   Order a copy of this article
    by Hari Lal Bhaskar  
    Abstract: The purpose of this paper is to analysis and evaluate the different tools and techniques of business process management (BPM) as well as selection and adoption factors for process mining tools in industry 4.0 for BPM. This paper also discusses that how tools and techniques of process mining can be used to drive the pedals of microeconomics principles. This paper discusses the core concepts of BPM and process mining tool in industry 4.0 as well as evaluation of different types of models etc. A tactical roadmap has been provided with a lot of comparative analysis for selecting a process mining tool or software for initiating a business process optimisation or BPR program. This work lies in the fact that how the modern-day digitally enabled organization, industry 4.0 to be specific, can actually benefit and re-organise its legacy systems using data-driven business insights, in order to achieve operational excellence.
    Keywords: Business Process Management (BPM); Digital Transformation; Digitalization; Process Mining; Industry 4.0; BPM tools; Industrial Internet of Things (IIoT).
    DOI: 10.1504/IJDMMM.2025.10065406
     
  • A Frequent Itemset Generation Approach in Data Mining using Transaction-Labelling Dynamic Itemset Counting (TL-DIC) Method   Order a copy of this article
    by Ambily Balaram, Nedunchezhian Raju 
    Abstract: A significant amount of data is generated, gathered, stored, and evaluated in real-world applications as a result of technology breakthroughs. Data mining (DM) combines a number of disciplines to efficiently discover hidden patterns from vast archives of historical information. To significantly reduce complexities associated with data, the proposed method, transaction-labelling dynamic itemset counting (TL-DIC), utilises a labelling approach on the given transactional database to logically arrange and process the underlying transactions. This method generates frequent itemsets thereby improving the performance of conventional dynamic itemset counting (DIC) method. Based on experimental findings, the average scan count in DIC and M-Apriori is 4% and 3.66%, respectively higher than TL-DIC, for different support counts. TL-DIC executes 20% and 16% quicker than DIC and M-Apriori, respectively, in terms of execution time. These results validate the proposed approach’s efficacy in creating frequent itemsets from large datasets.
    Keywords: data mining; association rule mining; ARM; dynamic itemset counting method; DIC; frequent itemset generation; transaction labelling; TL; labelling.
    DOI: 10.1504/IJDMMM.2025.10065414
     
  • Sorting Paired Points: A Dissimilarity Measure Based on Sorting of Series   Order a copy of this article
    by Wallace Pinheiro, Ricardo Q. A. Fernandes, Ana Bárbara Sapienza Pinheiro 
    Abstract: We propose a new dissimilarity measure, sorting different time series and measuring their absolute and relative degree of disorganisation. This work compares this strategy with the state-of-the-art of dissimilarities or similarities measures, such as DTW, maximal information coefficient (MIC) and complexity-invariant distance (CID). Two clustering algorithms, one deterministic and one non-deterministic, K-means and hierarchical, allow us to analyse their results. To infer the accuracy, we use two different indexes, maximal HITS, and adjusted Rand index. The results of the experiments, over 128 different datasets, demonstrate that the proposed approach provides more accurate results for different domains using the proposed metrics.
    Keywords: clustering; similarity; time series; entropy; sorting.
    DOI: 10.1504/IJDMMM.2025.10065723
     
  • Ensemble of Large Self-Supervised Transformers for Improving Speech Emotion Recognition   Order a copy of this article
    by Mrunal Gavali, Abhishek Verma 
    Abstract: Speech emotion recognition (SER) is a challenging and active field of collaborative, social robotics to improve human-robot interaction (HRI) and affective computing as a feedback mechanism. More recently self-supervised learning (SSL) approaches have become an important method for learning speech representations. We present results of experiments on the challenging largescale speech emotion RAVDESS dataset. Six very large state-of-the-art selfsupervised learning transformer models were trained on the speech emotion dataset.Wav2vec2.0-XLSR-53 was the most successful of the six level-0 models and achieved classification accuracy of 93%. We propose majority voting ensemble models that combined three and five level-0 models. The five-model and three-model majority voting ensemble models achieved 96.88% and 96.53% accuracy respectively and thereby significantly outperformed the best level-0 model and surpassed the state-of-the-art.
    Keywords: Speech Emotion Recognition; self-supervised learning; Emotion AI; transformers; speech processing; acoustic features.
    DOI: 10.1504/IJDMMM.2025.10065871
     
  • Ensemble Learning Models for Predicting the Gaming Addiction Behaviours of Adolescents   Order a copy of this article
    by Nongyao Nai-arun, Warachanan Choothong 
    Abstract: This paper proposes: 1) to create a prediction model for the game addiction of adolescents using six data mining algorithms; 2) to optimise the models by adjusting the parameters; 3) to create an ensemble model. Bagging and boosting algorithms were investigated for improving the models. Data were collected from eight Northern Rajabhat Universities in Thailand. The results found that bagging with neural network had shown the highest performance with an accuracy of 99.35%, followed by the boosting with neural network (99.02%). The model with the best-optimised parameters of the neural network algorithm achieved by adjusting the learning rate. The best model was used to develop a web application for predicting the gaming addiction behaviours of adolescents which would contribute to solve the problem.
    Keywords: classification; ensemble learning; bagging; boosting; neural network; random forest; optimisation; gaming addiction behaviours.
    DOI: 10.1504/IJDMMM.2025.10065942
     
  • A Review on Breast Cancer Detection using Machine Learning Techniques   Order a copy of this article
    by Sowjanya Yerramaneni, Sudheer Reddy K. 
    Abstract: One of the major diseases that has a high mortality rate in women is breast cancer. As the womens death rate has been increasing every year, it is necessary to decrease this number to detect the cancerous cells accurately by employing various methods. This paper presents a review of various works on the detection of breast cancer using various machine learning techniques such as decision tree, random forest, K-nearest neighbour, support vector machine, logistic regression and Na
    Keywords: breast cancer; classification models; machine learning; neural networks; deep learning.
    DOI: 10.1504/IJDMMM.2025.10065995
     
  • An Approach to Improve the Healthcare Purchase Decision: An Application in a Healthcare Center in Turkey   Order a copy of this article
    by Sena Kumcu, Bahar Özyörük 
    Abstract: For the healthcare sector, the right supplier selection and order quantity allocation decisions for the healthcare sector are crucial because the healthcare sector must deliver its products and services to its patients properly and on time. However, in this sector, supplier selection and order allocation decision is still not given enough attention. For this reason, there is a significant research and application gap in the literature. In this study, first, in order to determine the annual purchasing needs of the medical equipment that are vital for a healthcare centre in Ankara, T
    Keywords: healthcare procurement practices; supplier selection; order allocation; goal programming; ABC-VED analysis.
    DOI: 10.1504/IJDMMM.2025.10066154
     
  • Analysing and Forecasting COVID-19 Vaccination - Evidence from a Native American Community in North Carolina, USA   Order a copy of this article
    by Xin Zhang, Zhixin Kang, Guanlin Gao, Xinyan Shi 
    Abstract: This study examines the determining factors of vaccination decisions for adults and children in a historical tribal region and evaluates various machine learning models in their predicting powers. COVID-19 vaccination data were investigated; though, the proposed method may be used for evaluating other vaccination data. We administrated a survey and collected cross-sectional data (e.g., socio-demographics, COVID-19 testing behaviours, vaccination status, and people's knowledge about, attitude toward, and belief in the vaccines), developed new features and built predicting models (e.g., random forest, neural network, and decision tree), and evaluated their performance against the benchmark logistic regression models. The results show that people, who tested more frequently, believed vaccination is a social responsibility, and were provided with paid leaves from employers are more likely to be fully vaccinated and vaccinate their children. Our results also show that not all machine learning models outperform the logistic regression model.
    Keywords: COVID-19 vaccination intention; feature design and evaluation; vaccination forecasting; machine learning; Bayesian-correlation; model evaluation.
    DOI: 10.1504/IJDMMM.2025.10066364
     
  • Multi-Document Text Summarisation using DL-BiLSTM Model with Hybrid Algorithms   Order a copy of this article
    by Jyotirmayee Rautaray, Sangram Panigrahi, Ajit Kumar Nayak 
    Abstract: With the overwhelming amount of information available online, it becomes challenging for users to access relevant data. Automated techniques are essential to effectively filter and extract valuable information from vast datasets. Recently, text summarisation has emerged as a key method for distilling relevant content from lengthy documents. This work introduces a novel deep learning-based approach for multi-document text summarisation. The proposed system begins with pre-processing tasks such as stop word removal, sentence and paragraph chunking, stemming, and lemmatisation. Textual phrases are transformed into vector space models using TF-ISF and sentence scores are evaluated. A deep learning-based bidirectional long short-term memory model is employed for summarisation. Additionally, cat swarm optimisation and aquila optimisers refine DL model's parameters. The approach is validated using DUC 2002, DUC 2003, and DUC 2005 datasets, demonstrating superior performance across various metrics including Rouge scores, BLEU scores, cohesion, sensitivity, positive predictive value, and readability when compared to other summarisation methods.
    Keywords: multi-document text summarisation; MDTS; BiLSTM; term frequency-inverse sentence frequency; deep learning; Aquila optimiser; cat swarm optimisation; CSO; natural language processing; NLP.
    DOI: 10.1504/IJDMMM.2025.10066438
     
  • Training an Artificial Neural Network for an Effective PCB Defect Detection   Order a copy of this article
    by Blanka Bartova, Vladislav Bina 
    Abstract: The Printed Circuit Boards (PCBs) are crucial components of most electronic devices. In the last decades, the PCBs' manufacturing process was significantly improved, mainly by Surface Mounted Technology (SMT) and Automatic Optical Inspection (AOI) implementation. The real data as an output from the AOI device used for our analysis have been composed in a real manufacturing company. The currently used AOI solution achieves an accuracy of 95 82%. The goal of our study was to train an Artificial Neural Network (ANN) to detect the defect PCBs with the highest possible accuracy. Different approaches have been used for ANN training, such as the experimental approach, regression, and Taguchi method. The resulted PCA-ANN model combines Principal Components Analysis (PCA) method for data dimensionality reduction and ANN for low quality products detection. Our proposed model increases the AOI accuracy rate by 3.95%.
    Keywords: ANN; Taguchi; PCB; defect; detection; SMT; regression; data mining; networks training; quality management; Industry 4.0.
    DOI: 10.1504/IJDMMM.2025.10066541
     
  • Identifying Immoral Posts on Social Media Platforms: a Review   Order a copy of this article
    by Bibi Saqia, Khairullah Khan, Atta Ur Rahman 
    Abstract: Social media has become an integral part of our lives, connecting people across different parts of the world. Recently, there has been an increasing concern over the proliferation of immoral content on social media platforms. The ease and speed of communication on social media have made it a popular platform for people to express their opinions. Still, it has also led to the spread of harmful and immoral content. Hate speech, cyberbullying, and other forms of immoral behaviour are common on social media platforms, which can have serious consequences for the individuals involved and the wider community. Current literature reviews have normally fixated on a specific class of immoral posts as hate speech. According to the study, no review has been dedicated to overall categories of immoral post-identification. This paper describes a systematic literature review of computational approaches, resources, challenges, and research gaps about overall categories of immoral post-identification.
    Keywords: immoral posts; social media; cyberbullying; hate speech; challenges and issues.
    DOI: 10.1504/IJDMMM.2025.10066845
     
  • Sentiment Analysis of Danish Health Care Industries' Financial Text   Order a copy of this article
    by Rudra Pratap Deb Nath, Emil Bækdahl, Magnus Brogaard Larsen, Jakob Skallebæk, Jesper Juul Severinsen 
    Abstract: Sentiment analysis enables organisations to gain insights into market trends and customer opinions expressed in textual format. It quantifies textual opinions by classifying them as positive, negative, or neutral. We present a system for performing sentiment analysis on Danish texts related to the Danish healthcare industry. The system is composed of two components: domain-specific sentiment lexicon (DSSL) generator and dependency tree-based sentence analyser (DTSA). To generate DSSL, we use company stock prices to automatically label the sentiments of financial news articles based on the point-wise mutual information method and achieve performance improvements compared to existing general sentiment lexicons. Our DTSA is based on a data structure called a dependency tree, which describes how words in a text are connected. Depending on the types of connections between the words, we apply different rules to compute a sentiment value. This approach, in conjunction with DSSL, performs best in three-class sentence classification compared to systems using different sentiment lexicons and/or sentiment analysis components. We achieve an accuracy of 53% and the best F1 scores.
    Keywords: Sentiment Analysis; Danish Text Mining; Business Intelligence; Knowledge Discovery; Natural Language Processing; ETL.
    DOI: 10.1504/IJDMMM.2025.10066891
     
  • Lung Disease Classification using Deep Learning 1-D Convolutional Neural Network   Order a copy of this article
    by J. Viji Gripsy, Divya T 
    Abstract: Healthcare plays a crucial role in human life, particularly in the early diagnosis of diseases such as lung cancer, which affects people worldwide. Early detection of lung cancer can significantly improve treatment outcomes. This paper proposes a 1-D CNN deep learning architecture to classify patients into low, medium, and high-risk categories for lung cancer. The model achieves 97% training accuracy and 96.33% test accuracy, outperforming existing classification algorithms in accuracy, precision, recall, F1-score, and AUC. These results highlight the effectiveness of the proposed architecture in the early diagnosis of lung cancer.
    Keywords: lung disease; classification; 1-D convolutional neural network; 1-D CNN; prediction.
    DOI: 10.1504/IJDMMM.2025.10066898
     
  • Sentiment Analysis on Customers' Review in Indonesian Marketplace using Natural Language Processing (a Case Study of Organic Face Mask)   Order a copy of this article
    by Nur Izzaty, Adelia Shinta, Riski Arifin, Sri Rahmawati 
    Abstract: The increasing development of technology nowadays has led to the transformation of customers behaviour in purchasing products, from offline to online through marketplace. One of the most popular marketplaces in Indonesia is Shopee with the best seller skincare product is organic face mask. This study aims to analyse the sentiment of customers review using natural language processing (NLP) and term frequency-inversed document frequency (TF-IDF). The result revealed that from 882 reviews extracted, 89.7% was classified as positive reviews (rating 4 and 5) and the rest as much as 10.3% was the negative ones (rating 1 and 2). The sentiments were visualised using word cloud. Among the positive reviews were 'very good', 'quickly absorbed', and 'convenient'. Meanwhile, among the negative reviews were 'disappointed', 'delivery', and 'acne'. In summary, the performance metrics used for the evaluation of the classification model showed that the model accuracy reached 95%.
    Keywords: customers review; natural language processing; NLP; sentiment analysis; term frequency-inverse document frequency; TF-IDF; skincare; organic face mask.
    DOI: 10.1504/IJDMMM.2025.10066900