Forthcoming Articles

International Journal of Data Mining, Modelling and Management

International Journal of Data Mining, Modelling and Management (IJDMMM)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

International Journal of Data Mining, Modelling and Management (12 papers in press)

Regular Issues

  • Entity Resolution: a Novel Graph Embedding Approach Using RandomDeep   Order a copy of this article
    by Nour Mekki, Djamel Berrabah, Abdelhamid Malki 
    Abstract: The exponential growth of digital information necessitates robust methods for entity resolution to ensure data quality and integration across datasets. This paper presents three novel node embedding algorithms for entity resolution in graph databases: textit{RandomDeep}, Refined embedding, and Combined embedding. textit{RandomDeep} integrates Iterative Deepening Depth First Search with deep learning to capture structural and semantic characteristics. Refined embedding enhances initial Graph Convolutional (GCN) embeddings through random walk-based refinement. Combined embedding merges outputs from complementary algorithms to produce versatile representations adaptable to diverse graph structures. A two-stage graph summarization technique supports this approach: initially as a blocking method to reduce computational complexity, and later during merging to consolidate redundant nodes. Evaluation datasets (DBLP-Scholar, Amazon-Google, Cora, and Yellow-Yelp) demonstrate the methods' effectiveness, with Area Under Cover Precision and Recall values ranging from 0.50 to 0.97 and F-measure values between 0.67 and 0.94. These results showcase accurate, efficient entity resolution in graph databases.
    Keywords: Entity Resolution; graph databases; node embedding; graph summarization; data quality.
    DOI: 10.1504/IJDMMM.2026.10069148
     
  • Context-Specific Multi-Class Data Analytics for Improving Online Conversation through Deep Learning   Order a copy of this article
    by Dhanasekaran K, Nadana Ravishankar, Goyal S. B, Sardar M. N. Islam 
    Abstract: Social networks have emerged as a platform for disseminating information rapidly to friends, relatives, and the public. An effective text classification strategy can improve the effectiveness of online discussion. This has been a great motivation behind text analytics research. Several text classification approaches have been developed to enhance information extraction performance and address its challenges. However, traditional text data analytics are based on limited contextual and static resources and require effective intelligent techniques for automatically extracting features from the container. To address these issues, we proposed and developed a unique context-specific Multi-Class Data Analytics architecture based on Deep Learning, this approach improved the performance of data analytics and mainly focused on extracting various types of information that describe several attributes to improve the online conversation. The experimental results showed that the proposed multi-class data analytics provide promising results over classification accuracy, validation accuracy, validation loss, precision, recall, and F1-measure in support of text classification for information extraction.
    Keywords: Convolutional neural network; Data analytics; Information extraction; Clustering; Deep learning.
    DOI: 10.1504/IJDMMM.2026.10069923
     
  • MoDA-TL - Monitoring Domestic Animals using Convolutional Neural Networks and Transfer Learning   Order a copy of this article
    by Alex A. Do Amaral, Raimundo V. Costa Filho, Mário W. De L. Moreira 
    Abstract: In recent years, computer vision has made significant advances, expanding its knowledge and applications in various fields. An important example is the use of this technology to improve the recognition of different types of animals. This paper proposes an intelligent surveillance system that can individually identify each animal in a specific location and clearly indicate dangerous or unsuitable areas during monitoring, ensuring the safety of both people and the animals being monitored. In this context, deep learning algorithms, such as convolutional neural networks (CNN), are used to produce machine learning models capable of detecting and identifying objects in digital images. The study utilises the You Only Look Once (YOLO) version 8 model and achieves 99.5% accuracy in animal recognition, demonstrating its effectiveness in monitoring. Additionally, a comparison between a model trained from random weight initialisation and another based on transfer learning reveals that the latter outperforms across various metrics, showing 99.5% accuracy, 99.3% recall, 99.5% mAP50, and 77.5% mAP50-95. These results highlight the advantage of transfer learning in optimising performance.
    Keywords: Artificial Intelligence; Deep Learning; Neural Networks; Computer Vision; Image Recognition.
    DOI: 10.1504/IJDMMM.2026.10070032
     
  • Hybrid Kernel Support Vector Penalised Regression Model for Forewarning Pest Incidence using Weather Variables   Order a copy of this article
    by Naranammal Narayanasamy, Krishna S. R. Priya 
    Abstract: Crop pest incidence and development are impacted by environmental factors. Therefore, weather-based machine learning model will be an effective scientific measure for forewarning pests. But in many cases, the raw data is complex and has the problems of nonlinearity and multicollinearity. So, development of robust model is much needed to forecast complex data. The present study is an attempt to develop hybrid models such as kernel support vector ridge and kernel support vector elastic net regression (KSVENR) to forewarn crop pests of Cotton. Weekly pest incidence data of sucking pests such as aphids, jassid, thrips and whitefly from year 2015-16 to 2022-23 has been used for the study. The results reveal that the KSVENR model outperformed other penalised models by 43%, 42%, 40% and 33% for forewarning pest incidence of aphids, jassid, thrips and whitefly respectively. The proposed model would be a good tool for forecasting nonlinear data with multicollinearity.
    Keywords: Time series; Modelling; Forecasting; Nonlinear; Multicollinearity; Data Analysis; Machine Learning; Hybrid Model.
    DOI: 10.1504/IJDMMM.2026.10070953
     
  • ATESA: Audio Text Emotion & Sentiment Analyser- a Sentiment & Emotion Analysis Tool based on Deep Learning Methods   Order a copy of this article
    by Pallavi Shukla, Rakesh Kumar, Vijay Dwivedi, Ashutosh Singh 
    Abstract: Sentiment analysis (SA) identifies sentiments in text, reviews, tweets, audio, images, and videos. Sentiment integrates emotion and thinking, with emotions being temporary while sentiments last longer. Emotion recognition and sentiment polarity analysis are gaining popularity in natural language processing due to their ability to mine social media data. This study applies machine learning (ML) classifiers such as random forest, logistic regression, support vector machine, and decision tree to classify text and speech as positive, negative, or neutral. Additionally, it explores available sentiment analysis tools and introduces the audio text emotion and sentiment analyser (ATESA). ATESA leverages ensemble-oriented classification techniques using deep learning, specifically bidirectional long-short-term memory recurrent neural networks (Bi-LSTM-RNN). It processes text, Twitter data, and speech converted into text. Experimental results show that ATESA achieves 92% accuracy, outperforming other algorithms.
    Keywords: Sentiment Analysis Tool; Bi-LSTM; RNN; TFIDF; Deep Learning.
    DOI: 10.1504/IJDMMM.2026.10071047
     
  • Advancements in Mental Health Diagnosis: Leveraging Delta Feature Extraction Framework and PWSA Ensemble for Motion Data Analysis   Order a copy of this article
    by S. Annapoorani, Lakshmi M. 
    Abstract: Depression affects over 350 million people globally and can become a serious health issue, especially when prolonged and ranging from mild to severe. Physical activity data offers a cost-effective and accessible approach to aid in diagnosing mental illnesses. This study introduces the Delta feature extraction framework (D-FEF), which extracts delta series and relevant features from original time series data, subsequently selecting a significant feature set. A probabilistic weighted selection algorithm (PSWA) with SMOTE generates multiple hypotheses using training data based on modified distributions, creating an ensemble of classifiers to predict healthy controls, depressive disorder, and schizophrenia. The PSWA classifier, utilising the D-FEF feature selection process, achieved 92.94% accuracy, outperforming all other tested methods. The techniques performance was evaluated on mental health datasets, including Depresjon and Psykose, and compared against state-of-the-art approaches. The proposed D-FEF and PSWA methodology demonstrates promising results for the classification of mental health conditions using physical activity data.
    Keywords: Actigraphy data; mental health; feature engineering; feature selection; ensemble machine learning algorithm.
    DOI: 10.1504/IJDMMM.2026.10072023
     
  • D-HUP Tree: Distributed HUP Tree for Scalable High Utility Itemset Mining   Order a copy of this article
    by Chintan Rajput, Mathe John Kenny Kumar, Dipti Rana 
    Abstract: High utility itemset mining (HUIM) is useful for extracting useful information from datasets. As volume, velocity and variety increases, the traditional methods struggle with computational efficiency with respect to runtime and memory utilisation. The proposed work introduces a new approach called distributed-high utility pattern tree (D-HUP Tree) by combining a HUP Tree data structure with the Hadoop distributed computing framework thereby improving runtime, memory management and enabling parallel processing. Experimental results clearly illustrate that the proposed methodology reduces computation complexity without compromising the quality of discovered high utility itemsets, providing a substantial contribution to the high utility itemset mining field.
    Keywords: High Utility Itemset Mining; HUP Tree; Distributed Itemset Mining; Map Reduce.
    DOI: 10.1504/IJDMMM.2026.10072676
     
  • Mining Maximal Empty Rectangles   Order a copy of this article
    by Dwipen Laskar, Irani Hazarika, Farha Naznin, Anjana Kakoti Mahanta 
    Abstract: An interval data with k-dimensions can be represented as a hyperrectangle. All the domains of an interval dataset can be represented as a bounded hyperrectangle, which can be treated as the universe or bounding region. Empty hyperrectangles within this bounding hyperrectangle are regions having no intersections with any other hyperrectangle represented by any data in the dataset. A maximal empty hyperrectangle is an empty hyperrectangle that is not properly contained in any other empty hyperrectangle. In a 2D interval dataset, the problem of mining all maximal empty hyperrectangles can be reduced to mining all maximal empty rectangles within the bounding rectangle of the dataset. In this paper, a two-steps dynamic algorithm called AMER-Miner has been proposed for mining all maximal empty rectangles contained in bounding rectangle of a 2D interval dataset. The proposed method has been tested on two real life datasets, one synthetic dataset and experimental results have reported.
    Keywords: Interval data; Empty interval; Empty rectangle; Hyperrectangle.
    DOI: 10.1504/IJDMMM.2026.10072741
     
  • Analysis of the Debt Status of Households in Poor Areas based on Economic Capital using Two-Class Boosted Decision Trees   Order a copy of this article
    by Pita Jarupunphol, Wipawan Buathong, Suthasinee Kuptabut 
    Abstract: This study examines household debt determinants in Kut Bak district, Thailand, using a two-class boosted decision tree (TBDT) model to analyse 301 households across 30 financial, asset, and socio-economic variables. Compared with logistic regression, decision tree, random forest, and XGBoost, the model demonstrates superior performance, achieving an accuracy of 0.922, precision of 0.975, recall of 0.867, F1-score of 0.918, and AUC of 0.948. Key findings reveal that limited savings, minimal state assistance, and low ownership of productive assets significantly increase debt likelihood. Specific thresholds, such as savings below 4,500 units and cash reserves of 50 units or less, are strongly associated with indebtedness. The study highlights the model's effectiveness in predicting debt status and provides actionable insights for policymakers and organisations to enhance financial stability in rural communities. These results contribute to understanding socio-economic factors driving household debt in disadvantaged areas.
    Keywords: data mining; household debt; machine learning; socio-economic factors; two-class boosted decision tree.
    DOI: 10.1504/IJDMMM.2026.10072842
     
  • Predicting Building Fitness Ranks Using a Hybrid Evaluation Method on Earthquake Damage Data   Order a copy of this article
    by Moram Vishnu Vardhana Rao, Aparna Chaparala, Kusuma Kumari Katakam, G.Apparao Naidu, N. D. S. S. Kiran Relangi, Radhika Sajja 
    Abstract: Structural health monitoring system (SHMS) plays a crucial role in assessing the status of structures. This article presents a feature selection method and examines 5-machine-learning classifiers. In this paper, we proposed a fuzzy rough set features selection (FRSFS) methodology for identification of more correlated and removal of irrelevant attributes in datasets. FRSFS well reduces features by eliminating duplicates, null values, missing data, and other irrelevant features, facilitating the extraction of pertinent information. To enhance classification accuracy, novel modified-KNN (M-KNN) classifier is introduced, which is designed to address undesirable structural health characteristics. Comprehensive analysis, comparing various machine-learning classifiers and their growth rates using classification metrics like accuracy, mean squared error (MSE), precision, recall, and F1-score, demonstrates the effectiveness of FRSFS and M-KNN. The suggested methodology is slightly more accurate (99.76%) than other classifiers like naive Bayes (NB), stochastic gradient descent (SGD), K-nearest neighbours (KNN), logistic regression (LR), and random forest (RF).
    Keywords: Structural Health Monitoring Systems (SHMS); Fuzzy Rough Set Features Selection (FRSFS); Rough Set Theory (RST); Modified-KNN (M-KNN); Mean Squared Error (MSE); Engineering-Based Tools (EBT).
    DOI: 10.1504/IJDMMM.2026.10076121
     
  • Predicting Stock Prices using state-of-art Machine Learning Models with Enhanced Feature Representation and Momentum Indicator Selection   Order a copy of this article
    by Haoyu Wang, Dejun Xie, Yujian Liu 
    Abstract: This study examines advanced machine learning approaches for predicting next-period prices of the CSI 300 Index, a key benchmark of the Chinese stock market. Momentum-based technical indicators, adapted from Yin and Yang (2016), which are subjected to a residual correction process and serve as continuous features for regression tasks? are utilized as the primary features for this analysis. A comparative study is conducted among Deep Forest (DF), Support Vector Regression (SVR), Ridge Regression (RR), Gated Recurrent Units (GRU), Long Short-Term Memory (LSTM), Neural Networks (NN), Random Forests (RF), and XGBoost. Dimensionality reduction techniques, including Principal Component Analysis, Factor Analysis, and Autoencoders, are applied to further enhance model evaluation. Empirical findings demonstrate that Ridge Regression, GRU, and LSTM, particularly when combined with dimensionality reduction, outperform other popular machine learning algorithms in stock price prediction.
    Keywords: Stock price prediction; quantitative finance; applied machine learning models; deep learning.
    DOI: 10.1504/IJDMMM.2026.10076902
     
  • Modeling the Combined Effect of Node Centrality Measures in Predicting Links in Complex Networks   Order a copy of this article
    by Nurmeen Basharat, Irfan Yousuf 
    Abstract: Using a variety of centrality measures and machine learning models, we introduce a novel generic strategy for link prediction in this work. We use some of the most important node centrality measures to collect the network's global structure, local structure and quasi-local structure. To create a dataset for machine learning models, the values of centrality score act as the features of a node in the network where as the presence or absence of an edge produce positive or negative samples respectively. We test our approach on twelve publicly available graphs and use multiple performance metrics to figure out that Light Gradient Boosting Machine (LGBM) outperforms many other machine learning models. We achieve a maximum of 97\% and a minimum of 75\% ROC-AUC. We also find that there is no single machine learning model that can perform well on all types of graphs or networks.
    Keywords: Complex Network; Social Networks; Link Prediction; Node Centrality Measures; Link Prediction Models.
    DOI: 10.1504/IJDMMM.2027.10076947