Forthcoming Articles
International Journal of Data Mining and Bioinformatics

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.
Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.
Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.
Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.
Register for our alerting service, which notifies you by email when new issues are published online.
International Journal of Data Mining and Bioinformatics (14 papers in press) Special Issue on: OA Big Data Industrial Application and Computing Innovation Part One
Abstract: In the current diagnosis of cancer, the analysis of pathological section images and molecular markers (such as HER2, hormone receptor status, etc.) is usually performed separately, which can easily lead to difficulties in early identification, deviations in subtype classification, and limitations in personalized treatment decisions. This research solves this problem by establishing a breast cancer diagnosis model based on visual converter (ViT) and full connected neural network (FCNN). The experimental results show that the diagnostic model established in this study performs the best in terms of accuracy (0.963), recall rate (0.947), precision (0.952), and F1 score (0.950). In addition, the model shows high accuracy in classifying eight breast cancer subtypes in the cancer histopathological image dataset. The diagnostic model established in this study is helpful in promoting the development of precision medicine for cancer, improving the efficiency of clinical treatment, and has important practical value in reducing cancer mortality. Keywords: breast cancer diagnosis; vision transformer; ViT; fully connected neural network; FCNN; molecular markers; prognosis prediction. DOI: 10.1504/IJDMB.2026.10077478
Abstract: The rapid expansion of online education has made the analysis of users' implicit behaviours - viewed through the lens of nonlinear and complex data - a crucial avenue for enhancing educational effectiveness. To address this, we introduce a random forest-fuzzy comprehensive evaluation (RF-FCE) method embedded within a clustering framework. Leveraging multiple clustering techniques, we first identify distinct category-specific influence patterns across different courses. Subsequently, we integrate fuzzy comprehensive evaluation with machine learning to analyse implicit behavioural data, examining both the intrinsic factors that affect course outcomes and the complex interactions between these factors and course quality. Our findings reveal significant variations in user engagement and learning outcomes across courses of differing quality, with these variations exerting a substantial influence on learning behaviours. In summary, this study offers a structured and robust analytical approach for examining implicit user behaviours in online education, demonstrating both methodological innovation and practical utility for improving course design and delivery. Keywords: online course; data mining technology; implicit behaviour; cluster analysis; online education; course quality evaluation; behaviour analysis. DOI: 10.1504/IJDMB.2026.10077083
Abstract: This paper proposes a scheduling optimisation framework GCN-EDQNet that integrates multi-scale feature fusion, bidirectional edge detection, and graph convolutional network (GCN). Data centre resources are modelled as a graph over distributed nodes; a multi-scale module builds hierarchical representations to capture spatial heterogeneity in resource distributions. A bidirectional edge-detection subnetwork then identifies scheduling-sensitive regions and produces an edge heatmap - assigning higher weights to edges connecting nodes with high load variation - that guides the GCN to prioritise structurally salient, mutation-prone areas. This explicit weighting mechanism enables the GCN to focus on bottleneck-prone regions and improve structure-aware feature learning. Finally, a reinforcement learning strategy enables adaptive task allocation and migration. Experiments on two public datasets show that GCN-EDQNet outperforms conventional approaches in task completion time, load variance, scheduling success rate, and energy efficiency. These results highlight a structure-aware, intelligent paradigm for data centre resource scheduling with clear theoretical and practical value. Keywords: dynamic load balancing; graph neural networks; GNNs; edge detection; data centre scheduling optimisation. DOI: 10.1504/IJDMB.2026.10078019
Abstract: The current wearable motion tracking devices have significant differences in heart rate monitoring, inaccurate calorie tracking, and measurement errors in exercise speed and distance. Based on this, this paper optimises the design of wearable motion tracking devices. Firstly, this paper establishes a real-time data mining model for sports training wearable devices using fuzzy algorithms, and determines the heterogeneity of sports training data. Then, this paper constructs a feature extraction model and processes the data using thresholding and Savitzky Golay filtering. Subsequently, this paper elaborates on the calibration method of sensors in wearable motion tracking devices, and finally tests the application of the device in training monitoring and evaluation. The research results indicate that the heart rate of student 11 measured by the device in this paper is 84 beats per minute under normal conditions and 123 beats per minute under high-intensity exercise. Keywords: sports tracking; training monitoring; wearable devices; data processing; device calibration; physiological parameter monitoring. DOI: 10.1504/IJDMB.2026.10077968
Abstract: This study adopts modality-specific feature extraction for text, visual, and audio inputs. Task predictions and modality representations are embedded into an adaptive graph, which is further augmented by introducing an attenuated higher-order common-neighbour similarity matrix within a heterogeneous graph neural network. This formulation is used to guide node aggregation and to support interpretability through explicit graph-based relational modelling. Based on these components, an attention-aware graph embedding model is constructed for downstream analysis. Across the Alibaba and IMDB datasets, the proposed method achieves average gains of 6.13% (Macro-F1) and 6.57% (Micro-F1) over graph embedding baselines. On IMDB, it further improves accuracy by 4.1%, F1-score by 5.9%, and reduces mean absolute error by 6.2%. These results suggest that the proposed graph-based fusion strategy can provide measurable gains on the considered benchmarks while enabling adaptive estimation of inter-modal interaction weights. Keywords: multimodal information; visual analytics; graph embedding networks; attention perception; adaptive graphs. DOI: 10.1504/IJDMB.2026.10077884
Abstract: This paper proposed an attention-based multi-scale deformation prediction network (AMSD-Net) for nonlinear mechanical response modelling. Using multi-dimensional physical parameters of pipeline steels as inputs, AMSD-Net integrates a hierarchical feature extraction backbone composed of Inception modules, squeeze-and-excitation (SE) channel attention, and convolutional block attention module (CBAM) spatial attention to capture deformation characteristics at different spatial scales. Parallel multi-scale convolutional pathways and a dual-attention mechanism are employed to recalibrate channel-wise and spatial features in a data-driven manner. Experimental evaluations on simulation datasets generated from X70 and X90 pipeline steels show that AMSD-Net achieves lower root mean square error and mean absolute error in stress, strain, and deformation prediction compared with representative baseline models, while maintaining stable fitting behaviour across the elastic-plastic transition region. AMSD-Net outperforms conventional baselines in predicting nonlinear deformation and failure strength, enabling more efficient and accurate data-driven pipeline integrity assessment. Keywords: nonlinear deformation prediction; multi-scale feature extraction; pipeline steel modelling. DOI: 10.1504/IJDMB.2026.10077477 Regular Issues
![]() by Qianqian Sun, Wei Qi, Fei Tan, Hongwei Zhang Abstract: This study aimed to identify key biomarkers for venous thromboembolism (VTE) following blunt trauma. Using bioinformatics analysis of public gene expression datasets (GSE19151 and GSE36809), we screened for differentially expressed genes (DEGs) and constructed co expression networks. Functional enrichment analysis revealed critical biological pathways, and a protein-protein interaction network was established to pinpoint central hub genes. Six hub genes (MRPL15, MRPL3, MYC, RPLP0, TP53, and CD3D) were identified, with MRPL3 and MYC showing promising diagnostic potential (AUC of 0.71 and 0.76, respectively). These findings suggest that these genes may serve as novel biomarkers for the early diagnosis of trauma-associated VTE, offering a foundation for future clinical validation and targeted therapeutic strategies. Keywords: venous thromboembolism; VTE; biomarkers; diagnosis. DOI: 10.1504/IJDMB.2026.10076808 Special Issue on: The Development of Novel Integrative Bioinformatics Based Machine Learning Techniques and Multi Omics Data Integration Part 2
![]() by Dhiraj Kumar Singh, Prashant Ranjan, Sahar Qazi, Bimal Prasad Jit, Amit Kumar Verma, Riyaz Ahmad Mir Abstract: Genetic alterations in normal brain cells lead to the development of brain tumours (BT). The incidence of newly diagnosed cases is on the rise over time. Understanding the molecular biology of paediatric brain tumours is crucial for advancing novel therapeutic approaches to prevent or effectively manage this disease. The R2TP complex, a conserved co-chaperone from yeast to mammals, including RUVBL1, RUVBL2, PIH1D1, and RPAP3 in humans, plays a crucial role in the assembly and maturation of various multi-subunit complexes. This study evaluates the expression of PIH1D1 and p53 in paediatric brain cancers using The Cancer Genome Atlas (TCGA) data through the UALCAN. Our analysis revealed elevated expression levels of PIH1D1 in paediatric brain tumours across all age groups compared to normal tissues, suggesting its potential as an early detection marker and a prognostic indicator. Additionally, p53 emerged as a promising target for brain tumour treatment, warranting exploration for age-specific applications. Keywords: R2TP; PIH1D1; paediatric brain tumour; TCGA; UALCAN; CBTTC. DOI: 10.1504/IJDMB.2025.10067136 ICEP and ILEP: two new approaches to identify community of complex biological network ![]() by Mamata Das, K. Selvakumar, P.J.A. Alphonse Abstract: Understanding the internal modular organisation of protein-protein interactions is crucial for deciphering molecular-level biological processes. Recognition of network communities enhances our comprehension of the biological origins of disease pathogenesis. This research introduces two innovative community detection algorithms, iterative credit-edge pruning (ICEP) and iterative load-based edge pruning (ILEP), designed to identify communities within complex biological networks. Our algorithms are evaluated using real-world data from the Omicron dataset, and their performance is compared with four established algorithms: Girvan-Newman, Louvain, Leiden, and the label propagation algorithm. Validation of the community structures is achieved through modularity. Among the techniques compared, our proposed method, ICEP, stands out with the highest modularity score of 0.885, outperforming all other approaches. The alternative method, ILEP, also achieves a notable modularity score of 0.698, surpassing the Girvan-Newman method. By implementing ICEP and ILEP, we gain profound insights into the structural organisation and interconnections within the Omicron virus. Keywords: protein interaction network; omicron; community detection; modularity; graphlet; centrality. DOI: 10.1504/IJDMB.2025.10067341 BMSD-CDE: a robust community detection ensemble method for biomarker identification ![]() by Bikash Baruah, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya Abstract: Community detection algorithms (CDAs) are crucial for identifying cohesive groups within complex networks. However, individual CDAs often fall short of accurately uncovering all hidden communities due to their inherent biases and limitations. These algorithms are typically designed with specific objectives, which may inadvertently lead to the oversight of certain community types, resulting in partial or imprecise outcomes. To address these limitations, we propose BMSD-community detection ensemble (CDE), a novel ensemble method that integrates six prominent CDAs - FastGreedy, Infomap, LabelProp, LeadingEigen, Louvain, and Walktrap. By strategically combining the outputs of these diverse algorithms using p-value references and elite genes, BMSD-CDE enhances the accuracy and robustness of community detection. This ensemble approach provides a more reliable foundation for downstream analyses, particularly in identifying potential biomarkers. Applied to esophageal squamous cell carcinoma (ESCC), BMSD-CDE reveals a set of genes - F2RL3, ATP6V1C2, CGN, CAD, ANGPT2, ALDH2, CLDN7, and DTX2- as potential biomarkers. These findings are supported by extensive topological and biological analyses across normal and disease conditions using four distinct datasets. Keywords: potential biomarker; community detection algorithm; CDA; ensemble algorithm; topological experiment; ESCC; biological validation; community detection ensemble; CDE. DOI: 10.1504/IJDMB.2025.10067623 Multi-epitopes prediction for designing a candidate vaccine against Ebola virus: a reverse vaccinology and immunoinformatics approach ![]() by Swati Mohanty, Himanshu Singh Abstract: Over a span of four decades, the Ebola virus disease (EVD) outbreak, has wreaked havoc starting from Central African countries through to different parts of the world including Asian countries. Guinea was the first to witness the catastrophe followed by many African and Asian countries including Liberia and Sierra Leone. In this study, the immunoinformatics approach which would include both B cell and T cell epitopes has been used for candidate vaccine development against EVD. The prediction of B cell and T cell epitopes was done by targeting the glycoprotein (GP) and VP40 proteins of Ebolavirus and an antigenic multi-epitope vaccine construct was designed. The vaccine construct was then docked with human immunogenic Toll-like Receptor 4 (TLR 4) having binding energy - 13,883.1 and in silico immune simulation was done to predict the immunogenic potential of the vaccine construct with the CAI of 0.94 and the GC content 54.35 as it showed efficient expression in Escherichia coli (E. coli) K12 strain which produced vaccine in wide scale. The Ebola virus vaccine construct designed through the immunoinformatics approach in this study could be useful in combatting EVD. Keywords: Ebola virus; epitope-based vaccine; molecular docking; immunoinformatics; reverse vaccinology. DOI: 10.1504/IJDMB.2025.10068508 Downregulation of CENPA and CCNB1 as a factor predicting the poor prognosis of acute myeloid leukaemia: a systems biological approach ![]() by Mohammad Hossein Shams, Saeid Afshar, Elmira Parto Beiragh, Azin Atabakhsh, Hassan Rafieemehr Abstract: Acute myeloid leukaemia (AML) is a complex hematologic malignancy. The present study takes a novel approach using bioinformatics to identify the primary molecular markers involved in AML pathogenesis. The differential expression of GEO microarray data (LogFC ≤ -1 / ≥1, adj. P-value ≤ 0.01, P-value ≤ 0.01) is analysed, and then the corresponding protein network (PPI) is drawn and examined using Cytoscape 3.6. The findings are validated externally and clinically using the GEPIA database and a survival curve. This study also identified important transcription factors (TF) affecting the expression of hub genes. The key finding is that the downregulation of CENPA and CCNB1 is associated with shorter overall survival in AML, with FOXM1 identified as a potential regulating TF. It is also suggests that disruption in various cellular features such as cell cycle, replication, and cell signalling may play roles in the pathogenesis of AML. Keywords: CENPA; CCNB1; systems biology; FOXM1; molecular markers; gene expression profiling. DOI: 10.1504/IJDMB.2025.10069104 Skin image analysis for detecting monkeypox disease: utilising new model M-Net, a non-invasive deep learning model ![]() by Vinod Kumar Yadav, Rajitha Bakthula Abstract: Skin and skin-related diseases pose a significant public health challenge worldwide, leading to major concerns in medical diagnosis. Various environmental factors, including bacteria, fungi, and viruses, can contribute to these conditions, resulting in a growing number of individuals affected by skin diseases. Most physicians rely on manual biopsy tests for skin disease diagnosis, which can cause delays in timely treatment. Therefore, there is a high demand for automated skin disease classification systems to provide quick and accurate results. Deep learning (DL) has recently shown remarkable effectiveness in image-based classification tasks, such as identifying skin cancer, rosacea, melanocytic nevus, tumour cells, and COVID-19 patients. Consequently, DL can also be adapted to detect monkeypox skin disease. In this article, we propose a novel approach consisting of two phases. First, new HR, UOR, and BR algorithms will be used to preprocess the images. Second, a custom CNN model will be developed for monkeypox classification. The proposed model is compared with existing approaches in the literature and demonstrates superior performance, achieving an accuracy of 95%. Keywords: image pre-processing; classification; hair removal; object removal; background removal; data augmentation. DOI: 10.1504/IJDMB.2025.10071008 Machine learning approaches for disease genes prediction ![]() by Priya Sadana, Isha Kansal, Vikas Khullar Abstract: The identification of genes involved in human hereditary diseases frequently necessitates the examination of a large number of potential candidate genes, which can be time-consuming and expensive. Genome-wide techniques such as association studies and linkage analysis frequently select many hundreds of positional candidates. Earlier binary classification methods used disease-causing and healthy genes as positive and negative training sets but risked including unknown disease genes. This work aims to discuss machine learning-based methods for disease susceptibility gene identification. Recent advancements, include complex methods like ensemble and deep learning. Then, we evaluated several well-known machine learning-based disease gene prediction algorithms. We concluded by discussing the pros and cons of different methods and their interpretability and reliability. A comparative study demonstrates the effectiveness of proposed approaches, contributing to the advancement of disease gene identification methodologies while highlighting their interpretability and reliability. Keywords: neurological disorder; gene prediction; binary classification; semi supervised learning; SSL. DOI: 10.1504/IJDMB.2025.10069769 |
Open Access
