International Journal of Data Mining and Bioinformatics (IJDMB) Inderscience Publishers - linking academia, business and industry through research

Forthcoming Articles

International Journal of Data Mining and Bioinformatics

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are also listed here. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Articles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

International Journal of Data Mining and Bioinformatics (16 papers in press)

Regular Issues

Skin cancer detection and segmentation utilising LadderNet by stock exchange white shark optimisation enabled deep learning
by Anuradha Govada, Virendra Singh Kushwah, Ruth Ramya Kalangi, Vimala Shanmugam
Abstract: Skin cancer is one of the most common types of cancer worldwide, with increasing incidence rates over the past few decades. This paper proposes an optimisation-enabled deep learning for skin cancer segmentation and detection. Initially, the input image is pre-processed by a bilateral filter. After that, skin lesion segmentation is performed using LadderNet, which is tuned by Stock exchange trading white shark optimisation (SEWSO). Here, SEWSO is the combination of Stock exchange trading optimisation (SETO) and white shark optimisation (WSO). Moreover, segmented image is allowed through data augmentation which is done by rotation, shifting and random brightness techniques. Thereafter, the feature extraction is achieved to obtain the desired features. At last, the feature vector is subjected to skin cancer detection, which is accomplished by employing SqueezeNet tuned by SEWSO. This approach delivered high accuracy, sensitivity, and specificity of 93.90%, 95.00%, and 94.70%, respectively.
Keywords: skin cancer; SqueezeNet; neural networks; NNs; white shark optimisation; WSO; stock exchange trading optimisation; SETO.
DOI: 10.1504/IJDMB.2025.10070407

Special Issue on: The Development of Novel Integrative Bioinformatics Based Machine Learning Techniques and Multi Omics Data Integration Part 2

Machine learning algorithm for lung cancer classification using ADASYN with standard random forest
by J. Viji Gripsy, T. Divya
Abstract: Lung cancer is one type of cancer that develops in the lungs. Early identification of lung cancer symptoms may lead to a successful treatment. The dataset indicates the presence of duplicate characteristics, as well as an imbalanced classification, making lung cancer classification a challenging task. This study presents a novel approach that combines the ADASYN with the standard random forest (ASRF) model to enhance the efficacy of lung cancer dataset identification. The ASRF, as described, offers interpretable outcomes by using feature significance, hence providing significant insights into the aspects that contribute to judgments on the classification of lung cancer. The classification algorithm is used to ascertain the existence or absence of lung cancer in a certain patient. When comparing the proposed ASRF with the current SVM, MLP, RF and GB, compared to other existing methods, the ASRF technique achieved 93.5% precision, 94.7% recall, 94.1% F-measure, and 94% accuracy.
Keywords: lung cancer; LC; RF ASRF; MLP; support vector machine; SVM; GB.
DOI: 10.1504/IJDMB.2025.10065391

Modified VGG-16 model for COVID-19 chest X-ray images: optimal binary severity assessment
by Manoranjan Dash
Abstract: A pandemic caused by a virus known as COVID-19 has swept across the globe. One potential weapon in the fight against COVID-19 could be early detection through the use of chest X-ray images. In this paper, I have used modified VGG-16 deep learning model for binary classification of COVID-19 chest X-ray images. There are 16 weight layers in the standard VGG-16 model. In the suggested modified VGG model, the total number of weight layers has been reduced from 16 to 9 (eight convolutional layers and one fully connected layer). According to the results, the modified VGG-16 model performs better than the other three models (CNN, KNN and VGG-16) in terms of quantitative measures of accuracy, sensitivity and specificity. The dataset used for the proposed work consists of 24,000 chest X-ray images of lung collected from online depository comprising of 12,000 for each class (healthy and pneumonia).
Keywords: deep learning; classification; COVID-19; SARS-CoV-2; modified VGG-16.
DOI: 10.1504/IJDMB.2025.10065665

Revealing novel biomarkers for oesophageal squamous cell carcinoma through integrated single-cell RNA sequencing analysis
by Bikash Baruah, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya
Abstract: This study employs single-cell RNA sequencing (scRNA-seq) to analyse oesophageal squamous cell carcinoma (ESCC), identifying 10 potential biomarkers (ALDH2, ANGPT2, APPL1, ARPC2, CAD, CALM1, CLDN7, CLTB, F2RL3, LPAR1) associated with radiation exposure. Methodology involves scRNA-seq for data partitioning, pre-processing, clustering, and differential expression analysis. Dysregulated genes are identified through comprehensive gene ontology (GO) annotations, and ESCC-related pathways are explored via the Kyoto encyclopedia of genes and genomes (KEGG) database. Analysis of 38 genes reveals distinct patterns under radiation exposure, enriching understanding of ESCC-related processes, components, and functions. This research provides a holistic view of ESCC’s molecular landscape, emphasising the clinical significance of identified biomarkers and contributing significantly to the understanding of this complex malignancy.
Keywords: oesophageal squamous cell carcinoma; ESCC; single-cell RNA sequencing; scRNA-seq; differential expression analysis; gene ontology; GO; pathway analysis; potential biomarker.
DOI: 10.1504/IJDMB.2025.10065927

A novel suppressed segmentation framework for hyper spectral image processing in earlier cancer detection
by Kaushal Kishor, Manoj Singhal, Rajesh Kumar Maurya, Pramod Kumar Sagar, Rupak Sharma, Satya Prakash Yadav
Abstract: This paper affords a novel suppressed segmentation framework for Hyperspectral image processing before most cancer detection. This framework integrates the most recent advances in deep learning fashions and image segmentation for the most fulfilling selection-making approximately early cancer analysis. The proposed framework facilitates the fast and correct segmentation of the tumour tissues and other aberrations within hyperspectral images. The key modules of this framework encompass input pre-processing, noisy additive analysis, random area cropping, augmented context representation, hierarchical segmentation, and submit-processing. Experiments performed on real-world datasets show that the proposed framework yields segmentation accuracy similar to other main segmentation techniques while having advanced pace and robustness. The proposed model obtained 95.32% accuracy, 92.89% sensitivity, 91.50% specificity, 94.25% precision and 92.51% F1-score. This proposed method offers an optimised workflow for fast and correct segmentation of tumour tissues in most early cancer diagnoses.
Keywords: image processing; deep learning; suppressed segmentation framework; SSF; hyperspectral image processing; HSIP; earlier cancer detection; traditional imaging techniques; hyperspectral imaging; hierarchical segmentation.
DOI: 10.1504/IJDMB.2025.10066121

Age invariant face recognition method based on enhanced convolutional neural network
by Bin Fang
Abstract: Research on anti age invariant face recognition can not only improve the robustness of facial recognition systems, but also provide guidance for the development and application of facial recognition technology. Aiming at the problems of low peak signal-to-noise ratio, low recognition accuracy and long recognition time of traditional anti-age invariant face recognition methods, an age invariant face recognition method based on enhanced convolutional neural network is proposed. The captured images are enhanced using a bilateral filtering algorithm. The SURF algorithm is employed to extract facial features and remove age-related interference features, completing the selection of facial image features. These selected features are then inputted into the enhanced CNN to obtain the age invariant face recognition results. The experimental results demonstrate that the proposed method achieves a maximum image peak signal-to-noise ratio of 56.85dB,varying recognition accuracy in the range of 96.1% to 97.6%,and a maximum recognition time of 78.96ms
Keywords: enhanced convolutional neural network; age invariant; face recognition; bilateral filtering algorithm; SURF algorithm.
DOI: 10.1504/IJDMB.2025.10066150

Deep mining of elderly health data based on improved association clustering
by Bo Yang
Abstract: To deeply process the health data of the elderly, this paper designs a deep mining method for elderly health data based on an improved association clustering approach. Initially, health data samples from the elderly are collected. The Apriori algorithm is enhanced with interest constraints, connectivity operations are employed to generate candidate itemsets, and those that do not meet the requirements are eliminated. Associated feature quantities are then extracted from the health data. Subsequently, a fuzzy K-means algorithm with weight attributes is incorporated as the core method, and a balance coefficient is calculated using the principle of balanced contribution. Finally, the improved fuzzy K-means algorithm is utilised to complete data classification, detect abnormal data points, and achieve deep mining of the health data. The results indicate that the proposed method has a false alarm rate of less than 3.21% and a false negative rate of less than 1.81%, demonstrating a superior mining effect compared to the comparison method.
Keywords: association rules; clustering algorithm; the elderly; health data; deep mining.
DOI: 10.1504/IJDMB.2025.10066985

In silico study discerns PIH1D1 and p53 to be promising prognostic markers for children's brain cancer
by Dhiraj Kumar Singh, Prashant Ranjan, Sahar Qazi, Bimal Prasad Jit, Amit Kumar Verma, Riyaz Ahmad Mir
Abstract: Genetic alterations in normal brain cells lead to the development of brain tumours (BT). The incidence of newly diagnosed cases is on the rise over time. Understanding the molecular biology of paediatric brain tumours is crucial for advancing novel therapeutic approaches to prevent or effectively manage this disease. The R2TP complex, a conserved co-chaperone from yeast to mammals, including RUVBL1, RUVBL2, PIH1D1, and RPAP3 in humans, plays a crucial role in the assembly and maturation of various multi-subunit complexes. This study evaluates the expression of PIH1D1 and p53 in paediatric brain cancers using The Cancer Genome Atlas (TCGA) data through the UALCAN. Our analysis revealed elevated expression levels of PIH1D1 in paediatric brain tumours across all age groups compared to normal tissues, suggesting its potential as an early detection marker and a prognostic indicator. Additionally, P53 emerged as a promising target for brain tumour treatment, warranting exploration for age-specific applications.
Keywords: R2TP; PIH1D1; paediatric brain tumour; TCGA; UALCAN; CBTTC.
DOI: 10.1504/IJDMB.2025.10067136

ICEP and ILEP: two new approaches to identify community of complex biological network
by Mamata Das, K. Selvakumar , P.J.A. Alphonse
Abstract: Understanding the internal modular organization of protein-protein interactions is crucial for deciphering molecular-level biological processes. Recognition of network communities enhances our comprehension of the biological origins of disease pathogenesis. This research introduces two innovative community detection algorithms, Iterative Credit-Edge Pruning (ICEP) and Iterative Load-Based Edge Pruning (ILEP), designed to identify communities within complex biological networks. Our algorithms are evaluated using real-world data from the Omicron dataset, and their performance is compared with four established algorithms: Girvan-Newman, Louvain, Leiden, and the Label Propagation algorithm. Validation of the community structures is achieved through modularity. Among the techniques compared, our proposed method, ICEP, stands out with the highest modularity score of 0.885, outperforming all other approaches. The alternative method, ILEP, also achieves a notable modularity score of 0.698, surpassing the Girvan Newman method. By implementing ICEP and ILEP, we gain profound insights into the structural organization and interconnections within the Omicron virus.
Keywords: protein interaction network; omicron; community detection; modularity; graphlet; centrality.
DOI: 10.1504/IJDMB.2025.10067341

BMSD-CDE: a robust community detection ensemble method for biomarker identification
by Bikash Baruah, Manash P. Dutta, Subhasish Banerjee, Dhruba K. Bhattacharyya
Abstract: Community detection algorithms (CDAs) are crucial for identifying cohesive groups within complex networks. However, individual CDAs often fall short of accurately uncovering all hidden communities due to their inherent biases and limitations. These algorithms are typically designed with specific objectives, which may inadvertently lead to the oversight of certain community types, resulting in partial or imprecise outcomes. To address these limitations, we propose BMSD-community detection ensemble (CDE), a novel ensemble method that integrates six prominent CDAs FastGreedy, Infomap, LabelProp, LeadingEigen, Louvain, and Walktrap. By strategically combining the outputs of these diverse algorithms using p-value references and elite genes, BMSD-CDE enhances the accuracy and robustness of community detection. 2 B. Baruah et al. This ensemble approach provides a more reliable foundation for downstream analyses, particularly in identifying potential biomarkers. Applied to esophageal squamous cell carcinoma (ESCC), BMSD-CDE reveals a set of genes F2RL3, ATP6V1C2, CGN, CAD, ANGPT2, ALDH2, CLDN7, and DTX2 as potential biomarkers. These findings are supported by extensive topological and biological analyses across normal and disease conditions using four distinct datasets.
Keywords: potential biomarker; community detection algorithm; CDA; ensemble algorithm; topological experiment; ESCC; biological validation; community detection ensemble; CDE.
DOI: 10.1504/IJDMB.2025.10067623

Multi-epitopes prediction for designing a candidate vaccine against Ebola virus: a reverse vaccinology and immunoinformatics approach
by Swati Mohanty, Himanshu Singh
Abstract: Over a span of four decades, the Ebola virus disease (EVD) outbreak, has wreaked havoc starting from Central African countries through to different parts of the world including Asian countries. Guinea was the first to witness the catastrophe followed by many African and Asian countries including Liberia and Sierra Leone. In this study, the immunoinformatics approach which would include both B cell and T cell epitopes has been used for candidate vaccine development against EVD. The prediction of B cell and T cell epitopes was done by targeting the glycoprotein (GP) and VP40 proteins of Ebolavirus and an antigenic multi-epitope vaccine construct was designed. The vaccine construct was then docked with human immunogenic Toll-like Receptor 4 (TLR 4) having binding energy 13,883.1 and in silico immune simulation was done to predict the immunogenic potential of the vaccine construct with the CAI of 0.94 and the GC content 54.35 as it showed efficient expression in Escherichia coli (E. coli) K12 strain which produced vaccine in wide scale. The Ebola virus vaccine construct designed through the immunoinformatics approach in this study could be useful in combatting EVD.
Keywords: Ebola virus; epitope-based vaccine; molecular docking; immunoinformatics; reverse vaccinology.
DOI: 10.1504/IJDMB.2025.10068508

Downregulation of CENPA and CCNB1 as a factor predicting the poor prognosis of acute myeloid leukaemia: a systems biological approach
by Mohammad Hossein Shams, Saeid Afshar, Elmira Parto Beiragh, Azin Atabakhsh, Hassan Rafieemehr
Abstract: Acute myeloid leukaemia (AML) is a complex hematologic malignancy. The present study takes a novel approach using bioinformatics to identify the primary molecular markers involved in AML pathogenesis. The differential expression of GEO microarray data (LogFC ≤ -1 / ≥1, adj. P-value ≤ 0.01, P-value ≤ 0.01) is analysed, and then the corresponding protein network (PPI) is drawn and examined using Cytoscape 3.6. The findings are validated externally and clinically using the GEPIA database and a survival curve. This study also identified important transcription factors (TF) affecting the expression of hub genes. The key finding is that the downregulation of CENPA and CCNB1 is associated with shorter overall survival in AML, with FOXM1 identified as a potential regulating TF. It is also suggest that disruption in various cellular features such as cell cycle, replication, and cell signalling may play roles in the pathogenesis of AML.
Keywords: CENPA; CCNB1; systems biology; FOXM1; molecular markers; gene expression profiling.
DOI: 10.1504/IJDMB.2025.10069104

An efficient attention encoder decoder-based residual-UNet for the segmentation of liver and lung tumour
by Rakesh Kumar Donthi, Ram Chandra Bhushan, N. Lakshmipathi Anantha, P. Dileep, U. Srinivasarao
Abstract: In the world, liver and lung cancer are the two types of cancers that occur in the human body. Liver and lung tumour segmentation is a basic process in treating and diagnosing diseases. The automated detection of these two cancers undergoes stages like dataset collection, pre-processing, and optimisation-based segmentation. Datasets like 3DIRCADb for the liver and LIDC-IDRI for the lung are initially obtained. Then, the hybrid deep learning model with optimisation is carried out for the segmentation process. The deep learning model attention encoder decoder-based residual-UNet is used to segment the liver and extract the region of interest. Similarly, the same process is carried out for lung tumour segmentation. The metaheuristic optimisation fire hawk algorithm is introduced. The segmentation performance of the proposed liver and lung segmentation model is carried out using different measures. On the liver and lung datasets, the proposed approach achieves dice values of 0.901 and 0.916, respectively.
Keywords: lung cancer; liver cancer; automated detection; region of interest; fire hawk algorithm.
DOI: 10.1504/IJDMB.2026.10069197

The role of SP100 in tumour immune microenvironment and prognosis of head and neck squamous cell carcinoma
by Mengmeng Zhang, Yu Bai, Shengli Han, Jun Shu
Abstract: This study aims to investigate the role and prognostic value of SP100 in the tumour immune microenvironment of HNSCC. A comprehensive bioinformatics analysis was conducted. We first investigated the expression and overall survival (OS) of SP100 in pan-cancers, then, the relationship between tumour immune microenvironment of SP100 in HNSCC was analysed, the SP100-related genes were identified, and finally, a new gene signature was established. SP100 revealed differential expression, correlated with OS in various cancer types, and showed positive associations with immune cell infiltration and immune checkpoints. KEGG analysis showed a focus on ‘antigen processing and presentation’ and ‘natural killer cell-mediated cytotoxicity’. Key SP100-related genes: SP110, MT2A, and CAV1. High SP100 expression negatively correlated with prognosis and positively associated with tumour immune infiltration in HNSCC. Thus, SP100 may be served as a new prognostic biomarker as well as a new target for immune therapy in HNSCC.
Keywords: head and neck squamous cell carcinoma; SP100; tumour immune infiltration; gene signature; prognosis.
DOI: 10.1504/IJDMB.2026.10069591

Machine learning approaches for disease genes prediction
by Priya Sadana, Isha Kansal, Vikas Khullar
Abstract: The identification of genes involved in human hereditary diseases frequently necessitates the examination of a large number of potential candidate genes, which can be time-consuming and expensive. Genome-wide techniques such as association studies and linkage analysis frequently select many hundreds of positional candidates. This work aims to discuss machine learning-based methods for disease susceptibility gene identification. Disease genes are already linked to diseases, while non-disease genes are a random subset of the larger population of unrelated genes. The methodology followed in this paper included a critical review to identify the literature related to title. Here, we try to identify the significant ongoing research in this domain. Earlier binary classification methods used disease-causing and healthy genes as positive and negative training sample sets. Although they could potentially include unknown disease-related genes. Unary and semi-supervised classification are more practical ways to define non-disease genes. Recent advancements, include complex methods like ensemble and deep learning. Then, we evaluated several well-known machine learning-based disease gene prediction algorithms. We concluded by discussing the pros and cons of different methods and their interpretability and reliability.
Keywords: neurological disorder; gene prediction; binary classification; semi supervised learning; SSL.
DOI: 10.1504/IJDMB.2025.10069769

Skin image analysis for detecting monkeypox disease: utilising new model M-Net, a non-invasive deep learning model
by Vinod Kumar Yadav, Rajitha Bakthula
Abstract: Skin and skin-related diseases pose a significant public health challenge worldwide, leading to major concerns in medical diagnosis. Various environmental factors, including bacteria, fungi, and viruses, can contribute to these conditions, resulting in a growing number of individuals affected by skin diseases. Most physicians rely on manual biopsy tests for skin disease diagnosis, which can cause delays in timely treatment. Therefore, there is a high demand for automated skin disease classification systems to provide quick and accurate results. Deep learning (DL) has recently shown remarkable effectiveness in image-based classification tasks, such as identifying skin cancer, rosacea, melanocytic nevus, tumour cells, and COVID-19 patients. Consequently, DL can also be adapted to detect monkeypox skin disease. In this article, we propose a novel approach consisting of two phases. First, new HR, UOR, and BR algorithms will be used to preprocess the images. Second, a custom CNN model will be developed for monkeypox classification. The proposed model is compared with existing approaches in the literature and demonstrates superior performance, achieving an accuracy of 95%.
Keywords: image pre-processing; classification; hair removal; object removal; background removal; data augmentation.
DOI: 10.1504/IJDMB.2025.10071008

Forthcoming Articles

International Journal of Data Mining and Bioinformatics

Keep up-to-date