Forthcoming and Online First Articles

International Journal of Data Mining and Bioinformatics

International Journal of Data Mining and Bioinformatics (IJDMB)

Forthcoming articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.

Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.

Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.

Online First articles are published online here, before they appear in a journal issue. Online First articles are fully citeable, complete with a DOI. They can be cited, read, and downloaded. Online First articles are published as Open Access (OA) articles to make the latest research available as early as possible.

Open AccessArticles marked with this Open Access icon are Online First articles. They are freely available and openly accessible to all without any restriction except the ones stated in their respective CC licenses.

Register for our alerting service, which notifies you by email when new issues are published online.

We also offer which provide timely updates of tables of contents, newly published articles and calls for papers.

International Journal of Data Mining and Bioinformatics (7 papers in press)

Regular Issues

  • Analyzing SEER Cancer Data Using Signed Maximal Frequent Itemset Networks   Order a copy of this article
    by Yunuscan Kocak, Tansel Ozyer 
    Abstract: Background: Evaluating patient prognosis is an important factor for predicting the effects and consequences of diseases. With the advancement in populationlevel data collection and with the development in statistical models, it has become possible to develop systems capable of analyzing disease prognosis. Powered by data mining and machine learning techniques, systems can find interesting properties within a dataset and predict unseen cases. Initial and important steps in this process are known as feature extraction and feature selection. Feature extraction is the process of developing new features based on existing features whereas feature selection is the process of deciding which features will be used within the model. Methods: Grouping many features into a single one and understanding relationships between features has been proven as a good approach for selecting strong features. In this work, a novel network-based feature extraction method is presented and tested on two cancer cases, namely (1) Lung and Bronchus cancer, and (2) Pancreatic cancer. Named as Signed Maximal Frequent Itemset Network, the proposed method uses maximal frequent itemsets as actors in a network and extracts features by considering their co-occurrence and structure of the subgraph. Maximal frequent itemsets are selected as actors to have a compact representation of the features and a network is created to model the relationship between these actors. Results: For performance comparison, extracted features and original features are tested by employing some of the well-accepted and tested machine learning algorithms. In both cases, relatively the best results are obtained when itemsets are used as features; and combining extracted and original features will increase performance which is measured with the root mean square error metric. It has been reported as 13.74 and 7.60 for Lung and Bronchus cancer and Pancreatic cancer, respectively. To investigate patterns on prediction, the top 10 maximal itemsets are selected with the recursive feature elimination method and their distributions are analyzed. Conclusions: For most of the cases, features created from itemsets and extracted from the network increased the performance of the well-known machine learning algorithms compared to the original features. Itemset analysis confirmed previously known knowledge. As a result of the conducted analysis, it has been realized that survival months are low for cases where information on the disease was unknown or blank, and higher for cases when chemotherapy was given and the primary site was labelled, such as head of the pancreas.
    Keywords: Keywords: cancer data analysis; frequent pattern mining; machine learning; network analysis; signed networks; maximal frequent itemsets; feature selection; lung cancer; pancreatic cancer.

  • Multiple-Ensemble Methods for Prediction of Alzheimer Disease   Order a copy of this article
    by Ashutosh Mishra 
    Abstract: Alzheimer's disease (AD) is a neurodegenerative disease whose permanent cure is not yet available. However, its prediction at an early stage may increase the life span of a person by many years. The main predicament is to detect AD at an early stage and select the features responsible for it. The objective of this study was to predict AD at an early stage and identify the features that facilitate early prediction using ensemble learning. First, we implemented the ADNI dataset on different machine-learning and deep-learning models. The proposed multiple ensemble method overcomes the limitations of existing models by applying feature selection for the early prediction, and it is observed that the best ensemble model is having the top 6-selected features and achieves an accuracy of 96.71% with higher ROC. Our model performed well compared with other machine and deep learning models.
    Keywords: Alzheimer Disease (AD); Machine Learning (ML); Ensemble Learning (EL); Deep Learning (DL); Feature Selection.

  • Diagnostic and prognostic value of HSPD1 in esophageal cancer   Order a copy of this article
    by Xin Chen, Can Luo, Yuting Bai, Xi Zhou, Lei Xu, Xiaolan Guo, Qing Wu, Xiaowu Zhong 
    Abstract: HSPD1 is a potential biomarker in many cancers. However, its role in esophageal cancer (ESCA) is poorly understood. Among patients with ESCA, a high HSPD1 expression is linked to a poor outcome. As suggested by Cox analysis results combined with ROC (receiver operating characteristic) graph, HSPD1 is an independent outcome predictor for the ESCA population and had a diagnostic value. Moreover, HSPD1 is linked to immunofiltration, genetic alteration and methylation in ESCA, which is also involved in biological processes, such as chaperonin-containing T-complex, PI3K/Akt signalling pathway, and thyroid hormone signalling pathway. According to a final analysis of drug susceptibility, low HSPD1 expression is correlated with resistance to 23 drugs. This phenomenon provided new insights for the probable predictor role of HSPD1 in the ESCA diagnosis and prognosis.
    Keywords: HSPD1; Esophageal cancer; Bioinformatics; Prognosis; Diagnosis; Biomarkers.
    DOI: 10.1504/IJDMB.2021.10048132
  • Smart Variant Filtering   Order a copy of this article
    by Vladimir Kovacevic, Predrag Obradovic 
    Abstract: Variant filtering as a part of the genome reconstruction process is used for identifying falsely called variants. Availability of truth set variants published for several human DNA samples enabled the creation of the machine learning-based Smart Variant Filtering tool and framework for filtering germline variants. Conceptually, the framework consists of selecting an optimal machine learning algorithm, configuration, set of features, and producing a model used for filtering novel variants. rnWith direct comparison, we demonstrated that the presented solution outperforms variant filtering currently used within most secondary DNA analyses. Smart Variant Filtering increases the precision of called single nucleotide variants (removes false positives) for up to 0.2% while keeping the overall f-score higher by 0.12-0.27% than in existing solutions. The precision of calling insertions and deletions is increased up to 7.8%, while the f-score increase is in the range of 0.1 to 3.2%.
    Keywords: genomic variant filtering; variant calling; machine learning.

  • Comorbidities and risk factors impact of COVID-19 in Mexico: A Feature Utility Metrics Approach   Order a copy of this article
    by Eduardo Emmanuel Rodríguez López, Daniel Hernández González, Francisco Javier Álvarez Rodríguez, Julio Cesar Ponce Gallegos 
    Abstract: By applying Machine Learning, it is possible to determine the impact of main comorbidities and risk factors associated with COVID-19 based on an analysis of official Mexican Secretary of Health data. This analysis was performed using Feature Utility Metrics: Mutual Information (MI), Permutation Importance (PI), and Partial Dependence Plot (PDP) with two different learning models (RandomForest and XGBoost), finding similarities between these metrics. According to these models, the main comorbidities and risk factors associated with COVID-19 are Age, Gender, Obesity, Diabetes, and Hypertension. Regarding MI and PI (RandomForest), the main risk factor is Age, while for PI (XGBoost) is Obesity. Finally, the PDP graph for Age, shows that the associated probability of risk of COVID-19 infection increases considerably after 60 years old. Therefore, it was confirmed that the main comorbidities and risk factors associated with COVID-19 in Mexico are coherent with the diseases and conditions most present in the population.
    Keywords: Comorbidities; COVID-19 risk factors; mutual information; permutation importance; feature utility metrics.
    DOI: 10.1504/IJDMB.2021.10048434
  • Protein complex prediction based on dense subgraph merging   Order a copy of this article
    by Tushar Ranjan Sahoo, Swati Vipsita, Sabyasachi Patra 
    Abstract: Protein complex prediction is an essential task in cell biology to understand and analyze the protein-protein interaction networks, further bringing about the knowledge of many important biological functions. In this article, the authors presented a PROtein COmplex Prediction technique based on Dense Subgraph Merging (PROCOP), which considers the inherent organization of proteins and the regions with heavy interactions in PPI networks. The work is intended to isolate the dense regions of the PPI network by simply a neighbourhood search, followed by a merging strategy based on the weighted cluster density. Two or more dense regions are merged iteratively to produce biologically meaningful protein complexes. The predicted protein complexes are evaluated and analyzed using the PPI network of S. cerevisiae and Homosapiens. The performance of the proposed algorithm is at par with most of the existing algorithms and outperforms in terms of evaluation metrics like F-measure and accuracy.
    Keywords: biological network; protein complex; induced subgraph; subgraph merging; clustering.
    DOI: 10.1504/IJDMB.2021.10048571
  • Integrative Analysis of Molecular Genetic Targets and Pathways in Colorectal Cancer Through Screening Large?Scale Microarray Data   Order a copy of this article
    by Elif ONUR, Tuba DENKÇEKEN 
    Abstract: Our aim was to make comprehensive analyses of mRNAs and miRNAs in early diagnosis of Colorectal Cancer (CRC) via Principal Component Analysis (PCA)_based Unsupervised Feature Extraction (UFE) and additional bioinformatics approaches. miRNA and mRNA expression profiling studies of CRC in the GEO were downloaded. PCA_based UFE was used to define significant mRNA and miRNAs. The target genes of the identified miRNAs were determined, and the common gene clusters were determined with the mRNAs analyzed from GEO. Functional enrichment analysis was conducted with DAVID. PPI network was established with the STRING, and the mRNA-miRNA regulatory network was established with Cytoscape. Determined hub-miRNAs/hub-genes were verified using TCGA. PPI, Cytoscape, and TCGA verification analysis demonstrated that three hub-genes and five hub-miRNAs were found to be significant in CRC. Dysregulation of these may contribute to CRC development and may be considered a new target in CRC therapy.
    Keywords: Bioinformatics; Colorectal cancer; mRNA; miRNA; Microarray; Principal Component Analysis-based Unsupervised Feature Extraction.
    DOI: 10.1504/IJDMB.2021.10048645