International Journal of Data Mining and Bioinformatics
These articles have been peer-reviewed and accepted for publication but are pending final changes, are not yet published and may not appear here in their final order of publication until they are assigned to issues. Therefore, the content conforms to our standards but the presentation (e.g. typesetting and proof-reading) is not necessarily up to the Inderscience standard. Additionally, titles, authors, abstracts and keywords may change before publication. Articles will not be published until the final proofs are validated by their authors.
Forthcoming articles must be purchased for the purposes of research, teaching and private study only. These articles can be cited using the expression "in press". For example: Smith, J. (in press). Article Title. Journal Title.
Articles marked with this shopping trolley icon are available for purchase - click on the icon to send an email request to purchase.
International Journal of Data Mining and Bioinformatics (6 papers in press)
An Integrated Approach for DNA-Damage Detection from Comet-Images of Drosophila Melanogaster by Mukerrem Bahar Baskir, Fahriye Zemheri Navruz Abstract: Image processing is a popular technique in data mining. Researchers can obtain various results from an image related to experimental study using this technique. In this study, we proposed an approach to make inference from comet assay images used for identification of genotoxins causing several disorders in chromosome and DNA structure. This proposed approach has three phases: i) Creating comet assay images after giving mineral oil (1.19 ?l/L) for 24-, 48- and 72-hours as diet to Drosophila melanogaster known as in vivo model organism. ii) Transforming these comet images into quantitative images using texture analysis in image processing, iii) Clustering the quantitative images in order to detect DNA damages in comet images by similarities of 24-, 48-, 72-hourly experiments and control group. The accuracy rate of clustering analysis is 95%. Consequently, this proposed approach reveals convenient and precise results for the detection of DNA damage in Drosophila Melanogaster. Keywords: Image processing; comet assay; texture; clustering; accuracy; Drosophila Melanogaster.
Gradient Boosting Tree for 1H-MRS Alzheimer Diagnosis by Defu Liu, Guowu Yang, Fengmao Lv, Yuchen Li, Jinzhao Wu Abstract: In recent years, increasing attention is drawn to early-onset alzheimer\'s disease (EOAD). As effective biomarkers for EOAD, the brain metabolites, measured by proton magnetic resonance spectroscopy (1H-MRS), are significantly sensitive to the brain metabolite changes in dementia patients. This work aims to design an effective EOAD computer-aided system through mining the 1H-MRS data with advanced machine learning techniques. Specifically, our method first adopts gradient boosting decision tree (GBDT) to learn the 1H-MRS biomarkers of EOAD patients, which are then used to construct the final classifier for Alzheimer diagnosis. To validate our proposal, we have conducted comprehensive experiments for evaluation and the experimental results clearly demonstrate the effectiveness of our method. Keywords: Early-onset Alzheimer\'s Disease; Proton Magnetic Resonance Spectroscopy; Alzheimer Biomarker; Gradient Boosting Decision Tree.
Evaluation of Pseudo-Haptic Interfaces for Perceiving Virtual Weights by Jun Lee, Jee-In Kim, HyeongSeok Kim Abstract: A task of lifting a virtual object is widely executed in a virtual reality (VR) environment. If users could experience the lifting task with more realistic perception of virtual weight, their senses of presence in a VR environment would greatly improve. Force Arrow is a pseudo-haptic interface for improving the virtual weight perception. It has been proposed and extended. It was proposed to create a cognitive illusion of virtual weights to a user who performed tasks of lifting virtual dumbbells with visual guidance. The user was guided to control velocity and force to perform the lifting tasks while experiencing the virtual weights of the dumbbells. It has been extended with another visual guidance by changing the sizes of the virtual dumbbells when their virtual weights are changed. The extended interface was called as Force Arrow 2. The two pseudo-haptic interfaces were compared in terms of their effects of generating the virtual weights and the usability. The experimental results showed that the Force Arrow 2 interface guided its users during the lifting tasks more effectively than its previous version. Keywords: Virtual Reality; Virtual Weight Perception; Pseudo Haptic Feedback; Virtual Object Manipulation.
A Fast and Novel Approach Based on Grouping and Weighted mRMR for Feature Selection and Classification of Protein Sequence Data by Kiranpreet Kaur, Nagamma Patil Abstract: Here,a three-stage feature selection approach has been proposed for feature vector obtained from protein sequence data. Along with the relevance and redundancy of features, the conflicting nature of features is also given importance in this method. In the first stage, features are ranked and most irrelevant features are removed, in the second stage, conflicting features are grouped together, and in third stage, a fast approach based on weighted Minimum Redundancy Maximum Relevance (wMRMR) has been proposed and applied on grouped features. Further, to reduce the time consumed for feature selection, third stage has been implemented in a parallel fashion. The classification methods like Decision Tree, Naive Bayes and k-Nearest Neighbor are used to analyze the performance of the proposed approach. It is observed that the proposed approach has increased classification accuracy results in comparison to the state of the art methods. Along with that, this method reduces the computations involved and when applied in a parallel fashion results in drastic reduction in time consumption. Keywords: bioinformatics; feature selection; protein sequence data; filter method; mRMR; classification.
The Correlation-Based Redundancy Ensemble Multi-Filter for Gene selection by Abdulrauf Sharifai, Zurinahni Binti Zainol Abstract: Microarray data analysis is infamously challenging as it comprises significant number of features (genes), but with small samples. Gene selection attempts to find a highly discriminative subset of genes for cancer detection and classification. In the literature, various methods have been proposed for gene selection; however, most existing gene selection methods predominantly focused on selecting relevant gene subsets. However, the selected genes often comprised redundant genes in the training data, which may reduce the performance and increase the complexity of the learning algorithm. This paper proposes a Correlation-Based Redundancy Multiple Filter Approach (CBRMFA). Three filter methods are employed to select the relevant gene subsets with diverse classification ability. The top N ranking genes with the highest-ranking scores from each filter are combined to form a new ranking list. A Correlation-Based Redundancy is utilized to eliminate the redundant genes in the data set. Finally, A wrapper approach, sequential forward search algorithm, is used to select the optimal gene subsets. Experimental results on four benchmark microarray datasets show that our proposed CBRMFA methods achieved outstanding results in two out of four datasets in terms of classification accuracy sensitivity, specificity and minimum of genes selected compared with the state-of-the-art algorithms. Keywords: Multi-filter; Correlation-Based Redundancy; ensemble method; Microarray data set; gene selection.
A New Scalable Approach for Missing Value Imputation in High-Throughput Microarray Data on Apache Spark by Madhuri Gupta, Bharat Gupta Abstract: Data acquisition of high-dimensional data such as gene expression and proteomic data are performed using High-Throughput Technology (HTT). Data extracted using HTT, contains the large amount of missing value. These missing values in dataset degrade the performance of analytical techniques. Gene expression data plays a vital role in the healthcare research therefore reconstruction of missing value is a challenging task. In the research work carried out, a scalable technique PC-ImNN is proposed that stands for Pearson correlation involving with Monte Carlo and modified Nearest Neighbor method to predict the missing value. Monte Carlo is the technique that use the procedure of repeated random sampling to make numerical estimations of unknown parameters. Pearson correlation combined with Monte Carlo to maintain the distribution of estimated data-points. Nearest Neighbor technique is applied to find the nearest estimated data-points. Proposed model is compared with five existing imputation techniques (missForest, wNNSelect, KNNimpute, missPALLasso, mean). The result shows that proposed technique performs better than other imputation technique in term of mean square error and imputation accuracy. In the research work, Apache Spark data processing engine is used to make the proposed technique scalable and it helps to speed up the performance. Keywords: Missing Value; Pearson’s Correlation; K-Nearest Neighbor; Mean Square Error; Monte Carlo Method; SVM; Microarray Data.