Title: A novel distance-based iterative sequential KNN algorithm for estimation of missing values in microarray gene expression data

Authors: Chandra Das; Shilpi Bose; Matangini Chattopadhyay; Samiran Chattopadhyay

Addresses: Department of Information Technology, Netaji Subhash Engineering College, Kolkata, West Bengal, India ' Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, West Bengal, India ' School of Education Technology, Jadavpur University, Kolkata, West Bengal, India ' Department of Information Technology, Jadavpur University, Kolkata, West Bengal, India

Abstract: The presence of missing entries in DNA microarray gene expression datasets creates severe problems in downstream analysis because they require complete datasets. Though several missing value prediction methods have been proposed to solve this problem, they have limitations which may affect the performance of various analysis algorithms. In this regard, a novel distance based iterative sequential K-nearest neighbour imputation method (ISKNNimpute) has been proposed. The proposed distance is a hybridisation of modified Euclidean distance and Pearson correlation coefficient. The proposed method is a modification of KNN estimation in which the concept of reuse of estimation is considered using both iterative and sequential approach. The performance of the proposed ISKNNimpute method is tested on various time-series and non time-series microarray datasets comparing with several widely used existing imputation techniques. The experimental results confirm that the ISKNNimpute method consistently generates better results compared to other existing methods.

Keywords: DNA microarrays; gene expression data; missing value estimation; correlated genes; co-expressed genes; K-nearest neighbour; kNN imputation; missing values; modified Euclidean distance; Pearson correlation coefficient; bioinformatics.

DOI: 10.1504/IJBRA.2016.080719

International Journal of Bioinformatics Research and Applications, 2016 Vol.12 No.4, pp.312 - 342

Received: 24 Feb 2015
Accepted: 14 Mar 2016

Published online: 05 Dec 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article