Title: Missing value imputation in DNA microarray gene expression data: a comparative study of an improved collaborative filtering method with decision tree based approach
Authors: Sujay Saha; Anupam Ghosh; Saikat Bandopadhyay; Kashi Nath Dey
Addresses: Department of Computer Science and Engineering, Heritage Institute of Technology, Kolkata 700107, India ' Department of Computer Science and Engineering, Netaji Subhash Engineering College, Kolkata, India ' Department of Computer Science and Engineering, Heritage Institute of Technology, Kolkata 700107, India ' Department of Computer Science and Engineering, University of Calcutta, Kolkata, India
Abstract: DNA microarray is used to study the expression levels of thousands of genes under various conditions simultaneously. Unfortunately, microarray experiments can generate datasets with multiple missing values. In this work, the approach proposed first (CFBRSTFDV), uses fuzzy difference vector (FDV) along with rough set based collaborative filtering that helps to estimate the missing values. Later on, we have also proposed a decision tree based approach combined with genetic algorithm GADTreeImpute to impute the same missing values. We have applied our proposed algorithms on three benchmark datasets, i.e., yeast gene expression data, human tumour cell and prostate cancer dataset. We have first measured the performance of both these proposed approaches by using RMSE metric. Later on the estimation is also validated by using classification process and the performance is measured by the metrics like % of classification accuracy, precision, recall, etc.
Keywords: missing value estimation; DNA microarray; collaborative filtering; fuzzy set theory; rough set theory; decision tree; genetic algorithm.
International Journal of Computational Science and Engineering, 2019 Vol.18 No.2, pp.130 - 139
Received: 03 Feb 2017
Accepted: 27 Nov 2017
Published online: 14 Feb 2019 *