Title: Unsupervised selection of informative genes in microarray gene expression data

Authors: Samaneh Liaghat; Eghbal G. Mansoori

Addresses: School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran ' School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran

Abstract: With respect to DNA microarray has produced massive amounts of gene expression data with high dimension in recent years, gene selection is one of the bottlenecks of gene expression datasets analysis. This paper presents a framework for unsupervised gene selection based on dependency maximisation between the samples similarity matrices of before and after deleting a gene, using a novel estimation of the Hilbert-Schmidt independence criterion (HSIC). The key idea is that elimination of genes which are redundant and/or have much relevancy with other genes does not have much effect on pairwise samples similarity. Also, to deal with diagonally dominant matrices, the dynamic range of matrix values is reduced. Additionally, gap statistic and k-means clustering methods are used to increase the speed of proposed methods. Experimental validation is conducted on several microarray gene expression datasets and the results show that our gene selection scheme works well in practice.

Keywords: DNA microarrays; gene expression data; unsupervised gene selection; Hilbert-Schmidt independence criterion; HSIC; bioinformatics; dependency maximisation; similarity matrices; gap statistics; k-means clustering.

DOI: 10.1504/IJAPR.2016.082237

International Journal of Applied Pattern Recognition, 2016 Vol.3 No.4, pp.351 - 367

Received: 23 Jun 2016
Accepted: 04 Jul 2016

Published online: 13 Feb 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article