Authors: Samaneh Liaghat; Eghbal G. Mansoori
Addresses: School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran ' School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
Abstract: With respect to DNA microarray has produced massive amounts of gene expression data with high dimension in recent years, gene selection is one of the bottlenecks of gene expression datasets analysis. This paper presents a framework for unsupervised gene selection based on dependency maximisation between the samples similarity matrices of before and after deleting a gene, using a novel estimation of the Hilbert-Schmidt independence criterion (HSIC). The key idea is that elimination of genes which are redundant and/or have much relevancy with other genes does not have much effect on pairwise samples similarity. Also, to deal with diagonally dominant matrices, the dynamic range of matrix values is reduced. Additionally, gap statistic and k-means clustering methods are used to increase the speed of proposed methods. Experimental validation is conducted on several microarray gene expression datasets and the results show that our gene selection scheme works well in practice.
Keywords: DNA microarrays; gene expression data; unsupervised gene selection; Hilbert-Schmidt independence criterion; HSIC; bioinformatics; dependency maximisation; similarity matrices; gap statistics; k-means clustering.
International Journal of Applied Pattern Recognition, 2016 Vol.3 No.4, pp.351 - 367
Received: 23 Jun 2016
Accepted: 04 Jul 2016
Published online: 13 Feb 2017 *