Title: Data shuffling and statistical analysis on microarray data for gene selection: a comparative study on filtering methods

Authors: Zejin Ding, Yan-Qing Zhang, Yichuan Zhao

Addresses: Department of Computer Science, Georgia State University, P.O. Box 3994, Atlanta, GA 30302, USA. ' Department of Computer Science, Georgia State University, P.O. Box 3994, Atlanta, GA 30302, USA. ' Department of Mathematic and Statistics, Georgia State University, 750 COE, 30 Pryor Street, Atlanta, GA 30303, USA

Abstract: Computational analysis have been broadly used to discover disease-relevant genes from microarray expression data. In this paper, we extend a traditional statistical metric to a second level to measure gene-disease relations, testing such relation whether can be replicated by randomly shuffling the gene expression data. The traditional metric can be considered as a first-level metric; the relevance of each gene is then verified through the second-level significance testing based on the first-level metric calculated on the original data and shuffled data. We show that this method can also produce high classification performance, compared with other filter-based methods.

Keywords: gene selection; feature selection; microarray data; statistical analysis; data shuffling; SVM; support vector machines; filtering methods.

DOI: 10.1504/IJFIPM.2010.039119

International Journal of Functional Informatics and Personalised Medicine, 2010 Vol.3 No.3, pp.183 - 203

Published online: 17 Mar 2011 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article