Title: Feature cluster selection for high-throughput data analysis

Authors: Lei Yu

Addresses: Department of Computer Science, Binghamton University, P.O. Box 6000, Binghamton, NY 13902-6000, USA

Abstract: Feature selection is effective in selecting predictive gene sets for microarray classification. However, the large number of predictive gene sets and the disparity among them presents a challenge for identifying potential biomarkers. To facilitate biomarker identification, we present a new data mining task, feature cluster selection, which selects from a full set of features a small number of coherent and predictive feature clusters. We provide both theoretical definition and empirical formulation for the new problem, and propose an efficient 3M algorithm. Experiments on microarray data have shown that the 3M algorithm can select predictive and statistically significant gene clusters.

Keywords: bioinformatics; data mining; feature selection; feature cluster selection; biomarker identification; high-throughput data; predictive gene sets; microarray classification; gene clusters.

DOI: 10.1504/IJDMB.2009.024850

International Journal of Data Mining and Bioinformatics, 2009 Vol.3 No.2, pp.177 - 191

Published online: 01 May 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article