Title: Irrelevant gene elimination for Partial Least Squares based Dimension Reduction by using feature probes

Authors: Xue-Qiang Zeng, Guo-Zheng Li, Geng-Feng Wu, Jack Y. Yang, Mary Qu Yang

Addresses: School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China. ' Department of Control Science and Engineering, Tongji University, Shanghai 201804, China. ' School of Computer Engineering and Science, Shanghai University, Shanghai 200072, China. ' Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Harvard University, Boston, Massachusetts 02114, USA. ' National Human Genome Research Institute, National Institutes of Health (NIH), US Department of Health and Human Services, Bethesda, MD 20852, USA; Oak Ridge, D.O.E., USA

Abstract: It is hard to analyse gene expression data which has only a few observations but with thousands of measured genes. Partial Least Squares based Dimension Reduction (PLSDR) is superior for handling such high dimensional problems, but irrelevant features will introduce errors into the dimension reduction process. Here, feature selection is applied to filter the data and an algorithm named PLSDRg is described by integrating PLSDR with gene elimination, which is performed by the indication of t-statistic scores on standardised probes. Experimental results on six microarray data sets show that PLSDRg is effective and reliable to improve generalisation performance of classifiers.

Keywords: PLS; partial least squares; dimension reduction; gene selection; microarray analysis; data mining; bioinformatics; irrelevant gene elimination; feature selection; gene expression; classifier performance.

DOI: 10.1504/IJDMB.2009.023886

International Journal of Data Mining and Bioinformatics, 2009 Vol.3 No.1, pp.85 - 103

Received: 02 Oct 2007
Accepted: 19 Jul 2008

Published online: 17 Mar 2009 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article