Title: Enhanced classification for high-throughput data with an optimal projection and hybrid classifier

Authors: Joon Jin Song; Jingying Zhang

Addresses: Center for Statistical Research and Consulting, Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA ' Department of Mathematical Sciences, University of Arkansas, Fayetteville, AR 72701, USA

Abstract: High-throughput screening technologies recently developed allow scientists to conduct millions of biological and medical tests simultaneously and rapidly. A major bottleneck for the analysis is to reduce the inherent high dimensionality for subsequent analysis. Principal Component Analysis (PCA) is a popular tool for dimensionality reduction by selecting typically a few Principal Components (PCs) ranked by their variances, eigenvalues. Since this selection approach is not always effective in reducing dimensionality, we consider a different ranking criterion, the canonical variate criterion. To further enhance the classification performance, we propose an integrated classification framework to combine the criterion and two hybrid classification methods and compare with several popular classification methods using leave-one-out cross-validation. For illustration, three real high-throughput data sets are considered and analysed to illustrate the methods.

Keywords: PCA; principal component analysis; CVA; canonical variate analysis; hybrid classification methods; high throughput data; optimal projection; bioinformatics; high throughput screening; high dimensionality; dimension reduction; microarray data sets; nuclear magnetic resonance; NMR spectra data.

DOI: 10.1504/IJDMB.2014.057783

International Journal of Data Mining and Bioinformatics, 2014 Vol.9 No.1, pp.106 - 120

Received: 19 Mar 2011
Accepted: 15 Mar 2012

Published online: 21 Oct 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article