Authors: Vikas K. Garg, M.N. Murty
Addresses: Department of Computer Science and Automation, Indian Institute of Science, Bangalore-560 012, India. ' Department of Computer Science and Automation, Indian Institute of Science, Bangalore-560 012, India
Abstract: Training SVMs on high dimensional feature vectors in one shot incurs high computational cost. A low dimensional representation reduces computational overhead and improves the classification speed. Low dimensionality also reduces the risk of over-fitting and tends to improve the generalisation ability of classification algorithms. For many important applications, the dimensionality may remain prohibitively high despite feature selection. In this paper, we address these issues primarily in the context of handwritten digit data. In particular, we make the following contributions: 1) we introduce the α-minimum feature over (α-MFC) problem and prove it to be NP-hard; 2) investigate the efficacy of a divide-and-conquer ensemble method for SVMs based on segmentation of the feature space (FS-SVMs); 3) propose a greedy algorithm for finding an approximate α-MFC using FS-SVMs.
Keywords: dimensionality reduction; classification; greedy algorithms; support vector machines; SVMs; approximation algorithms; feature selection; feature subspace; handwritten digits; digit recognition; handwriting.
International Journal of Data Mining, Modelling and Management, 2009 Vol.1 No.4, pp.411 - 436
Published online: 29 Oct 2009 *Full-text access for editors Access for subscribers Purchase this article Comment on this article