Title: Efficient data selection approach in projected feature space for fast training support vector machines

Authors: Sonia Chaibi; Mohamed Tayeb Laskri

Addresses: LRS Laboratory, Computer Science Department, Badji Mokhtar-Annaba University – UBMA, P.O. Box 12, Annaba, 23000 Annaba, Algeria ' LRS Laboratory, Computer Science Department, Badji Mokhtar-Annaba University – UBMA, P.O. Box 12, Annaba, 23000 Annaba, Algeria

Abstract: Support vector machines (SVMs) have shown superior performance compared to other machine learning techniques, especially in classification problems. Yet one limitation is the long computational training time which increases with the data size. This problem has been investigated thoroughly and different algorithms for classification have been used with various success rates. Among them, clustering techniques have shown a considerable success to reduce SVM's data training. However, once these solutions are used for large scale datasets it becomes clear that using only clustering approaches is insufficient. In this paper, we tackle the problem of how to combine clustering methods and feature reducing techniques to minimise efficiently SVM's complexity. Several experiments on different datasets show that the proposed solution can be a promised way for fast training SVMs on large scale datasets.

Keywords: support vector machines; K-means clustering; SVM training; linear discrimination analysis; feature reduction; data mining; data selection; projected feature space; fast training SVM.

DOI: 10.1504/IJBIDM.2014.068349

International Journal of Business Intelligence and Data Mining, 2014 Vol.9 No.3, pp.179 - 194

Received: 11 Jun 2014
Accepted: 10 Jul 2014

Published online: 10 Apr 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article