Authors: S. Asharaf
Addresses: Indian Institute of Information Technology and Management – Kerala, Technopark Campus, Trivandrum, Kerala 695581, India
Abstract: Support vector machines (SVMs) are hyperplane classifiers defined in a kernel induced feature space. The high computational and space requirements for solving the conventional SVM problem prohibit its use in applications involving large datasets. Core vector machine (CVM) is a suitable technique for scaling an SVM for large-scale pattern classification problems. But in applications where the datasets are unbalanced, the performance of CVM is observed to be poor both in terms of generalisation and training time. In such scenarios, the CVM performance highly depends on the orderings of data points belonging to the two classes within the dataset. In this paper, we propose two training schemes which improve the performance of CVM irrespective of the orderings of patterns belonging to different classes within the dataset. These methods employ a selective sampling-based training of CVM using novel kernel-based clustering algorithms. Empirical studies made on several synthetic and real world datasets show that the proposed strategies improve the performance of CVM on large datasets.
Keywords: support vector machines; SVM; CF tree; clustering; kernel function; selective sampling; core vector machines; CVM; pattern classification; training schemes.
International Journal of Granular Computing, Rough Sets and Intelligent Systems, 2013 Vol.3 No.1, pp.20 - 43
Received: 26 Mar 2012
Accepted: 23 Jan 2013
Published online: 22 May 2013 *