Title: Supervised and semi-supervised learning in text classification using enhanced KNN algorithm: a comparative study of supervised and semi-supervised classification in text categorisation

Authors: M.A. Wajeed; T. Adilakshmi

Addresses: School of Computer Science & Informatics, Sreenidhi Institute of Science & Technology, Ghatkesar, Hyderabad, India ' Department of CSE, Vasavi College of Engineering, Ibrahimbagh, Hyderabad, India

Abstract: To make efficient decisions, knowledge in terms of experience is needed that can be obtained from the process of learning. The present paper's aim and objective are to explore the learning process in text classification using semi-supervised learning paradigm and compare the results obtained with the supervised learning classifier's accuracy. Semi-supervised learning can be applied when limited amount of training data is available. In traditional K-nearest neighbour algorithm all features are given similar weights in all classes which is not reasonable. Few features may play vital role in some classes and in others there presence has no impact. In the present paper, exploration of assigning different weights to the features in different classes based on the concept of variance is discussed. Finally to gain insight in semi-supervised learning paradigm, supervised and semi-supervised learning paradigm in text classification are compared. Results obtained show that the semi-supervised learning paradigm can be applied in cases where very limited training data is available, but still reasonable classifier accuracy can be obtained.

Keywords: text classification; semi-supervised learning; vector generation; variance; enhanced KNN; K-nearest neighbour; supervised learning; text categorisation; classifier accuracy.

DOI: 10.1504/IJISTA.2012.052497

International Journal of Intelligent Systems Technologies and Applications, 2012 Vol.11 No.3/4, pp.179 - 195

Received: 07 Mar 2012
Accepted: 11 Sep 2012

Published online: 06 Mar 2013 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article