Authors: R. Jayashree; K. Srikantamurthy; Basavaraj S. Anami
Addresses: Department of Computer Science, PES Institute of Technology, Bangalore, India ' Department of Computer Science, PES School of Engineering, Bangalore, India ' Department of Computer Science, KLE Institute of Technology, Hubli, India
Abstract: Better information retrieval techniques are needed to address the problem of information explosion. Major portion of data available online is text, which gives rise to huge feature space, hence, structured organisation and retrieval is very important. Information retrieval in the context of Indian languages is not uncommon, but IR in the South Indian language Kannada is quite new. This work focuses on sentence level text classification in the Kannada language, which is a fine grained approach to text classification; here, we look at the suitability of classifiers such as naïve Bayesian, bag of words and support vector machine (SVM) for the same. The dimensionality reduction technique using two different approaches: minimum term frequency and stop word removal methods are carried out in this work and the performance analysis of the above mentioned classifiers are noted.
Keywords: sentence level classification; Kannada language; text classification; naive Bayes; bag of words; BOW; single label; multi label; SVM models; support vector machines; information retrieval; India; minimum term frequency; stop word removal.
International Journal of Computational Vision and Robotics, 2015 Vol.5 No.3, pp.254 - 270
Received: 19 Jul 2013
Accepted: 10 Jun 2014
Published online: 20 Aug 2015 *