Title: Automatic Punjabi poetry classification using machine learning algorithms with reduced feature set

Authors: Jasleen Kaur; Jatinderkumar R. Saini

Addresses: Shroff S.R. Rotary Institute of Chemical Technology, Ankleshwar, Gujarat, India; Uka Tarsadia University, Bardoli, Gujarat, India ' Narmada College of Computer Application, Bharuch, Gujarat, India; Uka Tarsadia University, Bardoli, Gujarat, India

Abstract: With the appearance of Unicode encoding, content in Indian dialects is continually expanding on the internet. Gathering of artistic writings in Punjabi language, particularly poetry, is expanding day by day on the web. In this way, grouping of poems, as indicated by topic, is a critical errand. Classification of poems is very challenging in computational linguistic point of view. Manual collection of 240 poems in four categories is done and passed to pre-processing phase. Gain ratio is used for ranking features. K-nearest neighbour (k-KNN), Naïve Bayes (NB), support vector machine (SVM) and hyperpipes (HP) are trained and tested. Outcomes indicate that Naïve Bayes outperformed all other classifiers utilising 60% top ranked features and hyperpipes is the least efficient classifier. Result additionally demonstrates 15% increase in accuracy by utilising gain ratio as feature selection technique.

Keywords: classification; gain ratio; Punjabi poetry; naive Bayes; support vector machines; SVM; hyperpipes; poem classification; machine learning; reduced feature sets; Indian dialects; India; poems; K-nearest neighbour; kNN; feature selection.

DOI: 10.1504/IJAISC.2016.081353

International Journal of Artificial Intelligence and Soft Computing, 2016 Vol.5 No.4, pp.311 - 319

Received: 05 Jan 2016
Accepted: 24 Aug 2016

Published online: 05 Jan 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article