Authors: Jasleen Kaur; Jatinderkumar R. Saini
Addresses: Shroff S.R. Rotary Institute of Chemical Technology, Block No. 402, Ankleshwar, Valia Road, Vataria, Gujarat, India; Uka Tarsadia University, Maliba Campus, Bardoli-Mahuva Road, Tarsadi, Barodli, Gujarat, India ' Narmada College of Computer Application, Bharuch, Gujarat, India; Uka Tarsadia University, Maliba Campus, Bardoli-Mahuva Road, Tarsadi, Barodli, Gujarat, India
Abstract: Automatic classification of poetic content is very challenging from the computational linguistic point of view. For library suggestion framework, poetries can be grouped on different measurements, for example, artist, day and age, assumptions, and topic. In this work, content-based Punjabi poetry classifier was built utilising Weka toolset. Four unique classes were manually populated with 2,034 poetries. NAFE, LIPA, RORE, PHSP classes comprises of 505, 399, 529 and 601 number of poems, individually. These poems were passed to different pre-processing sub stages, for example, tokenisation, noise removal, stop word removal, special symbol removal. An aggregate of 31,938 tokens was separated, after passing through pre-processing layer, and weighted using term frequency (TF) and term frequency-inverse document frequency (TF-IDF) weighting plan. Depending upon poetic elements of poetry, two different poetic features (orthographic and phonemic) were experimented to build a classifier using machine learning algorithms. Naive Bayes, support vector machine, hyper pipes, and K-nearest neighbour algorithms experimented with two poetic features. The results revealed that addition of poetic features does not boost the performance of Punjabi poetry classification task. Using poetic features, the best performing algorithm is SVM and highest accuracy (71.98%) is achieved considering orthographic features.
Keywords: classification; computational; poetic; linguistic; orthographic; phonemic.
International Journal of Computational Intelligence Studies, 2018 Vol.7 No.2, pp.124 - 137
Received: 16 Aug 2017
Accepted: 08 Dec 2017
Published online: 13 Sep 2018 *