Title: A review of recent advances in text mining of Indian languages

Authors: Prabin Kumar Panigrahi; Nishikant Bele

Addresses: Indian Institute of Management Indore, Indore, 453556, Madhya Pradesh, India ' ITS Institute of Management, Greater Noida 201308, Uttar Pradesh, India

Abstract: Text mining in English language has been researched extensively in past and significant amount of resources, tools and techniques have been developed. India is a country of high language diversity. A large amount of textual data is available in Indian languages. Knowledge can be discovered from this text by applying text-mining techniques. Due to the characteristics of Indian languages, tools, techniques and resources available for mining text in English language cannot be applied directly to text in Indian languages. We could not find any comprehensive literature describing the research work related to mining of text written in Indian languages. In this paper, we review the research work done so far, availability of language resources and various challenges of text mining tasks in Indian languages.

Keywords: text mining; Indian languages; language corpora; feature extraction; language resources; classification; sentiment analysis; natural language processing; NLP; Hindi; India; Indian texts.

DOI: 10.1504/IJBIS.2016.078905

International Journal of Business Information Systems, 2016 Vol.23 No.2, pp.175 - 193

Received: 07 Feb 2015
Accepted: 26 Feb 2015

Published online: 05 Sep 2016 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article