Authors: Prabin Kumar Panigrahi; Nishikant Bele
Addresses: Indian Institute of Management Indore, Indore, 453556, Madhya Pradesh, India ' ITS Institute of Management, Greater Noida 201308, Uttar Pradesh, India
Abstract: Text mining in English language has been researched extensively in past and significant amount of resources, tools and techniques have been developed. India is a country of high language diversity. A large amount of textual data is available in Indian languages. Knowledge can be discovered from this text by applying text-mining techniques. Due to the characteristics of Indian languages, tools, techniques and resources available for mining text in English language cannot be applied directly to text in Indian languages. We could not find any comprehensive literature describing the research work related to mining of text written in Indian languages. In this paper, we review the research work done so far, availability of language resources and various challenges of text mining tasks in Indian languages.
Keywords: text mining; Indian languages; language corpora; feature extraction; language resources; classification; sentiment analysis; natural language processing; NLP; Hindi; India; Indian texts.
International Journal of Business Information Systems, 2016 Vol.23 No.2, pp.175 - 193
Received: 07 Feb 2015
Accepted: 26 Feb 2015
Published online: 05 Sep 2016 *