Title: Hybrid methodologies for summarisation of Kannada language text documents

Authors: R. Jayashree; K. Srikanta Murthy; Basavaraj S. Anami

Addresses: Department of Computer Science, PES Institute of Technology, Bangalore, India ' Department of Computer Science, PES Institute of Technology, Bangalore, India ' Department of Computer Science, KLE Institute of Technology, Hubli, India

Abstract: The problem of information explosion is becoming a serious concern. In this regard, any new methodology developed to solve the issues related to information retrieval (a.k.a. Information Retrieval or IR) draws wide attention. Text summarisation is a predominant field of NLP which may provide promising solution to the issues stated earlier. Text summarisation or text document summarisation provides a quick and concise meaning of the document without even reading the whole document. In this work, we have developed hybrid methodologies for providing summary of a given document in the Kannada language. The approach is new as we have used combination of feature selection methods as a pre-processing step for summarisation. In this work, we have devised four different methodologies for text document summarisation, which focus on text extraction, which is an open approach as stated earlier: (a) summarisation based on keywords, (b) summarisation based on sentence ranking, (c) summarisation based on Jaccards' similarity score and (d) summarisation based on neural network approach.

Keywords: document summaries; text extraction; feature selection; keywords; sentence ranking; similarity scores; GSS; IDF; chi-square; neural networks; Jaccard similarity; thematic words; summarisation; Kannada language text; information retrieval.

DOI: 10.1504/IJKEDM.2014.066238

International Journal of Knowledge Engineering and Data Mining, 2014 Vol.3 No.1, pp.82 - 114

Received: 10 Feb 2014
Accepted: 01 Sep 2014

Published online: 08 Dec 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article