Title: The impact of feature selection on text summarisation
Authors: R. Jayashree; K. Srikanta Murthy; Basavaraj S. Anami; Alex Pappachen James
Addresses: Department of Computer Science, PES Institute of Technology, Bangalore, India ' Department of Computer Science, PES School of Engineering, Bangalore, India ' Department of Computer Science, KLE Institute of Technology, Hubli, India ' Department of Electrical and Electronic Engineering, Griffith University, Kazakhstan
Abstract: The applicability of using feature selection methods for text document summarisation is relatively an unexplored topic in information retrieval. The ability of feature selection techniques to identify key features within the text document could produce better summaries. In this paper, we put this premise to test, by considering feature selection as an essential preprocessing step for text document summarisation. In this work, we have explored several feature selection methods and their role in text document summarisation. The corpus used is Technology Development for Indian Languages (TDIL) that consists of 483 documents belonging to four categories: aesthetics, commerce, social sciences and natural sciences.
Keywords: feature selection; text summarisation; text document summaries; feature extraction; rank; score; term frequency; Galavotti-Sebastiani-Simi; GSS; inverse document frequency; IDF; word occurrence count.
DOI: 10.1504/IJAPR.2014.068344
International Journal of Applied Pattern Recognition, 2014 Vol.1 No.4, pp.377 - 400
Received: 31 Dec 2013
Accepted: 15 May 2014
Published online: 10 Apr 2015 *