Title: The impact of feature selection on text summarisation

Authors: R. Jayashree; K. Srikanta Murthy; Basavaraj S. Anami; Alex Pappachen James

Addresses: Department of Computer Science, PES Institute of Technology, Bangalore, India ' Department of Computer Science, PES School of Engineering, Bangalore, India ' Department of Computer Science, KLE Institute of Technology, Hubli, India ' Department of Electrical and Electronic Engineering, Griffith University, Kazakhstan

Abstract: The applicability of using feature selection methods for text document summarisation is relatively an unexplored topic in information retrieval. The ability of feature selection techniques to identify key features within the text document could produce better summaries. In this paper, we put this premise to test, by considering feature selection as an essential preprocessing step for text document summarisation. In this work, we have explored several feature selection methods and their role in text document summarisation. The corpus used is Technology Development for Indian Languages (TDIL) that consists of 483 documents belonging to four categories: aesthetics, commerce, social sciences and natural sciences.

Keywords: feature selection; text summarisation; text document summaries; feature extraction; rank; score; term frequency; Galavotti-Sebastiani-Simi; GSS; inverse document frequency; IDF; word occurrence count.

DOI: 10.1504/IJAPR.2014.068344

International Journal of Applied Pattern Recognition, 2014 Vol.1 No.4, pp.377 - 400

Received: 31 Dec 2013
Accepted: 15 May 2014

Published online: 10 Apr 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article