Title: Document summarisation using combination and reduction of extracted sentences

Authors: Gautam Kumar Parai, Tejaswi Tenneti, Pranip Kumar Borah, Saurav Shah, Sudip Sanyal

Addresses: Indian Institute of Information Technology Allahabad, Deoghat, Jhalwa, Allahabad, 211012, India. ' Indian Institute of Information Technology Allahabad, Deoghat, Jhalwa, Allahabad, 211012, India. ' Indian Institute of Information Technology Allahabad, Deoghat, Jhalwa, Allahabad, 211012, India. ' Indian Institute of Information Technology Allahabad, Deoghat, Jhalwa, Allahabad, 211012, India. ' Indian Institute of Information Technology Allahabad, Deoghat, Jhalwa, Allahabad, 211012, India

Abstract: An ideal summariser should produce a summary which contains all the crucial information present in the original text, while conforming to the size of the summary. We propose a novel method to perform single-document summarisation of the English text. We start by combining semantically related sentences using a rule-based approach to avoid loss of important information and maintain coherence in the resulting summary. The rules for combination rely on surface indicators present in the sentence i.e., cue-phrases. Then we extract important text from the combined sentences using lexical chains. This is followed by a sentence-reduction step involving removal of superfluous phrases from the extracted sentences using a rule-based approach. The rules for sentence pruning use discourse trees generated using the intra-sentential rhetorical relations. The summaries produced by our system were agreed by human subjects to be more concise and coherent than extraction-based summaries for the same documents.

Keywords: document summaries; cue phrases; lexical chains; discourse structure; rhetorical relations; sentence combination; sentence extraction; sentence reduction; reasoning-based intelligent systems; English text.

DOI: 10.1504/IJRIS.2009.028018

International Journal of Reasoning-based Intelligent Systems, 2009 Vol.1 No.3/4, pp.191 - 199

Published online: 27 Aug 2009 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article