Title: A classification-based summarisation model for summarising text documents

Authors: M. Esther Hannah; Saswati Mukherjee

Addresses: Department of Computer Applications, St. Joseph's College of Engineering, Sholinganallur, Chennai, TN, India ' Department of Information Science and Technology, College of Engineering, Anna University, Guindy Campus, Chennai, TN, India

Abstract: The work presents a CBS: classification-based summarisation model that performs automatic summarisation of the text through classification. Summarisation systems are the need of the hour, since information is overloaded in the web and extracting informative sentences from a document is highly essential. We have used 60% of the documents in the DUC 2002 corpus for training and the remaining 40% for testing the CBS model. We have used 10%, 20%, and 30% of the total number of the sentences in the original input document as the summary length and evaluated those summaries. The CBS model is evaluated by two metrics: 1) compression ratios 2) ROUGE metrics. The results obtained by the framework on testing the CBS model reveal that it works to the optimality with respect to the other systems, and hence is a novel solution to text summarisation.

Keywords: training; classification; feature extraction; summarisation models; decision trees; document summaries; text documents; modelling; text summarisation.

DOI: 10.1504/IJICT.2014.063217

International Journal of Information and Communication Technology, 2014 Vol.6 No.3/4, pp.292 - 308

Received: 05 Mar 2013
Accepted: 24 Jun 2013

Published online: 26 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article