Title: Web document summarisation using pointwise mutual information (PMI) from web resources
Authors: Atul Kumar Srivastava; Dhiraj Pandey; Alok Aggarwal; Sunil Gupta
Addresses: APJ Abdul Kalam Technical University, Uttar Pradesh, Lucknow, 226031, India ' JSS Academy of Technical Education Noida, Noida, 201301, India ' University of Petroleum and Energy Studies, Dehradun, 248007, India ' University of Petroleum and Energy Studies, Dehradun, 248007, India
Abstract: Nowadays, large amount of data is generated over the internet. It is impossible for the humans to summarise such large chunks of bytes. Therefore, to deal with such challenges, automatic text summarisation systems are deployed. Text Summarisation is the field of data mining that highlights the relevance of important text in a document. In this paper, we proposed a web-based text summarisation approach that generates good quality summary based on total pointwise mutual information (TPMI) scores of the sentences. A sample document from DUC dataset is used which is pre-processed for tokenisation, stop words removal and stemming operations. Based on the extracted words, the TPMI is estimated by calculating the pointwise mutual information (PMI) of the occurrences of words on web search engine. To provide evidence for the robustness of our proposed system, proposed approach is compared with the well-known text summarisation techniques based on sentence length and mean score. The results show that our method outperforms the other techniques by exhibiting best results for closest mean score and generating good quality summary on sentences of different length.
Keywords: document summarisation; text summarisation; PMI; point-wise mutual information.
DOI: 10.1504/IJSSE.2022.127991
International Journal of System of Systems Engineering, 2022 Vol.12 No.4, pp.329 - 353
Received: 18 Aug 2021
Accepted: 25 Oct 2021
Published online: 03 Jan 2023 *