Title: Modelling on web summarisation based on structure analysis and vectorisation similarity

Authors: Kai Gao; Hong-xia Ma; Radha Ganesan

Addresses: School of Information Science and Engineering, Hebei University of Science and Technology, Hebei 050018, China ' School of Information Science and Engineering, Hebei University of Science and Technology, Hebei 050018, China ' System Engineering and Architecture Team, IBM India Private Limited, No. 22/14, North Gangai Amman Koil 4th Street, Vadapalani, Chennai, Tamil Nadu, 600026, India

Abstract: With the rapid development of the internet, the useful information extraction has become increasingly important. Web summarisation, as the art of abstracting key contents from huge web data, has become an integral part of search engines and digital libraries. As the weighted keywords can be considered as condensed versions of the content, on the basis of the statistics, this paper proposes a novel summarisation approach based on structure analysis and keyword vectorisation similarity. The structure vector space model, the candidate selection and the summarisation generation are also applied in this novel approach. The experimental results show that this approach is feasible. Existing problems and further works are also presented at the end of the paper.

Keywords: modelling; web summarisation; document layer model; DLM; vectorisation similarity; internet; information extraction; information retrieval; search engines; digital libraries; weighted keywords; web pages; structure analysis.

DOI: 10.1504/IJMIC.2013.057570

International Journal of Modelling, Identification and Control, 2013 Vol.20 No.4, pp.368 - 378

Published online: 27 Sep 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article