Authors: Te-Min Chang, Wen-Feng Hsiao
Addresses: Department of Information Management, National Sun Yat-sen University, 70, Lien-hai Road, Kaohsiung 804, Taiwan. ' Department of Information Management, National Pingtung Institute of Commerce, 51 Min-Sheng E. Road, Pingtung 900, Taiwan
Abstract: This research focuses on developing a hybrid automatic text summarisation approach, KCS, to enhance the quality of summaries. KCS employs the K-mixture probabilistic model to establish term weight distributions in a statistical sense. It further identifies the lexical relations between nouns and nouns, as well as nouns and verbs to derive the connective strength (CS) of nouns. Sentences are ranked and extracted according to the accumulated CS values they contain. We conduct two experiments to justify the proposed approach. The results show that the K-mixture model itself is more conducive to document classification than traditional TFIDF weighting scheme since the best macro F-measure increases from 0.63 to 0.67. It, however, is still no better than the more complex linguistic-based approach that takes noun|s CS into consideration. Most importantly, our proposed approach, KCS, performs best among all approaches considered (with the best macro F-measure of 0.8). It implies that KCS can extract more representative sentences from the document and its feasibility in text summarisation applications is thus justified.
Keywords: automatic text summarisation; statistical approach; linguistics; summary quality; probabilistic modelling; representative sentences; nouns; verbs.
International Journal of Intelligent Systems Technologies and Applications, 2011 Vol.10 No.2, pp.111 - 127
Published online: 11 Mar 2011 *Full-text access for editors Access for subscribers Purchase this article Comment on this article