Title: Design and analysis of genetic algorithm based Chinese keyword extracting

Authors: Kai Gao; Hua-Ping Zhang; Yun-Feng Xu; Guo-Jiang Gao; Yang-Jie Li

Addresses: School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang city, Hebei Province 050051, China ' School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang city, Hebei Province 050051, China ' School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang city, Hebei Province 050051, China ' School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang city, Hebei Province 050051, China ' School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang city, Hebei Province 050051, China

Abstract: Analysing and extracting useful knowledge effectively from the web data is becoming more and more important. As the weighted keywords can be considered as the condensed versions of documents, this paper presents the novel Chinese keyword extraction algorithm based on genetic algorithm, together with paragraph analysing, Chinese segmentation, synonymous and unlisted-term processing. On the basis of the genetic algorithm training and the lead of the extracted terms results given by the experts manually, the genetic algorithm based approach can present an optimised and useful results, especially in some domains. It can be used to train the term weights within the lexicons. The experimental results and the analysis show the feasibility of the approach.

Keywords: text modelling; keyword extraction; unlisted terms; lexicons; genetic algorithms; Chinese keywords; information retrieval.

DOI: 10.1504/IJCAT.2013.055564

International Journal of Computer Applications in Technology, 2013 Vol.48 No.1, pp.27 - 35

Published online: 31 Jul 2013 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article