Title: Research on double-array-trie tree-based lexicon and its application on micro-blog content analysing

Authors: Kai Gao; Er-Liang Zhou; Dong-Ru Ruan; Radha Ganesan

Addresses: School of Information Science and Engineering, Hebei University of Science and Technology, Hebei 050018, China ' School of Information Science and Engineering, Hebei University of Science and Technology, Hebei 050018, China ' School of Information Science and Engineering, Hebei University of Science and Technology, Hebei 050018, China ' System Engineering and Architecture Team, IBM India Private Limited, No. 22/14, North Gangai Amman Koil 4th Street, Vadapalani Chennai, Tamil Nadu 600008, India

Abstract: This paper presents a novel algorithm on double-array-trie tree-based lexicon construction and its corresponding application on Chinese segmentation. Compared with the traditional hash and binary-based algorithms, the proposed approach can enhance the space utilisation and the retrieval efficiency so as to minimise the unnecessary comparison. This paper also presents its application both on micro-blog content analysis and the public opinion discovery. The experimental result and the application show the feasibility of the approach, and the existing problems and the future works are also presented.

Keywords: segmentation; double-array-trie tree; microblogs; tree-based lexicon; blog content analysis; space utilisation; retrieval efficiency; public opinion discovery.

DOI: 10.1504/IJCAT.2015.073594

International Journal of Computer Applications in Technology, 2015 Vol.52 No.4, pp.277 - 284

Published online: 13 Dec 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article