Title: Double array structures based on byte segmentation for n-gram

Authors: Masao Fuketa; Kazuhiro Morita; Jun-ichi Aoe

Addresses: Department of Information Science and Intelligent Systems, University of Tokushima, Minami Josanjima Tokushima-shi, Tokushima, 770-8506, Japan ' Department of Information Science and Intelligent Systems, University of Tokushima, Minami Josanjima Tokushima-shi, Tokushima, 770-8506, Japan ' Department of Information Science and Intelligent Systems, University of Tokushima, Minami Josanjima Tokushima-shi, Tokushima, 770-8506, Japan

Abstract: Retrieving keywords is used in many applications, and the most demands for retrieving keywords are speed and compactness. The double array is one of the retrieval methods, and combines fast retrieval speed with compactness. In previous researches for the double array, one-by-one-byte character in the key traversed in the trie. Therefore, this paper proposes retrieval algorithms and data structures for the double array method traversed every n byte. Moreover, the proposed method is applied to the compact double array which is the compression method of the double array. From experimental observations, the size of the proposed method becomes 41% compared with that of the original double array and 63% compared with that of the compact double array. The retrieval speed of the proposed method becomes 1.2 times faster than the original double array.

Keywords: trie; n-gram; compression; keyword retrieval; double array structures; byte segmentation; retrieval algorithms; data structures.

DOI: 10.1504/IJCAT.2015.071971

International Journal of Computer Applications in Technology, 2015 Vol.52 No.2/3, pp.110 - 116

Published online: 26 Sep 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article