Title: A fast search method of similar strings from dictionaries

Authors: Masao Fuketa, El-Sayed Atlam, Nobuo Fujisawa, Hiroshi Hanafusa, Kazuhiro Morita, Jun-ichi Aoe

Addresses: Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan. ' Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan. ' Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan. ' Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan. ' Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan. ' Department of Information Science and Intelligent Systems, University of Tokushima, Tokushima 770-8506, Japan

Abstract: The World Wide Web is growing ever more rapidly, and there are benefits from rich information. Moreover, demands for retrieving similar strings to an input string from dictionaries have been increasing. The edit distance is necessary to retrieve information from a large amount of data using the similarity between two strings. However, drawback of this method is time consumption because the input string must be compared with all strings in dictionaries. This study proposes a new technique for retrieving similar strings from dictionaries at high speed. The method presented can retrieve all similar strings 14 times faster than unigram methods although the edit distance is 3.

Keywords: edit distance; similar strings; N-gram; substrings; dictionaries; information retrieval.

DOI: 10.1504/IJCAT.2011.041655

International Journal of Computer Applications in Technology, 2011 Vol.40 No.4, pp.265 - 272

Published online: 28 Jul 2011 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article