Authors: M. Priya; R. Kalpana
Addresses: Bharathiyar College of Engineering and Technology, Karaikal, 609609, India ' Pondicherry Engineering College, Puducherry, 605014, India
Abstract: A lot of problems with natural language processing, data mining, information retrieval and bioinformatics can be legitimated as trying transformation. The task of the string transformation is, once the input string is given, the system generates the k most likely occurring output strings resultant to the input string. The existing method for approximate keyword search based on rules uses two processes called learning and generation which provides the improvement in both accuracy and efficiency of searching, but not to the expected level. A new genetic algorithm-based approach is introduced to generate rules and the generated rules are learned by applying maximum-a-likelihood function to select the best rule and produce a rule dictionary. The given query keyword is searched in database by constructing tree-based index called Aho-Corasick tree and carry out the pattern matching with the rule dictionary for retrieving the document even if it has some misspelling. The experimental result shows better enhancement in terms of both accuracy and efficiency when compared to existing methods.
Keywords: bigram dice coefficient; rule dictionary; divide-and-conquer; error correction; maximum-a-likelihood.
International Journal of Advanced Intelligence Paradigms, 2023 Vol.25 No.3/4, pp.374 - 381
Received: 13 Sep 2017
Accepted: 10 Mar 2018
Published online: 19 Jul 2023 *