Intelligent typo correction for text mining through machine learning
by Yinghao Huang; Yi Lu Murphey; Yao Ge
International Journal of Knowledge Engineering and Data Mining (IJKEDM), Vol. 3, No. 2, 2015

Abstract: Typo detection and correction is an important process in many text mining applications. This research focuses on automatic typo detection and correction for processing text documents that are unstructured, contain many grammar and spelling errors, and have many self-invented terminologies that can be interpreted only through domain-specific knowledge. In this paper we present an intelligent typo detection and correction (ITDC) system. Its 'intelligence' is reflected by automatically identifying and accurately correcting a broad range of typos, from simple typos such as duplication, omission, transposition, substitution characters, to complex spelling errors, such as word boundary errors, unconventional use of acronyms, etc. ITDC utilises general language knowledge and domain-specific knowledge extracted by machine learning algorithms. It is evaluated through a case study that involves the automatic processing of automotive fault diagnostic text documents. The experiment results show that the proposed system outperforms some of the state-of-art spell checking systems.

Online publication date: Wed, 19-Aug-2015

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Knowledge Engineering and Data Mining (IJKEDM):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com