Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction
by Iyad Abu Doush; Ahmed M. Al-Trad
International Journal of Reasoning-based Intelligent Systems (IJRIS), Vol. 8, No. 3/4, 2016

Abstract: The optical character recognition (OCR) is used to convert scanned documents into text. The resulted text need to be validated for correctness. The problem increased when working on Arabic text because of the complexity of Arabic language. This research aims to explore the ways of improving OCR spell checking effectiveness by proposing a post-processing Arabic OCR system based on three different approaches: Microsoft Office Word with Google online suggestion system, Ayaspell spell checker with Google online suggestion system, and using Google online suggestion system alone. We have used precision and recall in order to evaluate the effectiveness of our proposed OCR post-processing. The results show that using Microsoft Office Word with Google outperform other approaches with accuracy of (0.49).

Online publication date: Fri, 17-Mar-2017

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Reasoning-based Intelligent Systems (IJRIS):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com