Title: Improving post-processing optical character recognition documents with Arabic language using spelling error detection and correction

Authors: Iyad Abu Doush; Ahmed M. Al-Trad

Addresses: Computer Science Department, Yarmouk University, Irbid, Jordan ' Computer Science Department, Yarmouk University, Irbid, Jordan

Abstract: The optical character recognition (OCR) is used to convert scanned documents into text. The resulted text need to be validated for correctness. The problem increased when working on Arabic text because of the complexity of Arabic language. This research aims to explore the ways of improving OCR spell checking effectiveness by proposing a post-processing Arabic OCR system based on three different approaches: Microsoft Office Word with Google online suggestion system, Ayaspell spell checker with Google online suggestion system, and using Google online suggestion system alone. We have used precision and recall in order to evaluate the effectiveness of our proposed OCR post-processing. The results show that using Microsoft Office Word with Google outperform other approaches with accuracy of (0.49).

Keywords: post-processing; Arabic OCR; optical character recognition; Arabic spell checker; Arabic language; spelling errors; error detection; error corrections; precision and recall.

DOI: 10.1504/IJRIS.2016.082957

International Journal of Reasoning-based Intelligent Systems, 2016 Vol.8 No.3/4, pp.91 - 103

Received: 27 Oct 2014
Accepted: 27 Aug 2015

Published online: 17 Mar 2017 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article