Title: Multi Level Feature Priority algorithm based text extraction from heterogeneous and hybrid textual images

Authors: Gopalan Chitrakala, D. Manjula

Addresses: Department of Computer Science and Engineering, Easwari Engineering College, Anna University, Chennai 600 089, Tamil Nadu, India. ' Department of Computer Science and Engineering, College of Engineering, Anna University, Chennai 600 025, Tamil Nadu, India

Abstract: This paper presents a unified approach for the extraction of text from heterogeneous and hybrid textual images (both scene text and caption text in an image) and document images with variations in illumination, transformation/perspective projection, font size and radially changing/angular text. The strength of this technique lies in producing small number of features at less running time for the extraction of text from heterogeneous images in various priority levels. Proposed feature selection algorithm is evaluated with three common Machine-Learning (ML) algorithms and effectiveness is shown by comparing with three feature selection methods. The qualitative analysis proves the encouraging performance of the proposed text extraction system in comparison with the edge-, Connected-Component- (CC) and texture-based text extraction algorithm.

Keywords: text extraction; feature selection; NSCT; non sub-sampled contourlet transform; grey-level run length matrix; caption text; scene text; document images.

DOI: 10.1504/IJSISE.2009.033759

International Journal of Signal and Imaging Systems Engineering, 2009 Vol.2 No.4, pp.183 - 195

Received: 12 Jun 2009
Accepted: 09 Jan 2010

Published online: 30 Jun 2010 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article