Title: Research on born-digital image text extraction based on conditional random field

Authors: Zhang Jian; Cheng RenHong; Wang Kai; Zhao Hong

Addresses: College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China ' College of Software, Nankai University 94#, Weijin Road, Tianjin, China ' College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China ' College of Computer and Control Engineering, Nankai University, 94#, Weijin Road, Tianjin, China

Abstract: With the number of digital videos and digital images increasing tremendously in e-mails and web pages, text extraction from images becomes important more than ever. Born-digital images are generated directly with the computer and the text in the images is important to help the semantic understanding of the images. Although there are many methods proposed over the past years for text extraction from natural scene images, the text detection and extraction from born-digital images remains a challenge. This paper proposes a novel method to segment the text connected components (CCs) from a born-digital image. Firstly, binarisation is conducted on the given image to get all candidate text CCs based on wavelet theory. Secondly, classification is conducted on the extracted CCs to label text CCs based on conditional random field (CRF) - a probabilistic graph model that has been widely used in natural language processing. Experimental results show that the proposed method can effectively extract text from the born-digital images.

Keywords: born-digital images; text detection; text extraction; conditional random field; CRF; image binarisation; wavelet theory; probabilistic graph model; image segmentation; image label; digital images; semantics.

DOI: 10.1504/IJHPSA.2014.059873

International Journal of High Performance Systems Architecture, 2014 Vol.5 No.1, pp.39 - 49

Received: 13 Aug 2013
Accepted: 07 Nov 2013

Published online: 12 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article