Title: A new image binarisation technique for segmentation of text from digital images

Authors: Ranjit Ghoshal; Sayan Das; Aditya Saha

Addresses: Department of Information Technology, St. Thomas' College of Engineering and Technology, Kolkata, West Bengal, India ' Department of Computer Science, St. Thomas' College of Engineering and Technology, Kolkata, West Bengal, India ' Department of Computer Science, St. Thomas' College of Engineering and Technology, Kolkata, West Bengal, India

Abstract: Text segmentation in digital images is requisite for many image analysis and interpretation tasks. In this article, we have proposed an effective binarisation technique towards text segmentation from digital images. This image binarisation technique creates numerous texts as well as non-text connected components. Next, it is required to separate the possible text components from the obtained connected components. Further, to distinguish between text and non-text components, a set of features are considered. Then, during training, we consider the two feature files namely text and non-text prepared by us. Here, K-nearest neighbour (K-NN) and support vector machine (SVM) classifiers are considered for the present two class classification problem. The experiments are based on ICDAR 2011 born digital dataset. Our binarisation technique is also applied on publically available dataset street view text dataset (SVT), DIBCO 2009 and ICDAR 2011 Roust Reading Competition. We have accomplished in binarisation and as well as segmenting between text and non-text.

Keywords: binarisation; connected component; feature extraction; K-NN classifier; SVM classifier; text segmentation.

DOI: 10.1504/IJAIP.2021.114585

International Journal of Advanced Intelligence Paradigms, 2021 Vol.19 No.1, pp.84 - 100

Received: 10 Aug 2017
Accepted: 20 Dec 2017

Published online: 08 Apr 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article