Authors: Ranjit Ghoshal; Sayan Das; Aditya Saha
Addresses: Department of Information Technology, St. Thomas' College of Engineering and Technology, Kolkata, West Bengal, India ' Department of Computer Science, St. Thomas' College of Engineering and Technology, Kolkata, West Bengal, India ' Department of Computer Science, St. Thomas' College of Engineering and Technology, Kolkata, West Bengal, India
Abstract: Text segmentation in digital images is requisite for many image analysis and interpretation tasks. In this article, we have proposed an effective binarisation technique towards text segmentation from digital images. This image binarisation technique creates numerous texts as well as non-text connected components. Next, it is required to separate the possible text components from the obtained connected components. Further, to distinguish between text and non-text components, a set of features are considered. Then, during training, we consider the two feature files namely text and non-text prepared by us. Here, K-nearest neighbour (K-NN) and support vector machine (SVM) classifiers are considered for the present two class classification problem. The experiments are based on ICDAR 2011 born digital dataset. Our binarisation technique is also applied on publically available dataset street view text dataset (SVT), DIBCO 2009 and ICDAR 2011 Roust Reading Competition. We have accomplished in binarisation and as well as segmenting between text and non-text.
Keywords: binarisation; connected component; feature extraction; K-NN classifier; SVM classifier; text segmentation.
International Journal of Advanced Intelligence Paradigms, 2021 Vol.19 No.1, pp.84 - 100
Received: 10 Aug 2017
Accepted: 20 Dec 2017
Published online: 28 Apr 2021 *