Title: Fuzzy text/non-text classification of document images based on morphological operator, wavelet transform, and strong feature vector

Authors: Mobina Ranjbar Malidareh; Amir Masoud Molaei

Addresses: Electrical and Computer Engineering Faculty, Babol Noshirvani University of Technology, Babol, Iran ' Institute of Electronics, Communications, and Information Technology, Queen's University Belfast, Belfast, UK

Abstract: In text retrieval systems, the classification of textual and non-textual content is known as an introduction to accessing semantic information in document images. In this paper, a new structure based on morphological operator, wavelet transform, and strong feature vector extraction is proposed for classifying textual and non-textual content in document images regardless of text language. In this structure, the image is segmented by an effective mechanism. By training the pattern of textual and non-textual areas in the images, the text and non-text regions are determined by a fuzzy classifier. The texture features such as coarseness, directionality, contrast and roughness, and features extracted from the wavelet transform sub-bands are used to classify and label the regions. The proposed method is evaluated on a database of textual and non-textual images derived from document images available on the Internet. The simulation results show the high efficiency of the proposed method in the segmentation and classification of the image components. It provides an accuracy of 90.1% for the classification of image regions.

Keywords: fuzzy classification; morphological operator; segmentation; strong feature vector; text/non-text separation; wavelet transform.

DOI: 10.1504/IJCVR.2024.141814

International Journal of Computational Vision and Robotics, 2024 Vol.14 No.6, pp.677 - 692

Received: 06 Dec 2021
Accepted: 15 Mar 2023

Published online: 02 Oct 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article