Title: Clustering-based word segmentation from off-line handwritten Uyghur text-line images

Authors: Askar Hamdulla; Aysadet Abliz; Abdusalam Dawut; Kamil Moydin; Palidan Tuerxun

Addresses: Institute of Information Science and Engineering, Xinjiang University, Urumqi 830046, China ' Institute of Information Science and Engineering, Xinjiang University, Urumqi 830046, China ' Institute of Information Science and Engineering, Xinjiang University, Urumqi 830046, China ' Institute of Information Science and Engineering, Xinjiang University, Urumqi 830046, China ' Institute of Information Science and Engineering, Xinjiang University, Urumqi 830046, China

Abstract: For the word segmentation of handwritten Uyghur text images, this paper proposes a segmentation method based on clustering algorithm. In this paper, firstly, the pre-processed text line images are projected to the vertical direction, which can get the initial probable segmentation points and record the blank spaces and text length between connected domains. By using clustering algorithm, the blank spaces are classified into two categories: 'within word' gap and 'between words' gap. Then the first mergence is completed according to the clustering results. For the existed phenomenon of over segmentation, one merging method based on threshold is proposed through the combination of text region length and blank space length so that the final segmentation points are obtained. And the experimental results show that this method can effectively solve the word segmentation problem in the handwritten text images.

Keywords: Uyghur handwritten text; word segmentation; clustering; colouring process.

DOI: 10.1504/IJICT.2020.106312

International Journal of Information and Communication Technology, 2020 Vol.16 No.3, pp.214 - 229

Received: 04 Jan 2019
Accepted: 27 Mar 2019

Published online: 02 Apr 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article