Title: A new unsupervised method for boundary perception and word-like segmentation of sequence
Authors: Arko Banerjee; Arun K. Pujari; Bibudhendu Pati; Chhabi Rani Panigrahi
Addresses: College of Engineering and Management, Kolaghat, West Bengal, India; Biju Patnaik University of Technology, Rourkela, Odisha, India ' Central University of Rajasthan, Ajmer, Rajasthan, India ' Rama Devi Women's University, Bhubaneswar, Odisha, India ' Rama Devi Women's University, Bhubaneswar, Odisha, India
Abstract: In cognitive science research on natural language processing, motor learning and visual perception, perceiving boundary points and segmenting a continuous string or sequence is one of the fundamental problems. Boundary perception can also be viewed as a machine learning problem; supervised or unsupervised learning. In supervised learning approach for determining boundary points for segmentation of a sequence, it is necessary to have some pre-segmented training examples. In unsupervised mode, the learning is accomplished without any training data hence, the frequency of occurence of symbols within the sequence is normally used as the cue. Most of earlier algorithms use this cue while scanning the sequence in forward direction. In this paper we propose a novel approach of extracting the possible boundary points by using bi-directional scanning of the sequence. We show here that such an extension from unidirectional to bi-directional is not trivial and requires judicious consideration of datastructure and algorithm. We here propose a new algorithm which traverses the sequence unidirectionally but extracts the information bi-directionally. Our method yields better segmentation which is demonstrated by rigorous experimentation on several datasets.
Keywords: boundary perception; sequence segmentation; trie datastructure.
DOI: 10.1504/IJCSE.2020.111437
International Journal of Computational Science and Engineering, 2020 Vol.23 No.3, pp.286 - 295
Received: 27 Jan 2020
Accepted: 22 Apr 2020
Published online: 26 Nov 2020 *