Title: Speaking rate control based on time-scale modification and its effects on the performance of speech recognition
Authors: Jin Ah Kang; Seung Ho Choi
Addresses: School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju 500-712, Korea ' Department of Electronic and IT Media Engineering, Seoul National University of Science and Technology, Seoul 139-743, Korea
Abstract: In this paper, we describe the influence of speaking rate on speech recognition. Speaking rate of input speech is controlled by applying a time-scale modification (TSM) algorithm and speaking rate normalisation is achieved by selecting a scale factor of TSM. The scale factor selection for training and testing of a speech recognition system is performed based on a maximum likelihood criterion during HMM decoding. From the experimental results, we showed that optimal selection of a TSM scale factor in speaking rate normalisation can reduce WER by 47.6% compared to the baseline.
Keywords: speaking rate control; time-scale modification; TSM; speech recognition.
International Journal of Engineering Systems Modelling and Simulation, 2014 Vol.6 No.1/2, pp.31 - 36
Available online: 23 Dec 2013 *Full-text access for editors Access for subscribers Purchase this article Comment on this article