Title: Speaking rate control based on time-scale modification and its effects on the performance of speech recognition

Authors: Jin Ah Kang; Seung Ho Choi

Addresses: School of Information and Communications, Gwangju Institute of Science and Technology, Gwangju 500-712, Korea ' Department of Electronic and IT Media Engineering, Seoul National University of Science and Technology, Seoul 139-743, Korea

Abstract: In this paper, we describe the influence of speaking rate on speech recognition. Speaking rate of input speech is controlled by applying a time-scale modification (TSM) algorithm and speaking rate normalisation is achieved by selecting a scale factor of TSM. The scale factor selection for training and testing of a speech recognition system is performed based on a maximum likelihood criterion during HMM decoding. From the experimental results, we showed that optimal selection of a TSM scale factor in speaking rate normalisation can reduce WER by 47.6% compared to the baseline.

Keywords: speaking rate control; time-scale modification; TSM; speech recognition.

DOI: 10.1504/IJESMS.2014.058421

International Journal of Engineering Systems Modelling and Simulation, 2014 Vol.6 No.1/2, pp.31 - 36

Available online: 23 Dec 2013 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article