Title: Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling
Authors: Virender Kadyan; Shashi Bala; Puneet Bawa; Mohit Mittal
Addresses: Department of Informatics, School of Computer Science, University of Petroleum & Energy Studies (UPES), Dehradun, Uttarakhand, India ' Department of Computer Science & Engineering, Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India ' Department of Computer Science & Engineering, Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India ' Department of Information Science & Engineering, Kyoto Sangyo University, Kyoto, Japan
Abstract: Processing of children's speech is always challenging due to data scarcity and inefficient modelling input feature vectors. Accuracy of the modelling phase is always dependent upon extracted input features. In this paper, posterior probabilities are estimated over a phone set using first discriminatively trained model through neural-net pre-processor. This Neural Network (NN) classifier is first trained on original speech and then context-independent phone posterior probabilities are estimated on Tandem-NN system. The output vectors are employed as default features which are processed on Deep Neural Network-Hidden Markov Model (DNN-HMM) models. The original data-based system performance is improved by extending it using data augmentation. To see the robustness of the augmented speech various in-vehicle data are investigated and found that it is superior to that of other systems. Finally, we combine all augmented data to overcome data scarcity challenges to enhance system performance. It gives a relative improvement of 23.77% over the baseline system.
Keywords: children speech recognition; data augmentation; GFCC; multi-layer perceptron; Tandem-NN.
International Journal of Vehicle Autonomous Systems, 2020 Vol.15 No.3/4, pp.296 - 306
Received: 25 Mar 2020
Accepted: 04 Dec 2020
Published online: 26 Jul 2021 *