Article: Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling Journal: International Journal of Vehicle Autonomous Systems (IJVAS) 2020 Vol.15 No.3/4 pp.296 - 306 Abstract: Processing of children's speech is always challenging due to data scarcity and inefficient modelling input feature vectors. Accuracy of the modelling phase is always dependent upon extracted input features. In this paper, posterior probabilities are estimated over a phone set using first discriminatively trained model through neural-net pre-processor. This Neural Network (NN) classifier is first trained on original speech and then context-independent phone posterior probabilities are estimated on Tandem-NN system. The output vectors are employed as default features which are processed on Deep Neural Network-Hidden Markov Model (DNN-HMM) models. The original data-based system performance is improved by extending it using data augmentation. To see the robustness of the augmented speech various in-vehicle data are investigated and found that it is superior to that of other systems. Finally, we combine all augmented data to overcome data scarcity challenges to enhance system performance. It gives a relative improvement of 23.77% over the baseline system. Inderscience Publishers - linking academia, business and industry through research

Title: Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling

Authors: Virender Kadyan; Shashi Bala; Puneet Bawa; Mohit Mittal

Addresses: Department of Informatics, School of Computer Science, University of Petroleum & Energy Studies (UPES), Dehradun, Uttarakhand, India ' Department of Computer Science & Engineering, Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India ' Department of Computer Science & Engineering, Institute of Engineering and Technology, Chitkara University, Rajpura, Punjab, India ' Department of Information Science & Engineering, Kyoto Sangyo University, Kyoto, Japan

Abstract: Processing of children's speech is always challenging due to data scarcity and inefficient modelling input feature vectors. Accuracy of the modelling phase is always dependent upon extracted input features. In this paper, posterior probabilities are estimated over a phone set using first discriminatively trained model through neural-net pre-processor. This Neural Network (NN) classifier is first trained on original speech and then context-independent phone posterior probabilities are estimated on Tandem-NN system. The output vectors are employed as default features which are processed on Deep Neural Network-Hidden Markov Model (DNN-HMM) models. The original data-based system performance is improved by extending it using data augmentation. To see the robustness of the augmented speech various in-vehicle data are investigated and found that it is superior to that of other systems. Finally, we combine all augmented data to overcome data scarcity challenges to enhance system performance. It gives a relative improvement of 23.77% over the baseline system.

Keywords: children speech recognition; data augmentation; GFCC; multi-layer perceptron; Tandem-NN.

DOI: 10.1504/IJVAS.2020.116461

International Journal of Vehicle Autonomous Systems, 2020 Vol.15 No.3/4, pp.296 - 306

Received: 25 Mar 2020
Accepted: 04 Dec 2020
Published online: 26 Jul 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Developing in-vehicular noise robust children ASR system using Tandem-NN-based acoustic modelling

Keep up-to-date