Article: Robust optimal sub-band wavelet cepstral coefficient method for speech recognition Journal: International Journal of Computer Aided Engineering and Technology (IJCAET) 2019 Vol.11 No.2 pp.163 - 173 Abstract: The objective of this paper is to propose a robust feature extraction technique for speech recognition system which is insusceptible in the adverse environments. Efficacy of the speech recognition system depends on the feature extraction method. This paper proposes an auditory scale like filter banks using optimal sub-band tree structuring based on wavelet transform. The optimised wavelet filter banks along with energy, logarithmic, discrete cosine transform and cepstral mean normalisation blocks form a robust feature extraction method. This method is validated on a hidden Markov model (HMM)-based single Gaussian isolated word recognition system for additive white Gaussian noise, street and airport noises with different noise levels. Compared with Fourier transform-based methods such as mel-frequency cepstral coefficient (MFCC) and perceptual linear predictive (PLP) methods, the wavelet transform-based method yielded significant improvement across all the noise levels. The experiments also performed with higher dimensions of MFCC features including delta, acceleration features (MFCC_D_A). This study proves that the outcome of wavelet transform-based method gives an increased recognition accuracy of 13% over MFCC_D_A for non-stationary noises. Inderscience Publishers - linking academia, business and industry through research

Title: Robust optimal sub-band wavelet cepstral coefficient method for speech recognition

Authors: John Sahaya Rani Alex; Nithya Venkatesan

Addresses: School of Electronics Engineering, VIT University, Chennai, Tamil Nadu, India ' School of Electrical Engineering, VIT University, Chennai, Tamil Nadu, India

Abstract: The objective of this paper is to propose a robust feature extraction technique for speech recognition system which is insusceptible in the adverse environments. Efficacy of the speech recognition system depends on the feature extraction method. This paper proposes an auditory scale like filter banks using optimal sub-band tree structuring based on wavelet transform. The optimised wavelet filter banks along with energy, logarithmic, discrete cosine transform and cepstral mean normalisation blocks form a robust feature extraction method. This method is validated on a hidden Markov model (HMM)-based single Gaussian isolated word recognition system for additive white Gaussian noise, street and airport noises with different noise levels. Compared with Fourier transform-based methods such as mel-frequency cepstral coefficient (MFCC) and perceptual linear predictive (PLP) methods, the wavelet transform-based method yielded significant improvement across all the noise levels. The experiments also performed with higher dimensions of MFCC features including delta, acceleration features (MFCC_D_A). This study proves that the outcome of wavelet transform-based method gives an increased recognition accuracy of 13% over MFCC_D_A for non-stationary noises.

Keywords: speech recognition; feature extraction; wavelet transform; robust; noisy environments; mel-frequency cepstral coefficient; MFCC; perceptual linear predictive; PLP.

DOI: 10.1504/IJCAET.2019.098137

International Journal of Computer Aided Engineering and Technology, 2019 Vol.11 No.2, pp.163 - 173

Received: 17 May 2016
Accepted: 08 Dec 2016
Published online: 05 Mar 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Robust optimal sub-band wavelet cepstral coefficient method for speech recognition

Keep up-to-date