Authors: John Sahaya Rani Alex; Nithya Venkatesan
Addresses: School of Electronics Engineering, VIT University, Chennai, Tamil Nadu, India ' School of Electrical Engineering, VIT University, Chennai, Tamil Nadu, India
Abstract: The objective of this paper is to propose a robust feature extraction technique for speech recognition system which is insusceptible in the adverse environments. Efficacy of the speech recognition system depends on the feature extraction method. This paper proposes an auditory scale like filter banks using optimal sub-band tree structuring based on wavelet transform. The optimised wavelet filter banks along with energy, logarithmic, discrete cosine transform and cepstral mean normalisation blocks form a robust feature extraction method. This method is validated on a hidden Markov model (HMM)-based single Gaussian isolated word recognition system for additive white Gaussian noise, street and airport noises with different noise levels. Compared with Fourier transform-based methods such as mel-frequency cepstral coefficient (MFCC) and perceptual linear predictive (PLP) methods, the wavelet transform-based method yielded significant improvement across all the noise levels. The experiments also performed with higher dimensions of MFCC features including delta, acceleration features (MFCC_D_A). This study proves that the outcome of wavelet transform-based method gives an increased recognition accuracy of 13% over MFCC_D_A for non-stationary noises.
Keywords: speech recognition; feature extraction; wavelet transform; robust; noisy environments; mel-frequency cepstral coefficient; MFCC; perceptual linear predictive; PLP.
International Journal of Computer Aided Engineering and Technology, 2019 Vol.11 No.2, pp.163 - 173
Received: 17 May 2016
Accepted: 08 Dec 2016
Published online: 26 Dec 2018 *