Article: Hindi viseme recognition using subspace DCT features Journal: International Journal of Applied Pattern Recognition (IJAPR) 2014 Vol.1 No.3 pp.257 - 272 Abstract: Many automatic speech recognition (ASR) applications may have noisy background conditions; hence, robustness has become an important area of research. The automated recognition of human speech using features from the visual domain along with audio has proved to be useful under these conditions. In this paper, use of visual information is proposed to increase the recognition performance and robustness of Hindi viseme recognition system. A database has been prepared comprising of ten Hindi sentences uttered by five different speakers. The audio feature based on mel frequency cepstral coefficient (MFCC) has been extracted and subspace-based discrete cosine transform (DCT) was applied to extract visual features. The video-based features were integrated with audio features before using a discriminant function-based classifier for five Hindi viseme classes. Integration of visual features gave an improvement in viseme recognition in case of clean as well as noisy speech. Maximum improvement of 6.67% in accuracy of viseme recognition was found at −10 dB SNR using Mahalanobis distance-based classifier for speech corrupted by car noise. Inderscience Publishers - linking academia, business and industry through research

Title: Hindi viseme recognition using subspace DCT features

Authors: Priyanka Varshney; Omar Farooq; Prashant Upadhyaya

Addresses: Department of Electronics Engineering, Aligarh Muslim University, Aligarh 202002, India ' Department of Electronics Engineering, Aligarh Muslim University, Aligarh 202002, India ' Department of Electronics Engineering, Aligarh Muslim University, Aligarh 202002, India

Abstract: Many automatic speech recognition (ASR) applications may have noisy background conditions; hence, robustness has become an important area of research. The automated recognition of human speech using features from the visual domain along with audio has proved to be useful under these conditions. In this paper, use of visual information is proposed to increase the recognition performance and robustness of Hindi viseme recognition system. A database has been prepared comprising of ten Hindi sentences uttered by five different speakers. The audio feature based on mel frequency cepstral coefficient (MFCC) has been extracted and subspace-based discrete cosine transform (DCT) was applied to extract visual features. The video-based features were integrated with audio features before using a discriminant function-based classifier for five Hindi viseme classes. Integration of visual features gave an improvement in viseme recognition in case of clean as well as noisy speech. Maximum improvement of 6.67% in accuracy of viseme recognition was found at −10 dB SNR using Mahalanobis distance-based classifier for speech corrupted by car noise.

Keywords: automatic speech recognition; viseme recognition; phoneme recognition; feature extraction; mel frequency cepstral coefficient; MFCC; discrete wavelet transform; DWT; discrete cosine transform; DCT; Hindi; audio features; video features; car noise; vehicle noise.

DOI: 10.1504/IJAPR.2014.065768

International Journal of Applied Pattern Recognition, 2014 Vol.1 No.3, pp.257 - 272

Received: 30 Mar 2013
Accepted: 31 Aug 2013
Published online: 29 Nov 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Hindi viseme recognition using subspace DCT features

Keep up-to-date