Multimodal systems for speech recognition Online publication date: Mon, 04-May-2020
by Orken Zh. Mamyrbayev; Keylan Alimhan; Beibut Amirgaliyev; Bagashar Zhumazhanov; Dinara Mussayeva; Farida Gusmanova
International Journal of Mobile Communications (IJMC), Vol. 18, No. 3, 2020
Abstract: In this article, we have implemented a system of multimodal recognition of Kazakh speech, based on speech and lip recognition. During the feature extraction phase, several methods have been used, such as voice activity detection (VAD), mel-frequency cepstral coefficients, perceptual linear prediction, relative perceptual linear prediction, and their first-order time derivatives. The main problems of recognition of Kazakh speech, VAD algorithms and speech segmentation, lip movement recognition are considered in the article. The description of probabilistic modelling of audiovisual speech based on coupled hidden Markov models (HMMs), information fusion methods with weight coefficients for audio and video speech modalities, and parametric representation of signals is provided. Quantitative results in multimodal recognition of continuous Kazakh speech indicate high accuracy and reliability of the automatic system. This approach has been used and compared in terms of computational time and recognition speed and gives very interesting results.
Online publication date: Mon, 04-May-2020
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Mobile Communications (IJMC):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email email@example.com