Article: Application of deep learning approach for recognition of voiced Odia digits Journal: International Journal of Computational Science and Engineering (IJCSE) 2022 Vol.25 No.5 pp.513 - 522 Abstract: Automatic speech recognition in a regional language like Odia is a challenging field of research. Voiced Odia digit recognition helps in designing automatic voice dialler systems. In this study, a deep learning approach is used for the recognition of voiced Odia digits. The spectrogram representation of voiced samples is given as the input to the deep learning models after considering the feature extraction using MFCC. Various performance metrics are obtained by considering several experiments with different epoch sizes and variation in the dataset using the train-validate-test ratio. Experimental outcomes reveal that the CNN model provides improved accuracy of 91.72% in epoch size of 500 with a split ratio of 80-10-10 as compared to the other two models that use VSL and DNN. From the reported outcome it unravels that, the proposed CNN model has better average recognition accuracy as compared with contemporary models like HMM and SVM. Inderscience Publishers - linking academia, business and industry through research

Title: Application of deep learning approach for recognition of voiced Odia digits

Authors: Prithviraj Mohanty; Jyoti Prakash Sahoo; Ajit Kumar Nayak

Addresses: Department of Computer Science and Information Technology, ITER, S'O'A (Deemed to be University), Bhubaneswar, India ' Department of Computer Science and Information Technology, ITER, S'O'A (Deemed to be University), Bhubaneswar, India ' Department of Computer Science and Information Technology, ITER, S'O'A (Deemed to be University), Bhubaneswar, India

Abstract: Automatic speech recognition in a regional language like Odia is a challenging field of research. Voiced Odia digit recognition helps in designing automatic voice dialler systems. In this study, a deep learning approach is used for the recognition of voiced Odia digits. The spectrogram representation of voiced samples is given as the input to the deep learning models after considering the feature extraction using MFCC. Various performance metrics are obtained by considering several experiments with different epoch sizes and variation in the dataset using the train-validate-test ratio. Experimental outcomes reveal that the CNN model provides improved accuracy of 91.72% in epoch size of 500 with a split ratio of 80-10-10 as compared to the other two models that use VSL and DNN. From the reported outcome it unravels that, the proposed CNN model has better average recognition accuracy as compared with contemporary models like HMM and SVM.

Keywords: automatic speech recognition; ASR; convolutional neural network; CNN; deep neural network; DNN; MFCC; HMM; SVM; spectrogram.

DOI: 10.1504/IJCSE.2022.126254

International Journal of Computational Science and Engineering, 2022 Vol.25 No.5, pp.513 - 522

Received: 01 May 2021
Accepted: 01 Sep 2021
Published online: 18 Oct 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Application of deep learning approach for recognition of voiced Odia digits

Keep up-to-date