Title: Isolated spoken word recognition using packed-MFCC on padded-voice signal for unscripted languages

Authors: Rajdev Tiwari; Vidha Sharma; Ramesh Chandra Sahoo

Addresses: Greater Noida Institute of Technology, Plot No. 7, Knowledge Park-2, Greater Noida, India ' Greater Noida Institute of Technology, Plot No. 7, Knowledge Park-2, Greater Noida, India ' Utkal University, Vani Vihar Bhubaneswar, Odisha, India

Abstract: Voice-based applications like Alexa, Siri and Google Assistant have become very common these days. These voice-operated devices are based on scripted languages, which have their own set of alphabets, phonemes and grammar, whereas languages of oral tradition do not have all these. Because of the fundamental difference between languages of scripted and unscripted nature, techniques used for languages like English are found unfit to be used for languages of oral tradition. In this paper, an isolated-spoken word recognition system for unscripted languages is modelled. Model uses a packed-Mel frequency cepstral coefficients (MFCC) feature over padded-voice with support vector machine (SVM) as classifier. The model is tested and compared against various other statistical features with different classifiers like K-nearest neighbour (KNN) and stochastic gradient descent (SGD). SVM is found best in terms of recognition accuracy for data set of language Kurukh, spoken by Oraon community, having 8,900 samples.

Keywords: speech recognition; language translation; support vector machine; SVM; K-nearest neighbour; KNN; stochastic gradient descent; SGD; packed-MFCC; isolated word recognition; word error rate; WER; oral tradition languages; Oraon.

DOI: 10.1504/IJCVR.2022.121186

International Journal of Computational Vision and Robotics, 2022 Vol.12 No.2, pp.120 - 140

Received: 27 Aug 2020
Accepted: 29 Nov 2020

Published online: 28 Feb 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article