Article: Recurrent neural network-based speech recognition using MATLAB Journal: International Journal of Intelligent Enterprise (IJIE) 2020 Vol.7 No.1/2/3 pp.56 - 66 Abstract: The purpose of this paper is to design an efficient recurrent neural network (RNN)-based speech recognition system using software with long short-term memory (LSTM). The design process involves speech acquisition, pre-processing, feature extraction, training and pattern recognition tasks for a spoken sentence recognition system using LSTM-RNN. There are five layers namely, an input layer, a fully connected layer, a hidden LSTM layer, SoftMax layer and a sequential output layer. A vocabulary of 80 words which constitute 20 sentences is used. The depth of the layer is chosen as 20, 42 and 60 and the accuracy of each system is determined. The results reveal that the maximum accuracy of 89% is achieved when the depth of the hidden layer is 42. Since the depth of the hidden layer is fixed for a task, increased performance can be achieved by increasing the number of hidden layers. Inderscience Publishers - linking academia, business and industry through research

Title: Recurrent neural network-based speech recognition using MATLAB

Authors: Praveen Edward James; Mun Hou Kit; Chockalingam Aravind Vaithilingam; Alan Tan Wee Chiat

Addresses: School of Engineering, Taylor's University, Taylor's University Lakeside Campus, No. 1, Jalan Taylor's, 47500 Subang Jaya, Selangor, Malaysia ' School of Engineering, Taylor's University, Taylor's University Lakeside Campus, No. 1, Jalan Taylor's, 47500 Subang Jaya, Selangor, Malaysia ' School of Engineering, Taylor's University, Taylor's University Lakeside Campus, No. 1, Jalan Taylor's, 47500 Subang Jaya, Selangor, Malaysia ' Multimedia University, Jalan Ayer Keroh Lama, 75450 Bukit Beruang, Melaka, Malaysia

Abstract: The purpose of this paper is to design an efficient recurrent neural network (RNN)-based speech recognition system using software with long short-term memory (LSTM). The design process involves speech acquisition, pre-processing, feature extraction, training and pattern recognition tasks for a spoken sentence recognition system using LSTM-RNN. There are five layers namely, an input layer, a fully connected layer, a hidden LSTM layer, SoftMax layer and a sequential output layer. A vocabulary of 80 words which constitute 20 sentences is used. The depth of the layer is chosen as 20, 42 and 60 and the accuracy of each system is determined. The results reveal that the maximum accuracy of 89% is achieved when the depth of the hidden layer is 42. Since the depth of the hidden layer is fixed for a task, increased performance can be achieved by increasing the number of hidden layers.

Keywords: speech recognition; feature extraction; pre-processing; recurrent neural network; RNN; long short-term memory; LSTM; hidden layer; MATLAB.

DOI: 10.1504/IJIE.2020.104645

International Journal of Intelligent Enterprise, 2020 Vol.7 No.1/2/3, pp.56 - 66

Received: 29 Jun 2018
Accepted: 25 Sep 2018
Published online: 27 Jan 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Recurrent neural network-based speech recognition using MATLAB

Keep up-to-date