Title: The optimisation of speech recognition based on convolutional neural network

Authors: Weipeng Jing; Tao Jiang; Xingge Zhang; Liangkuan Zhu

Addresses: College of Information and Computer Engineering, Northeast Forestry University, China; Heilongjiang Province Engineering Technology Research Center for Forestry Ecological Big Data Storage and High Performance (Cloud) Computing, Harbin, China ' College of Information and Computer Engineering, Northeast Forestry University, China; Heilongjiang Province Engineering Technology Research Center for Forestry Ecological Big Data Storage and High Performance (Cloud) Computing, Harbin, China ' Architecture Design Group of CPU, Suzhou PowerCore Technology Company, Jiangsu Province, Suzhou, China ' College of Electromechanical Engineering, Northeast Forestry University, Harbin, China

Abstract: The convolutional neural network (CNN) as acoustic model is introduced into speech recognition system based on mobile computing. To improve speech accuracy, two optimised methods are proposed in speech recognition based on CNN. Firstly, aiming at the problem for existing pooling algorithms ignoring locally relevant characteristics of the speech data, a dynamic adaptive pooling (DA-pooling) algorithm is proposed in pooling layer of CNN model. DA-pooling algorithm calculates the Spearman correlation coefficient of the extracted data to determine data correlation, then selects appropriate pool strategy for different correlativity of data according to weight. Secondly, in order to solve traditional dropout hiding neurons node randomly, a dropout strategy based on sparseness is proposed in full-connected layer in CNN model. By adding a unit sparseness determination mechanism in the output stage of network unit, we can reduce the ratio of influence of smaller units in the model results, thereby improving the generalisation ability of the model. Experimental results show that these strategies can improve the performance of the acoustic models based on CNN.

Keywords: convolutional neural network; CNN; speech recognition; dynamic adaptive pooling; DA-pooling; overfitting; sparseness.

DOI: 10.1504/IJHPCN.2019.097502

International Journal of High Performance Computing and Networking, 2019 Vol.13 No.2, pp.222 - 231

Received: 31 Jul 2016
Accepted: 12 Sep 2016

Published online: 25 Jan 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article