Article: Research on x-vector speaker recognition algorithm based on Kaldi Journal: International Journal of Computing Science and Mathematics (IJCSM) 2022 Vol.15 No.3 pp.199 - 212 Abstract: This paper presents a convolutional neural network with an attention mechanism for analysing the spectrogram in an x-vector based speaker recognition system. First, the convolutional neural network (CNN) is used to extract the features of the spectrogram. Then, an attention mechanism is designed to calculate the frame weight in the statistical pooling layer. Finally, probability linear discriminant analysis (PLDA) is used as a back end classifier. The system is implemented using Kaldi speech recognition tools and tests on the Voxceleb1 database. The experimental results show that the combination of spectrogram and CNN gains a relative improvement of 6.7% in equal error rate (EER) compared with the x-vector baseline system. The attention mechanism for the statistical layer further leads to a relative improvement of 26.1%. Overall the proposed method outperforms state-of-the-art methods on the Voxceleb1 database. Inderscience Publishers - linking academia, business and industry through research

Title: Research on x-vector speaker recognition algorithm based on Kaldi

Authors: Hong Zhao; Lupeng Yue; Weijie Wang; Xiangyan Zeng

Addresses: School of Computer Science, Lanzhou University of Technology, Gansu, Lanzhou, 730050, China ' School of Computer Science, Lanzhou University of Technology, Gansu, Lanzhou, 730050, China ' School of Computer Science, Lanzhou University of Technology, Gansu, Lanzhou, 730050, China ' Department of Mathematics and Computer Science, Fort Valley State University, Fort Valley, GA, 31030, Georgia, USA

Abstract: This paper presents a convolutional neural network with an attention mechanism for analysing the spectrogram in an x-vector based speaker recognition system. First, the convolutional neural network (CNN) is used to extract the features of the spectrogram. Then, an attention mechanism is designed to calculate the frame weight in the statistical pooling layer. Finally, probability linear discriminant analysis (PLDA) is used as a back end classifier. The system is implemented using Kaldi speech recognition tools and tests on the Voxceleb1 database. The experimental results show that the combination of spectrogram and CNN gains a relative improvement of 6.7% in equal error rate (EER) compared with the x-vector baseline system. The attention mechanism for the statistical layer further leads to a relative improvement of 26.1%. Overall the proposed method outperforms state-of-the-art methods on the Voxceleb1 database.

Keywords: spectrogram; attention mechanism; x-vector; speaker recognition; Kaldi.

DOI: 10.1504/IJCSM.2022.124725

International Journal of Computing Science and Mathematics, 2022 Vol.15 No.3, pp.199 - 212

Received: 20 Jun 2021
Accepted: 28 Sep 2021
Published online: 08 Aug 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Research on x-vector speaker recognition algorithm based on Kaldi

Keep up-to-date