Article: Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network Journal: International Journal of Swarm Intelligence (IJSI) 2021 Vol.6 No.2 pp.143 - 153 Abstract: Automatic speaker verification (ASV) systems are being used as potential alternatives for authentication in security systems. This paper discusses the development of an ASV system trained by logical access (LA) and physical access (PA) sets of ASVspoof 2019 dataset. ASV systems have two parts frontend and backend. The frontend part of the proposed system includes the extraction of 30 static, 30 first orders and 30 second order constant Q cepstral coefficients (CQCC) features from each frame of an audio. These features are reshaped in three dimensional (3D) tensors of two dimensional (2D) slices with the chosen fix number of frames. A two dimensional convolutional neural network (2D CNN) is trained in the backend with these features. The proposed system achieves 0.055 equal error rate (EER) and 0.101 tandem detection cost function (tDCF) for LA set and 0.062 EER and 0.122 tDCF for the PA set of the taken dataset. Inderscience Publishers - linking academia, business and industry through research

Title: Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network

Authors: Aakshi Mittal; Mohit Dua

Addresses: Department of Computer Engineering, National Institute of Technology Kurukshetra, Haryana, India ' Department of Computer Engineering, National Institute of Technology Kurukshetra, Haryana, India

Abstract: Automatic speaker verification (ASV) systems are being used as potential alternatives for authentication in security systems. This paper discusses the development of an ASV system trained by logical access (LA) and physical access (PA) sets of ASVspoof 2019 dataset. ASV systems have two parts frontend and backend. The frontend part of the proposed system includes the extraction of 30 static, 30 first orders and 30 second order constant Q cepstral coefficients (CQCC) features from each frame of an audio. These features are reshaped in three dimensional (3D) tensors of two dimensional (2D) slices with the chosen fix number of frames. A two dimensional convolutional neural network (2D CNN) is trained in the backend with these features. The proposed system achieves 0.055 equal error rate (EER) and 0.101 tandem detection cost function (tDCF) for LA set and 0.062 EER and 0.122 tDCF for the PA set of the taken dataset.

Keywords: contextual variation; three dimensional features; CQCC features; 2D CNN.

DOI: 10.1504/IJSI.2021.118608

International Journal of Swarm Intelligence, 2021 Vol.6 No.2, pp.143 - 153

Received: 27 Jun 2020
Accepted: 27 Nov 2020
Published online: 29 Oct 2021 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network

Keep up-to-date