Automatic speaker verification system using three dimensional static and contextual variation-based features with two dimensional convolutional neural network Online publication date: Fri, 29-Oct-2021
by Aakshi Mittal; Mohit Dua
International Journal of Swarm Intelligence (IJSI), Vol. 6, No. 2, 2021
Abstract: Automatic speaker verification (ASV) systems are being used as potential alternatives for authentication in security systems. This paper discusses the development of an ASV system trained by logical access (LA) and physical access (PA) sets of ASVspoof 2019 dataset. ASV systems have two parts frontend and backend. The frontend part of the proposed system includes the extraction of 30 static, 30 first orders and 30 second order constant Q cepstral coefficients (CQCC) features from each frame of an audio. These features are reshaped in three dimensional (3D) tensors of two dimensional (2D) slices with the chosen fix number of frames. A two dimensional convolutional neural network (2D CNN) is trained in the backend with these features. The proposed system achieves 0.055 equal error rate (EER) and 0.101 tandem detection cost function (tDCF) for LA set and 0.062 EER and 0.122 tDCF for the PA set of the taken dataset.
Online publication date: Fri, 29-Oct-2021
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Swarm Intelligence (IJSI):
Login with your Inderscience username and password:
Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.
If you still need assistance, please email firstname.lastname@example.org