Title: Defending against audio adversarial examples based on multiple-sub-detectors

Authors: Keiichi Tamura; Hajime Ito

Addresses: Graduate School of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-Higashi, Asa-Minami-Ku, Hiroshima, 731-3194, Japan ' Faculty of Information Sciences, Hiroshima City University, 3-4-1, Ozuka-Higashi, Asa-Minami-Ku, Hiroshima, 731-3194, Japan

Abstract: Audio adversarial examples are audio input that can deceive the speech-to-text (STT) transcription neural network technology. In this paper, we propose a new defence method based on the decision-making of multiple sub-detectors to ensure security against adversarial examples for STT transcription neural networks. This method utilises a detector for defending audio adversarial examples consisting of three different sub-detectors. Then the results of the three sub-detectors are used to make the final decision, which makes the model more robust and accurate than a single detection method. Experiments were conducted to evaluate the method using 2,000 voice sound data, including 1,000 normal voice sound data and 1,000 audio adversarial examples, created on Mozilla-implementation DeepSpeech. The results of experiments to detect audio adversarial examples confirmed that the proposed method offers better performance in protection than our previous method.

Keywords: adversarial example; speech-to-text; STT; audio adversarial example; computer security; decision making.

DOI: 10.1504/IJCISTUDIES.2022.129025

International Journal of Computational Intelligence Studies, 2022 Vol.11 No.3/4, pp.253 - 278

Received: 31 Mar 2022
Accepted: 16 Nov 2022

Published online: 14 Feb 2023 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article