Article: Adversarial attack model based on deep neural network interpretability and artificial fish swarm algorithm Journal: International Journal of Electronic Security and Digital Forensics (IJESDF) 2024 Vol.16 No.5 pp.614 - 632 Abstract: In order to solve the problem of model information leakage caused by the interpretability in deep neural network (DNN), the feasibility of using the Grad-CAM interpretation method to generate admissible samples in the white box environment is proved, and a target-free black box attack algorithm is proposed. The new algorithm first improves the fitness function according to the relation between the interpretation region and the position of disturbed pixel. Then, the artificial fish swarm algorithm is improved to continuously reduce the disturbance value and increase the number of disturbance pixels. The improved artificial fish swarm algorithm uses the strategies of calculating mass and acceleration in gravity search to adjust the visual field and step size of artificial fish, so as to improve the adaptive ability of artificial fish swarm algorithm in the optimisation process. In the experimental part, the average attack success rate of the proposed algorithm in AlexNet, VGG-19, ResNet-50 and SqueezeNet models is 93.91% on average. Compared to the one pixel algorithm, the running time increases by 10%, but the success rate increases by 16.64%. The results show that the artificial fish swarm algorithm based on interpretation method can effectively carry out adversarial attack. Inderscience Publishers - linking academia, business and industry through research

Title: Adversarial attack model based on deep neural network interpretability and artificial fish swarm algorithm

Authors: Yamin Li

Addresses: School of Electronic and Electrical Engineering, Zhengzhou University of Science and Technology, Zhengzhou 450064, China

Abstract: In order to solve the problem of model information leakage caused by the interpretability in deep neural network (DNN), the feasibility of using the Grad-CAM interpretation method to generate admissible samples in the white box environment is proved, and a target-free black box attack algorithm is proposed. The new algorithm first improves the fitness function according to the relation between the interpretation region and the position of disturbed pixel. Then, the artificial fish swarm algorithm is improved to continuously reduce the disturbance value and increase the number of disturbance pixels. The improved artificial fish swarm algorithm uses the strategies of calculating mass and acceleration in gravity search to adjust the visual field and step size of artificial fish, so as to improve the adaptive ability of artificial fish swarm algorithm in the optimisation process. In the experimental part, the average attack success rate of the proposed algorithm in AlexNet, VGG-19, ResNet-50 and SqueezeNet models is 93.91% on average. Compared to the one pixel algorithm, the running time increases by 10%, but the success rate increases by 16.64%. The results show that the artificial fish swarm algorithm based on interpretation method can effectively carry out adversarial attack.

Keywords: adversarial attack model; deep neural network interpretability; artificial fish swarm; gradient-weighted class activation mapping; Grad-CAM.

DOI: 10.1504/IJESDF.2024.140749

International Journal of Electronic Security and Digital Forensics, 2024 Vol.16 No.5, pp.614 - 632

Received: 22 Jan 2023
Accepted: 20 Apr 2023
Published online: 02 Sep 2024 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Adversarial attack model based on deep neural network interpretability and artificial fish swarm algorithm

Keep up-to-date