Title: Multi-agent reinforcement learning based on self-satisfaction in sparse reward scenarios

Authors: Baofu Fang; Dandan Tang; Zaijun Wang; Hao Wang

Addresses: School of Computer Science and Information Engineering, Hefei University of Technology, No. 193 Tunxi Road, Baohe District, Hefei City, China ' School of Computer Science and Information Engineering, Hefei University of Technology, No. 193 Tunxi Road, Baohe District, Hefei City, China ' CAAC Academy of Flight Technology and Safety Civil Aviation Flight University of China, Guanhan, 618307, China; Key Laboratory of Flight Techniques and Flight Safety, CAAC, Guanghan, Sichuan, 618307, China ' School of Computer Science and Information Engineering, Hefei University of Technology, No. 193 Tunxi Road, Baohe District, Hefei City, China

Abstract: To solve the problem of sparse reward in reinforcement learning in multi-agent environments, this paper proposes an emotional model of self-satisfaction based on the role of human emotions in decision-making. The self-satisfaction emotional model composed of thirst for knowledge, recognition, and psychological gap is used as the internal motivation, and an internal emotional reward is generated as an effective supplement to the external reward, to alleviate the problem of the sparse reward. Based on this model, a self-satisfaction-based multi-agent reinforcement learning algorithm is proposed to speed up the convergence speed of the agent. Compared with the baseline algorithms in multi-agent pursuit scenarios, our algorithm can converge to the best strategy and has fast convergence speed. In addition, the success rate of the algorithm in partially observable scene is increased by about 20%, and the required time step is reduced by about 25%. Experiments show our algorithm is effective and robust.

Keywords: reinforcement learning; sparse reward; self-satisfaction; internal emotional reward.

DOI: 10.1504/IJBIC.2025.143667

International Journal of Bio-Inspired Computation, 2025 Vol.25 No.1, pp.56 - 67

Received: 19 Jul 2022
Accepted: 09 Jun 2023

Published online: 03 Jan 2025 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article