Article: Application of Q-learning based on adaptive greedy considering negative rewards in football match system Journal: International Journal of Wireless and Mobile Computing (IJWMC) 2019 Vol.16 No.3 pp.233 - 240 Abstract: Aiming at the problem that the multi-robot task allocation method in soccer system is easy to fall into the problem of local optimal solution and real-time performance, a new multi-robot task allocation method is proposed. First of all, in order to improve the speed and efficiency of finding optimal actions and make better use of the disadvantages that traditional Q-learning can't often propagate negative values, we propose a new way to propagate negative values, that is, Q-learning methods based on negative rewards. Next, in order to adapt to the dynamic external environment, an adaptive ε greedy method of which the mode of operation is judged by the ε value is proposed. This method is based on the classical ε-greedy. In the process of solving problems, ε can be adaptively changed as needed for a better balance of exploration and exploitation in reinforcement learning. Finally, we apply this method to the robot's football game system. It has been experimentally proven that dangerous actions can be avoided effectively by the Q-learning method which can spread negative rewards. The adaptive ε-greedy strategy can be used to adapt to the external environment better and faster so as to improve the speed of convergence. Inderscience Publishers - linking academia, business and industry through research

Title: Application of Q-learning based on adaptive greedy considering negative rewards in football match system

Authors: Fei Xue; Juntao Li; Ruiping Yuan; Tao Liu; Tingting Dong

Addresses: School of Information, Beijing Wuzi University, Beijing, 101149, China ' School of Information, Beijing Wuzi University, Beijing, 101149, China ' School of Information, Beijing Wuzi University, Beijing, 101149, China ' School of Information, Beijing Wuzi University, Beijing, 101149, China ' College of Computer Science and Technology, Beijing University of Technology, Beijing, 100124, China

Abstract: Aiming at the problem that the multi-robot task allocation method in soccer system is easy to fall into the problem of local optimal solution and real-time performance, a new multi-robot task allocation method is proposed. First of all, in order to improve the speed and efficiency of finding optimal actions and make better use of the disadvantages that traditional Q-learning can't often propagate negative values, we propose a new way to propagate negative values, that is, Q-learning methods based on negative rewards. Next, in order to adapt to the dynamic external environment, an adaptive ε greedy method of which the mode of operation is judged by the ε value is proposed. This method is based on the classical ε-greedy. In the process of solving problems, ε can be adaptively changed as needed for a better balance of exploration and exploitation in reinforcement learning. Finally, we apply this method to the robot's football game system. It has been experimentally proven that dangerous actions can be avoided effectively by the Q-learning method which can spread negative rewards. The adaptive ε-greedy strategy can be used to adapt to the external environment better and faster so as to improve the speed of convergence.

Keywords: task assignment; Q-learning; negative reward; ε algorithm; adaptive technology.

DOI: 10.1504/IJWMC.2019.099860

International Journal of Wireless and Mobile Computing, 2019 Vol.16 No.3, pp.233 - 240

Received: 24 Jul 2018
Accepted: 18 Sep 2018
Published online: 24 May 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Application of Q-learning based on adaptive greedy considering negative rewards in football match system

Keep up-to-date