Authors: Kunal Karda; Namit Dubey; Abhas Kanungo; Varun Gupta
Addresses: Department of Computer Science and Engineering, Acropolis Technical Campus, Indore, India ' Department of Computer Science and Engineering, Acropolis Technical Campus, Indore, India ' Department of Electronics and Instrumentation Engineering, KIET Group of Institutions, Delhi – NCR, Ghaziabad – 201206, UP, India ' Department of Electronics and Instrumentation Engineering, KIET Group of Institutions, Delhi – NCR, Ghaziabad – 201206, UP, India
Abstract: The actor-critic models are generally prone to overestimation of sub-optimal policies and Q-values. Our proposed approach is established on value-based deep reinforcement learning algorithm also known as twin delayed deep deterministic policy gradient algorithm or TD3. The suggested approach is used to solve complex reinforcement learning problem like half-humanoid robot, ant, and half-cheetah to cover a path. This problem can only be solved with an algorithm which can work on continuous-action spaces, without much delaying the result to propagate during the inference of model. The proposed model has been adapted to converge faster to optimal Q-values. The TD3 uses two deep neural networks for learning two Q-values, viz., Q1 and Q2; in the proposed approach the Q-values average is being taken as an input for final Q-value unlike the other reinforcement learning algorithm such as DDPG which is prone to overestimate the Q-values. The proposed approach has also made self-adjusting noise clipping function, which make it harder for the policy to exploit Q-function errors to further improve performance.
Keywords: TD3; Q-values; deep neural networks; half-humanoid robot; ant; half-cheetah; reinforcement learning.
International Journal of Applied Pattern Recognition, 2022 Vol.7 No.1, pp.15 - 23
Received: 28 Sep 2020
Accepted: 02 Jul 2021
Published online: 14 Apr 2022 *