Title: Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning
Authors: Shady A. Maged; Bishoy H. Mikhail
Addresses: Mechatronics Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt ' Mechatronics Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt
Abstract: Usage of trust region policy optimisation (TRPO) and proximal policy optimisation (PPO) 'children of policy gradient optimisation method' and deep Q-learning network (DQN) in Lidar-based differential robots are proposed using Turtlebot and OpenAI's baselines optimisation methods. The simulation results proved that the three algorithms are ideal for obstacle avoidance and robot navigation with the utter advantage for TRPO and PPO in complex environments. The used policies can be used in a fully decentralised manner as the learned policy is not constrained by any robot parameters or communication protocols.
Keywords: robot operating system; ROS; robotics; reinforcement learning; deep learning; deep Q-learning; trust region optimisation; proximal policy optimisation; PPO; trust region policy optimisation; TRPO; deep Q-learning network; DQN; Q-learning; autonomous; differential robot; obstacle avoidance; navigation; tensorflow.
International Journal of Computational Vision and Robotics, 2020 Vol.10 No.3, pp.260 - 274
Received: 02 Apr 2019
Accepted: 01 Jun 2019
Published online: 28 Apr 2020 *