Title: Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning

Authors: Shady A. Maged; Bishoy H. Mikhail

Addresses: Mechatronics Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt ' Mechatronics Engineering Department, Faculty of Engineering, Ain Shams University, Cairo, Egypt

Abstract: Usage of trust region policy optimisation (TRPO) and proximal policy optimisation (PPO) 'children of policy gradient optimisation method' and deep Q-learning network (DQN) in Lidar-based differential robots are proposed using Turtlebot and OpenAI's baselines optimisation methods. The simulation results proved that the three algorithms are ideal for obstacle avoidance and robot navigation with the utter advantage for TRPO and PPO in complex environments. The used policies can be used in a fully decentralised manner as the learned policy is not constrained by any robot parameters or communication protocols.

Keywords: robot operating system; ROS; robotics; reinforcement learning; deep learning; deep Q-learning; trust region optimisation; proximal policy optimisation; PPO; trust region policy optimisation; TRPO; deep Q-learning network; DQN; Q-learning; autonomous; differential robot; obstacle avoidance; navigation; tensorflow.

DOI: 10.1504/IJCVR.2020.107253

International Journal of Computational Vision and Robotics, 2020 Vol.10 No.3, pp.260 - 274

Received: 02 Apr 2019
Accepted: 01 Jun 2019

Published online: 11 May 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article