Deep reinforcement learning collision avoidance using policy gradient optimisation and Q-learning
by Shady A. Maged; Bishoy H. Mikhail
International Journal of Computational Vision and Robotics (IJCVR), Vol. 10, No. 3, 2020

Abstract: Usage of trust region policy optimisation (TRPO) and proximal policy optimisation (PPO) 'children of policy gradient optimisation method' and deep Q-learning network (DQN) in Lidar-based differential robots are proposed using Turtlebot and OpenAI's baselines optimisation methods. The simulation results proved that the three algorithms are ideal for obstacle avoidance and robot navigation with the utter advantage for TRPO and PPO in complex environments. The used policies can be used in a fully decentralised manner as the learned policy is not constrained by any robot parameters or communication protocols.

Online publication date: Mon, 11-May-2020

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Computational Vision and Robotics (IJCVR):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com