Article: Convergence and numerical stability of action-dependent heuristic dynamic programming algorithms based on RLS learning for online DLQR optimal control Journal: International Journal of Computational Science and Engineering (IJCSE) 2019 Vol.20 No.3 pp.317 - 334 Abstract: The development and the numerical stability analysis of a novel algorithm of approximate dynamic programming (ADP) based on RLS learning for approximating the optimal control solution online in real-time are the main issues of this paper. The approximate dynamic programming is a method developed to make possible the use of dynamic programming techniques in real-time, but this method has a reasonable mathematical complexity due to the size of the internal matrices of the algorithm and the need for inversion of some of them. Thus, focusing on improving numerical stability and computational cost of ADP algorithms, more specifically in the action-dependent heuristic dynamic programming and optimal control context, UDUT-type unitary transformations are integrated in actor-critic architectures, which produce algorithms with better specifications for implementation in real-world optimal control systems. The control and stabilisation of the inverted pendulum system on a motor driven cart is established as a study platform to evaluate the convergence and numerical stability for the estimated parameters of the proposed algorithm. Inderscience Publishers - linking academia, business and industry through research

Title: Convergence and numerical stability of action-dependent heuristic dynamic programming algorithms based on RLS learning for online DLQR optimal control

Authors: Guilherme Bonfim De Sousa; Patrícia Helena Moraes Rêgo

Addresses: Master Program in Computer Engineering and Systems, State University of Maranhão, São Luís, MA, Brazil ' Mathematics and Computing Department, State University of Maranhão, São Luís, MA, Brazil

Abstract: The development and the numerical stability analysis of a novel algorithm of approximate dynamic programming (ADP) based on RLS learning for approximating the optimal control solution online in real-time are the main issues of this paper. The approximate dynamic programming is a method developed to make possible the use of dynamic programming techniques in real-time, but this method has a reasonable mathematical complexity due to the size of the internal matrices of the algorithm and the need for inversion of some of them. Thus, focusing on improving numerical stability and computational cost of ADP algorithms, more specifically in the action-dependent heuristic dynamic programming and optimal control context, UDU^T-type unitary transformations are integrated in actor-critic architectures, which produce algorithms with better specifications for implementation in real-world optimal control systems. The control and stabilisation of the inverted pendulum system on a motor driven cart is established as a study platform to evaluate the convergence and numerical stability for the estimated parameters of the proposed algorithm.

Keywords: action-dependent heuristic dynamic programming; discrete linear quadratic regulator; recursive least-squares; numerical stability.

DOI: 10.1504/IJCSE.2019.103962

International Journal of Computational Science and Engineering, 2019 Vol.20 No.3, pp.317 - 334

Received: 23 Dec 2017
Accepted: 16 Dec 2018
Published online: 03 Dec 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Convergence and numerical stability of action-dependent heuristic dynamic programming algorithms based on RLS learning for online DLQR optimal control

Keep up-to-date