Electrical, Information Engineering and Mechatronics 2011 pp 1763-1770 | Cite as
Online Actor-Critic Learning for Motion Control of Non-holonomic Mobile Robot
This paper presents a control structure designed for non-holonomic mobile robots by an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost. The algorithm learns online in real-time to the solution of Hamilton–Jacobi–Bellman (HJB) equation which has been used for optimal control design. This method finds in real-time suitable approximations of both the optimal cost and control policy, while also guaranteeing closed-loop stability, which implemented as an actor/critic structure involves simultaneous continuous-time adaptation of both actor and critic neural networks (NNs). Simulation examples show the effectiveness of the new algorithm.
KeywordsLyapunov stability Non-holonomic mobile robots Neural networks Actor-critic algorithms Hamilton–Jacobi–Bellman equation
This work was supported by the NSFC (60727002, 60774003, 60921001, 90916024), the MOE (20030006003), the COSTIND A2120061303) and the National 973 Program (2005CB321902).
- 1.Kolmanovsky I, McClamroch NH (1995) Developments in non-holonomic control problems. IEEE Cont Syst Mag 15(6):20–36Google Scholar
- 2.Alexander JC, Maddocks JH (1989) On the kinematics of wheeled mobile robots. Int J Robot Res 8(5):15–27Google Scholar
- 3.Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791Google Scholar
- 4.Murray J, Cox C, Lendaris G, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern 32(2):140–153Google Scholar
- 5.Si J, Barto A, Powel W, Wunsch D (2004) Handbook of learning and approximate dynamic programming. Wiley, HobokenGoogle Scholar
- 6.Ioannou P, Fidan B (2006) Advances in design and control, adaptive control tutorial. SIAM, PAGoogle Scholar
- 7.Murray RM, Sastry SS (1993) Non-holonomic motion planning: steering using sinusoids. IEEE Trans Automat Cont 38:700–716Google Scholar
- 8.Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46:878–888Google Scholar