Online Actor-Critic Learning for Motion Control of Non-holonomic Mobile Robot

Conference paper
Part of the Lecture Notes in Electrical Engineering book series (LNEE, volume 138)


This paper presents a control structure designed for non-holonomic mobile robots by an online algorithm based on policy iteration for learning the continuous-time (CT) optimal control solution with infinite horizon cost. The algorithm learns online in real-time to the solution of Hamilton–Jacobi–Bellman (HJB) equation which has been used for optimal control design. This method finds in real-time suitable approximations of both the optimal cost and control policy, while also guaranteeing closed-loop stability, which implemented as an actor/critic structure involves simultaneous continuous-time adaptation of both actor and critic neural networks (NNs). Simulation examples show the effectiveness of the new algorithm.


Lyapunov stability Non-holonomic mobile robots Neural networks Actor-critic algorithms Hamilton–Jacobi–Bellman equation 



This work was supported by the NSFC (60727002, 60774003, 60921001, 90916024), the MOE (20030006003), the COSTIND A2120061303) and the National 973 Program (2005CB321902).


  1. 1.
    Kolmanovsky I, McClamroch NH (1995) Developments in non-holonomic control problems. IEEE Cont Syst Mag 15(6):20–36Google Scholar
  2. 2.
    Alexander JC, Maddocks JH (1989) On the kinematics of wheeled mobile robots. Int J Robot Res 8(5):15–27Google Scholar
  3. 3.
    Abu-Khalaf M, Lewis FL (2005) Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica 41(5):779–791Google Scholar
  4. 4.
    Murray J, Cox C, Lendaris G, Saeks R (2002) Adaptive dynamic programming. IEEE Trans Syst Man Cybern 32(2):140–153Google Scholar
  5. 5.
    Si J, Barto A, Powel W, Wunsch D (2004) Handbook of learning and approximate dynamic programming. Wiley, HobokenGoogle Scholar
  6. 6.
    Ioannou P, Fidan B (2006) Advances in design and control, adaptive control tutorial. SIAM, PAGoogle Scholar
  7. 7.
    Murray RM, Sastry SS (1993) Non-holonomic motion planning: steering using sinusoids. IEEE Trans Automat Cont 38:700–716Google Scholar
  8. 8.
    Vamvoudakis KG, Lewis FL (2010) Online actor-critic algorithm to solve the continuous-time infinite horizon optimal control problem. Automatica 46:878–888Google Scholar

Copyright information

© Springer-Verlag London Limited  2012

Authors and Affiliations

  1. 1.The Seventh Research Division and the Department of Systems and ControlBeihang University (BUAA)BeijingChina
  2. 2.Key Laboratory of Mathematics, Informatics and Behavioral Semantics (LMIB), Ministry of Education, SMSSBeihang University (BUAA)BeijingChina
  3. 3.Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, School of Computer Science and TechnologyBeijing University of Posts and TelecommunicationsBeijingChina
  4. 4.School of Electrical Engineering and AutomationHenan Polytechnic UniversityJiaozuoChina

Personalised recommendations