Advertisement

Inverse reinforcement learning control for trajectory tracking of a multirotor UAV

  • Seungwon Choi
  • Suseong Kim
  • H. Jin Kim
Regular Papers Robot and Applications

Abstract

The main purpose of this paper is to learn the control performance of an expert by imitating the demonstrations of a multirotor UAV (unmanned aerial vehicle) operated by an expert pilot. First, we collect a set of several demonstrations by an expert for a certain task which we want to learn. We extract a representative trajectory from the dataset. Here, the representative trajectory includes a sequence of state and input. The trajectory is obtained using hidden Markov model (HMM) and dynamic time warping (DTW). In the next step, the multirotor learns to track the trajectory for imitation. Although we have data of feed-forward input for each time sequence, using this input directly can deteriorate the stability of the multirotor due to insufficient data for generalization and numerical issues. For that reason, a controller is needed which generates the input command for the suitable flight maneuver. To design such a controller, we learn the hidden reward function of a quadratic form from the demonstrated flights using inverse reinforcement learning. After we find the optimal reward function that minimizes the trajectory tracking error, we design a reinforcement learning based controller using this reward function. The simulation and experiment applied to a multirotor UAV show successful imitation results.

Keywords

Inverse reinforcement learning learning from demonstration multirotor control particle swarm optimization 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    D. Lee, H. Jin Kim, and S. Sastry, “Feedback linearization vs. adaptive sliding mode control for a quadrotor helicopter,” International Journal of Control, Automation and Systems, vol. 7, no. 3, pp. 419–428, 2009. [click]CrossRefGoogle Scholar
  2. [2]
    A. P. Schoellig, F. L. Mueller, and R. D’Andrea, “Optimization-based iterative learning for precise quadrocopter trajectory tracking,” Autonomous Robots, vol. 33, no. 1-2, pp. 103–127, 2012. [click]CrossRefGoogle Scholar
  3. [3]
    A. P. Schoellig, C. Wiltsche, and R. D’Andrea, “Feedforward parameter identification for precise periodic quadrocopter motions” Proc. of American Control Conference (ACC), pp. 4313–4318 2012.Google Scholar
  4. [4]
    D. Mellinger, N. Michael, and V. Kumar, “Trajectory generation and control for precise aggressive maneuvers with quadrotors,” The International Journal of Robotics Research, vol. 31, no. 5, pp. 664–674, 2012. [click]CrossRefGoogle Scholar
  5. [5]
    S. Lupashin, A. Schollig, M. Sherback, and R. D’Andrea, “A simple learning strategy for high-speed quadrocopter multi-flips” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1642–1648 2010. [click]Google Scholar
  6. [6]
    M. Hammer, M. Waibel, and R. D’Andrea, “Knowledge transfer for high-performance quadrocopter maneuvers” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1714–1719 2013. [click]Google Scholar
  7. [7]
    T. Tomic, M. Maier, and S. Haddadin, “Learning quadrotor maneuvers from optimal control and generalizing in realtime” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1747–1754 2014. [click]Google Scholar
  8. [8]
    M. Deisenroth and C. E. Rasmussen, “PILCO: A modelbased and data-efficient approach to policy search” Proc. of the 28th International Conference on Machine Learning (ICML), pp. 465–472 2011.Google Scholar
  9. [9]
    S. Levine and P. Abbeel, “Learning neural network policies with guided policy search under unknown dynamics” Advances in Neural Information Processing Systems (NIPS), pp. 1071–1079 2014.Google Scholar
  10. [10]
    X. Bu, Z. Hou, and F. Yu, “Stability of first and high order iterative learning control with data dropouts,” International Journal of Control, Automation and Systems, vol. 9, no. 5, pp. 843–849, 2011. [click]CrossRefGoogle Scholar
  11. [11]
    X. Bu, Z. Hou, S. Jin, and R. Chi, “An iterative learning control design approach for networked control systems with data dropouts,” International Journal of Robust and Nonlinear Control, vol. 26, no. 1, pp. 91–109, 2016. [click]MathSciNetCrossRefzbMATHGoogle Scholar
  12. [12]
    B. D. Argall, S. Chernova, M. Veloso, and B. Browning, “A survey of robot learning from demonstration,” Robotics and Autonomous Systems, vol. 57, no. 5, pp. 469–483, 2009. [click]CrossRefGoogle Scholar
  13. [13]
    P. Abbeel, A. Coates, and A. Y. Ng, “Autonomous helicopter aerobatics through apprenticeship learning,” The International Journal of Robotics Research, vol. 29, no. 13, pp. 1608–1639, 2010.CrossRefGoogle Scholar
  14. [14]
    S. Calinon, F. Guenter, and A. Billard, “On learning, representing, and generalizing a task in a humanoid robot,” IEEE Trans. on Systems, Man, and Cybernetics, Part B: Cybernetics, vol. 37, no. 2, pp. 286–298, 2007. [click]CrossRefGoogle Scholar
  15. [15]
    D. Korkinof and Y. Demiris, “Online quantum mixture regression for trajectory learning by demonstration” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3222–3229 2013.Google Scholar
  16. [16]
    W. Yang and N. Y. Chong, “Imitation learning of humanoid locomotion using the direction of landing foot,” International Journal of Control, Automation and Systems, vol. 7, no. 4, pp. 585–597, 2009. [click]CrossRefGoogle Scholar
  17. [17]
    J. D. Sweeney and R. Grupen, “A model of shared grasp affordances from demonstration” Proc. of 7th IEEE-RAS International Conference on Humanoid Robots, pp. 27–35 2007.Google Scholar
  18. [18]
    B. Browning, L. Xu, and M. Veloso, “Skill acquisition and use for a dynamically-balancing soccer robot” The Association for the Advancement of Artificial Intelligence (AAAI), pp. 599–604 2004.Google Scholar
  19. [19]
    C. G. Atkeson and S. Schaal, “Robot learning from demonstration,” The International Conference on Machine Learning (ICML), vol. 97, pp. 12–20, 1997.Google Scholar
  20. [20]
    A. K. Tanwani and A. Billard, “Transfer in inverse reinforcement learning for multiple strategies” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3244–3250 2013.Google Scholar
  21. [21]
    M. S. Malekzadeh, D. Bruno, S. Calinon, T. Nanayakkara, and D. G. Caldwell, “Skills transfer across dissimilar robots by learning context-dependent rewards” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1746–1751 2013. [click]Google Scholar
  22. [22]
    J. Z. Kolter, P. Abbeel, and A. Y. Ng, “Hierarchical apprenticeship learning with application to quadruped locomotion” Advances in Neural Information Processing Systems (NIPS), pp. 769–776 2007.Google Scholar
  23. [23]
    P. Abbeel and A. Y. Ng, “Apprenticeship learning via inverse reinforcement learning,” Proc. of the 21st International Conference on Machine Learning (ICML), pp. 1, 2004.Google Scholar
  24. [24]
    M. Kalakrishnan, P. Pastor, L. Righetti, and S. Schaal, “Learning objective functions for manipulation” Proc. of IEEE International Conference on Robotics and Automation (ICRA), pp. 1331–1336 2013. [click]Google Scholar
  25. [25]
    A. Boularias, J. Kober, and J. Peters, “Relative entropy inverse reinforcement learning” Proc. of International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 182–189 2011.Google Scholar
  26. [26]
    N. Aghasadeghi and T. Bretl, “Maximum entropy inverse reinforcement learning in continuous state spaces with path integrals” Proc. of IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 1561–1566 2011. [click]Google Scholar
  27. [27]
    M. Wulfmeier, P. Ondruska, and I. Posner, “Maximum Entropy Deep Inverse Reinforcement Learning,” arXiv:1507.04888, 2015.Google Scholar
  28. [28]
    J. Kennedy, “Particle swarm optimization,” Encyclopedia of Machine Learning, pp. 760–766, Springer US, 2010.Google Scholar
  29. [29]
    F. L. Lewis, D. Vrabie, and K. G. Vamvoudakis, “Reinforcement learning and feedback control: Using natural decision methods to design optimal adaptive controllers,” IEEE Control Systems, vol. 32, no. 6, pp. 76–105, 2012. [click]MathSciNetCrossRefGoogle Scholar

Copyright information

© Institute of Control, Robotics and Systems and The Korean Institute of Electrical Engineers and Springer-Verlag GmbH Germany 2017

Authors and Affiliations

  1. 1.Department of Mechanical and Aerospace engineeringSeoul National UniversitySeoulKorea

Personalised recommendations