Encyclopedia of Machine Learning and Data Mining

2017 Edition
| Editors: Claude Sammut, Geoffrey I. Webb

Autonomous Helicopter Flight Using Reinforcement Learning

  • Adam Coates
  • Pieter Abbeel
  • Andrew Y. Ng
Reference work entry
DOI: https://doi.org/10.1007/978-1-4899-7687-1_16


Helicopter flight is a highly challenging control problem. While it is possible to obtain controllers for simple maneuvers (like hovering) by traditional manual design procedures, this approach is tedious and typically requires many hours of adjustments and flight testing, even for an experienced control engineer. For complex maneuvers, such as aerobatic routines, this approach is likely infeasible. In contrast,  reinforcement learning (RL) algorithms enable faster and more automated design of controllers. Model-based RL algorithms have been used successfully for autonomous helicopter flight for hovering, forward flight, and using apprenticeship learning methods for expert-level aerobatics. In model-based RL, the first one builds a model of the helicopter dynamics and specifies the task using a reward function. Then, given the model and the reward function, the RL algorithm finds a controller that maximizes the expected sum of rewards accumulated over time.

Motivation and...

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. Abbeel P, Coates A, Hunter T, Ng AY (2008) Autonomous autorotation of an rc helicopter. In: ISER 11, AthensGoogle Scholar
  2. Abbeel P, Coates A, Quigley M, Ng AY (2007) An application of reinforcement learning to aerobatic helicopter flight. In: NIPS 19, Vancouver, pp 1–8Google Scholar
  3. Abbeel P, Ganapathi V, Ng AY (2006) Learning vehicular dynamics with application to modeling helicopters. In: NIPS 18, VancouverGoogle Scholar
  4. Abbeel P, Ng AY (2004) Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the international conference on machine learning, Banff. ACM, New YorkCrossRefGoogle Scholar
  5. Abbeel P, Ng AY (2005a) Exploration and apprenticeship learning in reinforcement learning. In: Proceedings of the international conference on machine learning, Bonn. ACM, New YorkCrossRefGoogle Scholar
  6. Abbeel P, Ng AY (2005b) Learning first order Markov models for control. In: NIPS 18, VancouverGoogle Scholar
  7. Abbeel P, Quigley M, Ng AY (2006) Using inaccurate models in reinforcement learning. In: ICML ’06: proceedings of the 23rd international conference on machine learning, Pittsburgh. ACM, New York, pp 1–8CrossRefGoogle Scholar
  8. Anderson B, Moore J (1989) Optimal control: linear quadratic methods. Prentice-Hall, PrincetonGoogle Scholar
  9. Bagnell J, Schneider J (2001) Autonomous helicopter control using reinforcement learning policy search methods. In: International conference on robotics and automation, Seoul. IEEE, CanadaCrossRefGoogle Scholar
  10. Brafman RI, Tennenholtz M (2002) R-max, a general polynomial time algorithm for near-optimal reinforcement learning. J Mach Learn Res 3: 213–231MathSciNetzbMATHGoogle Scholar
  11. Coates A, Abbeel P, Ng AY (2008) Learning for control from multiple demonstrations. In: Proceedings of the 25th international conference on machine learning (ICML ’08), HelsinkiGoogle Scholar
  12. Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc 39(1):1–38MathSciNetzbMATHGoogle Scholar
  13. Dunbabin M, Brosnan S, Roberts J, Corke P (2004) Vibration isolation for autonomous helicopter flight. In: Proceedings of the IEEE international conference on robotics and automation, New Orleans, vol 4, pp 3609–3615Google Scholar
  14. Gavrilets V, Martinos I, Mettler B, Feron E (2002a) Control logic for automated aerobatic flight of miniature helicopter. In: AIAA guidance, navigation and control conference, Monterey. Massachusetts Institute of Technology, CambridgeCrossRefGoogle Scholar
  15. Gavrilets V, Martinos I, Mettler B, Feron E (2002b) Flight test and simulation results for an autonomous aerobatic helicopter. In: AIAA/IEEE digital avionics systems conference, IrvineCrossRefGoogle Scholar
  16. Gavrilets V, Mettler B, Feron E (2001) Nonlinear model for a small-size acrobatic helicopter. In: AIAA guidance, navigation and control conference, Montreal, pp 1593–1600Google Scholar
  17. Jacobson DH, Mayne DQ (1970) Differential dynamic programming. Elsevier, New YorkzbMATHGoogle Scholar
  18. Kakade S, Kearns M, Langford J (2003) Exploration in metric state spaces. In: Proceedings of the international conference on machine learning, Washington, DCGoogle Scholar
  19. Kearns M, Koller D (1999) Efficient reinforcement learning in factored MDPs. In: Proceedings of the 16th international joint conference on artificial intelligence, Stockholm. Morgan Kaufmann, San FranciscoGoogle Scholar
  20. Kearns M, Singh S (2002) Near-optimal reinforcement learning in polynomial time. Mach Learn J 49(2–3):209–232zbMATHCrossRefGoogle Scholar
  21. La Civita M (2003) Integrated modeling and robust control for full-envelope flight of robotic helicopters. PhD thesis, Carnegie Mellon University, PittsburghGoogle Scholar
  22. La Civita M, Papageorgiou G, Messner WC, Kanade T (2006) Design and flight testing of a high-bandwidth \(\mathcal{H}_{\infty }\) loop shaping controller for a robotic helicopter. J Guid Control Dyn 29(2):485–494CrossRefGoogle Scholar
  23. Leishman J (2000) Principles of helicopter aerodynamics. Cambridge University Press, CambridgeGoogle Scholar
  24. Nelder JA, Mead R (1964) A simplex method for function minimization. Comput J 7:308–313MathSciNetzbMATHCrossRefGoogle Scholar
  25. Ng AY, Coates A, Diel M, Ganapathi V, Schulte J, Tse B et al (2004) Autonomous inverted helicopter flight via reinforcement learning. In: International symposium on experimental robotics, Singapore. Springer, BerlinGoogle Scholar
  26. Ng AY, Jordan M (2000) Pegasus: a policy search method for large MDPs and POMDPs. In: Proceedings of the uncertainty in artificial intelligence 16th conference, Stanford. Morgan Kaufmann, San FranciscoGoogle Scholar
  27. Ng AY, Kim HJ, Jordan M, Sastry S (2004) Autonomous helicopter flight via reinforcement learning. In: NIPS 16, VancouverGoogle Scholar
  28. Ng AY, Russell S (2000) Algorithms for inverse reinforcement learning. In: Proceedings of the 17th international conference on machine learning, San Francisco. Morgan Kaufmann, San Francisco, pp 663–670Google Scholar
  29. Saripalli S, Montgomery JF, Sukhatme GS (2003) Visually-guided landing of an unmanned aerial vehicle. IEEE Trans Robot Auton Syst 19(3):371–380CrossRefGoogle Scholar
  30. Seddon J (1990) Basic helicopter aerodynamics. AIAA education series. America Institute of Aeronautics and Astronautics, El SegundoGoogle Scholar
  31. Tischler MB, Cauffman MG (1992) Frequency response method for rotorcraft system identification: flight application to BO-105 couple rotor/fuselage dynamics. J Am Helicopter Soc 37:3–17CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2017

Authors and Affiliations

  1. 1.Stanford UniversityStanfordUSA
  2. 2.EECS DepartmentUC Berkeley, StanfordUSA
  3. 3.Stanford UniversityStanfordUSA
  4. 4.Computer Science DepartmentStanford UniversityStanfordUSA