Safe Exploration Techniques for Reinforcement Learning – An Overview

  • Martin Pecka
  • Tomas Svoboda
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8906)

Abstract

We overview different approaches to safety in (semi)autonomous robotics. Particularly, we focus on how to achieve safe behavior of a robot if it is requested to perform exploration of unknown states. Presented methods are studied from the viewpoint of reinforcement learning, a partially-supervised machine learning method. To collect training data for this algorithm, the robot is required to freely explore the state space – which can lead to possibly dangerous situations. The role of safe exploration is to provide a framework allowing exploration while preserving safety. The examined methods range from simple algorithms to sophisticated methods based on previous experience or state prediction. Our overview also addresses the issues of how to define safety in the real-world applications (apparently absolute safety is unachievable in the continuous and random real world). In the conclusion we also suggest several ways that are worth researching more thoroughly.

Keywords

Safe exploration policy search reinforcement learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abbeel, P., Coates, A., Quigley, M., Ng, A.Y.: An application of reinforcement learning to aerobatic helicopter flight. In: Proceedings of the 2006 Conference on Advances in Neural Information Processing Systems, vol. 19, p. 1 (2007)Google Scholar
  2. 2.
    Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robotics and Autonomous Systems 57(5), 469–483 (2009)CrossRefGoogle Scholar
  3. 3.
    Barto, A.G., Sutton, R.S., Brouwer, P.S.: Associative search network: A reinforcement learning associative memory. Biological Cybernetics (1981)Google Scholar
  4. 4.
    Bertsekas, D.P.: Dynamic programming: deterministic and stochastic models. Prentice-Hall (1987)Google Scholar
  5. 5.
    Chernova, S., Veloso, M.: Confidence-based policy learning from demonstration using Gaussian mixture models. In: AAMAS 2007 Proceedings, p. 1. ACM Press (2007)Google Scholar
  6. 6.
    Consortium, N.: NIFTi robotic UGV platform (2010)Google Scholar
  7. 7.
    Coraluppi, S.P., Marcus, S.I.: Risk-sensitive and minimax control of discrete-time, finite-state Markov decision processes. Automatica (1999)Google Scholar
  8. 8.
    Delage, E., Mannor, S.: Percentile optimization in uncertain Markov decision processes with application to efficient exploration. In: Proceedings of the 24th International Conference on Machine Learning, ICML 2007, pp. 225–232. ACM Press, New York (2007)Google Scholar
  9. 9.
    Ertle, P., Tokic, M., Cubek, R., Voos, H., Soffker, D.: Towards learning of safety knowledge from human demonstrations. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 5394–5399. IEEE (October 2012)Google Scholar
  10. 10.
    Garcia, J., Fernández, F.: Safe exploration of state and action spaces in reinforcement learning. Journal of Artificial Intelligence Research 45, 515–564 (2012)MATHMathSciNetGoogle Scholar
  11. 11.
    Garcia Polo, F.J., Rebollo, F.F.: Safe reinforcement learning in high-risk tasks through policy improvement. In: 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), pp. 76–83. IEEE (April 2011)Google Scholar
  12. 12.
    Geibel, P.: Reinforcement learning with bounded risk. In: ICML, pp. 162–169 (2001)Google Scholar
  13. 13.
    Gillula, J.H., Tomlin, C.J.: Guaranteed safe online learning of a bounded system. In: 2011 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2979–2984. IEEE (September 2011)Google Scholar
  14. 14.
    Hans, A., Schneegaß, D., Schäfer, A., Udluft, S.: Safe exploration for reinforcement learning. In: Proceedings of European Symposium on Artificial Neural Networks, pp. 23–25 (April 2008)Google Scholar
  15. 15.
    Heger, M.: Consideration of risk in reinforcement learning. In: 11th International Machine Learning Conference (1994)Google Scholar
  16. 16.
    Howard, R.A.: Dynamic Programming and Markov Processes. Technology Press of Massachusetts Institute of Technology (1960)Google Scholar
  17. 17.
    Kaelbling, L.P., Littman, M.L., Moore, A.W.: Reinforcement Learning: A Survey. Journal of Artificial Intelligence Research 4, 237–285 (1996)Google Scholar
  18. 18.
    Kim, D., Kim, K.E., Poupart, P.: Cost-Sensitive Exploration in Bayesian Reinforcement Learning. In: Proceedings of Neural Information Processing Systems (NIPS) (2012)Google Scholar
  19. 19.
    Mihatsch, O., Neuneier, R.: Risk-sensitive reinforcement learning. Machine Learning 49(2-3), 267–290 (2002)CrossRefMATHGoogle Scholar
  20. 20.
    Moldovan, T.M., Abbeel, P.: Safe Exploration in Markov Decision Processes. In: Proceedings of the 29th International Conference on Machine Learning (May 2012)Google Scholar
  21. 21.
    Nilim, A., El Ghaoui, L.: Robust Control of Markov Decision Processes with Uncertain Transition Matrices. Operations Research 53(5), 780–798 (2005)CrossRefMATHMathSciNetGoogle Scholar
  22. 22.
    Geibel, P., Wysotzki, F.: Risk-Sensitive Reinforcement Learning Applied to Control under Constraints. Journal Of Artificial Intelligence Research 24, 81–108 (2011)Google Scholar
  23. 23.
    Puterman, M.L.: Markov Decision Processes: Discrete Stochastic Dynamic Programming, 1st edn. John Wiley & Sons, Inc., New York (1994)CrossRefMATHGoogle Scholar
  24. 24.
    Schneider, J.G.: Exploiting model uncertainty estimates for safe dynamic control learning. Neural Information Processing Systems 9, 1047–1053 (1996)Google Scholar
  25. 25.
    Watkins, C.J., Dayan, P.: Q-learning. Machine Learning 8(3-4), 279–292 (1992)CrossRefMATHGoogle Scholar
  26. 26.
    Williams, R.J., Baird, L.C.: Tight performance bounds on greedy policies based on imperfect value functions. Tech. rep., Northeastern University,College of Computer Science (1993)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Martin Pecka
    • 1
  • Tomas Svoboda
    • 1
    • 2
  1. 1.Center for Machine Perception, Dept. of Cybernetics, Faculty of Electrical EngineeringCzech Technical University in PraguePragueCzech Republic
  2. 2.Czech Institute of Informatics, Robotics, and CyberneticsCzech Technical University in PraguePragueCzech Republic

Personalised recommendations