Skip to main content

Learning Options for an MDP from Demonstrations

  • Conference paper
Artificial Life and Computational Intelligence (ACALCI 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8955))

Abstract

The options framework provides a foundation to use hierarchical actions in reinforcement learning. An agent using options, along with primitive actions, at any point in time can decide to perform a macro-action made out of many primitive actions rather than a primitive action. Such macro-actions can be hand-crafted or learned. There has been previous work on learning them by exploring the environment. Here we take a different perspective and present an approach to learn options from a set of experts demonstrations. Empirical results are also presented in a similar setting to the one used in other works in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 1–8. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015430 , doi:10.1145/1015330.1015430

    Google Scholar 

  2. Baxter, J., Tridgell, A., Weaver, L.: Knightcap: A chess programm that learns by combining TD(lambda) with game-tree search. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 28–36. Morgan Kaufmann Publishers Inc., San Francisco (1998), http://dl.acm.org/citation.cfm?id=645527.657300

    Google Scholar 

  3. Cobo, L.C., Subramanian, K., Jr., C.L.I., Lanterman, A.D., Thomaz, A.L.: Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains. Artificial Intelligence 216(0), 103 (2014), http://www.sciencedirect.com/science/article/pii/S0004370214000861 , doi:10.1016/j.artint.2014.07.003

    Article  MATH  MathSciNet  Google Scholar 

  4. Şimşek, Ö., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine learning, ICML 2005, pp. 816–823. ACM, New York (2005), http://doi.acm.org/10.1145/1102351.1102454 , doi:10.1145/1102351.1102454

    Google Scholar 

  5. Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Fayyad, U., Han, J. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, vol. 96, pp. 226–231. AAAI Press (1996)

    Google Scholar 

  6. Floyd, R.W.: Algorithm 97: Shortest path. Communications of the ACM 5(6), 345–349 (1962), http://doi.acm.org/10.1145/367766.368168 , doi:10.1145/367766.368168

    Article  Google Scholar 

  7. Jong, N.K., Hester, T., Stone, P.: The utility of temporal abstraction in reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2008, vol. 1, pp. 299–306. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2008), http://dl.acm.org/citation.cfm?id=1402383.1402429

    Google Scholar 

  8. Klein, E., Geist, M., Pietquin, O.: Batch, off-policy and model-free apprenticeship learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 285–296. Springer, Heidelberg (2012), http://dx.doi.org/10.1007/978-3-642-29946-9_28

    Chapter  Google Scholar 

  9. Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. Adaptation, Learning, and Optimization, vol. 12, pp. 579–610. Springer, Heidelberg (2012), http://dx.doi.org/10.1007/978-3-642-27645-3_18 , doi:10.1007/978-3-642-27645-3_18

  10. Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 71–78. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015355 , doi:10.1145/1015330.1015355

    Google Scholar 

  11. McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 361–368. Morgan Kaufmann Publishers Inc., San Francisco (2001), http://dl.acm.org/citation.cfm?id=645530.655681

    Google Scholar 

  12. Lacasse, A., Laviolette, F., Marchand, M., Turgeon-Boutin, F.: Learning with randomized majority votes. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 162–177. Springer, Heidelberg (2010), http://dx.doi.org/10.1007/978-3-642-15883-4_25

    Chapter  Google Scholar 

  13. Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang Jr, M.H., Khatib, O. (eds.) Experimental Robotics IX. Springer Tracts in Advanced Robotics, vol. 21, pp. 363–372. Springer, Heidelberg (2006), http://dx.doi.org/10.1007/11552246_35

    Chapter  Google Scholar 

  14. Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 663–670. Morgan Kaufmann Publishers Inc., San Francisco (2000), http://dl.acm.org/citation.cfm?id=645529.657801

    Google Scholar 

  15. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in python. The Journal of Machine Learning Research 12, 2825–2830 (2011), http://dl.acm.org/citation.cfm?id=1953048.2078195

    MATH  Google Scholar 

  16. Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 2586–2591. Morgan Kaufmann Publishers Inc, San Francisco (2007), http://dl.acm.org/citation.cfm?id=1625275.1625692

    Google Scholar 

  17. Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 95–102. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015353 , doi:10.1145/1015330.1015353

    Google Scholar 

  18. Stolle, M., Precup, D.: Learning options in reinforcement learning. In: Koenig, S., Holte, R. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, pp. 212–223. Springer, Heidelberg (2002), http://dx.doi.org/10.1007/3-540-45622-8_16

    Chapter  Google Scholar 

  19. Stone, P., Sutton, R.S.: Scaling reinforcement learning toward robocup soccer. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 537–544. Morgan Kaufmann Publishers Inc., San Francisco (2001), http://dl.acm.org/citation.cfm?id=645530.655674

    Google Scholar 

  20. Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)

    Google Scholar 

  21. Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1–2), 181–211 (1999), http://www.sciencedirect.com/science/article/pii/S0004370299000521 , doi: http://dx.doi.org/10.1016/S0004-37029900052-1

  22. Vigorito, C., Barto, A.: Intrinsically motivated hierarchical skill learning in structured environments. IEEE Transactions on Autonomous Mental Development 2(2), 132–143 (2010), doi:10.1109/TAMD.2010.2050205

    Article  Google Scholar 

  23. Walt, S., van, d. C.S.C., Varoquaux, G.: The numpy array: A structure for efficient numerical computation. Computing in Science & Engineering 13(2), 22–30 (2011), http://scitation.aip.org/content/aip/journal/cise/13/2/10.1109/MCSE.2011.37 , doi: http://dx.doi.org/10.1109/MCSE.2011.37

  24. Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)

    Google Scholar 

  25. Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, AAAI 2008, pp. 1433–1438. AAAI Press (2008), http://dl.acm.org/citation.cfm?id=1620270.1620297

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Tamassia, M., Zambetta, F., Raffe, W., Li, X. (2015). Learning Options for an MDP from Demonstrations. In: Chalup, S.K., Blair, A.D., Randall, M. (eds) Artificial Life and Computational Intelligence. ACALCI 2015. Lecture Notes in Computer Science(), vol 8955. Springer, Cham. https://doi.org/10.1007/978-3-319-14803-8_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-14803-8_18

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-14802-1

  • Online ISBN: 978-3-319-14803-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics