Learning Options for an MDP from Demonstrations

Tamassia, Marco; Zambetta, Fabio; Raffe, William; Li, Xiaodong

doi:10.1007/978-3-319-14803-8_18

Marco Tamassia²²,
Fabio Zambetta²²,
William Raffe²² &
…
Xiaodong Li²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 8955))

Included in the following conference series:

Australasian Conference on Artificial Life and Computational Intelligence

1670 Accesses
1 Citations

Abstract

The options framework provides a foundation to use hierarchical actions in reinforcement learning. An agent using options, along with primitive actions, at any point in time can decide to perform a macro-action made out of many primitive actions rather than a primitive action. Such macro-actions can be hand-crafted or learned. There has been previous work on learning them by exploring the environment. Here we take a different perspective and present an approach to learn options from a set of experts demonstrations. Empirical results are also presented in a similar setting to the one used in other works in this area.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 1–8. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015430 , doi:10.1145/1015330.1015430
Google Scholar
Baxter, J., Tridgell, A., Weaver, L.: Knightcap: A chess programm that learns by combining TD(lambda) with game-tree search. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 28–36. Morgan Kaufmann Publishers Inc., San Francisco (1998), http://dl.acm.org/citation.cfm?id=645527.657300
Google Scholar
Cobo, L.C., Subramanian, K., Jr., C.L.I., Lanterman, A.D., Thomaz, A.L.: Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains. Artificial Intelligence 216(0), 103 (2014), http://www.sciencedirect.com/science/article/pii/S0004370214000861 , doi:10.1016/j.artint.2014.07.003
Article MATH MathSciNet Google Scholar
Şimşek, Ö., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine learning, ICML 2005, pp. 816–823. ACM, New York (2005), http://doi.acm.org/10.1145/1102351.1102454 , doi:10.1145/1102351.1102454
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Fayyad, U., Han, J. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, vol. 96, pp. 226–231. AAAI Press (1996)
Google Scholar
Floyd, R.W.: Algorithm 97: Shortest path. Communications of the ACM 5(6), 345–349 (1962), http://doi.acm.org/10.1145/367766.368168 , doi:10.1145/367766.368168
Article Google Scholar
Jong, N.K., Hester, T., Stone, P.: The utility of temporal abstraction in reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2008, vol. 1, pp. 299–306. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2008), http://dl.acm.org/citation.cfm?id=1402383.1402429
Google Scholar
Klein, E., Geist, M., Pietquin, O.: Batch, off-policy and model-free apprenticeship learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 285–296. Springer, Heidelberg (2012), http://dx.doi.org/10.1007/978-3-642-29946-9_28
Chapter Google Scholar
Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. Adaptation, Learning, and Optimization, vol. 12, pp. 579–610. Springer, Heidelberg (2012), http://dx.doi.org/10.1007/978-3-642-27645-3_18 , doi:10.1007/978-3-642-27645-3_18
Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 71–78. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015355 , doi:10.1145/1015330.1015355
Google Scholar
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 361–368. Morgan Kaufmann Publishers Inc., San Francisco (2001), http://dl.acm.org/citation.cfm?id=645530.655681
Google Scholar
Lacasse, A., Laviolette, F., Marchand, M., Turgeon-Boutin, F.: Learning with randomized majority votes. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 162–177. Springer, Heidelberg (2010), http://dx.doi.org/10.1007/978-3-642-15883-4_25
Chapter Google Scholar
Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang Jr, M.H., Khatib, O. (eds.) Experimental Robotics IX. Springer Tracts in Advanced Robotics, vol. 21, pp. 363–372. Springer, Heidelberg (2006), http://dx.doi.org/10.1007/11552246_35
Chapter Google Scholar
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 663–670. Morgan Kaufmann Publishers Inc., San Francisco (2000), http://dl.acm.org/citation.cfm?id=645529.657801
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in python. The Journal of Machine Learning Research 12, 2825–2830 (2011), http://dl.acm.org/citation.cfm?id=1953048.2078195
MATH Google Scholar
Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 2586–2591. Morgan Kaufmann Publishers Inc, San Francisco (2007), http://dl.acm.org/citation.cfm?id=1625275.1625692
Google Scholar
Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 95–102. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015353 , doi:10.1145/1015330.1015353
Google Scholar
Stolle, M., Precup, D.: Learning options in reinforcement learning. In: Koenig, S., Holte, R. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, pp. 212–223. Springer, Heidelberg (2002), http://dx.doi.org/10.1007/3-540-45622-8_16
Chapter Google Scholar
Stone, P., Sutton, R.S.: Scaling reinforcement learning toward robocup soccer. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 537–544. Morgan Kaufmann Publishers Inc., San Francisco (2001), http://dl.acm.org/citation.cfm?id=645530.655674
Google Scholar
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
Google Scholar
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1–2), 181–211 (1999), http://www.sciencedirect.com/science/article/pii/S0004370299000521 , doi: http://dx.doi.org/10.1016/S0004-37029900052-1
Vigorito, C., Barto, A.: Intrinsically motivated hierarchical skill learning in structured environments. IEEE Transactions on Autonomous Mental Development 2(2), 132–143 (2010), doi:10.1109/TAMD.2010.2050205
Article Google Scholar
Walt, S., van, d. C.S.C., Varoquaux, G.: The numpy array: A structure for efficient numerical computation. Computing in Science & Engineering 13(2), 22–30 (2011), http://scitation.aip.org/content/aip/journal/cise/13/2/10.1109/MCSE.2011.37 , doi: http://dx.doi.org/10.1109/MCSE.2011.37
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)
Google Scholar
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, AAAI 2008, pp. 1433–1438. AAAI Press (2008), http://dl.acm.org/citation.cfm?id=1620270.1620297

Download references

Author information

Authors and Affiliations

RMIT University, Melbourne, VIC, Australia
Marco Tamassia, Fabio Zambetta, William Raffe & Xiaodong Li

Authors

Marco Tamassia
View author publications
You can also search for this author in PubMed Google Scholar
Fabio Zambetta
View author publications
You can also search for this author in PubMed Google Scholar
William Raffe
View author publications
You can also search for this author in PubMed Google Scholar
Xiaodong Li
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Electrical Engineering and Computer Science, The University of Newcastle, 2308, Callaghan, NSW, Australia
Stephan K. Chalup
School of Computer Science and Engineering, The University of New South Wales, 2052, Sydney, NSW, Australia
Alan D. Blair
Faculty of Business, Bond University, 4226, Robina, QLD, Australia
Marcus Randall

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tamassia, M., Zambetta, F., Raffe, W., Li, X. (2015). Learning Options for an MDP from Demonstrations. In: Chalup, S.K., Blair, A.D., Randall, M. (eds) Artificial Life and Computational Intelligence. ACALCI 2015. Lecture Notes in Computer Science(), vol 8955. Springer, Cham. https://doi.org/10.1007/978-3-319-14803-8_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-14803-8_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14802-1
Online ISBN: 978-3-319-14803-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Learning Options for an MDP from Demonstrations