Abstract
The options framework provides a foundation to use hierarchical actions in reinforcement learning. An agent using options, along with primitive actions, at any point in time can decide to perform a macro-action made out of many primitive actions rather than a primitive action. Such macro-actions can be hand-crafted or learned. There has been previous work on learning them by exploring the environment. Here we take a different perspective and present an approach to learn options from a set of experts demonstrations. Empirical results are also presented in a similar setting to the one used in other works in this area.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 1–8. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015430 , doi:10.1145/1015330.1015430
Baxter, J., Tridgell, A., Weaver, L.: Knightcap: A chess programm that learns by combining TD(lambda) with game-tree search. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 28–36. Morgan Kaufmann Publishers Inc., San Francisco (1998), http://dl.acm.org/citation.cfm?id=645527.657300
Cobo, L.C., Subramanian, K., Jr., C.L.I., Lanterman, A.D., Thomaz, A.L.: Abstraction from demonstration for efficient reinforcement learning in high-dimensional domains. Artificial Intelligence 216(0), 103 (2014), http://www.sciencedirect.com/science/article/pii/S0004370214000861 , doi:10.1016/j.artint.2014.07.003
Şimşek, Ö., Wolfe, A.P., Barto, A.G.: Identifying useful subgoals in reinforcement learning by local graph partitioning. In: Proceedings of the 22nd International Conference on Machine learning, ICML 2005, pp. 816–823. ACM, New York (2005), http://doi.acm.org/10.1145/1102351.1102454 , doi:10.1145/1102351.1102454
Ester, M., Kriegel, H.-P., Sander, J., Xu, X.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Simoudis, E., Fayyad, U., Han, J. (eds.) Proceedings of the Second International Conference on Knowledge Discovery and Data Mining, vol. 96, pp. 226–231. AAAI Press (1996)
Floyd, R.W.: Algorithm 97: Shortest path. Communications of the ACM 5(6), 345–349 (1962), http://doi.acm.org/10.1145/367766.368168 , doi:10.1145/367766.368168
Jong, N.K., Hester, T., Stone, P.: The utility of temporal abstraction in reinforcement learning. In: Proceedings of the 7th International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS 2008, vol. 1, pp. 299–306. International Foundation for Autonomous Agents and Multiagent Systems, Richland (2008), http://dl.acm.org/citation.cfm?id=1402383.1402429
Klein, E., Geist, M., Pietquin, O.: Batch, off-policy and model-free apprenticeship learning. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS, vol. 7188, pp. 285–296. Springer, Heidelberg (2012), http://dx.doi.org/10.1007/978-3-642-29946-9_28
Kober, J., Peters, J.: Reinforcement learning in robotics: A survey. In: Wiering, M., van Otterlo, M. (eds.) Reinforcement Learning. Adaptation, Learning, and Optimization, vol. 12, pp. 579–610. Springer, Heidelberg (2012), http://dx.doi.org/10.1007/978-3-642-27645-3_18 , doi:10.1007/978-3-642-27645-3_18
Mannor, S., Menache, I., Hoze, A., Klein, U.: Dynamic abstraction in reinforcement learning via clustering. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 71–78. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015355 , doi:10.1145/1015330.1015355
McGovern, A., Barto, A.G.: Automatic discovery of subgoals in reinforcement learning using diverse density. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 361–368. Morgan Kaufmann Publishers Inc., San Francisco (2001), http://dl.acm.org/citation.cfm?id=645530.655681
Lacasse, A., Laviolette, F., Marchand, M., Turgeon-Boutin, F.: Learning with randomized majority votes. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 162–177. Springer, Heidelberg (2010), http://dx.doi.org/10.1007/978-3-642-15883-4_25
Ng, A., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Ang Jr, M.H., Khatib, O. (eds.) Experimental Robotics IX. Springer Tracts in Advanced Robotics, vol. 21, pp. 363–372. Springer, Heidelberg (2006), http://dx.doi.org/10.1007/11552246_35
Ng, A.Y., Russell, S.J.: Algorithms for inverse reinforcement learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, ICML 2000, pp. 663–670. Morgan Kaufmann Publishers Inc., San Francisco (2000), http://dl.acm.org/citation.cfm?id=645529.657801
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine learning in python. The Journal of Machine Learning Research 12, 2825–2830 (2011), http://dl.acm.org/citation.cfm?id=1953048.2078195
Ramachandran, D., Amir, E.: Bayesian inverse reinforcement learning. In: Proceedings of the 20th International Joint Conference on Artifical Intelligence, IJCAI 2007, pp. 2586–2591. Morgan Kaufmann Publishers Inc, San Francisco (2007), http://dl.acm.org/citation.cfm?id=1625275.1625692
Şimşek, Ö., Barto, A.G.: Using relative novelty to identify useful temporal abstractions in reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML 2004, pp. 95–102. ACM, New York (2004), http://doi.acm.org/10.1145/1015330.1015353 , doi:10.1145/1015330.1015353
Stolle, M., Precup, D.: Learning options in reinforcement learning. In: Koenig, S., Holte, R. (eds.) SARA 2002. LNCS (LNAI), vol. 2371, pp. 212–223. Springer, Heidelberg (2002), http://dx.doi.org/10.1007/3-540-45622-8_16
Stone, P., Sutton, R.S.: Scaling reinforcement learning toward robocup soccer. In: Proceedings of the Eighteenth International Conference on Machine Learning, ICML 2001, pp. 537–544. Morgan Kaufmann Publishers Inc., San Francisco (2001), http://dl.acm.org/citation.cfm?id=645530.655674
Sutton, R.S., Barto, A.G.: Introduction to Reinforcement Learning, 1st edn. MIT Press, Cambridge (1998)
Sutton, R.S., Precup, D., Singh, S.: Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence 112(1–2), 181–211 (1999), http://www.sciencedirect.com/science/article/pii/S0004370299000521 , doi: http://dx.doi.org/10.1016/S0004-37029900052-1
Vigorito, C., Barto, A.: Intrinsically motivated hierarchical skill learning in structured environments. IEEE Transactions on Autonomous Mental Development 2(2), 132–143 (2010), doi:10.1109/TAMD.2010.2050205
Walt, S., van, d. C.S.C., Varoquaux, G.: The numpy array: A structure for efficient numerical computation. Computing in Science & Engineering 13(2), 22–30 (2011), http://scitation.aip.org/content/aip/journal/cise/13/2/10.1109/MCSE.2011.37 , doi: http://dx.doi.org/10.1109/MCSE.2011.37
Watkins, C.J.C.H.: Learning from delayed rewards. Ph.D. thesis, University of Cambridge (1989)
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd National Conference on Artificial Intelligence - Volume 3, AAAI 2008, pp. 1433–1438. AAAI Press (2008), http://dl.acm.org/citation.cfm?id=1620270.1620297
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Tamassia, M., Zambetta, F., Raffe, W., Li, X. (2015). Learning Options for an MDP from Demonstrations. In: Chalup, S.K., Blair, A.D., Randall, M. (eds) Artificial Life and Computational Intelligence. ACALCI 2015. Lecture Notes in Computer Science(), vol 8955. Springer, Cham. https://doi.org/10.1007/978-3-319-14803-8_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-14803-8_18
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14802-1
Online ISBN: 978-3-319-14803-8
eBook Packages: Computer ScienceComputer Science (R0)