Abstract
Applications of Reinforcement Learning (RL) suffer from high sample complexity due to sparse reward signals and inadequate exploration. Novelty Search (NS) guides as an auxiliary task, in this regard to encourage exploration towards unseen behaviors. However, NS suffers from critical drawbacks concerning scalability and generalizability since they are based off instance learning. Addressing these challenges, we previously proposed a generic approach using unsupervised learning to learn representations of agent behaviors and use reconstruction losses as novelty scores. However, it considered only fixed-length sequences and did not utilize sequential information of behaviors. Therefore, we here extend this approach by using sequential auto-encoders to incorporate sequential dependencies. Experimental results on benchmark tasks show that this sequence learning aids exploration outperforming previous novelty search methods.
Keywords
- Reinforcement Learning
- Exploration
- Novelty Search
- Representation learning
- Sequence learning
This is a preview of subscription content, access via your institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Bellemare, M.G., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., Munos, R.: Unifying Count-Based Exploration and Intrinsic Motivation. arXiv preprint arXiv:1606.01868 (2016)
Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014)
Christiano, P.F., Leike, J., Brown, T., Martic, M., Legg, S., Amodei, D.: Deep reinforcement learning from human preferences. In: Advances in Neural Information Processing Systems, pp. 4299–4307 (2017)
Conti, E., Madhavan, V., Such, F.P., Lehman, J., Stanley, K.O., Clune, J.: Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents. arXiv preprint arXiv:1712.06560 (2017)
Du, Y., Czarnecki, W.M., Jayakumar, S.M., Pascanu, R., Lakshminarayanan, B.: Adapting Auxiliary Losses using Gradient Similarity. arXiv preprint arXiv:1812.02224 (2018)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: Proceedings of International Conference on Robotics and Automation (2017)
Jaderberg, M., et al.: Reinforcement Learning with Unsupervised Auxiliary Tasks. arXiv preprint arXiv:1611.05397 (2016)
Kartal, B., Hernandez-Leal, P., Taylor, M.E.: Terminal prediction as an auxiliary task for deep reinforcement learning. In: Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment (2019)
Lehman, J., Stanley, K.O.: Abandoning objectives: evolution through the search for novelty alone. Evol. Comput. 19(2), 189–223 (2011)
Lehman, J., Stanley, K.O.: Evolving a diversity of virtual creatures through novelty search and local competition. In: Proceedings of International Conference on Genetic and Evolutionary Computation (2011)
Makhzani, A., Frey, B.: K-sparse autoencoders. arXiv preprint arXiv:1312.5663 (2013)
Mirowski, P., et al.: Learning to Navigate in Complex Environments. arXiv preprint arXiv:1611.03673 (2016)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Ostrovski, G., Bellemare, M.G., van den Oord, A., Munos, R.: Count-Based Exploration with Neural Density Models (2017)
Pathak, D., Agrawal, P., Efros, A.A., Darrell, T.: Curiosity-driven exploration by self-supervised prediction. In: Proceedings of International Conference on Machine Learning (2017)
Pathak, D., Gandhi, D., Gupta, A.: Self-Supervised Exploration via Disagreement. arXiv preprint arXiv:1906.04161 (2019)
Ramamurthy, R.: Pytorch-Optimize - A Black Box Optimization Framework. https://github.com/rajcscw/pytorch-optimize (2020)
Ramamurthy, R., Bauckhage, C., Sifa, R., Schücker, J., Wrobel, S.: Leveraging domain knowledge for reinforcement learning using MMC architectures. In: Tetko, I.V., Kůrková, V., Karpov, P., Theis, F. (eds.) ICANN 2019. LNCS, vol. 11728, pp. 595–607. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-30484-3_48
Ramamurthy, R., Bauckhage, C., Sifa, R., Wrobel, S.: Policy learning using SPSA. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 3–12. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_1
Ramamurthy, R., Sifa, R., Lübbering, M., Bauckhage, C.: Novelty-guided reinforcement learning via encoded behaviors. In: Proceedings of International Joint Conference on Neural Networks (2020)
Rechenberg, I.: Evolutionsstrategie: Optimierung technischer Systeme nach Prinzipien der biologischen Evolution. Ph.D. thesis, Technical University of Berlin, Department of Process Engineering (1971)
Rechenberg, I.: Evolutionsstrategien. In: Simulationsmethoden in der Medizin und Biologie (1978)
Salimans, T., Ho, J., Chen, X., Sutskever, I.: Evolution Strategies as a Scalable Alternative to Reinforcement Learning. arXiv:1703.03864 (2017)
Schmidhuber, J.: Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Mental Dev. 2, 230–247 (2010)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: Proceedings of International Conference on Machine Learning (2015)
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of International Conference on Machine Learning (2014)
Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)
Srivastava, N., Mansimov, E., Salakhudinov, R.: Unsupervised learning of video representations using LSTMs. In: Proceedings of International Conference on Machine Learning (2015)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Proceedings in Neural Information Processing Systems (2014)
Zhu, Y., et al.: Target-driven visual navigation in indoor scenes using deep reinforcement learning. In: Proceedings of International Conference on Robotics and Automation (2017)
Acknowledgement
In parts, the authors of this work were supported by the Fraunhofer Research Center for Machine Learning (RCML) within the Fraunhofer Cluster of Excellence Cognitive Internet Technologies (CCIT). We gratefully acknowledges this support.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Ramamurthy, R., Sifa, R., Lübbering, M., Bauckhage, C. (2020). Guided Reinforcement Learning via Sequence Learning. In: Farkaš, I., Masulli, P., Wermter, S. (eds) Artificial Neural Networks and Machine Learning – ICANN 2020. ICANN 2020. Lecture Notes in Computer Science(), vol 12397. Springer, Cham. https://doi.org/10.1007/978-3-030-61616-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-61616-8_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-61615-1
Online ISBN: 978-3-030-61616-8
eBook Packages: Computer ScienceComputer Science (R0)