Abstract
Applying reinforcement learning algorithms in real-world domains is challenging because relevant state information is often embedded in a stream of high-dimensional sensor data. This paper describes a novel algorithm for learning task-relevant features through interactions with the environment. The key idea is that a feature is likely to be useful to the degree that its dynamics can be controlled by the actions of the agent. We describe an algorithm that can find such features and we demonstrate its effectiveness in an artificial domain.
Chapter PDF
References
Bellemare, M.G., Veness, J., Bowling, M.: Investigating contingency awareness using Atari 2600 games. In: Proceedings of the Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)
Escalante-B, A.N., Wiskott, L.: Slow feature analysis: Perspectives for technical applications of a versatile learning algorithm. Künstliche Intelligenz 26(4), 341–348 (2012)
Geramifard, A., Walsh, T., Roy, N., How, J.: Batch iFDD: A Scalable Matching Pursuit Algorithm for Solving MDPs. In: Proceedings of the 29th Annual Conference on Uncertainty in Artificial Intelligence (2013)
Kolter, J., Ng, A.Y.: Regularization and feature selection in least-squares temporal difference learning. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 521–528. ACM (2009)
Lagoudakis, M.G., Parr, R.: Least-squares policy iteration. The Journal of Machine Learning Research 4, 1107–1149 (2003)
Lange, S., Riedmiller, M.: Deep auto-encoder neural networks in reinforcement learning. In: Proceedings of the 2010 International Joint Conference on Neural Networks, pp. 1–8. IEEE (2010)
Luciw, M., Schmidhuber, J.: Low complexity proto-value function learning from sensory observations with incremental slow feature analysis. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part II. LNCS, vol. 7553, pp. 279–287. Springer, Heidelberg (2012)
Mahadevan, S., Giguere, S., Jacek, N.: Basis adaptation for sparse nonlinear reinforcement learning (2013)
Mahadevan, S., Maggioni, M.: Proto-value functions: A laplacian framework for learning representation and control in markov decision processes. Journal of Machine Learning Research 8(16), 2169–2231 (2007)
Parr, R., Painter-Wakefield, C., Li, L., Littman, M.: Analyzing feature generation for value-function approximation. In: Proceedings of the 24th International Conference on Machine Learning (2007)
Sprague, N.: Basis iteration for reward based dimensionality reduction. In: Proceedings of the 6th IEEE International Conference on Development and Learning (2007)
Sprague, N.: Predictive projections. In: Proceedings of the Twenty-First International Joint Conference on Artificial Intelligence (2009)
Sprekeler, H.: On the relation of slow feature analysis and laplacian eigenmaps. Neural Computation 23(12), 3287–3302 (2011)
Wiskott, L., Sejnowski, T.J.: Slow feature analysis: Unsupervised learning of invariances. Neural Computation 14(4), 715–770 (2002)
Zito, T., Wilbert, N., Wiskott, L., Berkes, P.: Modular toolkit for data processing (MDP): a Python data processing frame work. Front. Neuroinform. 2(8) (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Sprague, N. (2014). Contingent Features for Reinforcement Learning. In: Wermter, S., et al. Artificial Neural Networks and Machine Learning – ICANN 2014. ICANN 2014. Lecture Notes in Computer Science, vol 8681. Springer, Cham. https://doi.org/10.1007/978-3-319-11179-7_44
Download citation
DOI: https://doi.org/10.1007/978-3-319-11179-7_44
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11178-0
Online ISBN: 978-3-319-11179-7
eBook Packages: Computer ScienceComputer Science (R0)