Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants

Tokic, Michel; Palm, Günther

doi:10.1007/978-3-642-33212-8_6

Michel Tokic^22,23 &
Günther Palm²²

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7477))

Included in the following conference series:

IAPR Workshop on Artificial Neural Networks in Pattern Recognition

1412 Accesses
4 Citations

Abstract

Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data for each state of the state space, but produces exploratory behavior only in states with improvement potential. Our results suggest that gradient-based exploration can be efficiently used in combination with off- and on-policy algorithms such as Q-learning and Sarsa.

Download to read the full chapter text

Chapter PDF

Generalized exploration in policy search

Article 13 July 2017

Exploration Versus Exploitation in Model-Based Reinforcement Learning: An Empirical Study

Reinforcement Learning

Keywords

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)
Google Scholar
Wiering, M.: Explorations in Efficient Reinforcement Learning. PhD thesis, University of Amsterdam, Amsterdam (1999)
Google Scholar
Thrun, S.B.: Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA, USA (1992)
Google Scholar
Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. The Journal of Machine Learning Research 3, 397–422 (2002)
MathSciNet Google Scholar
van Eck, N.J., van Wezel, M.: Application of reinforcement learning to the game of Othello. Computers and Operations Research 35, 1999–2017 (2008)
Article MathSciNet MATH Google Scholar
Faußer, S., Schwenker, F.: Learning a strategy with neural approximated temporal-difference methods in english draughts. In: Proceedings of the 20th International Conference on Pattern Recognition, ICPR 2010, pp. 2925–2928. IEEE Computer Society (2010)
Google Scholar
Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University (1994)
Google Scholar
Daw, N.D., O’Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for exploratory decisions in humans. Nature 441(7095), 876–879 (2006)
Article Google Scholar
Tokic, M., Palm, G.: Adaptive exploration using stochastic neurons. In: Proceedings of the 22nd International Conference on Artificial Neural Networks, Lausanne, Switzerland. Springer (to appear, 2012)
Google Scholar
Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, England (1989)
Google Scholar
George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167–198 (2006)
Article Google Scholar
Grzes, M., Kudenko, D.: Online learning of shaping rewards in reinforcement learning. Neural Networks 23(4), 541–550 (2010)
Article Google Scholar
Nouri, A., Littman, M.L.: Multi-resolution exploration in continuous spaces. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 1209–1216 (2009)
Google Scholar
Tokic, M., Palm, G.: Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS, vol. 7006, pp. 335–346. Springer, Heidelberg (2011)
Chapter Google Scholar
Tokic, M., Ertle, P., Palm, G., Söffker, D., Voos, H.: Robust Exploration/Exploitation Trade-Offs in Safety-Critical applications. In: Proceedings of the 8th International Symposium on Fault Detection, Supervision and Safety of Technical Processes, Mexico City, Mexico. IFAC (to appear, 2012)
Google Scholar
Williams, R.J.: Simple statistical Gradient-Following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)
MATH Google Scholar
Singh, S., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Machine Learning 22, 123–158 (1996)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Neural Information Processing, University of Ulm, Germany
Michel Tokic & Günther Palm
Institute of Applied Research, University of Applied Sciences, Ravensburg-Weingarten, Germany
Michel Tokic

Authors

Michel Tokic
View author publications
You can also search for this author in PubMed Google Scholar
Günther Palm
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Fondazione Bruno Kessler (FBK), 38123, Trento, Italy
Nadia Mana
Institute of Neural Information Processing, University of Ulm, 89069, Ulm, Germany
Friedhelm Schwenker
Dipartimento di Ingegneria dell’Informazione, Università di Siena, 53100, Siena, Italy
Edmondo Trentin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tokic, M., Palm, G. (2012). Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants. In: Mana, N., Schwenker, F., Trentin, E. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2012. Lecture Notes in Computer Science(), vol 7477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33212-8_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-33212-8_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-33211-1
Online ISBN: 978-3-642-33212-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants

Abstract

Chapter PDF