Skip to main content

Advertisement

SpringerLink
Log in
Menu
Find a journal Publish with us
Search
Cart
Book cover

IAPR Workshop on Artificial Neural Networks in Pattern Recognition

ANNPR 2012: Artificial Neural Networks in Pattern Recognition pp 60–71Cite as

  1. Home
  2. Artificial Neural Networks in Pattern Recognition
  3. Conference paper
Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants

Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants

  • Michel Tokic22,23 &
  • Günther Palm22 
  • Conference paper
  • 1346 Accesses

  • 4 Citations

Part of the Lecture Notes in Computer Science book series (LNAI,volume 7477)

Abstract

Gradient-following algorithms are deployed for efficient adaptation of exploration parameters in temporal-difference learning with discrete action spaces. Global and local variants are evaluated in discrete and continuous state spaces. The global variant is memory efficient in terms of requiring exploratory data only for starting states. In contrast, the local variant requires exploratory data for each state of the state space, but produces exploratory behavior only in states with improvement potential. Our results suggest that gradient-based exploration can be efficiently used in combination with off- and on-policy algorithms such as Q-learning and Sarsa.

Keywords

  • reinforcement learning
  • exploration/exploitation

Download conference paper PDF

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)

    Google Scholar 

  2. Wiering, M.: Explorations in Efficient Reinforcement Learning. PhD thesis, University of Amsterdam, Amsterdam (1999)

    Google Scholar 

  3. Thrun, S.B.: Efficient exploration in reinforcement learning. Technical Report CMU-CS-92-102, Carnegie Mellon University, Pittsburgh, PA, USA (1992)

    Google Scholar 

  4. Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. The Journal of Machine Learning Research 3, 397–422 (2002)

    MathSciNet  Google Scholar 

  5. van Eck, N.J., van Wezel, M.: Application of reinforcement learning to the game of Othello. Computers and Operations Research 35, 1999–2017 (2008)

    CrossRef  MathSciNet  MATH  Google Scholar 

  6. Faußer, S., Schwenker, F.: Learning a strategy with neural approximated temporal-difference methods in english draughts. In: Proceedings of the 20th International Conference on Pattern Recognition, ICPR 2010, pp. 2925–2928. IEEE Computer Society (2010)

    Google Scholar 

  7. Rummery, G.A., Niranjan, M.: On-line Q-learning using connectionist systems. Technical Report CUED/F-INFENG/TR 166, Cambridge University (1994)

    Google Scholar 

  8. Daw, N.D., O’Doherty, J.P., Dayan, P., Seymour, B., Dolan, R.J.: Cortical substrates for exploratory decisions in humans. Nature 441(7095), 876–879 (2006)

    CrossRef  Google Scholar 

  9. Tokic, M., Palm, G.: Adaptive exploration using stochastic neurons. In: Proceedings of the 22nd International Conference on Artificial Neural Networks, Lausanne, Switzerland. Springer (to appear, 2012)

    Google Scholar 

  10. Watkins, C.: Learning from Delayed Rewards. PhD thesis, University of Cambridge, England (1989)

    Google Scholar 

  11. George, A.P., Powell, W.B.: Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming. Machine Learning 65(1), 167–198 (2006)

    CrossRef  Google Scholar 

  12. Grzes, M., Kudenko, D.: Online learning of shaping rewards in reinforcement learning. Neural Networks 23(4), 541–550 (2010)

    CrossRef  Google Scholar 

  13. Nouri, A., Littman, M.L.: Multi-resolution exploration in continuous spaces. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 21, pp. 1209–1216 (2009)

    Google Scholar 

  14. Tokic, M., Palm, G.: Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax. In: Bach, J., Edelkamp, S. (eds.) KI 2011. LNCS, vol. 7006, pp. 335–346. Springer, Heidelberg (2011)

    CrossRef  Google Scholar 

  15. Tokic, M., Ertle, P., Palm, G., Söffker, D., Voos, H.: Robust Exploration/Exploitation Trade-Offs in Safety-Critical applications. In: Proceedings of the 8th International Symposium on Fault Detection, Supervision and Safety of Technical Processes, Mexico City, Mexico. IFAC (to appear, 2012)

    Google Scholar 

  16. Williams, R.J.: Simple statistical Gradient-Following algorithms for connectionist reinforcement learning. Machine Learning 8, 229–256 (1992)

    MATH  Google Scholar 

  17. Singh, S., Sutton, R.S.: Reinforcement learning with replacing eligibility traces. Machine Learning 22, 123–158 (1996)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

  1. Institute of Neural Information Processing, University of Ulm, Germany

    Michel Tokic & Günther Palm

  2. Institute of Applied Research, University of Applied Sciences, Ravensburg-Weingarten, Germany

    Michel Tokic

Authors
  1. Michel Tokic
    View author publications

    You can also search for this author in PubMed Google Scholar

  2. Günther Palm
    View author publications

    You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

  1. Fondazione Bruno Kessler (FBK), 38123, Trento, Italy

    Nadia Mana

  2. Institute of Neural Information Processing, University of Ulm, 89069, Ulm, Germany

    Friedhelm Schwenker

  3. Dipartimento di Ingegneria dell’Informazione, Università di Siena, 53100, Siena, Italy

    Edmondo Trentin

Rights and permissions

Reprints and Permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tokic, M., Palm, G. (2012). Gradient Algorithms for Exploration/Exploitation Trade-Offs: Global and Local Variants. In: Mana, N., Schwenker, F., Trentin, E. (eds) Artificial Neural Networks in Pattern Recognition. ANNPR 2012. Lecture Notes in Computer Science(), vol 7477. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-33212-8_6

Download citation

  • .RIS
  • .ENW
  • .BIB
  • DOI: https://doi.org/10.1007/978-3-642-33212-8_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-33211-1

  • Online ISBN: 978-3-642-33212-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Share this paper

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • The International Association for Pattern Recognition

    Published in cooperation with

    http://www.iapr.org/

Search

Navigation

  • Find a journal
  • Publish with us

Discover content

  • Journals A-Z
  • Books A-Z

Publish with us

  • Publish your research
  • Open access publishing

Products and services

  • Our products
  • Librarians
  • Societies
  • Partners and advertisers

Our imprints

  • Springer
  • Nature Portfolio
  • BMC
  • Palgrave Macmillan
  • Apress
  • Your US state privacy rights
  • Accessibility statement
  • Terms and conditions
  • Privacy policy
  • Help and support

167.114.118.210

Not affiliated

Springer Nature

© 2023 Springer Nature