Schätzer für Zustandsbewertung und Aktionsauswahl

Lorenz, Uwe

doi:10.1007/978-3-662-68311-8_5

Uwe Lorenz²

260 Accesses

Zusammenfassung

In der Regel reichen die verfügbaren Ressourcen nicht aus, um Steuerung, Bewertungsfunktion oder Modell tabellarisch zu erfassen. Daher werden in diesem Kapitel parametrisierte Schätzer eingeführt, mit denen wir die Bewertung von Zuständen oder probabilistische Aktionspräferenzen abschätzen können, selbst dann, wenn sie nicht in genau gleicher Form zuvor beobachtet worden sind.

„Ebenso wie Federn irrelevant für das Fliegen sind, werden wir im Laufe der Zeit möglicherweise entdecken, dass Neuronen und Synapsen für die Intelligenz unbedeutend sind.“ (Alpaydin 2019). (Ethem Alpaydin)

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 24.99; Price excludes VAT (USA)

Softcover Book: USD 34.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Auch „efferente Nervenzellen“ oder „Motoneuronen“
2.
Autor: Zoran Sevarac; Copyright 2010 Neuroph Project http://neuroph.sourceforge.net. Licensed under the Apache License, Version 2.0 (the „License“); http://www.apache.org/licenses/LICENSE-2.0. Weitere Hinweise sind in den Files des zitierten Programmcodes.
3.
Autor: Zoran Sevarac; Copyright 2010 Neuroph Project http://neuroph.sourceforge.net. Licensed under the Apache License, Version 2.0 (the „License“); http://www.apache.org/licenses/LICENSE-2.0; Weitere Hinweise sind in den Files des zitierten Programmcodes.
4.
S. Kakade and J. Langford. „Approximately optimal approximate reinforcement learning“. In: ICML. Bd. 2. 2002, S. 267–274.

Literatur

Alpaydin E (2019) Maschinelles Lernen. 2., erweiterte Auflage. De Gruyter Studium, Berlin/Boston.
Google Scholar
Been K, Pavlus, J (2019) A new approach to understanding how machines think. Quantamagazine. https://www.quantamagazine.org/been-kim-is-building-a-translator-for-artificial-intelligence-20190110/
Churchland PS, Sejnowski TJ (1997) Grundlagen zur Neuroinformatik und Neurobiologie. The Computational Brain in deutscher Sprache: vieweg Computational Intelligence
Google Scholar
Frochte, J (2019) Maschinelles Lernen: Grundlagen und Algorithmen in Python. 2. Aufl. Hanser, München
Google Scholar
Fyfe C (2007) Hebbian learning and negative feedback networks. Advanced information and knowledge processing. Springer (Advanced Information and Knowledge Processing), Dordrecht. http://gbv.eblib.com/patron/FullRecord.aspx?p=371973
Hassabis, D (2014) Deepmind artificial intelligence @ FDOT14. https://www.youtube.com/watch?v=EfGD2qveGdQ
Hebb D (1949) The Organization of Behavior, John Wiley & Sons, New York.
Google Scholar
Kandel E (2009) Auf der Suche nach dem Gedächtnis. Die Entstehung einer neuen Wissenschaft des Geistes. Taschenbuchausg. 4. Aufl. Goldmann, München (Goldmann, 15570)
Google Scholar
Kavukcuoglu K, Minh V, Silver D (2015) Human-level control through deep reinforcement learning. Nature. https://web.stanford.edu/class/psych209/Readings/MnihEtAlHassibis15NatureControlDeepRL.pdf
Ribeiro MT, Singh S, Guestrin C (2016) „Why should I trust you?“ Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. S 1135–1144. https://arxiv.org/abs/1602.04938
Schulman J, Levine S, Abbeel P, Jordan M, Moritz P (2015) „Trust Region Policy Optimization“ Proceedings of the 32nd international conference on machine learning, PMLR 37:1889–1897
Google Scholar
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) „Proximal policy optimization algorithms.“ https://arxiv.org/abs/1707.06347v2
Sutton RS, Barto A (2018) Reinforcement learning. An introduction. 2., Aufl., The MIT Press (Adaptive computation and machine learning), Cambridge, MA, London
Google Scholar
Turing A, On Computable Numbers, with an Application to the Entscheidungsproblem (1937) Proceedings of the London Mathematical Society. Band 42, ISSN 0024-6115, S 230–265. https://londmathsoc.onlinelibrary.wiley.com/doi/abs/10.1112/plms/s2-42.1.230 (Oxford Journals)

Download references

Author information

Authors and Affiliations

Neckargemünd, Baden-Württemberg, Deutschland
Uwe Lorenz

Authors

Uwe Lorenz
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Uwe Lorenz .

5.1 Elektronisches Zusatzmaterial

Zusatzmaterial 1 (ZIP 19577 kb)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Lorenz, U. (2024). Schätzer für Zustandsbewertung und Aktionsauswahl. In: Reinforcement Learning. Springer Vieweg, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-68311-8_5

Download citation

DOI: https://doi.org/10.1007/978-3-662-68311-8_5
Published: 05 April 2024
Publisher Name: Springer Vieweg, Berlin, Heidelberg
Print ISBN: 978-3-662-68310-1
Online ISBN: 978-3-662-68311-8
eBook Packages: Computer Science and Engineering (German Language)

Publish with us

Policies and ethics