Gradientenbasiertes Lernen

Jung, Alexander

doi:10.1007/978-981-99-7972-1_5

Alexander Jung²

1646 Accesses

Zusammenfassung

Im Folgenden betrachten wir ML-Methoden, die einen parametrisierten Hypothesenraum verwenden \(\mathcal {H}\). Jede Hypothese \(h^{(\mathbf{w})} \in \mathcal {H}\) in diesem Raum ist durch einen spezifischen Gewichtsvektor charakterisiert \(\mathbf{w}\in \mathbb {R}^{n}\). Darüber hinaus betrachten wir ML-Methoden, die eine Verlustfunktion verwenden \(L({(\mathbf{x},y)},{h^{(\mathbf{w})}})\) so dass der durchschnittliche Verlust oder das empirische Risiko \( f(\mathbf{w}) :=(1/m) \sum _{i=1}^{m} L({(\mathbf{x}^{(i)},y^{(i)})},{h^{(\mathbf{w})}})\) reibungslos vom Gewichtsvektor abhängt \(\mathbf{w}\).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Hardcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Eine Funktion \(f: \mathbb {R}^{n} \rightarrow \mathbb {R}\) wird als glatt bezeichnet, wenn sie stetige partielle Ableitungen aller Ordnungen hat. Insbesondere können wir den Gradienten \(\nabla f(\mathbf{w})\) für eine glatte Funktion \(f(\mathbf{w})\) an jedem Punkt \(\mathbf{w}\) definieren.
2.
Das Problem, eine vollständige Eigenwertzerlegung von \(\mathbf{X}^{T} \mathbf{X}\) zu berechnen, hat im Wesentlichen die gleiche Komplexität wie die empirische Risikominimierung durch direktes Lösen von (4.11), was wir durch die Verwendung des „günstigeren“ Algorithmus 1 vermeiden wollen.

Literatur

I. Goodfellow, Y. Bengio, A. Courville, Deep Learning (MIT Press, Cambridge, 2016)
Google Scholar
W. Rudin, Principles of Mathematical Analysis, 3. Aufl. (McGraw-Hill, New York, 1976)
Google Scholar
Y. Nesterov, Introductory lectures on convex optimization, Applied Optimization, Bd. 87. (Kluwer Academic Publishers, Boston, MA, 2004)
Google Scholar
S. Oymak, B. Recht, M. Soltanolkotabi, Sharp time-data tradeoffs for linear inverse problems. IEEE Trans. Inf. Theory 64(6), 4129–4158 (2018). (June)
Article MathSciNet Google Scholar
T. Hastie, R. Tibshirani, J. Friedman, The Elements of Statistical Learning Springer Series in Statistics. (Springer, New York, 2001)
Google Scholar
A. Jung, A fixed-point of view on gradient methods for big data. Frontiers in Applied Mathematics and Statistics 3, 1–11 (2017)
Article Google Scholar
N. Murata, A statistical study on on-line learning, in On-line Learning in Neural Networks. Hrsg. by D. Saad (Cambridge University Press, New York, 1998), S. 63–92
Google Scholar
A. Krizhevsky, I. Sutskever, G. Hinton, Imagenet classification with deep convolutional neural network, in Neural Information Processing Systems (NIPS, 2012)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Aalto University, Espoo, Finland
Alexander Jung

Authors

Alexander Jung
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander Jung .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Jung, A. (2024). Gradientenbasiertes Lernen. In: Maschinelles Lernen. Springer, Singapore. https://doi.org/10.1007/978-981-99-7972-1_5

Download citation

DOI: https://doi.org/10.1007/978-981-99-7972-1_5
Published: 14 February 2024
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-7971-4
Online ISBN: 978-981-99-7972-1
eBook Packages: Computer Science and Engineering (German Language)

Publish with us

Policies and ethics