Machine Learning with Neural Networks

Crowley, James L.

doi:10.1007/978-3-031-24349-3_3

James L. Crowley¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13500))

Included in the following conference series:

ECCAI Advanced Course on Artificial Intelligence

1265 Accesses

Abstract

Artificial neural networks provide a distributed computing technology that can be trained to approximate any computable function, and have enabled substantial advances in areas such as computer vision, robotics, speech recognition and natural language processing. This chapter provides an introduction to Artificial Neural Networks, with a review of the early history of perceptron learning. It presents a mathematical notation for multi-layer neural networks and shows how such networks can be iteratively trained by back-propagation of errors using labeled training data. It derives the back-propagation algorithm as a distributed form of gradient descent that can be scaled to train arbitrarily large networks given sufficient data and computing power.

Learning Objectives: This chapter provides an introduction to the training and use of Artificial Neural Networks, and prepares students to understand fundamental concepts of deep-learning that are described and used in later chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

x _d :: A feature. An observed or measured value.
\(\vec{X}\) :: A vector of D features.
\(D\) :: The number of dimensions for the vector \(\vec{X}\)
\(y\) :: A dependent variable to be estimated.
\(a = f(\vec{X};\vec{\omega },b)\) :: A model that predicts an activation \(a\) from a vector \(\vec{X}\).
\(\vec{w},b\) :: The parameters of the model \(f(\vec{X};\vec{\omega },b)\).
\(\{ \vec{X}_{m} \}\) \(\{ y_{m} \}\) :: A set of training samples with indicator values.
\(\vec{X}_{m}\) :: A feature vector from a training set.
\(y_{m}\) :: An indicator or target value for \(\vec{X}_{m}\).
M :: The number of training samples in the set \(\{ \vec{X}_{m} \}\).
\(a_{m} = f(\vec{X}_{m} ;\vec{\omega },b)\) :: The output activation for a training sample, \(\vec{X}_{m}\).
\(\delta_{m}^{out} = a_{m} - y_{m}\) :: The error from using \(\vec{\omega }\) and \(b\) to compute \(y_{m}\) from \(\vec{X}_{m}\).
\(C(\vec{X}_{m} ,y_{m} ;\vec{\omega },b)\) :: The cost (or loss) for computing \(a_{m}\) from \(\vec{X}_{m}\) using \((\vec{\omega },b)\).
\(\vec{\nabla }C(\vec{X}_{m} ,y_{m} ;\vec{\omega },b)\) :: The gradient (vector derivative) of the cost.
\(L\) :: The number of layers in a neural network.
\(l\) :: An index for the \(l^{th}\) layer in a network \(1 \le l \le L\).
\(a_{j}^{(l)}\) :: The activation output of the j^th neuron of the l^th layer.
\(w_{ij}^{(l)}\) :: The weight from unit i of layer l–1 to the unit j of layer l.
\(b_{j}^{(l)}\) :: The bias for unit j of layer l.
\(\eta\) :: A variable learning rate. Typically very small (0.01).
\(\delta_{j,m}^{(l)}\) :: Error for the j^th neuron of layer l, from sample \(\vec{X}_{m}\).
\(\Delta w_{ij,m}^{(l)} = - a_{i}^{(l - 1)} \delta_{j,m}^{(l)}\) :: Update for the weight from unit i of layer l–1 to the unit j layer l.
\(\Delta b_{j,m}^{(l)} = {-}\delta_{j,m}^{(l)}\) :: Update for bias for unit j of layer l.

References

Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
MATH Google Scholar
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge university press, Cambridge (2000)
Google Scholar
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). https://doi.org/10.1007/978-0-387-45528-0
van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2019). https://doi.org/10.1007/s10994-019-05855-6
Article MathSciNet MATH Google Scholar
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977)
MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning. J. Cogn. Neurosci. 11(1), 126–134 (1999)
Google Scholar
Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(9) (2016)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)
Article MathSciNet MATH Google Scholar
Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)
Article Google Scholar
Minsky, M., Papert, S.: Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge (1969)
MATH Google Scholar
Fukushima, K.: Cognitron: a self-organizing multilayered neural network. Biol. Cybern. 20, 121–136 (1975)
Article Google Scholar
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institut Polytechnique de Grenoble, Univ. Grenoble Alpes, Grenoble, France
James L. Crowley

Authors

James L. Crowley
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to James L. Crowley .

Editor information

Editors and Affiliations

Sorbonne University, Paris, France
Mohamed Chetouani
Umeå University, Umeå, Sweden
Virginia Dignum
German Research Centre for Artificial Inteligence, Kaiserslautern, Germany
Paul Lukowicz
Artificial Intelligence Research Institute (IIIA-CSIC), Bellaterra, Spain
Carles Sierra

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Crowley, J.L. (2023). Machine Learning with Neural Networks. In: Chetouani, M., Dignum, V., Lukowicz, P., Sierra, C. (eds) Human-Centered Artificial Intelligence. ACAI 2021. Lecture Notes in Computer Science(), vol 13500. Springer, Cham. https://doi.org/10.1007/978-3-031-24349-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-24349-3_3
Published: 04 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24348-6
Online ISBN: 978-3-031-24349-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics