Abstract
Artificial neural networks provide a distributed computing technology that can be trained to approximate any computable function, and have enabled substantial advances in areas such as computer vision, robotics, speech recognition and natural language processing. This chapter provides an introduction to Artificial Neural Networks, with a review of the early history of perceptron learning. It presents a mathematical notation for multi-layer neural networks and shows how such networks can be iteratively trained by back-propagation of errors using labeled training data. It derives the back-propagation algorithm as a distributed form of gradient descent that can be scaled to train arbitrarily large networks given sufficient data and computing power.
Learning Objectives: This chapter provides an introduction to the training and use of Artificial Neural Networks, and prepares students to understand fundamental concepts of deep-learning that are described and used in later chapters.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Abbreviations
- x d :
-
A feature. An observed or measured value.
- \(\vec{X}\) :
-
A vector of D features.
- \(D\) :
-
The number of dimensions for the vector \(\vec{X}\)
- \(y\) :
-
A dependent variable to be estimated.
- \(a = f(\vec{X};\vec{\omega },b)\) :
-
A model that predicts an activation \(a\) from a vector \(\vec{X}\).
- \(\vec{w},b\) :
-
The parameters of the model \(f(\vec{X};\vec{\omega },b)\).
- \(\{ \vec{X}_{m} \}\) \(\{ y_{m} \}\) :
-
A set of training samples with indicator values.
- \(\vec{X}_{m}\) :
-
A feature vector from a training set.
- \(y_{m}\) :
-
An indicator or target value for \(\vec{X}_{m}\).
- M :
-
The number of training samples in the set \(\{ \vec{X}_{m} \}\).
- \(a_{m} = f(\vec{X}_{m} ;\vec{\omega },b)\) :
-
The output activation for a training sample, \(\vec{X}_{m}\).
- \(\delta_{m}^{out} = a_{m} - y_{m}\) :
-
The error from using \(\vec{\omega }\) and \(b\) to compute \(y_{m}\) from \(\vec{X}_{m}\).
- \(C(\vec{X}_{m} ,y_{m} ;\vec{\omega },b)\) :
-
The cost (or loss) for computing \(a_{m}\) from \(\vec{X}_{m}\) using \((\vec{\omega },b)\).
- \(\vec{\nabla }C(\vec{X}_{m} ,y_{m} ;\vec{\omega },b)\) :
-
The gradient (vector derivative) of the cost.
- \(L\) :
-
The number of layers in a neural network.
- \(l\) :
-
An index for the \(l^{th}\) layer in a network \(1 \le l \le L\).
- \(a_{j}^{(l)}\) :
-
The activation output of the jth neuron of the lth layer.
- \(w_{ij}^{(l)}\) :
-
The weight from unit i of layer l–1 to the unit j of layer l.
- \(b_{j}^{(l)}\) :
-
The bias for unit j of layer l.
- \(\eta\) :
-
A variable learning rate. Typically very small (0.01).
- \(\delta_{j,m}^{(l)}\) :
-
Error for the jth neuron of layer l, from sample \(\vec{X}_{m}\).
- \(\Delta w_{ij,m}^{(l)} = - a_{i}^{(l - 1)} \delta_{j,m}^{(l)}\) :
-
Update for the weight from unit i of layer l–1 to the unit j layer l.
- \(\Delta b_{j,m}^{(l)} = {-}\delta_{j,m}^{(l)}\) :
-
Update for bias for unit j of layer l.
References
Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)
Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge university press, Cambridge (2000)
Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). https://doi.org/10.1007/978-0-387-45528-0
van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2019). https://doi.org/10.1007/s10994-019-05855-6
Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977)
Sutton, R.S., Barto, A.G.: Reinforcement learning. J. Cogn. Neurosci. 11(1), 126–134 (1999)
Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(9) (2016)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805
McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)
Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)
Minsky, M., Papert, S.: Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge (1969)
Fukushima, K.: Cognitron: a self-organizing multilayered neural network. Biol. Cybern. 20, 121–136 (1975)
Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Crowley, J.L. (2023). Machine Learning with Neural Networks. In: Chetouani, M., Dignum, V., Lukowicz, P., Sierra, C. (eds) Human-Centered Artificial Intelligence. ACAI 2021. Lecture Notes in Computer Science(), vol 13500. Springer, Cham. https://doi.org/10.1007/978-3-031-24349-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-24349-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-24348-6
Online ISBN: 978-3-031-24349-3
eBook Packages: Computer ScienceComputer Science (R0)