Skip to main content

Machine Learning with Neural Networks

  • Chapter
  • First Online:
Human-Centered Artificial Intelligence (ACAI 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13500))

Included in the following conference series:

  • 1265 Accesses

Abstract

Artificial neural networks provide a distributed computing technology that can be trained to approximate any computable function, and have enabled substantial advances in areas such as computer vision, robotics, speech recognition and natural language processing. This chapter provides an introduction to Artificial Neural Networks, with a review of the early history of perceptron learning. It presents a mathematical notation for multi-layer neural networks and shows how such networks can be iteratively trained by back-propagation of errors using labeled training data. It derives the back-propagation algorithm as a distributed form of gradient descent that can be scaled to train arbitrarily large networks given sufficient data and computing power.

Learning Objectives: This chapter provides an introduction to the training and use of Artificial Neural Networks, and prepares students to understand fundamental concepts of deep-learning that are described and used in later chapters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Abbreviations

x d :

A feature. An observed or measured value.

\(\vec{X}\) :

A vector of D features.

\(D\) :

The number of dimensions for the vector \(\vec{X}\)

\(y\) :

A dependent variable to be estimated.

\(a = f(\vec{X};\vec{\omega },b)\) :

A model that predicts an activation \(a\) from a vector \(\vec{X}\).

\(\vec{w},b\) :

The parameters of the model \(f(\vec{X};\vec{\omega },b)\).

\(\{ \vec{X}_{m} \}\) \(\{ y_{m} \}\) :

A set of training samples with indicator values.

\(\vec{X}_{m}\) :

A feature vector from a training set.

\(y_{m}\) :

An indicator or target value for \(\vec{X}_{m}\).

M :

The number of training samples in the set \(\{ \vec{X}_{m} \}\).

\(a_{m} = f(\vec{X}_{m} ;\vec{\omega },b)\) :

The output activation for a training sample, \(\vec{X}_{m}\).

\(\delta_{m}^{out} = a_{m} - y_{m}\) :

The error from using \(\vec{\omega }\) and \(b\) to compute \(y_{m}\) from \(\vec{X}_{m}\).

\(C(\vec{X}_{m} ,y_{m} ;\vec{\omega },b)\) :

The cost (or loss) for computing \(a_{m}\) from \(\vec{X}_{m}\) using \((\vec{\omega },b)\).

\(\vec{\nabla }C(\vec{X}_{m} ,y_{m} ;\vec{\omega },b)\) :

The gradient (vector derivative) of the cost.

\(L\) :

The number of layers in a neural network.

\(l\) :

An index for the \(l^{th}\) layer in a network \(1 \le l \le L\).

\(a_{j}^{(l)}\) :

The activation output of the jth neuron of the lth layer.

\(w_{ij}^{(l)}\) :

The weight from unit i of layer l–1 to the unit j of layer l.

\(b_{j}^{(l)}\) :

The bias for unit j of layer l.

\(\eta\) :

A variable learning rate. Typically very small (0.01).

\(\delta_{j,m}^{(l)}\) :

Error for the jth neuron of layer l, from sample \(\vec{X}_{m}\).

\(\Delta w_{ij,m}^{(l)} = - a_{i}^{(l - 1)} \delta_{j,m}^{(l)}\) :

Update for the weight from unit i of layer l–1 to the unit j layer l.

\(\Delta b_{j,m}^{(l)} = {-}\delta_{j,m}^{(l)}\) :

Update for bias for unit j of layer l.

References

  1. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, New York (1973)

    MATH  Google Scholar 

  2. Cristianini, N., Shawe-Taylor, J.: An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge university press, Cambridge (2000)

    Google Scholar 

  3. Bishop, C.M., Nasrabadi, N.M.: Pattern Recognition and Machine Learning. Springer, New York (2006). https://doi.org/10.1007/978-0-387-45528-0

  4. van Engelen, J.E., Hoos, H.H.: A survey on semi-supervised learning. Mach. Learn. 109(2), 373–440 (2019). https://doi.org/10.1007/s10994-019-05855-6

    Article  MathSciNet  MATH  Google Scholar 

  5. Dempster, A.P., Laird, N.M., Rubin, D.B.: Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Stat. Soc.: Ser. B (Methodol.) 39(1), 1–22 (1977)

    MathSciNet  MATH  Google Scholar 

  6. Sutton, R.S., Barto, A.G.: Reinforcement learning. J. Cogn. Neurosci. 11(1), 126–134 (1999)

    Google Scholar 

  7. Weiss, K., Khoshgoftaar, T.M., Wang, D.: A survey of transfer learning. J. Big Data 3(9) (2016)

    Google Scholar 

  8. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv preprint arXiv:1810.04805

  9. McCulloch, W.S., Pitts, W.: A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943)

    Article  MathSciNet  MATH  Google Scholar 

  10. Rosenblatt, F.: The Perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386–408 (1958)

    Article  Google Scholar 

  11. Minsky, M., Papert, S.: Perceptrons: An Introduction to Computational Geometry. MIT Press, Cambridge (1969)

    MATH  Google Scholar 

  12. Fukushima, K.: Cognitron: a self-organizing multilayered neural network. Biol. Cybern. 20, 121–136 (1975)

    Article  Google Scholar 

  13. Werbos, P.J.: Backpropagation through time: what it does and how to do it. Proc. IEEE 78(10), 1550–1560 (1990)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to James L. Crowley .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Crowley, J.L. (2023). Machine Learning with Neural Networks. In: Chetouani, M., Dignum, V., Lukowicz, P., Sierra, C. (eds) Human-Centered Artificial Intelligence. ACAI 2021. Lecture Notes in Computer Science(), vol 13500. Springer, Cham. https://doi.org/10.1007/978-3-031-24349-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-24349-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-24348-6

  • Online ISBN: 978-3-031-24349-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics