Skip to main content

Multi-Layer Perceptrons

  • Chapter

Part of the book series: Texts in Computer Science ((TCS))

Abstract

Having described the structure, the operation and the training of (artificial) neural networks in a general fashion in the preceding chapter, we turn in this and the subsequent chapters to specific forms of (artificial) neural networks. We start with the best-known and most widely used form, the so-called multi-layer perceptron (MLP), which is closely related to the networks of threshold logic units we studied in a previous chapter. They exhibit a strictly layered structure and may employ other activation functions than a step at a crisp threshold.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   79.95
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Conservative logic is a mathematical model for computations and computational powers of computers, in which the fundamental physical principles that govern computing machines are explicitly taken into account. Among these principles are, for instance, that the speed with which information can travel as well as the amount of information that can be stored in the state of a finite system are both finite (Fredkin and Toffoli 1982).

  2. 2.

    In the following we assume implicitly that the output function of all neurons is the identity. Only the activation functions are exchanged.

  3. 3.

    Note that this approach is not easily transferred to functions with multiple arguments. For this to be possible, the influences of the two or more inputs have to be independent in a certain sense.

  4. 4.

    Note, however, that with this approach the sum of squared errors is minimized in the transformed space (coordinates x′=lnx and y′=lny), but this does not imply that it is also minimized in the original space (coordinates x and y). Nevertheless this approach usually yields very good results or at least an initial solution that may then be improved by other means.

  5. 5.

    Note again that with this procedure the sum of squared errors is minimized in the transformed space (coordinates x and \(z = \ln (\frac{Y-y}{y} )\)), but this does not imply that it is also minimized in the original space (coordinates x and y), cf. the preceding footnote.

  6. 6.

    Unless the output function is not differentiable. However, we usually assume (implicitly) that the output function is the identity and thus does not introduce any problems.

  7. 7.

    In order to avoid this factor right from the start, the error of an output neuron is sometimes defined as \(e_{u}^{(l)} = \frac{1}{2} (o_{u}^{(l)} - \operatorname{out}_{u}^{(l)} )^{2}\). In this way the factor 2 simply cancels in the derivation.

  8. 8.

    Note that the bias value θ u is already contained in the extended weight vector.

References

  • S.E. Fahlman. An Empirical Study of Learning Speed in Backpropagation Networks. In: Touretzky et al. (1988)

    Google Scholar 

  • E. Fredkin and T. Toffoli. Conservative Logic. International Journal of Theoretical Physics 21(3/4):219–253. Plenum Press, New York, NY, USA, 1982

    Article  MathSciNet  MATH  Google Scholar 

  • R.A. Jakobs. Increased Rates of Convergence Through Learning Rate Adaption. Neural Networks 1:295–307. Pergamon Press, Oxford, United Kingdom, 1988

    Article  Google Scholar 

  • A. Pinkus. Approximation Theory of the MLP Model in Neural Networks. Acta Numerica 8:143–196. Cambridge University Press, Cambridge, United Kingdom, 1999

    Article  MathSciNet  Google Scholar 

  • M. Riedmiller and H. Braun. Rprop—A Fast Adaptive Learning Algorithm. Technical Report, University of Karlsruhe, Karlsruhe, Germany, 1992

    Google Scholar 

  • M. Riedmiller and H. Braun. A Direct Adaptive Method for Faster Backpropagation Learning: The RPROP Algorithm. Int. Conf. on Neural Networks (ICNN-93, San Francisco, CA), 586–591. IEEE Press, Piscataway, NJ, USA, 1993

    Chapter  Google Scholar 

  • D.E. Rumelhart, G.E. Hinton and R.J. Williams. Learning Representations by Back-Propagating Errors. Nature 323:533–536, 1986

    Article  Google Scholar 

  • T. Tollenaere. SuperSAB: Fast Adaptive Backpropagation with Good Scaling Properties. Neural Networks 3:561–573, 1990

    Article  Google Scholar 

  • D. Touretzky, G. Hinton and T. Sejnowski (eds.) Proc. of the Connectionist Models Summer School (Carnegie Mellon University). Morgan Kaufman, San Mateo, CA, USA, 1988

    Google Scholar 

  • P.J. Werbos. Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences. Ph.D. Thesis, Harvard University, Cambridge, MA, USA, 1974

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag London

About this chapter

Cite this chapter

Kruse, R., Borgelt, C., Klawonn, F., Moewes, C., Steinbrecher, M., Held, P. (2013). Multi-Layer Perceptrons. In: Computational Intelligence. Texts in Computer Science. Springer, London. https://doi.org/10.1007/978-1-4471-5013-8_5

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-5013-8_5

  • Publisher Name: Springer, London

  • Print ISBN: 978-1-4471-5012-1

  • Online ISBN: 978-1-4471-5013-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics