Skip to main content

Modifications and Extensions to a Feed-Forward Neural Network

  • Chapter
  • First Online:

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

Abstract

This chapter explores modifications and extensions to simple feed-forward neural networks, which can be applied to any other neural network. The problem of local minima as one of the main problems in machine learning is explored with all of its intricacies. The main strategy against local minima is the idea of regularization, by adding a regularization parameter when learning. Both L1 and L2 regularizations are explored and explained in detail. The chapter also addresses the idea of the learning rate and shows how to implement it in backpropagation, both in the static and dynamic setting. Momentum is also explored, as a technique which also helps against local minima by adding inertia to the gradient descent. This chapter also explores the stochastic gradient descent in the form of learning with batches and pure online learning. This chapter concludes with a final view on the vanishing and exploding gradient problems, setting the stage for deep learning.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We will be using a modification of the explanation offered by [3]. Note that this book is available online at http://neuralnetworksanddeeplearning.com.

  2. 2.

    We take the idea for this abstraction from Geoffrey Hinton’s courses.

  3. 3.

    This is actually also a technique which is used to prevent overfitting called early stopping.

  4. 4.

    You can use the learning rate to force a gradient explosion, so if you want to see gradient explosion for yourself try with an \(\eta \) value of 5 or 10.

  5. 5.

    We have been clumsy around several things, and this section is intended to redefine them a bit to make them more precise.

  6. 6.

    We could use also a non-random selection. One of the most interesting ideas here is that of learning the simplest instances first and then proceeding to the more tricky ones, and this approach is called curriculum learning. For more on this see [13].

  7. 7.

    This is similar to reinforcement learning, which is, along with supervised and unsupervised learning one of the three main areas of machine learning, but we have decided against including it in this volume, since it falls outside of the the idea of a first introduction to deep learning. If the reader wishes to learn more, we refer her to [14].

  8. 8.

    Suppose for the sake of clarification it is non-randomly divided: the first batch contains training samples 1 to 1000, the second 1001 to 2000, etc.

  9. 9.

    A single hidden layer with two neurons in it. It it was (3, 2, 4, 1) we would know it has two hidden layer, the first one with two neurons and the second one with four.

  10. 10.

    Ok, we have used the adjusted the values to make this statement true. Several of the derivatives we need will become a value between 0 and 1 soon, but it the sigmoid derivatives are mathematically bound between 0 and 1, and if we have many layers (e.g. 8), the sigmoid derivatives would dominate backpropagation.

  11. 11.

    If the regular approach was something like making a clay statue (removing clay, but sometimes adding), the intuition behind initializing the weights to large values would be taking a block of stone or wood and start chipping away pieces.

References

  1. A.N. Tikhonov, On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)

    MathSciNet  Google Scholar 

  2. A.N. Tikhonov, Solution of incorrectly formulated problems and the regularization method. Sov. Math. 4, 1035–1038 (1963)

    MATH  Google Scholar 

  3. M.A. Nielsen, Neural Networks and Deep Learning (Determination Press, 2015)

    Google Scholar 

  4. R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser B (Methodol.) 58(1), 267–288 (1996)

    MathSciNet  MATH  Google Scholar 

  5. A. Ng, Feature selection, L1 versus L2 regularization, and rotational invariance, in Proceedings of the International Conference on Machine Learning (2004)

    Google Scholar 

  6. D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  Google Scholar 

  7. E.J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)

    Article  MathSciNet  Google Scholar 

  8. J. Wen, J.L. Zhao, S.W. Luo, Z. Han, The improvements of BP neural network learning algorithm, in Proceedings of 5th International Conference on Signal Processing (IEEE Press, 2000), pp. 1647–1649

    Google Scholar 

  9. D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation. Parallel Distrib. Process. 1, 318–362 (1986)

    Google Scholar 

  10. G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors (2012)

    Google Scholar 

  11. G.E. Dahl, T.N. Sainath, G.E. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout, in IEEE International Conference on Acoustic Speech and Signal Processing (IEEE Press, 2013), pp. 8609–8613

    Google Scholar 

  12. N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  13. Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, in Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, New York, NY, USA, (ACM, 2009), pp. 41–48

    Google Scholar 

  14. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998)

    MATH  Google Scholar 

  15. S. Hochreiter, Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis, Technische Universität Munich, 1991

    Google Scholar 

  16. S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  17. S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, in A Field Guide to Dynamical Recurrent Neural Networks, ed. by S.C. Kremer, J.F. Kolen (IEEE Press, 2001)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandro Skansi .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Skansi, S. (2018). Modifications and Extensions to a Feed-Forward Neural Network. In: Introduction to Deep Learning. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-73004-2_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73004-2_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73003-5

  • Online ISBN: 978-3-319-73004-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics