Modifications and Extensions to a Feed-Forward Neural Network

Skansi, Sandro

doi:10.1007/978-3-319-73004-2_5

Sandro Skansi ORCID: orcid.org/0000-0002-3851-1186¹¹

Part of the book series: Undergraduate Topics in Computer Science ((UTICS))

401k Accesses
1 Citations

Abstract

This chapter explores modifications and extensions to simple feed-forward neural networks, which can be applied to any other neural network. The problem of local minima as one of the main problems in machine learning is explored with all of its intricacies. The main strategy against local minima is the idea of regularization, by adding a regularization parameter when learning. Both L1 and L2 regularizations are explored and explained in detail. The chapter also addresses the idea of the learning rate and shows how to implement it in backpropagation, both in the static and dynamic setting. Momentum is also explored, as a technique which also helps against local minima by adding inertia to the gradient descent. This chapter also explores the stochastic gradient descent in the form of learning with batches and pure online learning. This chapter concludes with a final view on the vanishing and exploding gradient problems, setting the stage for deep learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
We will be using a modification of the explanation offered by [3]. Note that this book is available online at http://neuralnetworksanddeeplearning.com.
2.
We take the idea for this abstraction from Geoffrey Hinton’s courses.
3.
This is actually also a technique which is used to prevent overfitting called early stopping.
4.
You can use the learning rate to force a gradient explosion, so if you want to see gradient explosion for yourself try with an \(\eta \) value of 5 or 10.
5.
We have been clumsy around several things, and this section is intended to redefine them a bit to make them more precise.
6.
We could use also a non-random selection. One of the most interesting ideas here is that of learning the simplest instances first and then proceeding to the more tricky ones, and this approach is called curriculum learning. For more on this see [13].
7.
This is similar to reinforcement learning, which is, along with supervised and unsupervised learning one of the three main areas of machine learning, but we have decided against including it in this volume, since it falls outside of the the idea of a first introduction to deep learning. If the reader wishes to learn more, we refer her to [14].
8.
Suppose for the sake of clarification it is non-randomly divided: the first batch contains training samples 1 to 1000, the second 1001 to 2000, etc.
9.
A single hidden layer with two neurons in it. It it was (3, 2, 4, 1) we would know it has two hidden layer, the first one with two neurons and the second one with four.
10.
Ok, we have used the adjusted the values to make this statement true. Several of the derivatives we need will become a value between 0 and 1 soon, but it the sigmoid derivatives are mathematically bound between 0 and 1, and if we have many layers (e.g. 8), the sigmoid derivatives would dominate backpropagation.
11.
If the regular approach was something like making a clay statue (removing clay, but sometimes adding), the intuition behind initializing the weights to large values would be taking a block of stone or wood and start chipping away pieces.

References

A.N. Tikhonov, On the stability of inverse problems. Dokl. Akad. Nauk SSSR 39(5), 195–198 (1943)
MathSciNet Google Scholar
A.N. Tikhonov, Solution of incorrectly formulated problems and the regularization method. Sov. Math. 4, 1035–1038 (1963)
MATH Google Scholar
M.A. Nielsen, Neural Networks and Deep Learning (Determination Press, 2015)
Google Scholar
R. Tibshirani, Regression shrinkage and selection via the lasso. J. Roy. Stat. Soc. Ser B (Methodol.) 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
A. Ng, Feature selection, L1 versus L2 regularization, and rotational invariance, in Proceedings of the International Conference on Machine Learning (2004)
Google Scholar
D.L. Donoho, Compressed sensing. IEEE Trans. Inf. Theory 52(4), 1289–1306 (2006)
Article MathSciNet Google Scholar
E.J. Candes, J. Romberg, T. Tao, Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inf. Theory 52(2), 489–509 (2006)
Article MathSciNet Google Scholar
J. Wen, J.L. Zhao, S.W. Luo, Z. Han, The improvements of BP neural network learning algorithm, in Proceedings of 5th International Conference on Signal Processing (IEEE Press, 2000), pp. 1647–1649
Google Scholar
D.E. Rumelhart, G.E. Hinton, R.J. Williams, Learning internal representations by error propagation. Parallel Distrib. Process. 1, 318–362 (1986)
Google Scholar
G.E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Improving neural networks by preventing co-adaptation of feature detectors (2012)
Google Scholar
G.E. Dahl, T.N. Sainath, G.E. Hinton, Improving deep neural networks for LVCSR using rectified linear units and dropout, in IEEE International Conference on Acoustic Speech and Signal Processing (IEEE Press, 2013), pp. 8609–8613
Google Scholar
N. Srivastava, G.E. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
MathSciNet MATH Google Scholar
Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, in Proceedings of the 26th Annual International Conference on Machine Learning, ICML 2009, New York, NY, USA, (ACM, 2009), pp. 41–48
Google Scholar
R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998)
MATH Google Scholar
S. Hochreiter, Untersuchungen zu dynamischen neuronalen Netzen, Diploma thesis, Technische Universität Munich, 1991
Google Scholar
S. Hochreiter, J. Schmidhuber, Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
S. Hochreiter, Y. Bengio, P. Frasconi, J. Schmidhuber, Gradient flow in recurrent nets: the difficulty of learning long-term dependencies, in A Field Guide to Dynamical Recurrent Neural Networks, ed. by S.C. Kremer, J.F. Kolen (IEEE Press, 2001)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Zagreb, Zagreb, Croatia
Sandro Skansi

Authors

Sandro Skansi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandro Skansi .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Skansi, S. (2018). Modifications and Extensions to a Feed-Forward Neural Network. In: Introduction to Deep Learning. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-73004-2_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-73004-2_5
Published: 06 February 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73003-5
Online ISBN: 978-3-319-73004-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics