Abstract
This chapter introduces the first deep learning architecture of the book, convolutional neural networks. It starts with redefining the way a logistic regression accepts data, and defines 1D and 2D convolutional layers as a natural extension of the logistic regression. The chapter also details on how to connect the layers and dimensionality problems. The local receptive field is introduced as a core concept of any convolutional architecture and the connections with the vanishing gradient problem is explored. Also the idea of padding is introduced in the visual setting, as well as the stride of the local receptive field. Pooling is also explored in the general setting and as max-pooling. A complete convolutional neural network for classifying MNIST is then presented in Keras code, and all the details of the code are presented as comments and illustrations. The final section of the chapter presents modifications needed to adapt convolutional networks, which are primarily visual classificators, to work with text and language.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Yann LeCun once told in an interview that he prefers the name ‘convolutional network’ rather than ‘convolutional neural network’.
- 2.
An image in this sense is any 2D array with values between 0 and 255. In Fig. 6.1 we have numbered the positions, and you may think of them as ‘cell numbers’, in the sense that they will contain some value, but the number on the image denotes only their order. In addition, note that if we have e.g. 100 by 100 RGB images, each image would be a 3D array (tensor) with dimensions (100, 100, 3). The last dimension of the array would hold the three channels, red, green and blue.
- 3.
Here you might notice how important is weight initialization. We do have some techniques that are better than random initialization, but to find a good weight initialization strategy is an important open research problem.
- 4.
If using padding we will keep the same size, but still expand the depth. Padding is useful when there is possibly important information on the edges of the image.
- 5.
You have everything you need in this book to get the array (tensor) with the feature maps, and even to squash it to 2D, but you might have to search the Internet to find out how to visualize the tensor as an image. Consider it a good (but advanced) Python exercise.
- 6.
If it has 100 neurons per layer, with only one output neuron, that makes the total of \(784\cdot 100 + 100\cdot 100+ 100\cdot 100 + 100\cdot 1 = 98500\) parameters, and that is without the biases!.
- 7.
Which is, mathematically speaking, a tensor.
- 8.
Remember how we can convert a 28 by 28 matrix into a 784-dimensional vector.
- 9.
Keras calls them ‘Dense’.
- 10.
Trivially, every paper will have a ‘trickiest part’, and it is your job to learn how to decode this part, since it is often the most important part of the paper.
- 11.
Since the whole alphabet will not fit on a page, but you can easily imagine how it will expand to the normal English alphabet.
- 12.
A couple of hours each day—not a literal week.
References
Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
D.H. Hubel, T.N. Wiesel, Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195(1), 215–243 (1968)
X. Zhang, J. Zhao, Y. LeCun, Character-level convolutional networks for text classification, in Advances in Neural Information Processing Systems 28, NIPS (2015)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Skansi, S. (2018). Convolutional Neural Networks. In: Introduction to Deep Learning. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-73004-2_6
Download citation
DOI: https://doi.org/10.1007/978-3-319-73004-2_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-73003-5
Online ISBN: 978-3-319-73004-2
eBook Packages: Computer ScienceComputer Science (R0)