Skip to main content

Convolutional Neural Networks

  • Chapter
  • First Online:
Introduction to Deep Learning

Abstract

This chapter introduces the first deep learning architecture of the book, convolutional neural networks. It starts with redefining the way a logistic regression accepts data, and defines 1D and 2D convolutional layers as a natural extension of the logistic regression. The chapter also details on how to connect the layers and dimensionality problems. The local receptive field is introduced as a core concept of any convolutional architecture and the connections with the vanishing gradient problem is explored. Also the idea of padding is introduced in the visual setting, as well as the stride of the local receptive field. Pooling is also explored in the general setting and as max-pooling. A complete convolutional neural network for classifying MNIST is then presented in Keras code, and all the details of the code are presented as comments and illustrations. The final section of the chapter presents modifications needed to adapt convolutional networks, which are primarily visual classificators, to work with text and language.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Yann LeCun once told in an interview that he prefers the name ‘convolutional network’ rather than ‘convolutional neural network’.

  2. 2.

    An image in this sense is any 2D array with values between 0 and 255. In Fig. 6.1 we have numbered the positions, and you may think of them as ‘cell numbers’, in the sense that they will contain some value, but the number on the image denotes only their order. In addition, note that if we have e.g. 100 by 100 RGB images, each image would be a 3D array (tensor) with dimensions (100, 100, 3). The last dimension of the array would hold the three channels, red, green and blue.

  3. 3.

    Here you might notice how important is weight initialization. We do have some techniques that are better than random initialization, but to find a good weight initialization strategy is an important open research problem.

  4. 4.

    If using padding we will keep the same size, but still expand the depth. Padding is useful when there is possibly important information on the edges of the image.

  5. 5.

    You have everything you need in this book to get the array (tensor) with the feature maps, and even to squash it to 2D, but you might have to search the Internet to find out how to visualize the tensor as an image. Consider it a good (but advanced) Python exercise.

  6. 6.

    If it has 100 neurons per layer, with only one output neuron, that makes the total of \(784\cdot 100 + 100\cdot 100+ 100\cdot 100 + 100\cdot 1 = 98500\) parameters, and that is without the biases!.

  7. 7.

    Which is, mathematically speaking, a tensor.

  8. 8.

    Remember how we can convert a 28 by 28 matrix into a 784-dimensional vector.

  9. 9.

    Keras calls them ‘Dense’.

  10. 10.

    Trivially, every paper will have a ‘trickiest part’, and it is your job to learn how to decode this part, since it is often the most important part of the paper.

  11. 11.

    Since the whole alphabet will not fit on a page, but you can easily imagine how it will expand to the normal English alphabet.

  12. 12.

    A couple of hours each day—not a literal week.

References

  1. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  2. D.H. Hubel, T.N. Wiesel, Receptive fields and functional architecture of monkey striate cortex. J. Physiol. 195(1), 215–243 (1968)

    Article  Google Scholar 

  3. X. Zhang, J. Zhao, Y. LeCun, Character-level convolutional networks for text classification, in Advances in Neural Information Processing Systems 28, NIPS (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sandro Skansi .

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Skansi, S. (2018). Convolutional Neural Networks. In: Introduction to Deep Learning. Undergraduate Topics in Computer Science. Springer, Cham. https://doi.org/10.1007/978-3-319-73004-2_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-73004-2_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-73003-5

  • Online ISBN: 978-3-319-73004-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics