Abstract
Convolutional neural networks are designed to work with grid-structured inputs, which have strong spatial dependencies in local regions of the grid. The most obvious example of grid-structured data is a 2-dimensional image. This type of data also exhibits spatial dependencies, because adjacent spatial locations in an image often have similar color values of the individual pixels. An additional dimension captures the different colors, which creates a 3-dimensional input volume. Therefore, the features in a convolutional neural network have dependencies among one another based on spatial distances.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Here, it is assumed that (L q − F q) is exactly divisible by S q in order to obtain a clean fit of the convolution filter with the original image. Otherwise, some ad hoc modifications are needed to handle edge effects. In general, this is not a desirable solution.
- 2.
In recent years, subsampling also refers to other operations that reduce the spatial footprint. Therefore, there is some difference between the classical usage of this term and modern usage.
- 3.
The top-5 error rate makes more sense in image data where a single image might contain objects of multiple classes. Throughout this chapter, we use the term “error rate” to refer to the top-5 error rate.
- 4.
- 5.
Personal communication from Matthew Zeiler.
- 6.
The original architecture also contained auxiliary classifiers, which have been ignored in recent years.
- 7.
Typically, a 3 × 3 filter is used at a stride/padding of 1. This trend started with the principles in VGG, and was adopted by ResNet.
- 8.
Under normal circumstances, one only backpropagates to hidden layers as an intermediate step to compute gradients with respect to incoming weights in that hidden layer. Therefore, backpropagation to input layer is never really needed in traditional training. However, backpropagation to the input layer is identical to that with respect to the hidden layers.
- 9.
Example available at http://deeplearning.net/software/theano/tutorial/conv_arithmetic.html.
Bibliography
A. Babenko, A. Slesarev, A. Chigorin, and V. Lempitsky. Neural codes for image retrieval. arXiv:1404.1777, 2014.https://arxiv.org/abs/1404.1777
M. Baccouche, F. Mamalet, C. Wolf, C. Garcia, and A. Baskurt. Sequential deep learning for human action recognition. International Workshop on Human Behavior Understanding, pp. 29–39, 2011.
N. Ballas, L. Yao, C. Pal, and A. Courville. Delving deeper into convolutional networks for learning video representations. arXiv:1511.06432, 2015.https://arxiv.org/abs/1511.06432
T. Brox and J. Malik. Large displacement optical flow: descriptor matching in variational motion estimation. IEEE TPAMI, 33(3), pp. 500–513, 2011.
A. Coates and A. Ng. Learning feature representations with k-means. Neural networks: Tricks of the Trade, Springer, pp. 561–580, 2012.
R. Collobert, J. Weston, L. Bottou, M. Karlen, K. Kavukcuoglu, and P. Kuksa. Natural language processing (almost) from scratch. Journal of Machine Learning Research, 12, pp. 2493–2537, 2011.
R. Collobert and J. Weston. A unified architecture for natural language processing: Deep neural networks with multitask learning. ICML Conference, pp. 160–167, 2008.
D. Cox and N. Pinto. Beyond simple features: A large-scale feature search approach to unconstrained face recognition. IEEE International Conference on Automatic Face and Gesture Recognition and Workshops, pp. 8–15, 2011.
N. Dalal and B. Triggs. Histograms of oriented gradients for human detection. Computer Vision and Pattern Recognition, pp. 886–893, 2005.
M. Defferrard, X. Bresson, and P. Vandergheynst. Convolutional neural networks on graphs with fast localized spectral filtering. NIPS Conference, pp. 3844–3852, 2016.
J. Donahue, L. Anne Hendricks, S. Guadarrama, M. Rohrbach, S. Venugopalan, K. Saenko, and T. Darrell. Long-term recurrent convolutional networks for visual recognition and description. IEEE conference on computer vision and pattern recognition, pp. 2625–2634, 2015.
C. Dos Santos and M. Gatti. Deep Convolutional Neural Networks for Sentiment Analysis of Short Texts. COLING, pp. 69–78, 2014.
A. Dosovitskiy and T. Brox. Inverting visual representations with convolutional networks. CVPR Conference, pp. 4829–4837, 2016.
V. Dumoulin and F. Visin. A guide to convolution arithmetic for deep learning. arXiv:1603.07285, 2016.https://arxiv.org/abs/1603.07285
P. Felzenszwalb, R. Girshick, D. McAllester, and D. Ramanan. Object detection with discriminatively trained part-based models. IEEE TPAMI, 32(9), pp. 1627–1645, 2010.
K. Fukushima. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), pp. 193–202, 1980.
H. Gao, H. Yuan, Z. Wang, and S. Ji. Pixel Deconvolutional Networks. arXiv:1705.06820, 2017.https://arxiv.org/abs/1705.06820
L. Gatys, A. S. Ecker, and M. Bethge. Texture synthesis using convolutional neural networks. NIPS Conference, pp. 262–270, 2015.
L. Gatys, A. Ecker, and M. Bethge. Image style transfer using convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2414–2423, 2015.
K. Greff, R. K. Srivastava, and J. Schmidhuber. Highway and residual networks learn unrolled iterative estimation. arXiv:1612.07771, 2016.https://arxiv.org/abs/1612.07771
R. Girshick, F. Iandola, T. Darrell, and J. Malik. Deformable part models are convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 437–446, 2015.
B. Hariharan, P. Arbelaez, R. Girshick, and J. Malik. Simultaneous detection and segmentation. arXiv:1407.1808, 2014.https://arxiv.org/abs/1407.1808
D. Hassabis, D. Kumaran, C. Summerfield, and M. Botvinick. Neuroscience-inspired artificial intelligence. Neuron, 95(2), pp. 245–258, 2017.
M. Havaei et al. Brain tumor segmentation with deep neural networks. Medical Image Analysis, 35, pp. 18–31, 2017.
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778, 2016.
K. He, X. Zhang, S. Ren, and J. Sun. Identity mappings in deep residual networks. European Conference on Computer Vision, pp. 630–645, 2016.
M. Henaff, J. Bruna, and Y. LeCun. Deep convolutional networks on graph-structured data. arXiv:1506.05163, 2015.https://arxiv.org/abs/1506.05163
G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Weinberger. Deep networks with stochastic depth. European Conference on Computer Vision, pp. 646–661, 2016.
G. Huang, Z. Liu, K. Weinberger, and L. van der Maaten. Densely connected convolutional networks. arXiv:1608.06993, 2016.https://arxiv.org/abs/1608.06993
D. Hubel and T. Wiesel. Receptive fields of single neurones in the cat’s striate cortex. The Journal of Physiology, 124(3), pp. 574–591, 1959.
K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. What is the best multi-stage architecture for object recognition? International Conference on Computer Vision (ICCV), 2009.
S. Ji, W. Xu, M. Yang, and K. Yu. 3D convolutional neural networks for human action recognition. IEEE TPAMI, 35(1), pp. 221–231, 2013.
Y. Jia et al. Caffe: Convolutional architecture for fast feature embedding. ACM International Conference on Multimedia, 2014.
J. Johnson, A. Karpathy, and L. Fei-Fei. Densecap: Fully convolutional localization networks for dense captioning. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4565–4574, 2015.
J. Johnson, A. Alahi, and L. Fei-Fei. Perceptual losses for real-time style transfer and super-resolution. European Conference on Computer Vision, pp. 694–711, 2015.
R. Johnson and T. Zhang. Effective use of word order for text categorization with convolutional neural networks. arXiv:1412.1058, 2014.https://arxiv.org/abs/1412.1058
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. Large-scale video classification with convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 725–1732, 2014.
A. Karpathy, J. Johnson, and L. Fei-Fei. Stanford University Class CS321n: Convolutional neural networks for visual recognition, 2016.http://cs231n.github.io/
Y. Kim. Convolutional neural networks for sentence classification. arXiv:1408.5882, 2014.
T. Kipf and M. Welling. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907, 2016.https://arxiv.org/pdf/1609.02907.pdf
A. Krizhevsky, I. Sutskever, and G. Hinton. Imagenet classification with deep convolutional neural networks. NIPS Conference, pp. 1097–1105. 2012.
S. Lai, L. Xu, K. Liu, and J. Zhao. Recurrent Convolutional Neural Networks for Text Classification. AAAI, pp. 2267–2273, 2015.
G. Larsson, M. Maire, and G. Shakhnarovich. Fractalnet: Ultra-deep neural networks without residuals. arXiv:1605.07648, 2016.https://arxiv.org/abs/1605.07648
S. Lawrence, C. L. Giles, A. C. Tsoi, and A. D. Back. Face recognition: A convolutional neural-network approach. IEEE Transactions on Neural Networks, 8(1), pp. 98–113, 1997.
Q. Le et al. Building high-level features using large scale unsupervised learning. ICASSP, 2013.
Y. LeCun and Y. Bengio. Convolutional networks for images, speech, and time series. The Handbook of Brain Theory and Neural Networks, 3361(10), 1995.
Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), pp. 2278–2324, 1998.
Y. LeCun, C. Cortes, and C. Burges. The MNIST database of handwritten digits, 1998.http://yann.lecun.com/exdb/mnist/
Y. LeCun, K. Kavukcuoglu, and C. Farabet. Convolutional networks and applications in vision. IEEE International Symposium on Circuits and Systems, pp. 253–256, 2010.
H. Lee, R. Grosse, B. Ranganath, and A. Y. Ng. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. ICML Conference, pp. 609–616, 2009.
M. Lin, Q. Chen, and S. Yan. Network in network. arXiv:1312.4400, 2013.https://arxiv.org/abs/1312.4400
A. Mahendran and A. Vedaldi. Understanding deep image representations by inverting them. IEEE Conference on Computer Vision and Pattern Recognition, pp. 5188–5196, 2015.
A. Makhzani and B. Frey. Winner-take-all autoencoders. NIPS Conference, pp. 2791–2799, 2015.
J. Masci, U. Meier, D. Ciresan, and J. Schmidhuber. Stacked convolutional auto-encoders for hierarchical feature extraction. Artificial Neural Networks and Machine Learning, pp. 52–59, 2011.
T. Mikolov, K. Chen, G. Corrado, and J. Dean. Efficient estimation of word representations in vector space. arXiv:1301.3781, 2013.https://arxiv.org/abs/1301.3781
J. Y.-H. Ng, M. Hausknecht, S. Vijayanarasimhan, O. Vinyals, R. Monga, and G. Toderici. Beyond short snippets: Deep networks for video classification. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702, 2015.
A. Nguyen, A. Dosovitskiy, J. Yosinski, T., Brox, and J. Clune. Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. NIPS Conference, pp. 3387–3395, 2016.
M. Oquab, L. Bottou, I. Laptev, and J. Sivic. Learning and transferring mid-level image representations using convolutional neural networks. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1717–1724, 2014.
O. Parkhi, A. Vedaldi, and A. Zisserman. Deep Face Recognition. BMVC, 1(3), pp. 6, 2015.
J. Pennington, R. Socher, and C. Manning. Glove: Global Vectors for Word Representation. EMNLP, pp. 1532–1543, 2014.
A. Radford, L. Metz, and S. Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434, 2015.https://arxiv.org/abs/1511.06434
M.’ A. Ranzato, F. J. Huang, Y-L. Boureau, and Y. LeCun. Unsupervised learning of invariant feature hierarchies with applications to object recognition. Computer Vision and Pattern Recognition, pp. 1–8, 2007.
A. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. CNN features off-the-shelf: an astounding baseline for recognition. IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 806–813, 2014.
J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788, 2016.
H. A. Rowley, S. Baluja, and T. Kanade. Neural network-based face detection. IEEE TPAMI, 20(1), pp. 23–38, 1998.
A. Saxe, P. Koh, Z. Chen, M. Bhand, B. Suresh, and A. Ng. On random weights and unsupervised feature learning. ICML Confererence, pp. 1089–1096, 2011.
P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, and Y. LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. arXiv:1312.6229, 2013.https://arxiv.org/abs/1312.6229
E. Shelhamer, J., Long, and T. Darrell. Fully convolutional networks for semantic segmentation. IEEE TPAMI, 39(4), pp. 640–651, 2017.
P. Simard, D. Steinkraus, and J. C. Platt. Best practices for convolutional neural networks applied to visual document analysis. ICDAR, pp. 958–962, 2003.
K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556, 2014.https://arxiv.org/abs/1409.1556
K. Simonyan and A. Zisserman. Two-stream convolutional networks for action recognition in videos. NIPS Conference, pp. 568–584, 2014.
K. Simonyan, A. Vedaldi, and A. Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. arXiv:1312.6034, 2013.
J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller. Striving for simplicity: The all convolutional net. arXiv:1412.6806, 2014.https://arxiv.org/abs/1412.6806
Y. Sun, D. Liang, X. Wang, and X. Tang. Deepid3: Face recognition with very deep neural networks. arXiv:1502.00873, 2013. https://arxiv.org/abs/1502.00873
Y. Sun, X. Wang, and X. Tang. Deep learning face representation from predicting 10,000 classes. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1891–1898, 2014.
C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. Going deeper with convolutions. IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–9, 2015.
C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna. Rethinking the inception architecture for computer vision. IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826, 2016.
C. Szegedy, S. Ioffe, V. Vanhoucke, and A. Alemi. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. AAAI Conference, pp. 4278–4284, 2017.
G. Taylor, R. Fergus, Y. LeCun, and C. Bregler. Convolutional learning of spatio-temporal features. European Conference on Computer Vision, pp. 140–153, 2010.
D. Tran et al. Learning spatiotemporal features with 3d convolutional networks. IEEE International Conference on Computer Vision, 2015.
R. Uijlings, A. van de Sande, T. Gevers, and M. Smeulders. Selective search for object recognition. International Journal of Computer Vision, 104(2), 2013.
A. Vedaldi and K. Lenc. Matconvnet: Convolutional neural networks for matlab. ACM International Conference on Multimedia, pp. 689–692, 2005.http://www.vlfeat.org/matconvnet/
A. Veit, M. Wilber, and S. Belongie. Residual networks behave like ensembles of relatively shallow networks. NIPS Conference, pp. 550–558, 2016.
O. Vinyals, A. Toshev, S. Bengio, and D. Erhan. Show and tell: A neural image caption generator. CVPR Conference, pp. 3156–3164, 2015.
L. Wang, Y. Qiao, and X. Tang. Action recognition with trajectory-pooled deep-convolutional descriptors. IEEE Conference on Computer Vision and Pattern Recognition, pp. 4305–4314, 2015.
T. Wang, D. Wu, A. Coates, and A. Ng. End-to-end text recognition with convolutional neural networks. International Conference on Pattern Recognition, pp. 3304–3308, 2012.
S. Xie, R. Girshick, P. Dollar, Z. Tu, and K. He. Aggregated residual transformations for deep neural networks. arXiv:1611.05431, 2016.https://arxiv.org/abs/1611.05431
F. Yu and V. Koltun. Multi-scale context aggregation by dilated convolutions. arXiv:1511.07122, 2015.https://arxiv.org/abs/1511.07122
S. Zagoruyko and N. Komodakis. Wide residual networks. arXiv:1605.07146, 2016.https://arxiv.org/abs/1605.07146
M. Zeiler, D. Krishnan, G. Taylor, and R. Fergus. Deconvolutional networks. Computer Vision and Pattern Recognition (CVPR), pp. 2528–2535, 2010.
M. Zeiler, G. Taylor, and R. Fergus. Adaptive deconvolutional networks for mid and high level feature learning. IEEE International Conference on Computer Vision (ICCV)—, pp. 2018–2025, 2011.
M. Zeiler and R. Fergus. Visualizing and understanding convolutional networks. European Conference on Computer Vision, Springer, pp. 818–833, 2013.
X. Zhang, J. Zhao, and Y. LeCun. Character-level convolutional networks for text classification. NIPS Conference, pp. 649–657, 2015.
C. Zitnick and P. Dollar. Edge Boxes: Locating object proposals from edges. ECCV, pp. 391–405, 2014.
http://caffe.berkeleyvision.org/gathered/examples/feature_extraction.html
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this chapter
Cite this chapter
Aggarwal, C.C. (2018). Convolutional Neural Networks. In: Neural Networks and Deep Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-94463-0_8
Download citation
DOI: https://doi.org/10.1007/978-3-319-94463-0_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-94462-3
Online ISBN: 978-3-319-94463-0
eBook Packages: Computer ScienceComputer Science (R0)