Persistent Evidence of Local Image Properties in Generic ConvNets

Razavian, Ali Sharif; Azizpour, Hossein; Maki, Atsuto; Sullivan, Josephine; Ek, Carl Henrik; Carlsson, Stefan

doi:10.1007/978-3-319-19665-7_21

Ali Sharif Razavian¹⁵,
Hossein Azizpour¹⁵,
Atsuto Maki¹⁵,
Josephine Sullivan¹⁵,
Carl Henrik Ek¹⁵ &
…
Stefan Carlsson¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 9127))

Included in the following conference series:

Scandinavian Conference on Image Analysis

2569 Accesses
2 Citations

Abstract

Supervised training of a convolutional network for object classification should make explicit any information related to the class of objects and disregard any auxiliary information associated with the capture of the image or the variation within the object class. Does this happen in practice? Although this seems to pertain to the very final layers in the network, if we look at earlier layers we find that this is not the case. In fact, strong spatial information is implicit. This paper addresses this, in particular, exploiting the image representation at the first fully connected layer, i.e. the global image descriptor which has been recently shown to be most effective in a range of visual recognition tasks. We empirically demonstrate evidences for the finding in the contexts of four different tasks: 2d landmark detection, 2d object keypoints prediction, estimation of the RGB values of input image, and recovery of semantic label of each pixel. We base our investigation on a simple framework with ridge rigression commonly across these tasks, and show results which all support our insight. Such spatial information can be used for computing correspondence of landmarks to a good accuracy, but should potentially be useful for improving the training of the convolutional nets for classification purposes.

Download to read the full chapter text

Chapter PDF

Landmark Recognition: From Small-Scale to Large-Scale Retrieval

Context-Based Object Recognition: Indoor Versus Outdoor Environments

What’s Wrong with Computer Vision?

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Imagenet large scale visual recognition challenge (2013). http://www.image-net.org/challenges/LSVRC/2013/
Azizpour, H., Sharif Razavian, A., Sullivan, J., Maki, A., Carlsson, S.: From generic to specific deep representations for visual recognition (2014). arXiv:1406.5774 [cs.CV]
Belhumeur, P.N., Jacobs, D.W., Kriegman, D.J., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: CVPR, pp. 545–552 (2011)
Google Scholar
Bourdev, L., Malik, J.: Poselets: body part detectors trained using 3d human pose annotations. In: ICCV (2009)
Google Scholar
Burgos-Artizzu, X.P., Perona, P., Dollár, P.: Robust face landmark estimation under occlusion. In: ICCV (2013)
Google Scholar
Cao, X., Wei, Y., Wen, F., Sun, J.: Face alignment by explicit shape regression. In: CVPR, pp. 2887–2894 (2012)
Google Scholar
Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition. In: ICML (2014)
Google Scholar
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. In: NIPS (2014)
Google Scholar
Fischer, P., Dosovitskiy, A., Brox, T.: Descriptor matching with convolutional neural networks: a comparison to sift (2014). arXiv:1405.5769v1 [cs.CV]
Girshick, R.B., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R.B., Guadarrama, S., Darrell, T.: Caffe: Convolutional architecture for fast feature embedding (2014)
Google Scholar
Kazemi, V., Sullivan, J.: One millisecond face alignment with an ensemble of regression trees. In: CVPR, pp. 1867–1874 (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive facial feature localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012)
Chapter Google Scholar
Long, J., Zhang, N., Darrell, T.: Do convnets learn correspondence? (2014). arXiv:1411.1091 [cs.CV]
Milborrow, S., Nicolls, F.: Locating facial features with an extended active shape model. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS, vol. 5305, pp. 504–513. Springer, Heidelberg (2008)
Chapter Google Scholar
Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: CVPR (2014)
Google Scholar
Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: A semi-automatic methodology for facial landmark annotation. In: CVPR Workshops, pp. 896–903 (2013)
Google Scholar
Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: ICLR (2014)
Google Scholar
Sharif Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for visual recognition. In: CVPR workshop of DeepVision (2014)
Google Scholar
Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: CVPR (2014)
Google Scholar
Xiong, X., De la Torre, F.: Supervised descent method and its applications to face alignment. In: CVPR (2013)
Google Scholar
Yang, Y., Ramanan, D.: Articulated human detection with flexible mixtures of parts. PAMI 35(12), 2878–2890 (2013)
Article Google Scholar
Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: Panda: pose aligned networks for deep attribute modeling. In: CVPR (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Computer Vision and Active Perception Lab (CVAP), School of Computer Science and Communication (CSC), Royal Institute of Technology (KTH), 100 44, Stockholm, Sweden
Ali Sharif Razavian, Hossein Azizpour, Atsuto Maki, Josephine Sullivan, Carl Henrik Ek & Stefan Carlsson

Authors

Ali Sharif Razavian
View author publications
You can also search for this author in PubMed Google Scholar
Hossein Azizpour
View author publications
You can also search for this author in PubMed Google Scholar
Atsuto Maki
View author publications
You can also search for this author in PubMed Google Scholar
Josephine Sullivan
View author publications
You can also search for this author in PubMed Google Scholar
Carl Henrik Ek
View author publications
You can also search for this author in PubMed Google Scholar
Stefan Carlsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ali Sharif Razavian .

Editor information

Editors and Affiliations

Technical University of Denmark, Lyngby, Denmark
Rasmus R. Paulsen
University of Copenhagen, Copenhagen, Denmark
Kim S. Pedersen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Razavian, A.S., Azizpour, H., Maki, A., Sullivan, J., Ek, C.H., Carlsson, S. (2015). Persistent Evidence of Local Image Properties in Generic ConvNets. In: Paulsen, R., Pedersen, K. (eds) Image Analysis. SCIA 2015. Lecture Notes in Computer Science(), vol 9127. Springer, Cham. https://doi.org/10.1007/978-3-319-19665-7_21

Download citation

DOI: https://doi.org/10.1007/978-3-319-19665-7_21
Published: 09 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-19664-0
Online ISBN: 978-3-319-19665-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

The International Association for Pattern Recognition (opens in a new tab)

Persistent Evidence of Local Image Properties in Generic ConvNets

Abstract

Chapter PDF

Similar content being viewed by others

Landmark Recognition: From Small-Scale to Large-Scale Retrieval

Context-Based Object Recognition: Indoor Versus Outdoor Environments

What’s Wrong with Computer Vision?

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Navigation

Persistent Evidence of Local Image Properties in Generic ConvNets

Abstract

Chapter PDF

Similar content being viewed by others

Landmark Recognition: From Small-Scale to Large-Scale Retrieval

Context-Based Object Recognition: Indoor Versus Outdoor Environments

What’s Wrong with Computer Vision?

Keywords

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation