Advertisement

Part Detector Discovery in Deep Convolutional Neural Networks

  • Marcel SimonEmail author
  • Erik Rodner
  • Joachim Denzler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9004)

Abstract

Current fine-grained classification approaches often rely on a robust localization of object parts to extract localized feature representations suitable for discrimination. However, part localization is a challenging task due to the large variation of appearance and pose. In this paper, we show how pre-trained convolutional neural networks can be used for robust and efficient object part discovery and localization without the necessity to actually train the network on the current dataset. Our approach called “part detector discovery” (PDD) is based on analyzing the gradient maps of the network outputs and finding activation centers spatially related to annotated semantic parts or bounding boxes. This allows us not just to obtain excellent performance on the CUB200-2011 dataset, but in contrast to previous approaches also to perform detection and bird classification jointly without requiring a given bounding box annotation during testing and ground-truth parts during training.

Keywords

Input Image Convolutional Neural Network Deep Neural Network Part Detection Convolutional Layer 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Supplementary material

336656_1_En_12_MOESM1_ESM.pdf (104 kb)
Supplementary material (pdf 105 KB)

References

  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 1097–1105 (2012)Google Scholar
  2. 2.
    Bengio, Y., Courville, A.C., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35, 1798–1828 (2013)CrossRefGoogle Scholar
  3. 3.
    Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2, 1–127 (2009)CrossRefzbMATHGoogle Scholar
  4. 4.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 818–833. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  5. 5.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Darrell, T.: Decaf: a deep convolutional activation feature for generic visual recognition (2013). arXiv preprint arXiv:1310.1531
  6. 6.
    Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: Overfeat: integrated recognition, localization and detection using convolutional networks. In: International Conference on Learning Representations (ICLR). CBLS (2014). Preprint http://arxiv.org/abs/1312.6229
  7. 7.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: an astounding baseline for recognition (2014). arXiv preprint arXiv:1403.6382
  8. 8.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 32, 1627–1645 (2010)CrossRefGoogle Scholar
  9. 9.
    Berg, T., Belhumeur, P.: Poof: part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 955–962 (2013)Google Scholar
  10. 10.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20, 273–297 (1995)zbMATHGoogle Scholar
  11. 11.
    Erhan, D., Szegedy, C., Toshev, A., Anguelov, D.: Scalable object detection using deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014). Preprint http://arxiv.org/abs/1312.2249
  12. 12.
    Szegedy, C., Toshev, A., Erhan, D.: Deep neural networks for object detection. In: Advances in Neural Information Processing Systems (NIPS), pp. 2553–2561. Curran Associates Inc (2013)Google Scholar
  13. 13.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps (2013). arXiv preprint arXiv:1312.6034
  14. 14.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)Google Scholar
  15. 15.
    Wang, X., Yang, M., Zhu, S., Lin, Y.: Regionlets for generic object detection. In: IEEE International Conference on Computer Vision (ICCV), pp. 17–24 (2013)Google Scholar
  16. 16.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human Pose annotations. In: IEEE International Conference on Computer Vision (ICCV), pp. 1365–1372 (2009)Google Scholar
  17. 17.
    Zhang, N., Farrell, R., Iandola, F., Darrell, T.: Deformable part descriptors for fine-grained recognition and attribute prediction. In: IEEE International Conference on Computer Vision (ICCV), pp. 729–736 (2013)Google Scholar
  18. 18.
    Zhang, N., Paluri, M., Ranzato, M., Darrell, T., Bourdev, L.: Panda: Pose aligned networks for deep attribute modeling. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014). Preprint http://arxiv.org/abs/1311.5591
  19. 19.
    Zou, W.Y., Wang, X., Sun, M., Lin, Y.: Generic Object Detection With Dense Neural Patterns and Regionlets. CoRR (2014). Preprint http://arxiv.org/abs/1404.4316
  20. 20.
    Jain, A., Tompson, J., Andriluka, M., Taylor, G.W., Bregler, C.: Learning human pose estimation features with convolutional networks. In: International Conference on Learning Representations (ICLR) (2014). Preprint http://arxiv.org/abs/1312.7302
  21. 21.
    Toshev, A., Szegedy, C.: Deeppose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014). Preprint http://arxiv.org/abs/1312.4659
  22. 22.
    Sun, Y., Wang, X., Tang, X.: Deep convolutional network cascade for facial point detection. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3476–3483 (2013)Google Scholar
  23. 23.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2169–2178 (2006)Google Scholar
  24. 24.
    Coates, A., Ng, A.: The importance of encoding versus training with sparse coding and vector quantization. In: Proceedings of the 28th International Conference on Machine Learning (ICML), pp. 921–928. ACM (2011)Google Scholar
  25. 25.
    Borji, A., Itti, L.: State-of-the-art in visual attention modeling. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 35, 185–207 (2013)CrossRefGoogle Scholar
  26. 26.
    Göring, C., Rodner, E., Freytag, A., Denzler, J.: Nonparametric part transfer for fine-grained recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)Google Scholar
  27. 27.
    Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 dataset. Technical report CNS-TR-2011-001. California Institute of Technology (2011)Google Scholar
  28. 28.
    Liu, J., Kanazawa, A., Jacobs, D., Belhumeur, P.: Dog breed classification using part localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part I. LNCS, vol. 7572, pp. 172–185. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  29. 29.
    Chai, Y., Lempitsky, V., Zisserman, A.: Symbiotic segmentation and part localization for fine-grained categorization. In: IEEE International Conference on Computer Vision (ICCV), pp. 321–328 (2013)Google Scholar
  30. 30.
    Zhang, N., Donahue, J., Girshick, R., Darrell, T.: Part-based R-CNNs for fine-grained category detection. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part I. LNCS, vol. 8689, pp. 834–849. Springer, Heidelberg (2014) CrossRefGoogle Scholar
  31. 31.
    Branson, S., Horn, G.V., Belongie, S., Perona, P.: Bird species categorization using Pose normalized deep convolutional nets. CoRR (2014). Preprint http://arxiv.org/abs/1406.2952

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Computer Vision GroupFriedrich Schiller University of JenaJenaGermany

Personalised recommendations