Patch-Level Spatial Layout for Classification and Weakly Supervised Localization

  • Valentina ZadrijaEmail author
  • Josip Krapac
  • Jakob Verbeek
  • Siniša Šegvić
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9358)


We propose a discriminative patch-level model which combines appearance and spatial layout cues. We start from a block-sparse model of patch appearance based on the normalized Fisher vector representation. The appearance model is responsible for (i) selecting a discriminative subset of visual words, and (ii) identifying distinctive patches assigned to the selected subset. These patches are further filtered by a sparse spatial model operating on a novel representation of pairwise patch layout. We have evaluated the proposed pipeline in image classification and weakly supervised localization experiments on a public traffic sign dataset. The results show significant advantage of the combined model over state of the art appearance models.


Spatial Layout Fisher Vector (FV) Visual Word Pairs Patch Appearance Patch Contribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work has been fully supported by Croatian Science Foundation under the project I-2433-2014.


  1. 1.
    Alexe, B., Deselaers, T., Ferrari, V.: Measuring the objectness of image windows. IEEE Trans. Pattern Anal. Mach. Intell. 34(11), 2189–2202 (2012)CrossRefGoogle Scholar
  2. 2.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS (2003)Google Scholar
  3. 3.
    Arandjelović, R., Zisserman, A.: All about VLAD. In: CVPR (2013)Google Scholar
  4. 4.
    Baecchi, C., Turchini, F., Seidenari, L., Bagdanov, A.D., Bimbo, A.D.: Fisher vectors over random density forests for object recognition. In: ICPR (2014)Google Scholar
  5. 5.
    Brkić, K., Pinz, A., Šegvić, S., Kalafatić, Z.: Histogram-based description of local space-time appearance. In: Heyden, A., Kahl, F. (eds.) SCIA 2011. LNCS, vol. 6688, pp. 206–217. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  6. 6.
    Cinbis, R., Verbeek, J., Schmid, C.: Segmentation driven object detection with Fisher vectors. In: ICCV (2013)Google Scholar
  7. 7.
    Cinbis, R., Verbeek, J., Schmid, C.: Multi-fold MIL training for weakly supervised object localization. In: CVPR (2014)Google Scholar
  8. 8.
    Crowley, E.J., Zisserman, A.: Of gods and goats: weakly supervised learning of figurative art. In: BMVC (2013)Google Scholar
  9. 9.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)Google Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  11. 11.
    Deselaers, T., Alexe, B., Ferrari, V.: Weakly supervised localization and learning with generic knowledge. Int. J. Comput. Vis. 100(3), 275–293 (2012)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Dollár, P., Belongie, S., Perona, P.: The fastest pedestrian detector in the west. In: BMVC (2010)Google Scholar
  13. 13.
    Douze, M., Jégou, H.: The Yael library. In: Proceedings of the ACM International Conference on Multimedia (2014)Google Scholar
  14. 14.
    Everingham, M., Gool, L., Williams, C.K., Winn, J., Zisserman, A.: The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vis. 88(2), 303–338 (2010)CrossRefGoogle Scholar
  15. 15.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. 32(9), 1627–1645 (2010)CrossRefGoogle Scholar
  16. 16.
    Fernando, B., Fromont, E., Tuytelaars, T.: Mining mid-level features for image classification. Int. J. Comput. Vis. 108(3), 186–203 (2014)MathSciNetCrossRefGoogle Scholar
  17. 17.
    Galleguillos, C., Babenko, B., Rabinovich, A., Belongie, S.: Weakly supervised object localization with stable segmentations. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 193–207. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  18. 18.
    Jégou, H., Douze, M., Schmid, C.: On the burstiness of visual elements. In: CVPR (2009)Google Scholar
  19. 19.
    Jenatton, R., Mairal, J., Obozinski, G., Bach, F.R.: Proximal methods for hierarchical sparse coding. J. Mach. Learn. Res. 12, 2297–2334 (2011)MathSciNetzbMATHGoogle Scholar
  20. 20.
    Krapac, J., Šegvić, S.: Fast approximate GMM soft-assign for fine-grained image classification with large Fisher vectors. In: GCPR (2015)Google Scholar
  21. 21.
    Krapac, J., Šegvić, S.: Weakly supervised object localization with large Fisher vectors. In: VISAPP (2015)Google Scholar
  22. 22.
    Krapac, J., Verbeek, J., Jurie, F.: Modeling spatial layout with Fisher vectors for image categorization. In: ICCV (2011)Google Scholar
  23. 23.
    Lampert, C.H., Blaschko, M.B., Hofmann, T.: Efficient subwindow search: a branch and bound framework for object localization. IEEE Trans. Pattern Anal. Mach. Intell. 31, 2129–2142 (2009)CrossRefGoogle Scholar
  24. 24.
    Liu, D., Hua, G., Viola, P.A., Chen, T.: Integrated feature selection and higher-order spatial feature extraction for object categorization. In: CVPR (2008)Google Scholar
  25. 25.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)MathSciNetzbMATHGoogle Scholar
  26. 26.
    Maron, O., Lozano-Pérez, T.: A framework for multiple-instance learning. In: NIPS, pp. 570–576 (1997)Google Scholar
  27. 27.
    Mathias, M., Timofte, R., Benenson, R., Gool, L.J.V.: Traffic sign recognition - how far are we from the solution? In: IJCNN, pp. 1–8 (2013)Google Scholar
  28. 28.
    Mobileye: Traffic Sign Detection. Accessed 22 July 2015
  29. 29.
    Murphy, K.: Machine learning a probabilistic perspective. MIT Press, Cambridge (2012) zbMATHGoogle Scholar
  30. 30.
    Nguyen, M.H., Torresani, L., De la Torre, F., Rother, C.: Learning discriminative localization from weakly labeled data. Pattern Recogn. 47(3), 1523–1534 (2014)CrossRefGoogle Scholar
  31. 31.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the Fisher Kernel for large-scale image classification. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 143–156. Springer, Heidelberg (2010) CrossRefGoogle Scholar
  32. 32.
    Sánchez, J., Perronnin, F., Mensink, T., Verbeek, J.: Image classification with the Fisher vector: theory and practice. Int. J. Comput. Vis. 105(3), 222–245 (2013)MathSciNetCrossRefGoogle Scholar
  33. 33.
    Simonyan, K., Vedaldi, A., Zisserman, A.: Deep Fisher networks for large-scale image classification. In: NIPS, pp. 163–171 (2013)Google Scholar
  34. 34.
    Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  35. 35.
    Siva, P., Xiang, T.: Weakly supervised object detector learning with model drift detection. In: ICCV (2011)Google Scholar
  36. 36.
    Viola, P.A., Jones, M.J.: Robust real-time face detection. Int. J. Comput. Vis. 57(2), 137–154 (2004). Scholar
  37. 37.
    Voravuthikunchai, W., Cremilleux, B., Jurie, F.: Histograms of pattern sets for image classification and object recognition. In: CVPR (2014)Google Scholar
  38. 38.
    Šegvić, S., Brkic, K., Kalafatic, Z., Pinz, A.: Exploiting temporal and spatial constraints in traffic sign detection from a moving vehicle. Mach. Vis. Appl. 25(3), 649–665 (2014)CrossRefGoogle Scholar
  39. 39.
    Weng, C., Yuan, J.: Efficient mining of optimal AND/OR patterns for visual recognition. IEEE Trans. Multimedia 17(5), 626–635 (2015)CrossRefGoogle Scholar
  40. 40.
    Yang, Y., Newsam, S.: Spatial pyramid co-occurrence for image classification. In: ICCV (2011)Google Scholar
  41. 41.
    Yuan, J., Wu, Y., Yang, M.: Discovery of collocation patterns: from visual words to visual phrases. In: CVPR (2007)Google Scholar
  42. 42.
    Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. Roy. Stat. Soc.: Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Open Access This chapter is licensed under the terms of the Creative Commons Attribution-NonCommercial 2.5 International License (, which permits any noncommercial use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Authors and Affiliations

  • Valentina Zadrija
    • 1
    Email author
  • Josip Krapac
    • 1
  • Jakob Verbeek
    • 2
  • Siniša Šegvić
    • 1
  1. 1.Faculty of Electrical Engineering and ComputingUniversity of ZagrebZagrebCroatia
  2. 2.INRIA Rhone-AlpesGrenobleFrance

Personalised recommendations