Advertisement

Learning Discriminative Mid-Level Patches for Fast Scene Classification

  • Angran LinEmail author
  • Xuhui Jia
  • Kwok Ping Chan
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9493)

Abstract

Discriminative mid-level patch based approaches have become increasingly popular in the past few years. The reason of their popularity can be attributed to the fact that discriminative patches have the ability to accumulate low level features to form high level descriptors for objects and images. Unfortunately, state-of-the-art algorithms to discover those patches heavily rely on SVM related techniques, which consume a lot of computation resources in training. To overcome this shortage and apply discriminative part based techniques to more complicated computer vision problems with larger datasets, we proposed a fast, simple yet powerful way to mine part classifiers automatically with only class labels provided. Our experiments showed that our method, the Fast Exemplar Clustering, is 20 times faster than the commonly used SVM based methods while at the same time attaining competitive accuracy on scene classification.

Keywords

Discriminative mid-level patches Fast scene classification Fast exemplar clustering 

Notes

Acknowledgement

This work is supported by the Hong Kong RGC General Research Fund GRF HKU/710412E.

References

  1. 1.
    Juneja, M., Vedaldi, A., Jawahar, C., Zisserman, A.: Blocks that shout: distinctive parts for scene classification. In: 2013 IEEE Conference on CVPR, pp. 923–930. IEEE (2013)Google Scholar
  2. 2.
    Sun, J., Ponce, J.: Learning discriminative part detectors for image classification and cosegmentation. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 3400–3407. IEEE (2013)Google Scholar
  3. 3.
    Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. In: 2013 IEEE Conference on CVPR, pp. 851–858. IEEE (2013)Google Scholar
  4. 4.
    Rios-Cabrera, R., Tuytelaars, T.: Discriminatively trained templates for 3D object detection: a real time scalable approach. In: 2013 IEEE International Conference on Computer Vision (ICCV), pp. 2048–2055. IEEE (2013)Google Scholar
  5. 5.
    Wang, L., Qiao, Y., Tang, X.: Motionlets: mid-level 3D parts for human motion recognition. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2674–2681. IEEE (2013)Google Scholar
  6. 6.
    Jain, A., Gupta, A., Rodriguez, M., Davis, L.S.: Representing videos using mid-level discriminative patches. In: 2013 IEEE Conference on CVPR, pp. 2571–2578. IEEE (2013)Google Scholar
  7. 7.
    Tang, K., Sukthankar, R., Yagnik, J., Fei-Fei, L.: Discriminative segment annotation in weakly labeled video. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2483–2490. IEEE (2013)Google Scholar
  8. 8.
    Shabou, A., LeBorgne, H.: Locality-constrained and spatially regularized coding for scene categorization. In: 2012 IEEE Conference on CVPR, pp. 3618–3625. IEEE (2012)Google Scholar
  9. 9.
    Perronnin, F., Liu, Y., Sánchez, J., Poirier, H.: Large-scale image retrieval with compressed fisher vectors. In: 2010 IEEE Conference on CVPR, pp. 3384–3391. IEEE (2010)Google Scholar
  10. 10.
    Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes paris look like paris? ACM Trans. Graph. (TOG) 31, 101 (2012)CrossRefGoogle Scholar
  11. 11.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 32, 1627–1645 (2010)CrossRefGoogle Scholar
  12. 12.
    Mittelman, R., Lee, H., Kuipers, B., Savarese, S.: Weakly supervised learning of mid-level features with beta-bernoulli process restricted boltzmann machines. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 476–483 (2013)Google Scholar
  13. 13.
    Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  14. 14.
    Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 26, 1475–1490 (2004)CrossRefGoogle Scholar
  15. 15.
    Yang, J., Yu, K., Gong, Y., Huang, T.: Linear spatial pyramid matching using sparse coding for image classification. In: 2009 IEEE Conference on CVPR, pp. 1794–1801. IEEE (2009)Google Scholar
  16. 16.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: 2008 IEEE Conference on CVPR, pp. 1–8. IEEE (2008)Google Scholar
  17. 17.
    Maji, S., Shakhnarovich, G.: Part discovery from partial correspondence. In: 2013 IEEE Conference on CVPR, pp. 931–938. IEEE (2013)Google Scholar
  18. 18.
    Shen, L., Wang, S., Sun, G., Jiang, S., Huang, Q.: Multi-level discriminative dictionary learning towards hierarchical visual categorization. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 383–390. IEEE (2013)Google Scholar
  19. 19.
    Lee, Y.J., Efros, A.A., Hebert, M.: Style-aware mid-level representation for discovering visual connections in space and time. In: 2013 IEEE International Conference on ICCV, pp. 1857–1864. IEEE (2013)Google Scholar
  20. 20.
    Malisiewicz, T., Gupta, A., Efros, A. A.: Ensemble of exemplar-svms for object detection and beyond. In: 2011 IEEE International Conference on ECCV, pp. 89–96. IEEE (2011)Google Scholar
  21. 21.
    Aubry, M., Maturana, D., Efros, A. A., Russell, B.C., Sivic, J.: Seeing 3D chairs: exemplar part-based 2D–3D alignment using a large dataset of cad models. In: 2014 IEEE Conference on CVPR. IEEE (2014)Google Scholar
  22. 22.
    Walker, J., Gupta, A., Hebert, M.: Patch to the future: unsupervised visual prediction. In: 2014 IEEE Conference on CVPR. IEEE (2014)Google Scholar
  23. 23.
    Lim, J. J., Zitnick, C. L., Dollár, P.: Sketch tokens: a learned mid-level representation for contour and object detection. In: 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3158–3165. IEEE (2013)Google Scholar
  24. 24.
    Sandeep, R.N., Verma, Y., Jawahar, C.: Relative parts: distinctive parts for learning relative attributes. In: 2014 IEEE Conference on CVPR. IEEE (2014)Google Scholar
  25. 25.
    Chen, X., Shrivastava, A., Gupta, A.: Neil: extracting visual knowledge from web data. In: 2013 IEEE International Conference on ICCV, pp. 1409–1416. IEEE (2013)Google Scholar
  26. 26.
    Jia, X., Zhu, X., Lin, A., Chan, K.P.: Face alignment using structured random regressors combined with statistical shape model fitting. In: 28th International Conference on Image and Vision Computing New Zealand, IVCNZ 2013, Wellington, New Zealand, 27–29 November 2013, pp. 424–429 (2013)Google Scholar
  27. 27.
    Jia, X., Yang, H., Lin, A., Chan, K.P., Patras, I.: Structured semi-supervised forest for facial landmarks localization with face mask reasoning (2014)Google Scholar
  28. 28.
    Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 8, 679–698 (1986)CrossRefGoogle Scholar
  29. 29.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: 2005 IEEE Conference on CVPR, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  30. 30.
    Andrews, S., Tsochantaridis, I., Hofmann, T.: Support vector machines for multiple-instance learning. In: NIPS, pp. 561–568 (2002)Google Scholar
  31. 31.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: 2006 IEEE Conference on CVPR, vol. 2, pp. 2169–2178. IEEE (2006)Google Scholar
  32. 32.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: 2009 IEEE Conference on CVPR. IEEE (2009)Google Scholar
  33. 33.
    Zhu, J., Li, L.J., Fei-Fei, L., Xing, E.P.: Large margin learning of upstream scene understanding models. In: Advances in Neural Information Processing Systems, pp. 2586–2594 (2010)Google Scholar
  34. 34.
    Pandey, M., Lazebnik, S.: Scene recognition and weakly supervised object localization with deformable part-based models. In: 2011 IEEE International Conference on ECCV, pp. 1307–1314. IEEE (2011)Google Scholar
  35. 35.
    Wu, J., Rehg, J.M.: Centrist: a visual descriptor for scene categorization. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 33, 1489–1501 (2011)CrossRefGoogle Scholar
  36. 36.
    Li, L.J., Su, H., Fei-Fei, L., Xing, E.P.: Object bank: a high-level image representation for scene classification and semantic feature sparsification. In: Advances in Neural Information Processing Systems, pp. 1378–1386 (2010)Google Scholar
  37. 37.
    Parizi, S.N., Oberlin, J.G., Felzenszwalb, P.F.: Reconfigurable models for scene recognition. In: 2012 IEEE Conference on CVPR, pp. 2775–2782. IEEE (2012)Google Scholar
  38. 38.
    Zheng, Y., Jiang, Y.-G., Xue, X.: Learning hybrid part filters for scene recognition. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 172–185. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  39. 39.
    Sadeghi, F., Tappen, M.F.: Latent pyramidal regions for recognizing scenes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part V. LNCS, vol. 7576, pp. 228–241. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  40. 40.
    Chatfield, K., Lempitsky, V.S., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods, pp. 1–12 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceThe University of Hong KongHong KongChina

Personalised recommendations