Beyond SIFT for Image Categorization by Bag-of-Scenes Analysis

  • Sébastien ParisEmail author
  • Xanadu Halkias
  • Hervé Glotin
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 318)


In this paper, we address the general problem of image/object categorization with a novel approach referred to as Bag-of-Scenes (BoS). Our approach is efficient for both low semantic applications, such as texture classification and higher semantic tasks such as natural scenes recognition. It is based on the widely used combination of (i) Sparse coding (Sc), (ii) Max-pooling and (iii) Spatial Pyramid Matching (SPM) techniques applied to histograms of multi-scale Local Binary/Ternary Patterns (LBP/LTP) as local features. This approach can be considered as a two-layer hierarchical architecture. The first layer encodes quickly the local spatial patch structure via histograms of LBP/LTP, while the second layer encodes the relationships between pre-analyzed LBP/LTP-scenes/objects. In order to provide comparative results, we also introduce an alternate 2-layer architecture. For this latter, the first layer is encoding directly the multi-scale Differential Vectors (DV) local patches instead of histograms of LBP/LTP. Our method outperforms SIFT-based approaches using Sc techniques and can be trained efficiently with a simple linear SVM. Our BoS method achieves \(87.46\,\%\), and \(90.35\,\%\) of accuracy for Scene-15, UIUC-Sport datasets respectively.


Image categorization Scenes categorization Fine-grained visual categorization Non-parametric local patterns Multi-scale LBP/LTP Dictionary learning Sparse coding LASSO Max-pooling SPM Linear SVM 


  1. 1.
    Bosch, A., Zisserman, A., Munoz, X.: Image classification using random forests and ferns. In: ICCV’07 (2007)Google Scholar
  2. 2.
    Larios, N., Lin, J., Zhang, M., Lytle, D., Moldenke, A., Shapiro, L., Dietterich, T.: Stacked spatial-pyramid kernel: an object-class recognition method to combine scores from random trees. In: WACV’11 (2011)Google Scholar
  3. 3.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42, 145–175 (2001)CrossRefzbMATHGoogle Scholar
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR’05 (2005)Google Scholar
  5. 5.
    Deselaers, T., Ferrari, V.: Global and efficient self-similarity for object classification and detection. In: CVPR’10 (2010)Google Scholar
  6. 6.
    Chen, J., Shan, S., He, C., Zhao, G., Pietikainen, M., Chen, X., Gao, W.: Wld: a robust local image descriptor. IEEE Trans. PAMI 32(9), 1705–1720 (2010)CrossRefGoogle Scholar
  7. 7.
    Fröba, B., Ernst, A.: Face detection with the modified census transform. In: FGR’04 (2004)Google Scholar
  8. 8.
    Wu, J., Geyer, C., Rehg, J.M.: Real-time human detection using contour cues. In: ICRA’11 (2011)Google Scholar
  9. 9.
    Marcel, S., Rodriguez, Y., Heusch, G.: On the recent use of local binary patterns for face authentication. Int. J. Image Video Process. Spec. Issue Facial Image Process. 1–9 (2007)Google Scholar
  10. 10.
    Zhang, L., Chu, R., Xiang, S., Liao, S., Li, S.Z.: Face detection based on multi-block lbp representation. In: ICB’07 (2007)Google Scholar
  11. 11.
    Sadat, R.M.N., Teng, S.W., Lu, G., Hasan, S.F.: Texture classification using multimodal invariant local binary pattern. In: WACV’11 (2011)Google Scholar
  12. 12.
    Bianconi, F., González, E., Fernández, A., Saetta, S.A.: Automatic classification of granite tiles through colour and texture features. Expert Syst. Appl. 39(12), 11212–11218 (2012)CrossRefGoogle Scholar
  13. 13.
    Wu, J., Rehg, J.M.: Where am i: place instance and category recognition using spatialpact. In: CVPR’2008 (2008)Google Scholar
  14. 14.
    Gao, S., Tsang, I.W.-H., Chia, L.-T., Zhao, P.: Local features are not lonely Laplacian sparse coding for image classification. In: CVPR’10 (2010)Google Scholar
  15. 15.
    Paris, S., Glotin, H.: Pyramidal multi-level features for the robot vision@icpr 2010 challenge. In: ICPR’10 (2010)Google Scholar
  16. 16.
    Zhang, B., Gao, Y., Zhao, S., Liu, J.: Local derivative pattern versus local binary pattern: face recognition with high-order local pattern descriptor. IEEE Trans. Image Proc. 19(2), 533–544 (2010)CrossRefMathSciNetGoogle Scholar
  17. 17.
    Ojala, T., Pietikäinen, M., Mäenpää, T.: Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Trans. PAMI 24(7), 971–987 (2002)CrossRefGoogle Scholar
  18. 18.
    Zheng, Y., Shen, C., Hartley, R.I., Huang, X.: Effective pedestrian detection using center-symmetric local binary/trinary patterns. In: CoRR, vol. abs/1009.0892 (2010)
  19. 19.
    Zhang, W., Shan, S., Qing, L., Chen, X., Gao, W.: Are gabor phases really useless for face recognition? Pattern Anal. Appl. 12(3), 301–307 (2009)CrossRefMathSciNetGoogle Scholar
  20. 20.
    Lee, H., Chung, Y., Kim, J., Park, D.: Face image retrieval using sparse representation classifier with gabor-lbp histogram. In: WISA’10 (2010)Google Scholar
  21. 21.
    Jun, B., Kim, D.: Robust face detection using local gradient patterns and evidence accumulation. Pattern Recognit. 45, 3304–3316 (2012)CrossRefGoogle Scholar
  22. 22.
    Hussain, S.U., Triggs, W.: Visual recognition using local quantized patterns. In: CVPR’12 (2012)Google Scholar
  23. 23.
    Heikkilä, M., Pietikäinen, M., Schmid, C.: Description of interest regions with center-symmetric local binary patterns. In: CVGIP’06 (2006)Google Scholar
  24. 24.
    Huang, D., Shan, C., Ardabilian, M., Wang, Y., Chen, L.: Local binary patterns and its application to facial image analysis: a survey. IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev. 41, 1–17 (2011)CrossRefGoogle Scholar
  25. 25.
    Bianconi, F., Fernández, A.: On the occurrence probability of local binary patterns: a theoretical study. J. Math. Imaging Vis. 40(3), 259–268 (2011)CrossRefzbMATHGoogle Scholar
  26. 26.
    Tan, X., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. Trans. Image Proc. 19, 1635–1650 (2010)CrossRefMathSciNetGoogle Scholar
  27. 27.
    Willamowski, J., Arregui, D., Csurka, G., Dance, C.R., Fan, L.: Categorizing nine visual classes using local appearance descriptors. In: ICPR’04 (2004)Google Scholar
  28. 28.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR’06 (2006)Google Scholar
  29. 29.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: ICCV’99 (2009)Google Scholar
  30. 30.
    Yang, J., Yu, K., Gong, Y., Huang, T.S.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR’09 (2009)Google Scholar
  31. 31.
    Sermanet, P., Chintala, S., LeCun, Y.: Convolutional neural networks applied to house numbers digit classification. In: ICPR’12 (2012)Google Scholar
  32. 32.
    Wu, J., Rehg, J.: Beyond the euclidean distance: creating effective visual codebooks using the histogram intersection kernel. In: ICCV’09 (2009)Google Scholar
  33. 33.
    Boiman, O., Shechtman, E., Irani, M.: In defense of nearest-neighbor based image classification. In: CVPR’08 (2008)Google Scholar
  34. 34.
    Avila, S.E.F., Thome, N., Cord, M., Valle, E., de Albuquerque Araújo, A.: Bossa: extended bow formalism for image classification. In: ICIP’11 (2011)Google Scholar
  35. 35.
    Oliveira, G.L., Nascimento, E.R., Viera, A.W., Campos, M.F.M.: Sparse spatial coding: a novel approach for efficient and accurate object recognition. In: ICRA’12 (2012)Google Scholar
  36. 36.
    Perronnin, F., Sánchez, J., Mensink, T.: Improving the fisher kernel for large-scale image classification. In: ECCV’10 (2010)Google Scholar
  37. 37.
    Krapac, J., Verbeek, J., Jurie, F.: Modeling spatial layout with fisher vectors for image categorization. In: ICCV’11 (2011)Google Scholar
  38. 38.
    Bo, L., Ren, X., Fox, D.: Hierarchical matching pursuit for image classification: architecture and fast algorithms. In: NIPS’11, pp. 2115–2123 (2011)Google Scholar
  39. 39.
    Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: primal estimated sub-gradient solver for svm (2007)Google Scholar
  40. 40.
    Hsieh, C., Chang, K., Lin, C., Keerthi, S.: A dual coordinate descent method for large-scale linear svm (2008)Google Scholar
  41. 41.
    Liao, S., Zhu, X., Lei, Z., Zhang, L., Li, S.Z.: Learning multi-scale block local binary patterns for face recognition. In: ICB (2007)Google Scholar
  42. 42.
    Viola, P., Jones, M.: Robust real-time face detection. Int. J. Comput. Vis. 57, 137–154 (2004)CrossRefGoogle Scholar
  43. 43.
    Boureau, Y., Bach, F., LeCun, Y., Ponce, J.: Learning mid-level features for recognition. In: CVPR’10 (2010)Google Scholar
  44. 44.
    Chatfield, K., Lempitsky, V., Vedaldi, A., Zisserman, A.: The devil is in the details: an evaluation of recent feature encoding methods. In: BMVC (2011)Google Scholar
  45. 45.
    Mairal, J., Bach, F., Ponce, J., Sapiro, G.: Online dictionary learning for sparse coding. In: ICML’09 (2009)Google Scholar
  46. 46.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  47. 47.
    Boureau, Y., Ponce, J., LeCun, Y.: A theoretical analysis of feature pooling in vision algorithms. In: ICML’10 (2010)Google Scholar
  48. 48.
    Li, L.: What, where and who? Classifying event by scene and object recognition. In: CVPR’07 (2007)Google Scholar
  49. 49.
    Bo, L., Ren, X., Fox, D.: Kernel descriptors for visual recognition. In: NIPS’10 (2010)Google Scholar
  50. 50.
    Elfiky, N.M., Khan, F.S., van de Weijer, J., Gonzàlez, J.: Discriminative compact pyramids for object and scene recognition. Pattern Recognit. 45(4), 1627–1636 (2012)CrossRefzbMATHGoogle Scholar
  51. 51.
    Jia, Y., Huang, C., Darrell, T.: Beyond spatial pyramids: receptive field learning for pooled image features. In: NIPS’11 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Sébastien Paris
    • 1
    Email author
  • Xanadu Halkias
    • 2
  • Hervé Glotin
    • 2
    • 3
  1. 1.DYNI Team, LSIS CNRS UMR 7296Aix-Marseille UniversityMarseilleFrance
  2. 2.DYNI Team, LSIS CNRS UMR 7296Université Sud Toulon-VarLa GardeFrance
  3. 3.Institut Universitaire de FranceParisFrance

Personalised recommendations