What Image Classifiers Really See – Visualizing Bag-of-Visual Words Models

  • Christian Hentschel
  • Harald Sack
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8935)


Bag-of-Visual-Words (BoVW) features which quantize and count local gradient distributions in images similar to counting words in texts have proven to be powerful image representations. In combination with supervised machine learning approaches, models for nearly every visual concept can be learned. BoVW feature extraction, however, is performed by cascading multiple stages of local feature detection and extraction, vector quantization and nearest neighbor assignment that makes interpretation of the obtained image features and thus the overall classification results very difficult. In this work, we present an approach for providing an intuitive heat map-like visualization of the influence each image pixel has on the overall classification result. We compare three different classifiers (AdaBoost, Random Forest and linear SVM) that were trained on the Caltech-101 benchmark dataset based on their individual classification performance and the generated model visualizations. The obtained visualizations not only allow for intuitive interpretation of the classification results but also help to identify sources of misclassification due to badly chosen training examples.


Support Vector Machine Random Forest Visual Word Support Vector Machine Model Scale Invariant Feature Transform 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Breiman, L.: Random forests. Machine Learning (2001)Google Scholar
  2. 2.
    Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C., Maupertuis, D.: Visual Categorization with Bags of Keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)Google Scholar
  3. 3.
    Cutrell, E., Guan, Z.: What are you looking for?: an eye-tracking study of information usage in web search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing SystemsGoogle Scholar
  4. 4.
    Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop (2004)Google Scholar
  5. 5.
    Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1) (August 1997)Google Scholar
  6. 6.
    Hentschel, C., Gerke, S., Mbanya, E.: Classifying images at scene level: Comparing global and local descriptors. In: Detyniecki, M., García-Serrano, A., Nürnberger, A., Stober, S. (eds.) AMR 2011. LNCS, vol. 7836, pp. 72–82. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  7. 7.
    Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia 12(1) (2010)Google Scholar
  8. 8.
    Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: IEEE International Conference on Computer Vision, ICCV 2005 (2005)Google Scholar
  9. 9.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision, ICCV 1999 (1999)Google Scholar
  10. 10.
    Pedregosa, F., Varoquaux, G., Gramfort, et al.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011)Google Scholar
  11. 11.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, ICCV 2003 (2003)Google Scholar
  12. 12.
    Snoek, C.G.M., Worring, M.: Concept-Based Video Retrieval. Foundations and Trends in Information Retrieval 2(4) (2009)Google Scholar
  13. 13.
    Yang, K., Zhang, L., Wang, M., Zhang, H.: Semantic point detector. In: Proceedings of the 19th ACM ..., pp. 1209–1212 (2011),
  14. 14.
    Yang, L.Y.L., Jin, R.J.R., Sukthankar, R., Jurie, F.: Unifying discriminative visual codebook generation with classifier training for object category recognition. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  15. 15.
    Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: CVPR 2011 (2011)Google Scholar
  16. 16.
    Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. International Journal of Computer Vision 73(2) (2006)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Christian Hentschel
    • 1
  • Harald Sack
    • 1
  1. 1.Hasso Plattner Institute for Software Systems EngineeringPotsdamGermany

Personalised recommendations