What Image Classifiers Really See – Visualizing Bag-of-Visual Words Models

Hentschel, Christian; Sack, Harald

doi:10.1007/978-3-319-14445-0_9

Christian Hentschel¹⁹ &
Harald Sack¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8935))

Included in the following conference series:

International Conference on Multimedia Modeling

3762 Accesses
3 Citations

Abstract

Bag-of-Visual-Words (BoVW) features which quantize and count local gradient distributions in images similar to counting words in texts have proven to be powerful image representations. In combination with supervised machine learning approaches, models for nearly every visual concept can be learned. BoVW feature extraction, however, is performed by cascading multiple stages of local feature detection and extraction, vector quantization and nearest neighbor assignment that makes interpretation of the obtained image features and thus the overall classification results very difficult. In this work, we present an approach for providing an intuitive heat map-like visualization of the influence each image pixel has on the overall classification result. We compare three different classifiers (AdaBoost, Random Forest and linear SVM) that were trained on the Caltech-101 benchmark dataset based on their individual classification performance and the generated model visualizations. The obtained visualizations not only allow for intuitive interpretation of the classification results but also help to identify sources of misclassification due to badly chosen training examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Breiman, L.: Random forests. Machine Learning (2001)
Google Scholar
Csurka, G., Dance, C.R., Fan, L., Willamowski, J., Bray, C., Maupertuis, D.: Visual Categorization with Bags of Keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV (2004)
Google Scholar
Cutrell, E., Guan, Z.: What are you looking for?: an eye-tracking study of information usage in web search. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems
Google Scholar
Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. In: 2004 Conference on Computer Vision and Pattern Recognition Workshop (2004)
Google Scholar
Freund, Y., Schapire, R.E.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55(1) (August 1997)
Google Scholar
Hentschel, C., Gerke, S., Mbanya, E.: Classifying images at scene level: Comparing global and local descriptors. In: Detyniecki, M., García-Serrano, A., Nürnberger, A., Stober, S. (eds.) AMR 2011. LNCS, vol. 7836, pp. 72–82. Springer, Heidelberg (2013)
Chapter Google Scholar
Jiang, Y.G., Yang, J., Ngo, C.W., Hauptmann, A.G.: Representations of Keypoint-Based Semantic Concept Detection: A Comprehensive Study. IEEE Transactions on Multimedia 12(1) (2010)
Google Scholar
Jurie, F., Triggs, B.: Creating efficient codebooks for visual recognition. In: IEEE International Conference on Computer Vision, ICCV 2005 (2005)
Google Scholar
Lowe, D.G.: Object recognition from local scale-invariant features. In: IEEE International Conference on Computer Vision, ICCV 1999 (1999)
Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, et al.: Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011)
Google Scholar
Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: IEEE International Conference on Computer Vision, ICCV 2003 (2003)
Google Scholar
Snoek, C.G.M., Worring, M.: Concept-Based Video Retrieval. Foundations and Trends in Information Retrieval 2(4) (2009)
Google Scholar
Yang, K., Zhang, L., Wang, M., Zhang, H.: Semantic point detector. In: Proceedings of the 19th ACM ..., pp. 1209–1212 (2011), http://dl.acm.org/citation.cfm?id=2071976
Yang, L.Y.L., Jin, R.J.R., Sukthankar, R., Jurie, F.: Unifying discriminative visual codebook generation with classifier training for object category recognition. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition (2008)
Google Scholar
Yao, B., Khosla, A., Fei-Fei, L.: Combining randomization and discrimination for fine-grained image categorization. In: CVPR 2011 (2011)
Google Scholar
Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study. International Journal of Computer Vision 73(2) (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

Hasso Plattner Institute for Software Systems Engineering, Potsdam, Germany
Christian Hentschel & Harald Sack

Authors

Christian Hentschel
View author publications
You can also search for this author in PubMed Google Scholar
Harald Sack
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Technology, P.O. Box 123, 2007, Sydney, NSW, Australia
Xiangjian He , Dacheng Tao & Muhammad Abul Hasan , &
University of Newcastle, University Dr, Callaghan, 2308, NSW, Australia
Suhuai Luo
National Lab of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, 95, Zhongguancun East Road, 100190, Beijing, P.R. China
Changsheng Xu
Shanghai Jitotong University, 800 Dong Chuan Rd, 200240, Shanghai, China
Jie Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Hentschel, C., Sack, H. (2015). What Image Classifiers Really See – Visualizing Bag-of-Visual Words Models. In: He, X., Luo, S., Tao, D., Xu, C., Yang, J., Hasan, M.A. (eds) MultiMedia Modeling. MMM 2015. Lecture Notes in Computer Science, vol 8935. Springer, Cham. https://doi.org/10.1007/978-3-319-14445-0_9

Download citation

DOI: https://doi.org/10.1007/978-3-319-14445-0_9
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-14444-3
Online ISBN: 978-3-319-14445-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics