BoVW-CAM: Visual Explanation from Bag of Visual Words

da Silva, Arnaldo Vitor Barros; Alves Pereira, Luis Filipe

doi:10.1007/978-3-031-21689-3_4

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13654 ))

Included in the following conference series:

Brazilian Conference on Intelligent Systems

852 Accesses

Abstract

Classical computer vision solutions were used to extract image features designed by human experts for encoding visual scenes into vectors. Machine learning algorithms were then applied to model such vector space and assign labels to unseen vectors. Alternatively, such space could be composed of histograms generated using the Bag of Visual Words (BoVW) that compute the number of occurrences of clustered handcrafted features/descriptors in each image. Currently, Deep Learning methods automatically learn image features that maximize the accuracy of classification and object recognition. Still, Deep Learning fails in terms of interpretability. To tackle this issue, methods such as Grad-CAM allow the visualization of regions from input images that support the predictions generated by Convolutional Neural Networks (CNNs), i.e. visual explanations. However, there is a lack of similar visualization techniques for handcrafted features. This fact obscures the comparison between modern CNNs and classical methods for image classification. In this work, we present the BoVW-CAM that indicates the most important image regions for each prediction given by the BoVW technique. This way, we show a novel approach to compare the performance of learned and handcrafted features in the image domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://www.kaggle.com/datasets/shaunthesheep/microsoft-catsvsdogs-dataset.

References

Alahi, A., Ortiz, R., Vandergheynst, P.: Freak: fast retina keypoint. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition, pp. 510–517. IEEE (2012)
Google Scholar
Alcantarilla, P.F., Bartoli, A., Davison, A.J.: KAZE features. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 214–227. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33783-3_16
Chapter Google Scholar
Bay, H., Tuytelaars, T., Van Gool, L.: SURF: speeded up robust features. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–417. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_32
Chapter Google Scholar
Calonder, M., Lepetit, V., Strecha, C., Fua, P.: BRIEF: binary robust independent elementary features. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 778–792. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-15561-1_56
Chapter Google Scholar
Csurka, G., Dance, C., Fan, L., Willamowski, J., Bray, C.: Visual categorization with bags of keypoints. In: Workshop on Statistical Learning in Computer Vision, ECCV, Prague, vol. 1, pp. 1–2 (2004)
Google Scholar
Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)
Article Google Scholar
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 21, 768–769 (1965)
Google Scholar
Garcia-Garcia, A., Orts-Escolano, S., Oprea, S., Villena-Martinez, V., Garcia-Rodriguez, J.: A review on deep learning techniques applied to semantic segmentation. arXiv preprint arXiv:1704.06857 (2017)
Hearst, M.A., Dumais, S.T., Osuna, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. App. 13(4), 18–28 (1998)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Leutenegger, S., Chli, M., Siegwart, R.Y.: Brisk: Binary robust invariant scalable keypoints. In: 2011 International Conference on Computer Vision, pp. 2548–2555. IEEE (2011)
Google Scholar
Lin, W., Hasenstab, K., Moura Cunha, G., Schwartzman, A.: Comparison of handcrafted features and convolutional neural networks for liver MR image adequacy assessment. Sci. Rep. 10(1), 1–11 (2020)
Article Google Scholar
Lipton, Z.C.: The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. Queue 16(3), 31–57 (2018)
Article Google Scholar
Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)
Article Google Scholar
Lu, D., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28(5), 823–870 (2007)
Article Google Scholar
Myles, A.J., Feudale, R.N., Liu, Y., Woody, N.A., Brown, S.D.: An introduction to decision tree modeling. J. Chemom. J. Chemom. Soc. 18(6), 275–285 (2004)
Google Scholar
Nanni, L., Ghidoni, S., Brahnam, S.: Handcrafted vs non-handcrafted features for computer vision classification. Pattern Recogn. 71, 158–172 (2017)
Article Google Scholar
Rosten, E., Drummond, T.: Machine learning for high-speed corner detection. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 430–443. Springer, Heidelberg (2006). https://doi.org/10.1007/11744023_34
Chapter Google Scholar
Rublee, E., Rabaud, V., Konolige, K., Bradski, G.: ORB: an efficient alternative to sift or surf. In: 2011 International Conference on Computer Vision, pp. 2564–2571. IEEE (2011)
Google Scholar
Saba, T.: Computer vision for microscopic skin cancer diagnosis using handcrafted and non-handcrafted features. Microsc. Res. Tech. 84(6), 1272–1283 (2021)
Article Google Scholar
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Google Scholar
da Silva, A.V.B., Pereira, L.F.A.: Handcrafted vs. learned features for automatically detecting violence in surveillance footage. In: Anais do XLIX Seminário Integrado de Software e Hardware, pp. 82–91. SBC (2022)
Google Scholar
Zar, J.H.: Spearman rank correlation. Encyclop. Biostatist. 7, 1–7 (2005)
Google Scholar
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2921–2929 (2016)
Google Scholar
Zou, Z., Shi, Z., Guo, Y., Ye, J.: Object detection in 20 years: a survey. arXiv preprint arXiv:1905.05055 (2019)

Download references

Author information

Authors and Affiliations

Universidade Federal do Agreste de Pernambuco, Avenida Bom Pastor, Garanhuns, Pernambuco, 55292-270, Brazil
Arnaldo Vitor Barros da Silva & Luis Filipe Alves Pereira

Authors

Arnaldo Vitor Barros da Silva
View author publications
You can also search for this author in PubMed Google Scholar
Luis Filipe Alves Pereira
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Arnaldo Vitor Barros da Silva .

Editor information

Editors and Affiliations

Federal University of Rio Grande do Norte, Natal, Brazil
João Carlos Xavier-Junior
Federal University of Bahia, Salvador, Brazil
Ricardo Araújo Rios

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

da Silva, A.V.B., Alves Pereira, L.F. (2022). BoVW-CAM: Visual Explanation from Bag of Visual Words. In: Xavier-Junior, J.C., Rios, R.A. (eds) Intelligent Systems. BRACIS 2022. Lecture Notes in Computer Science(), vol 13654 . Springer, Cham. https://doi.org/10.1007/978-3-031-21689-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-21689-3_4
Published: 19 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-21688-6
Online ISBN: 978-3-031-21689-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BoVW-CAM: Visual Explanation from Bag of Visual Words