On the Inherent Explainability of Pattern Theory-Based Video Event Interpretations

Aakur, Sathyanarayanan N.; de Souza, Fillipe D. M.; Sarkar, Sudeep

doi:10.1007/978-3-319-98131-4_11

Sathyanarayanan N. Aakur¹¹,
Fillipe D. M. de Souza¹¹ &
Sudeep Sarkar¹¹

Part of the book series: The Springer Series on Challenges in Machine Learning ((SSCML))

4241 Accesses

Abstract

The ability of artificial intelligence systems to offer explanations for its decisions is central to building user confidence and structuring smart human-machine interactions. Expressing the rationale behind such a system’s output is an important aspect of human-machine interaction as AI continues to be prominent in general, everyday use-cases. In this paper, we introduce a novel framework integrating Grenander’s pattern theory structures to produce inherently explainable, symbolic representations for activity interpretations. These representations provide semantically rich and coherent interpretations of video activity using connected structures of detected (grounded) concepts, such as objects and actions, that are bound by semantics through background concepts not directly observed, i.e. contextualization cues. We use contextualization cues to establish semantic relationships among concepts to infer a deeper interpretation of events than what can be directly sensed. We propose the use of six questions that can be used to gain insight into the models ability to justify its decision and enhance its ability to interact with humans. The six questions are designed to (1) build an understanding of how the model is able to infer interpretations, (2) enable us to walk through its decision-making process, and (3) understand its drawbacks and possibly address them. We demonstrate the viability of this idea on video data using a dialog model that uses interpretations to generate explanations grounded in both video data and semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Hardcover + eBook: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, MÃžller KR (2010) How to explain individual classification decisions. Journal of Machine Learning Research 11(Jun):1803–1831
MathSciNet MATH Google Scholar
Biran O, McKeown K (2014) Justification narratives for individual classifications. In: Proceedings of the AutoML workshop at ICML, vol 2014
Google Scholar
Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1721–1730
Google Scholar
Core MG, Lane HC, Van Lent M, Gomboc D, Solomon S, Rosenberg M (2006) Building explainable artificial intelligence systems. In: AAAI, pp 1766–1773
Google Scholar
Escalante HJ, Kaya H, Salah AA, Escalera S, Gucluturk Y, Guclu U, Baro X, Guyon I, Junior JJ, Madadi M, Ayache S, Viegas E, Gurpinar F, Sukma Wicaksana A, Liem CCS, van Gerven MAJ, van Lier R (2018) Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos. ArXiv e-prints 1802.00745
Google Scholar
Grenander U (1996) Elements of pattern theory. JHU Press
Google Scholar
Gumperz JJ (1992) Contextualization and understanding. Rethinking context: Language as an interactive phenomenon 11:229–252
Google Scholar
Hendricks LA, Akata Z, Rohrbach M, Donahue J, Schiele B, Darrell T (2016) Generating visual explanations. In: European Conference on Computer Vision, Springer, pp 3–19
Chapter Google Scholar
Herlocker JL, Konstan JA, Riedl J (2000) Explaining collaborative filtering recommendations. In: Proceedings of the 2000 ACM conference on Computer supported cooperative work, ACM, pp 241–250
Google Scholar
Junior JCSJ, Musse SR, Jung CR (2010) Crowd analysis using computer vision techniques. IEEE Signal Processing Magazine 27(5):66–77
Google Scholar
Kheradpisheh SR, Ghodrati M, Ganjtabesh M, Masquelier T (2016) Deep networks can resemble human feed-forward vision in invariant object recognition. Scientific Reports 6:32,672
Article Google Scholar
Kuehne H, Arslan A, Serre T (2014) The language of actions: Recovering the syntax and semantics of goal-directed human activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 780–787
Google Scholar
Lane HC, Core MG, Van Lent M, Solomon S, Gomboc D (2005) Explainable artificial intelligence for training and tutoring. Tech. rep., DTIC Document
Google Scholar
Ledley S, Lusted LB, Ledley RS (1959) Reasoning foundations of medical diagnosis. In: Science, Citeseer
Google Scholar
Linder N, Turkki R, Walliander M, Mårtensson A, Diwan V, Rahtu E, Pietikäinen M, Lundin M, Lundin J (2014) A malaria diagnostic tool based on computer vision screening and visualization of plasmodium falciparum candidate areas in digitized blood smears. PLoS One 9(8):e104,855
Article Google Scholar
Liu H, Singh P (2004) Conceptnet’ a practical commonsense reasoning tool-kit. BT Technology Journal 22(4):211–226
Article Google Scholar
Lomas M, Chevalier R, Cross II EV, Garrett RC, Hoare J, Kopack M (2012) Explaining robot actions. In: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, ACM, pp 187–188
Google Scholar
Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, pp 1975–1981
Google Scholar
Martens D, Provost F (2013) Explaining data-driven document classifications. MIS Quarterly
Google Scholar
Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. The MIT Press
Google Scholar
Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1135–1144
Google Scholar
Shortliffe EH, Buchanan BG (1975) A model of inexact reasoning in medicine. Mathematical biosciences 23(3–4):351–379
Article MathSciNet Google Scholar
Souza F, Sarkar S, Srivastava A, Su J (2015) Temporally coherent interpretations for long videos using pattern theory. In: CVPR, IEEE, pp 1229–1237
Google Scholar
de Souza FD, Sarkar S, Srivastava A, Su J (2016) Spatially coherent interpretations of videos using pattern theory. International Journal on Computer Vision pp 1–21
Google Scholar
Speer R, Havasi C (2013) Conceptnet 5: A large semantic network for relational knowledge. In: The People’s Web Meets NLP, Springer, pp 161–176
Google Scholar

Download references

Acknowledgements

This research was supported in part by NSF grants IIS 1217676 and CNS-1513126. The authors would also like to thank Daniel Sawyer for his invaluable insights during discussion.

Author information

Authors and Affiliations

University of South Florida, Department of Computer Science and Engineering, Tampa, FL, USA
Sathyanarayanan N. Aakur, Fillipe D. M. de Souza & Sudeep Sarkar

Authors

Sathyanarayanan N. Aakur
View author publications
You can also search for this author in PubMed Google Scholar
Fillipe D. M. de Souza
View author publications
You can also search for this author in PubMed Google Scholar
Sudeep Sarkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sathyanarayanan N. Aakur .

Editor information

Editors and Affiliations

INAOE, Puebla, Mexico
Hugo Jair Escalante
University of Barcelona, Barcelona, Spain
Sergio Escalera
INRIA, Université Paris Sud, Université Paris Saclay, Paris, France
Isabelle Guyon
Open University of Catalonia, Barcelona, Spain
Xavier Baró
Radboud University Nijmegen, Nijmegen, The Netherlands
Yağmur Güçlütürk
Radboud University Nijmegen, Nijmegen, The Netherlands
Umut Güçlü
Radboud University Nijmegen, Nijmegen, The Netherlands
Marcel van Gerven

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Aakur, S.N., de Souza, F.D.M., Sarkar, S. (2018). On the Inherent Explainability of Pattern Theory-Based Video Event Interpretations. In: Escalante, H., et al. Explainable and Interpretable Models in Computer Vision and Machine Learning. The Springer Series on Challenges in Machine Learning. Springer, Cham. https://doi.org/10.1007/978-3-319-98131-4_11

Download citation

DOI: https://doi.org/10.1007/978-3-319-98131-4_11
Published: 30 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-98130-7
Online ISBN: 978-3-319-98131-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics