On the Inherent Explainability of Pattern Theory-Based Video Event Interpretations

  • Sathyanarayanan N. AakurEmail author
  • Fillipe D. M. de Souza
  • Sudeep Sarkar
Part of the The Springer Series on Challenges in Machine Learning book series (SSCML)


The ability of artificial intelligence systems to offer explanations for its decisions is central to building user confidence and structuring smart human-machine interactions. Expressing the rationale behind such a system’s output is an important aspect of human-machine interaction as AI continues to be prominent in general, everyday use-cases. In this paper, we introduce a novel framework integrating Grenander’s pattern theory structures to produce inherently explainable, symbolic representations for activity interpretations. These representations provide semantically rich and coherent interpretations of video activity using connected structures of detected (grounded) concepts, such as objects and actions, that are bound by semantics through background concepts not directly observed, i.e. contextualization cues. We use contextualization cues to establish semantic relationships among concepts to infer a deeper interpretation of events than what can be directly sensed. We propose the use of six questions that can be used to gain insight into the models ability to justify its decision and enhance its ability to interact with humans. The six questions are designed to (1) build an understanding of how the model is able to infer interpretations, (2) enable us to walk through its decision-making process, and (3) understand its drawbacks and possibly address them. We demonstrate the viability of this idea on video data using a dialog model that uses interpretations to generate explanations grounded in both video data and semantics.


Explainability Activity interpretation ConceptNet Semantics 



This research was supported in part by NSF grants IIS 1217676 and CNS-1513126. The authors would also like to thank Daniel Sawyer for his invaluable insights during discussion.


  1. Baehrens D, Schroeter T, Harmeling S, Kawanabe M, Hansen K, MÞller KR (2010) How to explain individual classification decisions. Journal of Machine Learning Research 11(Jun):1803–1831MathSciNetzbMATHGoogle Scholar
  2. Biran O, McKeown K (2014) Justification narratives for individual classifications. In: Proceedings of the AutoML workshop at ICML, vol 2014Google Scholar
  3. Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N (2015) Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1721–1730Google Scholar
  4. Core MG, Lane HC, Van Lent M, Gomboc D, Solomon S, Rosenberg M (2006) Building explainable artificial intelligence systems. In: AAAI, pp 1766–1773Google Scholar
  5. Escalante HJ, Kaya H, Salah AA, Escalera S, Gucluturk Y, Guclu U, Baro X, Guyon I, Junior JJ, Madadi M, Ayache S, Viegas E, Gurpinar F, Sukma Wicaksana A, Liem CCS, van Gerven MAJ, van Lier R (2018) Explaining First Impressions: Modeling, Recognizing, and Explaining Apparent Personality from Videos. ArXiv e-prints 1802.00745 Google Scholar
  6. Grenander U (1996) Elements of pattern theory. JHU PressGoogle Scholar
  7. Gumperz JJ (1992) Contextualization and understanding. Rethinking context: Language as an interactive phenomenon 11:229–252Google Scholar
  8. Hendricks LA, Akata Z, Rohrbach M, Donahue J, Schiele B, Darrell T (2016) Generating visual explanations. In: European Conference on Computer Vision, Springer, pp 3–19CrossRefGoogle Scholar
  9. Herlocker JL, Konstan JA, Riedl J (2000) Explaining collaborative filtering recommendations. In: Proceedings of the 2000 ACM conference on Computer supported cooperative work, ACM, pp 241–250Google Scholar
  10. Junior JCSJ, Musse SR, Jung CR (2010) Crowd analysis using computer vision techniques. IEEE Signal Processing Magazine 27(5):66–77Google Scholar
  11. Kheradpisheh SR, Ghodrati M, Ganjtabesh M, Masquelier T (2016) Deep networks can resemble human feed-forward vision in invariant object recognition. Scientific Reports 6:32,672CrossRefGoogle Scholar
  12. Kuehne H, Arslan A, Serre T (2014) The language of actions: Recovering the syntax and semantics of goal-directed human activities. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp 780–787Google Scholar
  13. Lane HC, Core MG, Van Lent M, Solomon S, Gomboc D (2005) Explainable artificial intelligence for training and tutoring. Tech. rep., DTIC DocumentGoogle Scholar
  14. Ledley S, Lusted LB, Ledley RS (1959) Reasoning foundations of medical diagnosis. In: Science, CiteseerGoogle Scholar
  15. Linder N, Turkki R, Walliander M, Mårtensson A, Diwan V, Rahtu E, Pietikäinen M, Lundin M, Lundin J (2014) A malaria diagnostic tool based on computer vision screening and visualization of plasmodium falciparum candidate areas in digitized blood smears. PLoS One 9(8):e104,855CrossRefGoogle Scholar
  16. Liu H, Singh P (2004) Conceptnet’ a practical commonsense reasoning tool-kit. BT Technology Journal 22(4):211–226CrossRefGoogle Scholar
  17. Lomas M, Chevalier R, Cross II EV, Garrett RC, Hoare J, Kopack M (2012) Explaining robot actions. In: Proceedings of the seventh annual ACM/IEEE international conference on Human-Robot Interaction, ACM, pp 187–188Google Scholar
  18. Mahadevan V, Li W, Bhalodia V, Vasconcelos N (2010) Anomaly detection in crowded scenes. In: Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on, IEEE, pp 1975–1981Google Scholar
  19. Martens D, Provost F (2013) Explaining data-driven document classifications. MIS QuarterlyGoogle Scholar
  20. Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND (2009) Dataset shift in machine learning. The MIT PressGoogle Scholar
  21. Ribeiro MT, Singh S, Guestrin C (2016) Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, pp 1135–1144Google Scholar
  22. Shortliffe EH, Buchanan BG (1975) A model of inexact reasoning in medicine. Mathematical biosciences 23(3–4):351–379MathSciNetCrossRefGoogle Scholar
  23. Souza F, Sarkar S, Srivastava A, Su J (2015) Temporally coherent interpretations for long videos using pattern theory. In: CVPR, IEEE, pp 1229–1237Google Scholar
  24. de Souza FD, Sarkar S, Srivastava A, Su J (2016) Spatially coherent interpretations of videos using pattern theory. International Journal on Computer Vision pp 1–21Google Scholar
  25. Speer R, Havasi C (2013) Conceptnet 5: A large semantic network for relational knowledge. In: The People’s Web Meets NLP, Springer, pp 161–176Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  • Sathyanarayanan N. Aakur
    • 1
    Email author
  • Fillipe D. M. de Souza
    • 1
  • Sudeep Sarkar
    • 1
  1. 1.University of South FloridaDepartment of Computer Science and EngineeringTampaUSA

Personalised recommendations