KI - Künstliche Intelligenz

, Volume 31, Issue 4, pp 321–330 | Cite as

Declarative Reasoning about Space and Motion with Video

  • Jakob Suchan
Technical Contribution


We present a commonsense theory of space and motion for representing and reasoning about motion patterns in video data, to perform declarative (deep) semantic interpretation of visuo-spatial sensor data, e.g., coming from object tracking, eye tracking data, movement trajectories. The theory has been implemented within constraint logic programming to support integration into large scale AI projects. The theory is domain independent and has been applied in a range of domains, in which the capability to semantically interpret motion in visuo-spatial data is central. In this paper, we demonstrate its capabilities in the context of cognitive film studies for analysing visual perception of spectators by integrating the visual structure of a scene and spectators gaze acquired from eye tracking experiments.


Cognitive vision Spatio-temporal dynamics Semantic interpretation of video 


  1. 1.
    Al-Omari M, Chinellato E, Gatsoulis Y, Hogg DC, Cohn AG (2016) Unsupervised grounding of textual descriptions of object features and actions in video. In: Baral C, Delgrande JP, Wolter F (eds) Principles of knowledge representation and reasoning: Proceedings of the fifteenth international conference, KR 2016, Cape Town, South Africa, April 25–29, 2016, pp 505–508. AAAI PressGoogle Scholar
  2. 2.
    Allen J F (1983) Maintaining knowledge about temporal intervals. Commun ACM 26(11):832–843 (ISSN 0001-0782) CrossRefzbMATHGoogle Scholar
  3. 3.
    Aloimonos Y, Fermüller C (2015) The cognitive dialogue: a new model for vision implementing common sense reasoning. Image Vis Comput 34:42–44. doi: 10.1016/j.imavis.2014.10.010 (ISSN 0262-8856)
  4. 4.
    Bhatt M, Lee JH, Schultz C (2011) CLP(QS): a declarative spatial reasoning framework. COSIT 2011—spatial information theory. Springer, Berlin, pp 210–230 (ISBN 978-3-642-23195-7) Google Scholar
  5. 5.
    Bhatt M, Suchan J, Freksa C (2013a) Rotunde—a smart meeting cinematography initiative—tools, datasets, and benchmarks for cognitive interpretation and control. In: Space, time, and ambient intelligence. Papers from the 2013 AAAI Workshop, Bellevue, Washington, USA, July 14, 2013, volume WS-13-14 of AAAI Workshops. AAAIGoogle Scholar
  6. 6.
    Bhatt M, Suchan J, Schultz CPL (2013b) Cognitive interpretation of everyday activities - toward perceptual narrative based visuo-spatial scene interpretation. In: Finlayson MA, Fisseni B, Löwe B, Meister JC (eds) 2013 workshop on computational models of narrative, CMN 2013, August 4–6, 2013, Hamburg, Germany, volume 32 of OASICS. Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, pp 24–29Google Scholar
  7. 7.
    Bhatt M, Suchan J, Kondyli V, Schultz CPL (2016a) Embodied visuo-locomotive experience analysis: immersive reality based summarisation of experiments in environment-behaviour studies. In: Jain E, Jörg S (eds) Proceedings of the ACM symposium on applied perception, SAP 2016, Anaheim, California, USA, July 22–23, 2016, p 133. ACMGoogle Scholar
  8. 8.
    Bhatt M, Suchan J, Schultz CPL, Kondyli V, Goyal S (2016b) Artificial intelligence for predictive and evidence based architecture design. In: Schuurmans D, Wellman MP (eds) Proceedings of the thirtieth AAAI conference on artificial intelligence, February 12–17, 2016, Phoenix, Arizona, USA, pp 4349–4350. AAAI PressGoogle Scholar
  9. 9.
    Cohn A, Hogg D, Bennett B, Devin V, Galata A, Magee D, Needham C, Santos P (2006) Cognitive vision: integrating symbolic qualitative representations with computer vision. In: Christensen H, Nagel H-H (eds) Cognitive vision systems, volume 3948 of lecture notes in computer science, vol 3948. Springer, Berlin, pp 221–246. doi: 10.1007/11414353_14 (ISBN 978-3-540-33971-7) Google Scholar
  10. 10.
    Dubba KSR, Cohn AG, Hogg DC (2010) Event model learning from complex videos using ILP. In: Proceedings of ECAI, volume 215 of frontiers in artificial intelligence and applications, pp 93–98. IOS PressGoogle Scholar
  11. 11.
    Dubba KSR, Cohn AG, Hogg DC, Bhatt M, Dylla F (2015) Learning relational event models from video. J Artif Intell Res (JAIR) 53:41–90Google Scholar
  12. 12.
    Gärdenfors P (2000) Conceptual spaces—the geometry of thought. MIT Press, Cambridge ISBN 978-0-262-07199-4Google Scholar
  13. 13.
    Guesgen HW (1989) Spatial reasoning based on Allen’s temporal logic. Technical Report TR-89-049. International Computer Science Institute, BerkeleyGoogle Scholar
  14. 14.
    Hazarika SM, Cohn AG (2002) Abducing qualitative spatio-temporal histories from partial observations. In: KR’02 Proceedings of the eights international conference on principles of knowledge representation and reasoning. Morgan Kaufmann, San Francisco, pp 14–25Google Scholar
  15. 15.
    Lieto A, Chella A, Frixione M (2017) Conceptual spaces for cognitive architectures: a lingua franca for different levels of representation. CoRR. arxiv:1701.00464
  16. 16.
    Mandler J M, Pagn Cnovas C (2014) On defining image schemas. Lang Cogn 6(4):510?532. doi: 10.1017/langcog.2014.14 CrossRefGoogle Scholar
  17. 17.
    Moratz R (2006) Representing relative direction as a binary relation of oriented points. In ECAI, pp 407–411Google Scholar
  18. 18.
    Muller P (1998) A qualitative theory of motion based on spatio-temporal primitives. In: Cohn AG, Schubert LK, Shapiro SC (eds) Proceedings of the sixth international conference on principles of knowledge representation and reasoning (KR’98), Trento, Italy, June 2–5, 1998, pp 131–143. Morgan KaufmannGoogle Scholar
  19. 19.
    Randell DA, Cui Z, Cohn A (1992) A spatial logic based on regions and connection. In: KR’92. Principles of knowledge representation and reasoning: Proceedings of the third international conference, pp 165–176. Morgan Kaufmann, San Mateo, CaliforniaGoogle Scholar
  20. 20.
    Rohrbach M, Rohrbach A, Regneri M, Amin S, Andriluka M, Pinkal M, Schiele B (2016) Recognizing fine-grained and composite activities using hand-centric features and script data. Int J Comput Vis 119(3):346–373CrossRefMathSciNetGoogle Scholar
  21. 21.
    Rohrbach A, Torabi A, Rohrbach M, Tandon N, Pal C, Larochelle H, Courville A, Schiele B (2017) Movie description. Int J Comput Vis 123(1):94–120Google Scholar
  22. 22.
    Schultz CPL, Bhatt M (2014) Declarative spatial reasoning with Boolean combinations of axis-aligned rectangular polytopes. In: ECAI 2014—21st European conference on artificial intelligence, 18–22 August 2014, Prague, Czech Republic—including prestigious applications of intelligent systems (PAIS 2014), pp 795–800. doi: 10.3233/978-1-61499-419-0-795
  23. 23.
    Schultz CPL, Bhatt M, Suchan J (2016) Probabilistic spatial reasoning in constraint logic programming. In: Schockaert S, Senellart P (eds) Scalable uncertainty management—10th international conference, SUM 2016, Nice, France, September 21–23, 2016, Proceedings, volume 9858 of lecture notes in computer science, pp 289–302. SpringerGoogle Scholar
  24. 24.
    Scivos A, Nebel B (2004) the finest of its class: the natural, point-based ternary calculus LR for qualitative spatial reasoning. In Freksa C, et al (2005) Spatial cognition IV. Reasoning, action, interaction: international co nference spatial cognition. Lecture notes in computer science, vol 3343, Springer, Berlin, volume 3343, pp 283–303Google Scholar
  25. 25.
    Song YC, Kautz H, Allen J, Swift M, Li Y, Luo J, Zhang C (2013) A Markov logic framework for recognizing complex events from multimodal data. In: Proceedings of the 15th ACM on international conference on multimodal interaction, ICMI ’13, pp 141–148, New York, NY, USA, 2013. ACM. doi: 10.1145/2522848.2522883 (ISBN 978-1-4503-2129-7)
  26. 26.
    Spranger M, Suchan J, Bhatt M, Eppe M (2014) Grounding dynamic spatial relations for embodied (robot) interaction. In: PRICAI 2014: trends in artificial intelligence—13th Pacific Rim International conference on artificial intelligence, Gold Coast, QLD, Australia, December 1–5, 2014. Proceedings, volume 8862, pp 958–971. Springer. doi: 10.1007/978-3-319-13560-1_83
  27. 27.
    Spranger M, Suchan J, Bhatt M (2016) Robust natural language processing—combining reasoning, cognitive semantics, and construction grammar for spatial language. In: Kambhampati S (ed) Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 2908–2914. IJCAI/AAAI PressGoogle Scholar
  28. 28.
    Sridhar M, Cohn AG, Hogg DC (2011) Benchmarking qualitative spatial calculi for video activity analysis. In: Proceedings of IJCAI workshop benchmarks and applications of spatial reasoning, pp 15–20Google Scholar
  29. 29.
    Srinivasan A (2001) The Aleph manual. Accessed 18 Aug 2017
  30. 30.
    Suchan J, Bhatt M (2016a) Semantic question-answering with video and eye-tracking data: AI foundations for human visual perception driven cognitive film studies. In: Kambhampati S (ed) Proceedings of the twenty-fifth international joint conference on artificial intelligence, IJCAI 2016, New York, NY, USA, 9–15 July 2016, pp 2633–2639. IJCAI/AAAI PressGoogle Scholar
  31. 31.
    Suchan J, Bhatt M (2016b) The geometry of a scene: On deep semantics for visual perception driven cognitive film, studies. In: 2016 IEEE winter conference on applications of computer vision, WACV 2016, Lake Placid, NY, USA, March 7–10, 2016, pp 1–9. IEEE Computer SocietyGoogle Scholar
  32. 32.
    Suchan J, Bhatt M, Santos PE (2014) Perceptual narratives of space and motion for semantic interpretation of visual data. In: de Agapito L, Bronstein MM, Rother C (eds) Computer vision—ECCV 2014 workshops—Zurich, Switzerland, September 6–7 and 12, 2014, Proceedings, Part II, volume 8926 of lecture notes in computer science, pp 339–354. SpringerGoogle Scholar
  33. 33.
    Suchan J, Bhatt M, Schultz CPL (2016) Deeply semantic inductive spatio-temporal learning. CoRR. arxiv:1608.02693
  34. 34.
    Tran S, Davis LS (2008) Event modeling and recognition using Markov logic networks. Computer Vision-ECCV 2008, pp 610 – 623Google Scholar
  35. 35.
    Tu K, Meng M, Lee MW, Choe TE, Zhu SC (2014) Joint video and text parsing for understanding events and answering queries. IEEE Multimed 21(2):42–70CrossRefGoogle Scholar
  36. 36.
    Vernon D (2006) The space of cognitive vision. In: Christensen HI, Nagel HH (eds) Cognitive vision systems. Lecture notes in computer science, vol 3948. Springer, Berlin, HeidelbergGoogle Scholar
  37. 37.
    Vernon D (2008) Cognitive vision: the case for embodied perception. Image Vis Comput 26(1):127–140CrossRefGoogle Scholar
  38. 38.
    Walega P, Bhatt M, Schultz C (2015) ASPMT(QS): non-monotonic spatial reasoning with answer set programming modulo theories. In: LPNMR: logic programming and nonmonotonic reasoning—13th international conferenceGoogle Scholar
  39. 39.
    Yang Y, Aloimonos Y, Fermüller C, Aksoy EE (2015) Learning the semantics of manipulation action. In: Proceedings of the 53rd annual meeting of the association for computational linguistics and the 7th international joint conference on natural language processing of the Asian Federation of Natural Language Processing, ACL 2015, July 26–31, 2015, Beijing, China, Volume 1: long papers, pp 676–686. The Association for Computer LinguisticsGoogle Scholar
  40. 40.
    Yu H, Siddharth N, Barbu A, Siskind JM (2015) A compositional framework for grounding language inference, generation, and acquisition in video. J Artif Intell Res 52:601–713. doi: 10.1613/jair.4556 zbMATHMathSciNetGoogle Scholar
  41. 41.
    Zampogiannis K, Yang Y, Fermüller C, Aloimonos Y (2015) Learning the spatial semantics of manipulation actions through preposition grounding. In: IEEE international conference on robotics and automation, ICRA 2015, Seattle, WA, USA, 26–30 May, 2015, pp 1389–1396. IEEE. doi: 10.1109/ICRA.2015.7139371

Copyright information

© Springer-Verlag GmbH Deutschland 2017

Authors and Affiliations

  1. 1.Human-Centred Cognitive Assistance LaboratoryUniversity of BremenBremenGermany

Personalised recommendations