Perceptual Narratives of Space and Motion for Semantic Interpretation of Visual Data

  • Jakob SuchanEmail author
  • Mehul Bhatt
  • Paulo E. Santos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8926)


We propose a commonsense theory of space and motion for the high-level semantic interpretation of dynamic scenes. The theory provides primitives for commonsense representation and reasoning with qualitative spatial relations, depth profiles, and spatio-temporal change; these may be combined with probabilistic methods for modelling and hypothesising event and object relations. The proposed framework has been implemented as a general activity abstraction and reasoning engine, which we demonstrate by generating declaratively grounded visuo-spatial narratives of perceptual input from vision and depth sensors for a benchmark scenario.

Our long-term goal is to provide general tools (integrating different aspects of space, action, and change) necessary for tasks such as real-time human activity interpretation and dynamic sensor control within the purview of cognitive vision, interaction, and control.


Logic Programming Spatial Change Semantic Interpretation Dynamic Scene Twilight Zone 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bhatt, M.: Reasoning about space, actions and change: a paradigm for applications of spatial reasoning. In: Qualitative Spatial Representation and Reasoning: Trends and Future Directions. IGI Global, USA (2012)Google Scholar
  2. 2.
    Cohn, A.G., Renz, J.: Qualitative spatial reasoning. In van Harmelen, F., Lifschitz, V., Porter, B., (eds.) Handbook of Knowledge Representation. Elsevier (2007)Google Scholar
  3. 3.
    Ligozat, G.: Qualitative Spatial and Temporal Reasoning. Wiley, ISTE (2013)Google Scholar
  4. 4.
    Bhatt, M., Guesgen, H., Wölfl, S., Hazarika, S.: Qualitative spatial and temporal reasoning: Emerging applications, trends, and directions. Spatial Cognition & Computation 11, 1–14 (2011)CrossRefGoogle Scholar
  5. 5.
    Bhatt, M., Lee, J.H., Schultz, C.: CLP(QS): a declarative spatial reasoning framework. In: Egenhofer, M., Giudice, N., Moratz, R., Worboys, M. (eds.) COSIT 2011. LNCS, vol. 6899, pp. 210–230. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  6. 6.
    Bhatt, M., Suchan, J., Schultz, C.: Cognitive interpretation of everyday activities - toward perceptual narrative based visuo-spatial scene interpretation. In: Finlayson, M., Fisseni, B., Lwe, B., Meister, J.C., (eds.) Computational Models of Narrative (CMN) (2013)Google Scholar
  7. 7.
    Lavee, G., Rivlin, E., Rudzsky, M.: Understanding video events: A survey of methods for automatic interpretation of semantic occurrences in video. IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews 39, 489–504 (2009)CrossRefGoogle Scholar
  8. 8.
    Gonzlez, J., Moeslund, T.B., Wang, L., (eds.) Special issue on Semantic Understanding of Human Behaviors in Image Sequences. In: Computer Vision and Image Understanding. vol. 116, pp. 305–472. Elsevier (2012)Google Scholar
  9. 9.
    Poppe, R.: A survey on vision-based human action recognition. Image and Vision Computing 28, 976–990 (2010)CrossRefGoogle Scholar
  10. 10.
    dos Santos, M., de Brito, R.C., Park, H.H., Santos, P.: Logic-based interpretation of geometrically observable changes occurring in dynamic scenes. Applied Intelligence 31, 161–179 (2009)CrossRefGoogle Scholar
  11. 11.
    Fernyhough, J.H., Cohn, A.G., Hogg, D.: Constructing qualitative event models automatically from video input. Image Vision Comput. 18, 81–103 (2000)CrossRefGoogle Scholar
  12. 12.
    Dubba, K.S.R., Cohn, A.G., Hogg, D.C.: Event model learning from complex videos using ILP. In: ECAI, pp. 93–98 (2010)Google Scholar
  13. 13.
    Sridhar, M., Cohn, A.G., Hogg, D.C.: Unsupervised learning of event classes from video. In: AAAI (2010)Google Scholar
  14. 14.
    Dee, H.M., Cohn, A.G., Hogg, D.C.: Building semantic scene models from unconstrained video. Computer Vision and Image Understanding 116, 446–456 (2012)CrossRefGoogle Scholar
  15. 15.
    Tran, S.D., Davis, L.S.: Event modeling and recognition using markov logic networks. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part II. LNCS, vol. 5303, pp. 610–623. Springer, Heidelberg (2008) CrossRefGoogle Scholar
  16. 16.
    Morariu, V., Davis, L.: Multi-agent event recognition in structured scenarios. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2011)Google Scholar
  17. 17.
    Song, Y.C., Kautz, H., Allen, J., Swift, M., Li, Y., Luo, J., Zhang, C.: A markov logic framework for recognizing complex events from multimodal data. In: Proceedings of the 15th ACM on International Conference on Multimodal Interaction. ICMI 2013, pp. 141–148. ACM, New York (2013)Google Scholar
  18. 18.
    Bohlken, W., Neumann, B., Hotz, L., Koopmann, P.: Ontology-based realtime activity monitoring using beam search. In: Crowley, J.L., Draper, B.A., Thonnat, M. (eds.) ICVS 2011. LNCS, vol. 6962, pp. 112–121. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  19. 19.
    Santos, P.: Reasoning about depth and motion from an observer’s viewpoint. Spatial Cognition and Computation 7, 133–178 (2007)CrossRefGoogle Scholar
  20. 20.
    Vernon, D.: Cognitive vision: The case for embodied perception. Image Vision Comput. 26, 127–140 (2008)CrossRefGoogle Scholar
  21. 21.
    Vernon, D.: The Space of Cognitive Vision. In: Christensen, H.I., Nagel, H.-H. (eds.) Cognitive Vision Systems. LNCS, vol. 3948, pp. 7–24. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  22. 22.
    Auer et al.: A research roadmap of cognitive vision. Technical Report v5, ECVISION (2005).
  23. 23.
    Sridhar, M., Cohn, A.G., Hogg, D.C.: Learning functional object-categories from a relational spatio-temporal representation. In: ECAI, pp. 606–610 (2008)Google Scholar
  24. 24.
    Dubba, K.S.R., Cohn, A.G., Hogg, D.C.: Event model learning from complex videos using ilp. In: Proc. ECAI. Volume 215 of Frontiers in Artificial Intelligence and Applications, pp. 93–98. IOS Press (2010)Google Scholar
  25. 25.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots learning a visually grounded storyline model from annotated videos. In: IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2009, pp. 2012–2019 (2009)Google Scholar
  26. 26.
    Cohn, A.G., Hogg, D.C., Bennett, B., Devin, V., Galata, A., Magee, D.R., Needham, C.J., Santos, P.: Cognitive vision: integrating symbolic qualitative representations with computer vision. In: Christensen, H.I., Nagel, H.-H. (eds.) Cognitive Vision Systems. LNCS, vol. 3948, pp. 221–246. Springer, Heidelberg (2006) CrossRefGoogle Scholar
  27. 27.
    Cohn, A., Hazarika, S.: Qualitative spatial representation and reasoning: An overview. Fundam. Inf. 46, 1–29 (2001)zbMATHMathSciNetGoogle Scholar
  28. 28.
    Freksa, C.: Conceptual neighborhood and its role in temporal and spatial reasoning. In: Singh, M., Travé-Massuyès, L. (eds.) Decision Support Systems and Qualitative Reasoning, pp. 181–187. North-Holland, Amsterdam (1991)Google Scholar
  29. 29.
    Galton, A.: Towards an integrated logic of space, time and motion. In: IJCAI, pp. 1550–1557 (1993)Google Scholar
  30. 30.
    Galton, A.: Towards a qualitative theory of movement. In: Frank, A.U., Kuhn, W. (eds.) Spatial Information Theory - A Theoretical Basis for GIS (COSIT’95), pp. 377–396. Springer, Heidelberg (1995)CrossRefGoogle Scholar
  31. 31.
    Galton, A.: Qualitative Spatial Change. Oxford University Press (2000)Google Scholar
  32. 32.
    Muller, P.: A qualitative theory of motion based on spatio-temporal primitives. In: Cohn, A.G., Schubert, L.K., Shapiro, S.C., (eds.) Proceedings of the Sixth International Conference on Principles of Knowledge Representation and Reasoning (KR’98), pp. 131–143. Morgan Kaufmann, Trento, June 2–5, 1998Google Scholar
  33. 33.
    Hazarika, S.M., Cohn, A.G.: Abducing qualitative spatio-temporal histories from partial observations. In: KR, pp. 14–25 (2002)Google Scholar
  34. 34.
    Davis, E.: Qualitative reasoning and spatio-temporal continuity. In: Hazarika, S.M. (ed.) Qualitative Spatio-Temporal Representation and Reasoning: Trends and Future Directions, pp. 97–146. IGI Global, Hershey (2012)CrossRefGoogle Scholar
  35. 35.
    Cohn, A.G., Bennett, B., Gooday, J., Gotts, N.M.: Qualitative spatial representation and reasoning with the region connection calculus. Geoinformatica 1, 275–316 (1997)CrossRefGoogle Scholar
  36. 36.
    Randell, D., Witkowski, M., Shanahan, M.: From images to bodies: Modeling and exploiting spatial occlusion and motion parallax. In: Proc. of IJCAI, Seattle, U.S., pp. 57–63 (2001)Google Scholar
  37. 37.
    Allen, J.F.: Maintaining knowledge about temporal intervals. Commun. ACM 26, 832–843 (1983)CrossRefzbMATHGoogle Scholar
  38. 38.
    Tassoni, S., Fogliaroni, P., Bhatt, M., Felice, G.D.: Toward a Qualitative 3D Visibility Model. In: 25th International Workshop on Qualitative Reasoning, co-located with the IJCAI-11 Conference, Barcelona, Spain (2011)Google Scholar
  39. 39.
    Bhatt, M., Suchan, J., Freksa, C.: ROTUNDE - A Smart Meeting Cinematography Initiative. In: Bhatt, M., Guesgen, H., Cook, D. (eds.) Proceedings of the AAAI-2013 Workshop on Space, Time, and Ambient Intelligence (STAMI). AAAI Press, Washington (2013)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Cognitive SystemsUniversity of BremenBremenGermany
  2. 2.Centro Universitario da FEISão PauloBrazil

Personalised recommendations