Determining Interacting Objects in Human-Centric Activities via Qualitative Spatio-Temporal Reasoning

  • Hajar Sadeghi SokehEmail author
  • Stephen Gould
  • Jochen Renz
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9007)


Understanding the activities taking place in a video is a challenging problem in Artificial Intelligence. Complex video sequences contain many activities and involve a multitude of interacting objects. Determining which objects are relevant to a particular activity is the first step in understanding the activity. Indeed many objects in the scene are irrelevant to the main activity taking place. In this work, we consider human-centric activities and look to identify which objects in the scene are involved in the activity. We take an activity-agnostic approach and rank every moving object in the scene with how likely it is to be involved in the activity. We use a comprehensive spatio-temporal representation that captures the joint movement between humans and each object. We then use supervised machine learning techniques to recognize relevant objects based on these features. Our approach is tested on the challenging Mind’s Eye dataset.


Optical Flow Activity Recognition Object Track Minority Class Relevant Object 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Prest, A., Ferrari, V., Schmid, C.: Explicit modeling of human-object interactions in realistic videos. Technical report RT-0411, INRIA (2011)Google Scholar
  2. 2.
    Wolter, D., Wallgrün, J.O.: Qualitative spatial reasoning for applications: new challenges and the sparq toolbox. In: Qualitative Spatio-Temporal Representation and Reasoning: Trends and Future Directions. IGI Global (2010)Google Scholar
  3. 3.
    Sridhar, M., Cohn, A.G., Hogg, D.C.: Benchmarking qualitative spatial calculi for video activity analysis. In: IJCAI Workshop Benchmarks and Applications of Spatial Reasoning, pp. 15–20 (2011)Google Scholar
  4. 4.
    Sridhar, M., Cohn, A.G., Hogg, D.C.: Unsupervised learning of event classes from video. In: Association for the Advancement of Artificial Intelligence (AAAI) (2010)Google Scholar
  5. 5.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Computer Vision and Pattern Recognition (CVPR), pp. 17–24 (2010)Google Scholar
  6. 6.
    Kjellström, H., Romero, J., Kragic, D.: Visual object-action recognition: Inferring object affordances from human demonstration. Comput. Vis. Image Underst. 115, 81–90 (2011)CrossRefGoogle Scholar
  7. 7.
    Sokeh, H.S., Gould, S., Renz, J.: Efficient extraction and representation of spatial information from video data. In: International Joint Conferences on Artificial Intelligence (IJCAI) (2013)Google Scholar
  8. 8.
    Cohn, A.G., Renz, J.: Qualitative spatial representation and reasoning. In: van Hermelen, F., Lifschitz, V., Porter, B. (eds.) Handbook of Knowledge Representation, pp. 551–596. Elsevier, Amsterdam (2008)CrossRefGoogle Scholar
  9. 9.
    Sridhar, M., Cohn, A.G., Hogg, D.C.: From video to RCC8: exploiting a distance based semantics to stabilise the interpretation of mereotopological relations. In: Egenhofer, M., Giudice, N., Moratz, R., Worboys, M. (eds.) COSIT 2011. LNCS, vol. 6899, pp. 110–125. Springer, Heidelberg (2011) CrossRefGoogle Scholar
  10. 10.
    Cohn, A.G., Renz, J., Sridhar, M.: Thinking inside the box: A comprehensive spatial representation for video analysis. In: International Conference on Principles of Knowledge Representation and Reasoning (KR) (2012)Google Scholar
  11. 11.
    Hernández, D., Clementini, E., Felice, P.D.: Qualitative distances. In: Conference On Spatial Information Theory (COSIT), pp. 45–57 (1995)Google Scholar
  12. 12.
    Felzenszwalb, P.F., Girshick, R.B., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part-based models. Pattern Anal. Mach. Intell. (PAMI) 32, 1627–1645 (2010)CrossRefGoogle Scholar
  13. 13.
    Sun, D., Roth, S., Black, M.J.: Secrets of optical flow estimation and their principles. In: CVPR, pp. 2432–2439 (2010)Google Scholar
  14. 14.
    Kalal, Z., Mikolajczyk, K., Matas, J.: Tracking-learning-detection. Pattern Anal. Mach. Intell. (PAMI) 34, 1409–1422 (2012)CrossRefGoogle Scholar
  15. 15.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Vision and Pattern Recognition (CVPR), pp. 886–893 (2005)Google Scholar
  16. 16.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: Smote: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)zbMATHGoogle Scholar
  17. 17.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27:1–27:27 (2011)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Hajar Sadeghi Sokeh
    • 1
    Email author
  • Stephen Gould
    • 1
  • Jochen Renz
    • 1
  1. 1.The Australian National UniversityCanberraAustralia

Personalised recommendations