Predicting Human Actions Taking into Account Object Affordances

  • Vibekananda Dutta
  • Teresa Zielinska
Open Access


Anticipating human intentional actions is essential for many applications involving service robots and social robots. Nowadays assisting robots must do reasoning beyond the present with predicting future actions. It is difficult due to its non-Markovian property and the rich contextual information. This task requires the subtle details inherent in human movements that may imply a future action. This paper presents a probabilistic method for action prediction in human-object interactions. The key idea of our approach is the description of the so-called object affordance, the concept which allows us to deliver a trajectory visualizing a possible future action. Extensive experiments were conducted to show the effectiveness of our method in action prediction. For evaluation we applied a new RGB-D activity video dataset recorded by the Sez3D depth sensors. The dataset contains several human activities composed out of different actions.


Intention recognition Human-object relation Object affordance Action prediction Feature extraction Probability distribution 



The initial stage of the work was supported by “HERITAGE” EU program (Grant Agreement 2012-2648/001-001 EM Action 2 Partnership) and in the later stages, the work was supported by the Preludium 11 (Grant No. 2016/21/ N/ST7/ 01614) funded by Polish National Science Center (NCN).


  1. 1.
    Baltieri, D., Vezzani, R., Cucchiara, R.: People orientation recognition by mixtures of wrapped distributions on random trees. In: Computer Vision–ECCV, pp. 270–283 (2012)Google Scholar
  2. 2.
    Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: 12th Int. Conf. on Computer Vision, pp. 1365–1372. IEEE (2009)Google Scholar
  3. 3.
    Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: Proc., Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 994–999. IEEE (1997)Google Scholar
  4. 4.
  5. 5.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR, vol. 1, pp. 886–893. IEEE (2005)Google Scholar
  6. 6.
    Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: Space-time pose representation for 3d human action recognition. In: Int. Conf. Image Analysis and Processing, pp. 456–464. Springer (2013)Google Scholar
  7. 7.
    Diethe, T., Twomey, N., Flach, P.: Bayesian modelling of the temporal aspects of smart home activity with circular statistics. In: Joint European Conf. Machine Learning and Knowledge Discovery in Databases, pp. 279–294. Springer (2015)Google Scholar
  8. 8.
    Dutta, V.: Mobile robot applied to qr landmark localization based on the keystone effect. In: Mechatronics and Robotics Engineering for Advanced and Intelligent Manufacturing, pp. 45–60. Springer (2017)Google Scholar
  9. 9.
    Dutta, V., Zielinska, T.: Predicting the intention of human activities for real-time human-robot interaction (hri). In: Int. Conf. Social Robotics, pp. 723–734. Springer (2016)Google Scholar
  10. 10.
    Dutta, V., Zielinska, T.: Action prediction based on physically grounded object affordances in human-object interactions. In: 11th Int. Workshop on Robot Motion and Control (RoMoCo), pp. 47–52. IEEE (2017)Google Scholar
  11. 11.
    Fablet, R., Black, M. J.: Automatic detection and tracking of human motion with a view-based representation. In: European Conf. Computer Vision, pp. 476–491. Springer (2002)Google Scholar
  12. 12.
    Ke, Q., Bennamoun, M., An, S., Boussaid, F., Sohel, F.: Human interaction prediction using deep temporal features. In: European Conf. Computer Vision, pp. 403–414. Springer (2016)Google Scholar
  13. 13.
    Kim, Y., Baek, S., Bae, B. C.: Motion capture of the human body using multiple depth sensors. In: ETRI Journal, vol. 39, pp. 181–190. Electronics and Telecommunications Research Institute (2017)Google Scholar
  14. 14.
    Kitani, K., Ziebart, B., Bagnell, J., Hebert, M.: Activity forecasting. Computer Vision–ECCV, 201–214 (2012)Google Scholar
  15. 15.
    Kong, Y., Fu, Y.: Max-margin action prediction machine. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, pp. 1844–1858. IEEE (2016)Google Scholar
  16. 16.
    Kong, Y., Jia, Y., Fu, Y.: Interactive phrases: Semantic descriptionsfor human interaction recognition. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 36, pp. 1775–1788. IEEE (2014)Google Scholar
  17. 17.
    Koppula, H. S., Gupta, R., Saxena, A.: Learning human activities and object affordances from rgb-d videos. In: The Int. Journal of Robotics Research, vol. 32, pp. 951–970. SAGE Publications (2013)Google Scholar
  18. 18.
    Koppula, H. S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, pp. 14–29. IEEE (2016)Google Scholar
  19. 19.
    Kurz, G., Gilitschenski, I., Hanebeck, U. D.: Efficient evaluation of the probability density function of a wrapped normal distribution. In: Sensor Data Fusion: Trends, Solutions, Applications (SDF), pp. 1–5. IEEE (2014)Google Scholar
  20. 20.
    Lan, T., Chen, T. C., Savarese, S.: A hierarchical representation for future action prediction. In: European Conf. Computer Vision, pp. 689–704. Springer (2014)Google Scholar
  21. 21.
    Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: Computer Vision and Pattern Recognition (CVPR), Conf., pp. 1354–1361. IEEE (2012)Google Scholar
  22. 22.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition, CVPR. Conf., pp. 1–8. IEEE (2008)Google Scholar
  23. 23.
    Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 36, pp. 1644–1657. IEEE (2014)Google Scholar
  24. 24.
    Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3337–3344. IEEE (2011)Google Scholar
  25. 25.
    Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in lstms for activity detection and early detection. In: Proc. of the IEEE Conf. Computer Vision and Pattern Recognition, pp. 1942–1950 (2016)Google Scholar
  26. 26.
    Papadopoulos, G. T., Axenopoulos, A., Daras, P.: Real-time skeleton-tracking-based human action recognition using kinect data. In: MMM (1), pp. 473–483 (2014)Google Scholar
  27. 27.
    Pelc, L., Kwolek, B.: Activity recognition using probabilistic timed automata. In: Pattern Recognition Techniques, Technology and Applications. InTech (2008)Google Scholar
  28. 28.
    Pérez-D’Arpino, C., Shah, J. A.: Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification. In: Robotics and Automation (ICRA), Int. Conf., pp. 6175–6182. IEEE (2015)Google Scholar
  29. 29.
    Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2650–2657 (2013)Google Scholar
  30. 30.
    Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: European Conf. Computer Vision, pp. 577–590. Springer (2010)Google Scholar
  31. 31.
    Ryoo, M. S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: IEEE Int. Conf. on Computer Vision (ICCV), pp. 1036–1043 (2011)Google Scholar
  32. 32.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR. Proc. 17th Int. Conf. on Pattern Recognition, vol. 3, pp. 32–36. IEEE (2004)Google Scholar
  33. 33.
    Sengupta, A., SenGupta, A.: Topics in Circular Statistics. Series on Multivariate Analysis. World Scientific (2001)Google Scholar
  34. 34.
    Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., et al.: Efficient human pose estimation from single depth images. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, pp. 2821–2840. IEEE (2013)Google Scholar
  35. 35.
    Slama, R., Wannous, H., Daoudi, M., Srivastava, A.: Accurate 3d action recognition using learning on the grassmann manifold. In: Pattern Recognition, vol. 48, pp. 556–567. Elsevier (2015)Google Scholar
  36. 36.
    Sminchisescu, C., Kanaujia, A., Metaxas, D.: Conditional models for contextual human motion recognition. In: Computer Vision and Image Understanding, vol. 104, pp. 210–220. Elsevier (2006)Google Scholar
  37. 37.
    Thode, H. C.: Testing for normality. In: STATISTICS: Textbooks and monographs, vol. 164. CRC Press (2002)Google Scholar
  38. 38.
    Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 98–106 (2016)Google Scholar
  39. 39.
    Vu, T. H., Olsson, C., Laptev, I., Oliva, A., Sivic, J.: Predicting actions from static scenes. In: European Conf. Computer Vision, pp. 421–436. Springer (2014)Google Scholar
  40. 40.
    Wang, H., Kläser, A., Schmid, C., Liu, C. L.: Action recognition by dense trajectories. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2011)Google Scholar
  41. 41.
    Wang, H., Kläser, A., Schmid, C., Liu, C. L.: Dense trajectories and motion boundary descriptors for action recognition. In: Int. Journal of Computer Vision, vol. 103, pp. 60–79. Springer (2013)Google Scholar
  42. 42.
    Wu, H., Pan, W., Xiong, X., Xu, S.: Human activity recognition based on the combined svm&hmm. In: IEEE Int. Conf. on Information and Automation (ICIA), pp. 219–224 (2014)Google Scholar
  43. 43.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 17–24. IEEE (2010)Google Scholar
  44. 44.
    Yao, B., Jiang, X., Khosla, A., Lin, A. L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: Int. Conf. on Computer Vision (ICCV), pp. 331–338. IEEE (2011)Google Scholar

Copyright information

© The Author(s) 2018

Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors and Affiliations

  1. 1.Institute of Aeronautics and Applied MechanicsWarsaw University of TechnologyWarsawPoland

Personalised recommendations