Predicting Human Actions Taking into Account Object Affordances

Dutta, Vibekananda; Zielinska, Teresa

doi:10.1007/s10846-018-0815-7

Predicting Human Actions Taking into Account Object Affordances

Open access
Published: 04 April 2018

Volume 93, pages 745–761, (2019)
Cite this article

Download PDF

You have full access to this open access article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Predicting Human Actions Taking into Account Object Affordances

Download PDF

1380 Accesses
12 Citations
Explore all metrics

Abstract

Anticipating human intentional actions is essential for many applications involving service robots and social robots. Nowadays assisting robots must do reasoning beyond the present with predicting future actions. It is difficult due to its non-Markovian property and the rich contextual information. This task requires the subtle details inherent in human movements that may imply a future action. This paper presents a probabilistic method for action prediction in human-object interactions. The key idea of our approach is the description of the so-called object affordance, the concept which allows us to deliver a trajectory visualizing a possible future action. Extensive experiments were conducted to show the effectiveness of our method in action prediction. For evaluation we applied a new RGB-D activity video dataset recorded by the Sez3D depth sensors. The dataset contains several human activities composed out of different actions.

Article PDF

Physically Grounded Spatio-temporal Object Affordances

Long-Term Human Motion Prediction with Scene Context

Graphing the Future: Activity and Next Active Object Prediction Using Graph-Based Activity Representations

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Baltieri, D., Vezzani, R., Cucchiara, R.: People orientation recognition by mixtures of wrapped distributions on random trees. In: Computer Vision–ECCV, pp. 270–283 (2012)
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: 12th Int. Conf. on Computer Vision, pp. 1365–1372. IEEE (2009)
Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: Proc., Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 994–999. IEEE (1997)
Creative: Senz3D. https://us.creative.com/p/web-cameras/creative-senz3d. Accessed (2017)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR, vol. 1, pp. 886–893. IEEE (2005)
Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: Space-time pose representation for 3d human action recognition. In: Int. Conf. Image Analysis and Processing, pp. 456–464. Springer (2013)
Diethe, T., Twomey, N., Flach, P.: Bayesian modelling of the temporal aspects of smart home activity with circular statistics. In: Joint European Conf. Machine Learning and Knowledge Discovery in Databases, pp. 279–294. Springer (2015)
Dutta, V.: Mobile robot applied to qr landmark localization based on the keystone effect. In: Mechatronics and Robotics Engineering for Advanced and Intelligent Manufacturing, pp. 45–60. Springer (2017)
Dutta, V., Zielinska, T.: Predicting the intention of human activities for real-time human-robot interaction (hri). In: Int. Conf. Social Robotics, pp. 723–734. Springer (2016)
Dutta, V., Zielinska, T.: Action prediction based on physically grounded object affordances in human-object interactions. In: 11th Int. Workshop on Robot Motion and Control (RoMoCo), pp. 47–52. IEEE (2017)
Fablet, R., Black, M. J.: Automatic detection and tracking of human motion with a view-based representation. In: European Conf. Computer Vision, pp. 476–491. Springer (2002)
Ke, Q., Bennamoun, M., An, S., Boussaid, F., Sohel, F.: Human interaction prediction using deep temporal features. In: European Conf. Computer Vision, pp. 403–414. Springer (2016)
Kim, Y., Baek, S., Bae, B. C.: Motion capture of the human body using multiple depth sensors. In: ETRI Journal, vol. 39, pp. 181–190. Electronics and Telecommunications Research Institute (2017)
Kitani, K., Ziebart, B., Bagnell, J., Hebert, M.: Activity forecasting. Computer Vision–ECCV, 201–214 (2012)
Kong, Y., Fu, Y.: Max-margin action prediction machine. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, pp. 1844–1858. IEEE (2016)
Kong, Y., Jia, Y., Fu, Y.: Interactive phrases: Semantic descriptionsfor human interaction recognition. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 36, pp. 1775–1788. IEEE (2014)
Koppula, H. S., Gupta, R., Saxena, A.: Learning human activities and object affordances from rgb-d videos. In: The Int. Journal of Robotics Research, vol. 32, pp. 951–970. SAGE Publications (2013)
Koppula, H. S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, pp. 14–29. IEEE (2016)
Kurz, G., Gilitschenski, I., Hanebeck, U. D.: Efficient evaluation of the probability density function of a wrapped normal distribution. In: Sensor Data Fusion: Trends, Solutions, Applications (SDF), pp. 1–5. IEEE (2014)
Lan, T., Chen, T. C., Savarese, S.: A hierarchical representation for future action prediction. In: European Conf. Computer Vision, pp. 689–704. Springer (2014)
Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: Computer Vision and Pattern Recognition (CVPR), Conf., pp. 1354–1361. IEEE (2012)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition, CVPR. Conf., pp. 1–8. IEEE (2008)
Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 36, pp. 1644–1657. IEEE (2014)
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3337–3344. IEEE (2011)
Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in lstms for activity detection and early detection. In: Proc. of the IEEE Conf. Computer Vision and Pattern Recognition, pp. 1942–1950 (2016)
Papadopoulos, G. T., Axenopoulos, A., Daras, P.: Real-time skeleton-tracking-based human action recognition using kinect data. In: MMM (1), pp. 473–483 (2014)
Pelc, L., Kwolek, B.: Activity recognition using probabilistic timed automata. In: Pattern Recognition Techniques, Technology and Applications. InTech (2008)
Pérez-D’Arpino, C., Shah, J. A.: Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification. In: Robotics and Automation (ICRA), Int. Conf., pp. 6175–6182. IEEE (2015)
Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2650–2657 (2013)
Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: European Conf. Computer Vision, pp. 577–590. Springer (2010)
Ryoo, M. S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: IEEE Int. Conf. on Computer Vision (ICCV), pp. 1036–1043 (2011)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR. Proc. 17th Int. Conf. on Pattern Recognition, vol. 3, pp. 32–36. IEEE (2004)
Sengupta, A., SenGupta, A.: Topics in Circular Statistics. Series on Multivariate Analysis. World Scientific (2001)
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., et al.: Efficient human pose estimation from single depth images. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, pp. 2821–2840. IEEE (2013)
Slama, R., Wannous, H., Daoudi, M., Srivastava, A.: Accurate 3d action recognition using learning on the grassmann manifold. In: Pattern Recognition, vol. 48, pp. 556–567. Elsevier (2015)
Sminchisescu, C., Kanaujia, A., Metaxas, D.: Conditional models for contextual human motion recognition. In: Computer Vision and Image Understanding, vol. 104, pp. 210–220. Elsevier (2006)
Thode, H. C.: Testing for normality. In: STATISTICS: Textbooks and monographs, vol. 164. CRC Press (2002)
Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 98–106 (2016)
Vu, T. H., Olsson, C., Laptev, I., Oliva, A., Sivic, J.: Predicting actions from static scenes. In: European Conf. Computer Vision, pp. 421–436. Springer (2014)
Wang, H., Kläser, A., Schmid, C., Liu, C. L.: Action recognition by dense trajectories. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2011)
Wang, H., Kläser, A., Schmid, C., Liu, C. L.: Dense trajectories and motion boundary descriptors for action recognition. In: Int. Journal of Computer Vision, vol. 103, pp. 60–79. Springer (2013)
Wu, H., Pan, W., Xiong, X., Xu, S.: Human activity recognition based on the combined svm&hmm. In: IEEE Int. Conf. on Information and Automation (ICIA), pp. 219–224 (2014)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 17–24. IEEE (2010)
Yao, B., Jiang, X., Khosla, A., Lin, A. L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: Int. Conf. on Computer Vision (ICCV), pp. 331–338. IEEE (2011)

Download references

Acknowledgements

The initial stage of the work was supported by “HERITAGE” EU program (Grant Agreement 2012-2648/001-001 EM Action 2 Partnership) and in the later stages, the work was supported by the Preludium 11 (Grant No. 2016/21/ N/ST7/ 01614) funded by Polish National Science Center (NCN).

Author information

Authors and Affiliations

Institute of Aeronautics and Applied Mechanics, Warsaw University of Technology, ul. Nowowiejska 24, 00-665, Warsaw, Poland
Vibekananda Dutta & Teresa Zielinska

Authors

Vibekananda Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Teresa Zielinska
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vibekananda Dutta.

Additional information

The preliminary version of the paper presented during “Intentional workshop on Robot Motion Control (RoMoCo)”, 2017, Poland.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Cite this article

Dutta, V., Zielinska, T. Predicting Human Actions Taking into Account Object Affordances. J Intell Robot Syst 93, 745–761 (2019). https://doi.org/10.1007/s10846-018-0815-7

Download citation

Received: 18 November 2017
Accepted: 14 March 2018
Published: 04 April 2018
Issue Date: 15 March 2019
DOI: https://doi.org/10.1007/s10846-018-0815-7

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Predicting Human Actions Taking into Account Object Affordances

Abstract

Article PDF

Similar content being viewed by others

Physically Grounded Spatio-temporal Object Affordances

Long-Term Human Motion Prediction with Scene Context

Graphing the Future: Activity and Next Active Object Prediction Using Graph-Based Activity Representations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Predicting Human Actions Taking into Account Object Affordances

Abstract

Article PDF

Similar content being viewed by others

Physically Grounded Spatio-temporal Object Affordances

Long-Term Human Motion Prediction with Scene Context

Graphing the Future: Activity and Next Active Object Prediction Using Graph-Based Activity Representations

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation