Abstract
Anticipating human intentional actions is essential for many applications involving service robots and social robots. Nowadays assisting robots must do reasoning beyond the present with predicting future actions. It is difficult due to its non-Markovian property and the rich contextual information. This task requires the subtle details inherent in human movements that may imply a future action. This paper presents a probabilistic method for action prediction in human-object interactions. The key idea of our approach is the description of the so-called object affordance, the concept which allows us to deliver a trajectory visualizing a possible future action. Extensive experiments were conducted to show the effectiveness of our method in action prediction. For evaluation we applied a new RGB-D activity video dataset recorded by the Sez3D depth sensors. The dataset contains several human activities composed out of different actions.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Baltieri, D., Vezzani, R., Cucchiara, R.: People orientation recognition by mixtures of wrapped distributions on random trees. In: Computer Vision–ECCV, pp. 270–283 (2012)
Bourdev, L., Malik, J.: Poselets: Body part detectors trained using 3d human pose annotations. In: 12th Int. Conf. on Computer Vision, pp. 1365–1372. IEEE (2009)
Brand, M., Oliver, N., Pentland, A.: Coupled hidden markov models for complex action recognition. In: Proc., Computer Society Conf. on Computer Vision and Pattern Recognition, pp. 994–999. IEEE (1997)
Creative: Senz3D. https://us.creative.com/p/web-cameras/creative-senz3d. Accessed (2017)
Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer Society Conf. on Computer Vision and Pattern Recognition, CVPR, vol. 1, pp. 886–893. IEEE (2005)
Devanne, M., Wannous, H., Berretti, S., Pala, P., Daoudi, M., Del Bimbo, A.: Space-time pose representation for 3d human action recognition. In: Int. Conf. Image Analysis and Processing, pp. 456–464. Springer (2013)
Diethe, T., Twomey, N., Flach, P.: Bayesian modelling of the temporal aspects of smart home activity with circular statistics. In: Joint European Conf. Machine Learning and Knowledge Discovery in Databases, pp. 279–294. Springer (2015)
Dutta, V.: Mobile robot applied to qr landmark localization based on the keystone effect. In: Mechatronics and Robotics Engineering for Advanced and Intelligent Manufacturing, pp. 45–60. Springer (2017)
Dutta, V., Zielinska, T.: Predicting the intention of human activities for real-time human-robot interaction (hri). In: Int. Conf. Social Robotics, pp. 723–734. Springer (2016)
Dutta, V., Zielinska, T.: Action prediction based on physically grounded object affordances in human-object interactions. In: 11th Int. Workshop on Robot Motion and Control (RoMoCo), pp. 47–52. IEEE (2017)
Fablet, R., Black, M. J.: Automatic detection and tracking of human motion with a view-based representation. In: European Conf. Computer Vision, pp. 476–491. Springer (2002)
Ke, Q., Bennamoun, M., An, S., Boussaid, F., Sohel, F.: Human interaction prediction using deep temporal features. In: European Conf. Computer Vision, pp. 403–414. Springer (2016)
Kim, Y., Baek, S., Bae, B. C.: Motion capture of the human body using multiple depth sensors. In: ETRI Journal, vol. 39, pp. 181–190. Electronics and Telecommunications Research Institute (2017)
Kitani, K., Ziebart, B., Bagnell, J., Hebert, M.: Activity forecasting. Computer Vision–ECCV, 201–214 (2012)
Kong, Y., Fu, Y.: Max-margin action prediction machine. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, pp. 1844–1858. IEEE (2016)
Kong, Y., Jia, Y., Fu, Y.: Interactive phrases: Semantic descriptionsfor human interaction recognition. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 36, pp. 1775–1788. IEEE (2014)
Koppula, H. S., Gupta, R., Saxena, A.: Learning human activities and object affordances from rgb-d videos. In: The Int. Journal of Robotics Research, vol. 32, pp. 951–970. SAGE Publications (2013)
Koppula, H. S., Saxena, A.: Anticipating human activities using object affordances for reactive robotic response. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 38, pp. 14–29. IEEE (2016)
Kurz, G., Gilitschenski, I., Hanebeck, U. D.: Efficient evaluation of the probability density function of a wrapped normal distribution. In: Sensor Data Fusion: Trends, Solutions, Applications (SDF), pp. 1–5. IEEE (2014)
Lan, T., Chen, T. C., Savarese, S.: A hierarchical representation for future action prediction. In: European Conf. Computer Vision, pp. 689–704. Springer (2014)
Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: Computer Vision and Pattern Recognition (CVPR), Conf., pp. 1354–1361. IEEE (2012)
Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Computer Vision and Pattern Recognition, CVPR. Conf., pp. 1–8. IEEE (2008)
Li, K., Fu, Y.: Prediction of human activity by discovering temporal sequence patterns. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 36, pp. 1644–1657. IEEE (2014)
Liu, J., Kuipers, B., Savarese, S.: Recognizing human actions by attributes. In: Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3337–3344. IEEE (2011)
Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in lstms for activity detection and early detection. In: Proc. of the IEEE Conf. Computer Vision and Pattern Recognition, pp. 1942–1950 (2016)
Papadopoulos, G. T., Axenopoulos, A., Daras, P.: Real-time skeleton-tracking-based human action recognition using kinect data. In: MMM (1), pp. 473–483 (2014)
Pelc, L., Kwolek, B.: Activity recognition using probabilistic timed automata. In: Pattern Recognition Techniques, Technology and Applications. InTech (2008)
Pérez-D’Arpino, C., Shah, J. A.: Fast target prediction of human reaching motion for cooperative human-robot manipulation tasks using time series classification. In: Robotics and Automation (ICRA), Int. Conf., pp. 6175–6182. IEEE (2015)
Raptis, M., Sigal, L.: Poselet key-framing: A model for human activity recognition. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 2650–2657 (2013)
Raptis, M., Soatto, S.: Tracklet descriptors for action modeling and video analysis. In: European Conf. Computer Vision, pp. 577–590. Springer (2010)
Ryoo, M. S.: Human activity prediction: Early recognition of ongoing activities from streaming videos. In: IEEE Int. Conf. on Computer Vision (ICCV), pp. 1036–1043 (2011)
Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: A local svm approach. In: ICPR. Proc. 17th Int. Conf. on Pattern Recognition, vol. 3, pp. 32–36. IEEE (2004)
Sengupta, A., SenGupta, A.: Topics in Circular Statistics. Series on Multivariate Analysis. World Scientific (2001)
Shotton, J., Girshick, R., Fitzgibbon, A., Sharp, T., Cook, M., Finocchio, M., Moore, R., Kohli, P., Criminisi, A., Kipman, A., et al.: Efficient human pose estimation from single depth images. In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol. 35, pp. 2821–2840. IEEE (2013)
Slama, R., Wannous, H., Daoudi, M., Srivastava, A.: Accurate 3d action recognition using learning on the grassmann manifold. In: Pattern Recognition, vol. 48, pp. 556–567. Elsevier (2015)
Sminchisescu, C., Kanaujia, A., Metaxas, D.: Conditional models for contextual human motion recognition. In: Computer Vision and Image Understanding, vol. 104, pp. 210–220. Elsevier (2006)
Thode, H. C.: Testing for normality. In: STATISTICS: Textbooks and monographs, vol. 164. CRC Press (2002)
Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: Proc. IEEE Conf. Computer Vision and Pattern Recognition, pp. 98–106 (2016)
Vu, T. H., Olsson, C., Laptev, I., Oliva, A., Sivic, J.: Predicting actions from static scenes. In: European Conf. Computer Vision, pp. 421–436. Springer (2014)
Wang, H., Kläser, A., Schmid, C., Liu, C. L.: Action recognition by dense trajectories. In: IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 3169–3176 (2011)
Wang, H., Kläser, A., Schmid, C., Liu, C. L.: Dense trajectories and motion boundary descriptors for action recognition. In: Int. Journal of Computer Vision, vol. 103, pp. 60–79. Springer (2013)
Wu, H., Pan, W., Xiong, X., Xu, S.: Human activity recognition based on the combined svm&hmm. In: IEEE Int. Conf. on Information and Automation (ICIA), pp. 219–224 (2014)
Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: Conf. on Computer Vision and Pattern Recognition (CVPR), pp. 17–24. IEEE (2010)
Yao, B., Jiang, X., Khosla, A., Lin, A. L., Guibas, L., Fei-Fei, L.: Human action recognition by learning bases of action attributes and parts. In: Int. Conf. on Computer Vision (ICCV), pp. 331–338. IEEE (2011)
Acknowledgements
The initial stage of the work was supported by “HERITAGE” EU program (Grant Agreement 2012-2648/001-001 EM Action 2 Partnership) and in the later stages, the work was supported by the Preludium 11 (Grant No. 2016/21/ N/ST7/ 01614) funded by Polish National Science Center (NCN).
Author information
Authors and Affiliations
Corresponding author
Additional information
The preliminary version of the paper presented during “Intentional workshop on Robot Motion Control (RoMoCo)”, 2017, Poland.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Dutta, V., Zielinska, T. Predicting Human Actions Taking into Account Object Affordances. J Intell Robot Syst 93, 745–761 (2019). https://doi.org/10.1007/s10846-018-0815-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10846-018-0815-7