Advertisement

Group Activity Prediction with Sequential Relational Anticipation Model

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12366)

Abstract

In this paper, we propose a novel approach to predict group activities given the beginning frames with incomplete activity executions. Existing action (We define action as the behavior performed by a single person, and define activity as the behavior performed by a group of people) prediction approaches learn to enhance the representation power of the partial observation (We define partial observation as the beginning frames with incomplete activity execution, and full observation as the one with complete activity execution). However, for group activity prediction, the relation evolution of people’s activity and their positions over time is an important cue for predicting group activity. To this end, we propose a sequential relational anticipation model (SRAM) that summarizes the relational dynamics in the partial observation and progressively anticipates the group representations with rich discriminative information. Our model explicitly anticipates both activity features and positions by two graph auto-encoders, aiming to learn a discriminative group representation for group activity prediction. Experimental results on two popularly used datasets demonstrate that our approach significantly outperforms the state-of-the-art activity prediction methods.

Keywords

Activity prediction Group activity Structured prediction Relational model. 

Notes

Acknowledgement

We thank Nvidia for the GPU donation. This research is supported in part by ONR Award N00014-18-1-2875.

References

  1. 1.
    Alahi, A., Goel, K., Ramanathan, V., Robicquet, A., Fei-Fei, L., Savarese, S.: Social LSTM: human trajectory prediction in crowded spaces. In: CVPR, pp. 961–971 (2016)Google Scholar
  2. 2.
    Amer, M.R., Lei, P., Todorovic, S.: HiRF: hierarchical random field for collective activity recognition in videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 572–585. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10599-4_37CrossRefGoogle Scholar
  3. 3.
    Azar, S.M., Atigh, M.G., Nickabadi, A., Alahi, A.: Convolutional relational machine for group activity recognition. In: CVPR, pp. 7892–7901 (2019)Google Scholar
  4. 4.
    Bagautdinov, T., Alahi, A., Fleuret, F., Fua, P., Savarese, S.: Social scene understanding: End-to-end multi-person action localization and collective activity recognition. In: CVPR, pp. 4315–4324 (2017)Google Scholar
  5. 5.
    Biswas, S., Gall, J.: Structural recurrent neural network (SRNN) for group activity analysis. In: WACV, pp. 1625–1632 (2018)Google Scholar
  6. 6.
    Cai, Y., Li, H., Hu, J.F., Zheng, W.S.: Action Knowledge Transfer for Action Prediction With Partial Videos, pp. 8118–8125 (2019)Google Scholar
  7. 7.
    Choi, W., Shahid, K., Savarese, S.: What are they doing? collective activity classification using spatio-temporal relationship among people. In: ICCV Workshops, pp. 1282–1289 (2009)Google Scholar
  8. 8.
    Deng, Z., Vahdat, A., Hu, H., Mori, G.: Structure inference machines: recurrent neural networks for analyzing relations in group activity recognition. In: CVPR, pp. 4772–4781 (2016)Google Scholar
  9. 9.
    Donahue, J., et al.: Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp. 2625–2634 (2015)Google Scholar
  10. 10.
    Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Multi-level sequence GAN for group activity recognition. In: Jawahar, C.V., Li, H., Mori, G., Schindler, K. (eds.) ACCV 2018. LNCS, vol. 11361, pp. 331–346. Springer, Cham (2019).  https://doi.org/10.1007/978-3-030-20887-5_21CrossRefGoogle Scholar
  11. 11.
    Gammulle, H., Denman, S., Sridharan, S., Fookes, C.: Predicting the future: a jointly learnt model for action anticipation. In: ICCV, pp. 5562–5571 (2019)Google Scholar
  12. 12.
    Gavrilyuk, K., Sanford, R., Javan, M., Snoek, C.G.M.: Actor-transformers for group activity recognition. In: CVPR, June 2020Google Scholar
  13. 13.
    Goodfellow, I., et al.: Generative adversarial nets. In: NIPS, pp. 2672–2680 (2014)Google Scholar
  14. 14.
    Gupta, A., Johnson, J., Fei-Fei, L., Savarese, S., Alahi, A.: Social GAN: socially acceptable trajectories with generative adversarial networks. In: CVPR, pp. 2255–2264 (2018)Google Scholar
  15. 15.
    He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2961–2969 (2017)Google Scholar
  16. 16.
    Hu, G., Cui, B., He, Y., Yu, S.: Progressive relation learning for group activity recognition. In: CVPR (2020)Google Scholar
  17. 17.
    Hu, J.F., Zheng, W.S., Ma, L., Wang, G., Lai, J.H., Zhang, J.: Early action prediction by soft regression. In: TPAMI (2018)Google Scholar
  18. 18.
    Ibrahim, M.S., Mori, G.: Hierarchical relational networks for group activity recognition and retrieval. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11207, pp. 742–758. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01219-9_44CrossRefGoogle Scholar
  19. 19.
    Ibrahim, M.S., Muralidharan, S., Deng, Z., Vahdat, A., Mori, G.: A hierarchical deep temporal model for group activity recognition. In: CVPR, pp. 1971–1980 (2016)Google Scholar
  20. 20.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. arXiv preprint arXiv:1609.02907 (2016)
  21. 21.
    Kong, Y., Gao, S., Sun, B., Fu, Y.: Action prediction from videos via memorizing hard-to-predict samples. In: AAAI (2018)Google Scholar
  22. 22.
    Kong, Yu., Kit, D., Fu, Y.: A discriminative model with multiple temporal scales for action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 596–611. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10602-1_39CrossRefGoogle Scholar
  23. 23.
    Kong, Y., Tao, Z., Fu, Y.: Deep sequential context networks for action prediction. In: CVPR, pp. 1473–1481 (2017)Google Scholar
  24. 24.
    Kong, Y., Tao, Z., Fu, Y.: Adversarial action prediction networks. In: TPAMI (2020)Google Scholar
  25. 25.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS, pp. 1097–1105 (2012)Google Scholar
  26. 26.
    Lan, T., Chen, T.-C., Savarese, S.: A hierarchical representation for future action prediction. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8691, pp. 689–704. Springer, Cham (2014).  https://doi.org/10.1007/978-3-319-10578-9_45CrossRefGoogle Scholar
  27. 27.
    Lan, T., Sigal, L., Mori, G.: Social roles in hierarchical models for human activity recognition. In: CVPR, pp. 1354–1361 (2012)Google Scholar
  28. 28.
    Lea, C., Flynn, M.D., Vidal, R., Reiter, A., Hager, G.D.: Temporal convolutional networks for action segmentation and detection. In: CVPR, pp. 156–165 (2017)Google Scholar
  29. 29.
    Li, X., Choo Chuah, M.: SBGAR: semantics based group activity recognition. In: ICCV, pp. 2876–2885 (2017)Google Scholar
  30. 30.
    Ma, S., Sigal, L., Sclaroff, S.: Learning activity progression in LSTMs for activity detection and early detection. In: CVPR, pp. 1942–1950 (2016)Google Scholar
  31. 31.
    Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR, pp. 2891–2900 (2017)Google Scholar
  32. 32.
    Qi, M., Qin, J., Li, A., Wang, Y., Luo, J., Van Gool, L.: stagNet: an attentive semantic RNN for group activity recognition. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 104–120. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_7CrossRefGoogle Scholar
  33. 33.
    Ramanathan, V., Huang, J., Abu-El-Haija, S., Gorban, A., Murphy, K., Fei-Fei, L.: Detecting events and key actors in multi-person videos. In: CVPR, pp. 3043–3053 (2016)Google Scholar
  34. 34.
    Ryoo, M.S.: Human activity prediction: early recognition of ongoing activities from streaming videos. In: ICCV, pp. 1036–1043 (2011)Google Scholar
  35. 35.
    Ryoo, M., Aggarwal, J.: Stochastic representation and recognition of high-level group activities. IJCV 93(2), 183–200 (2011)MathSciNetCrossRefGoogle Scholar
  36. 36.
    Sadegh Aliakbarian, M., Sadat Saleh, F., Salzmann, M., Fernando, B., Petersson, L., Andersson, L.: Encouraging LSTMs to anticipate actions very early. In: ICCV, pp. 280–289 (2017)Google Scholar
  37. 37.
    Shi, Y., Fernando, B., Hartley, R.: Action anticipation with RBF kernelized feature mapping RNN. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 305–322. Springer, Cham (2018).  https://doi.org/10.1007/978-3-030-01249-6_19CrossRefGoogle Scholar
  38. 38.
    Shu, T., Todorovic, S., Zhu, S.C.: CERN: confidence-energy recurrent network for group activity recognition. In: CVPR, pp. 5523–5531 (2017)Google Scholar
  39. 39.
    Soomro, K., Zamir, A.R., Shah, M.: Ucf101: a dataset of 101 human actions classes from videos in the wild. arXiv preprint arXiv:1212.0402 (2012)
  40. 40.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)Google Scholar
  41. 41.
    Tang, Y., Wang, Z., Li, P., Lu, J., Yang, M., Zhou, J.: Mining semantics-preserving attention for group activity recognition. In: Multimedia, pp. 1283–1291 (2018)Google Scholar
  42. 42.
    Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: ICCV, pp. 4489–4497 (2015)Google Scholar
  43. 43.
    Vondrick, C., Pirsiavash, H., Torralba, A.: Anticipating visual representations from unlabeled video. In: CVPR, pp. 98–106 (2016)Google Scholar
  44. 44.
    Wang, M., Ni, B., Yang, X.: Recurrent modeling of interaction context for collective activity recognition. In: CVPR, pp. 3048–3056 (2017)Google Scholar
  45. 45.
    Wang, X., Hu, J.F., Lai, J.H., Zhang, J., Zheng, W.S.: Progressive teacher-student learning for early action prediction. In: CVPR, pp. 3556–3565 (2019)Google Scholar
  46. 46.
    Wang, Z., Shi, Q., Shen, C., Van Den Hengel, A.: Bilinear programming for human activity recognition with unknown mrf graphs. In: CVPR, pp. 1690–1697 (2013)Google Scholar
  47. 47.
    Wichers, N., Villegas, R., Erhan, D., Lee, H.: Hierarchical Long-term Video Prediction Without Supervision, pp. 6038–6046 (2018)Google Scholar
  48. 48.
    Wu, J., Wang, L., Wang, L., Guo, J., Wu, G.: Learning actor relation graphs for group activity recognition. In: CVPR, pp. 9964–9974 (2019)Google Scholar
  49. 49.
    Yan, R., Tang, J., Shu, X., Li, Z., Tian, Q.: Participation-contributed temporal dynamic model for group activity recognition. In: Multimedia, pp. 1292–1300 (2018)Google Scholar
  50. 50.
    Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)Google Scholar
  51. 51.
    Yan, Y., Ni, B., Yang, X.: Predicting human interaction via relative attention model. In: IJCAI, pp. 3245–3251 (2017)Google Scholar
  52. 52.
    Yao, T., Wang, M., Ni, B., Wei, H., Yang, X.: Multiple granularity group interaction prediction. In: CVPR, pp. 2246–2254 (2018)Google Scholar
  53. 53.
    Zhao, H., Wildes, R.P.: Spatiotemporal feature residual propagation for action prediction. In: ICCV, pp. 7003–7012 (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Golisano College of Computing and Information Sciences, Rochester Institute of TechnologyRochesterUSA

Personalised recommendations