Skip to main content

Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13665))

Included in the following conference series:

Abstract

Previous works on human motion prediction follow the pattern of building a mapping relation between the sequence observed and the one to be predicted. However, due to the inherent complexity of multivariate time series data, it still remains a challenge to find the extrapolation relation between motion sequences. In this paper, we present a new prediction pattern, which introduces previously overlooked human poses, to implement the prediction task from the view of interpolation. These poses exist after the predicted sequence, and form the privileged sequence. To be specific, we first propose an InTerPolation learning Network (ITP-Network) that encodes both the observed sequence and the privileged sequence to interpolate the in-between predicted sequence, wherein the embedded Privileged-sequence-Encoder (Priv-Encoder) learns the privileged knowledge (PK) simultaneously. Then, we propose a Final Prediction Network (FP-Network) for which the privileged sequence is not observable, but is equipped with a novel PK-Simulator that distills PK learned from the previous network. This simulator takes as input the observed sequence, but approximates the behavior of Priv-Encoder, enabling FP-Network to imitate the interpolation process. Extensive experimental results demonstrate that our prediction pattern achieves state-of-the-art performance on benchmarked H3.6M, CMU-Mocap and 3DPW datasets in both short-term and long-term predictions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Aliakbarian, S., Saleh, F.S., Salzmann, M., Petersson, L., Gould, S.: A stochastic conditioning scheme for diverse human motion prediction. In: CVPR, pp. 5223–5232 (2020)

    Google Scholar 

  2. Bai, S., Kolter, J.Z., Koltun, V.: An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271 (2018)

  3. Butepage, J., Black, M.J., Kragic, D., Kjellstrom, H.: Deep representation learning for human motion prediction and classification. In: CVPR, pp. 6158–6166 (2017)

    Google Scholar 

  4. Cho, J.H., Hariharan, B.: On the efficacy of knowledge distillation. In: ICCV, pp. 4794–4802 (2019)

    Google Scholar 

  5. Corona, E., Pumarola, A., Alenyà, G., Moreno-Noguer, F.: Context-aware human motion prediction. In: CVPR, pp. 6992–7001 (2020)

    Google Scholar 

  6. Cui, Q., Sun, H., Li, Y., Kong, Y.: A deep bi-directional attention network for human motion recovery. In: IJCAI, pp. 701–707 (2019)

    Google Scholar 

  7. Cui, Q., Sun, H., Yang, F.: Learning dynamic relationships for 3D human motion prediction. In: CVPR, pp. 6519–6527 (2020)

    Google Scholar 

  8. Dang, L., Nie, Y., Long, C., Zhang, Q., Li, G.: MSR-GCN: multi-scale residual graph convolution networks for human motion prediction. In: ICCV, pp. 11467–11476 (2021)

    Google Scholar 

  9. Dong, M., Xu, C.: Skeleton-based human motion prediction with privileged supervision. IEEE Trans. Neural Netw. Learn. Syst. (2022)

    Google Scholar 

  10. Fernando, B., Herath, S.: Anticipating human actions by correlating past with the future with Jaccard similarity measures. In: CVPR, pp. 13224–13233 (2021)

    Google Scholar 

  11. Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: ICCV, pp. 4346–4354 (2015)

    Google Scholar 

  12. Gopalakrishnan, A., Mali, A., Kifer, D., Giles, L., Ororbia, A.G.: A neural temporal model for human motion prediction. In: CVPR, pp. 12116–12125 (2019)

    Google Scholar 

  13. Gu, C., et al.: Ava: a video dataset of spatio-temporally localized atomic visual actions. In: CVPR, pp. 6047–6056 (2018)

    Google Scholar 

  14. Gui, L.Y., Wang, Y.X., Liang, X., Moura, J.M.: Adversarial geometry-aware human motion prediction. In: ECCV, pp. 786–803 (2018)

    Google Scholar 

  15. Hernandez, A., Gall, J., Moreno-Noguer, F.: Human motion prediction via spatio-temporal inpainting. In: ICCV, pp. 7134–7143 (2019)

    Google Scholar 

  16. Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015)

  17. Hong, M., Xie, Y., Li, C., Qu, Y.: Distilling image dehazing with heterogeneous task imitation. In: CVPR, pp. 3462–3471 (2020)

    Google Scholar 

  18. Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)

    Google Scholar 

  19. Jain, A., Zamir, A.R., Savarese, S., Saxena, A.: Structural-RNN: deep learning on spatio-temporal graphs. In: CVPR, pp. 5308–5317 (2016)

    Google Scholar 

  20. Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)

  21. Koppula, H.S., Saxena, A.: Anticipating human activities for reactive robotic response. In: IROS, p. 2071 (2013)

    Google Scholar 

  22. Li, C., Zhang, Z., Lee, W.S., Lee, G.H.: Convolutional sequence to sequence model for human dynamics. In: CVPR, pp. 5226–5234 (2018)

    Google Scholar 

  23. Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: CVPR, pp. 214–223 (2020)

    Google Scholar 

  24. Liang, M., et al.: Learning lane graph representations for motion forecasting. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12347, pp. 541–556. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58536-5_32

    Chapter  Google Scholar 

  25. Liu, J., Shahroudy, A., Xu, D., Wang, G.: Spatio-temporal LSTM with trust gates for 3D human action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 816–833. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_50

    Chapter  Google Scholar 

  26. Liu, X., Yin, J., Liu, J., Ding, P., Liu, J., Liu, H.: Trajectorycnn: a new spatio-temporal feature learning network for human motion prediction. IEEE Trans. Circuits Syst. Video Technol. 31(6), 2133–2146 (2020)

    Article  Google Scholar 

  27. Liu, Y., Chen, K., Liu, C., Qin, Z., Luo, Z., Wang, J.: Structured knowledge distillation for semantic segmentation. In: CVPR, pp. 2604–2613 (2019)

    Google Scholar 

  28. Liu, Z., et al.: Motion prediction using trajectory cues. In: ICCV, pp. 13299–13308 (2021)

    Google Scholar 

  29. Ma, H., Li, J., Hosseini, R., Tomizuka, M., Choi, C.: Multi-objective diverse human motion prediction with knowledge distillation. In: CVPR, pp. 8161–8171 (2022)

    Google Scholar 

  30. Ma, T., Nie, Y., Long, C., Zhang, Q., Li, G.: Progressively generating better initial guesses towards next stages for high-quality human motion prediction. In: CVPR, pp. 6437–6446 (2022)

    Google Scholar 

  31. Mao, W., Liu, M., Salzmann, M.: History repeats itself: human motion prediction via motion attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 474–489. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_28

    Chapter  Google Scholar 

  32. Mao, W., Liu, M., Salzmann, M.: Generating smooth pose sequences for diverse human motion prediction. In: ICCV, pp. 13309–13318 (2021)

    Google Scholar 

  33. Mao, W., Liu, M., Salzmann, M.: Weakly-supervised action transition learning for stochastic human motion prediction. In: CVPR, pp. 8151–8160 (2022)

    Google Scholar 

  34. Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: ICCV, pp. 9489–9497 (2019)

    Google Scholar 

  35. Martinez, J., Black, M.J., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR, pp. 2891–2900 (2017)

    Google Scholar 

  36. Mishra, A., Marr, D.: Apprentice: using knowledge distillation techniques to improve low-precision network accuracy. arXiv preprint arXiv:1711.05852 (2017)

  37. Monti, A., Porrello, A., Calderara, S., Coscia, P., Ballan, L., Cucchiara, R.: How many observations are enough? knowledge distillation for trajectory forecasting. In: CVPR, pp. 6553–6562 (2022)

    Google Scholar 

  38. Paden, B., Čáp, M., Yong, S.Z., Yershov, D., Frazzoli, E.: A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans. Intell. Veh. 1(1), 33–55 (2016)

    Article  Google Scholar 

  39. Paszke, A., et al.: Automatic differentiation in pytorch (2017)

    Google Scholar 

  40. Pavllo, D., Feichtenhofer, C., Auli, M., Grangier, D.: Modeling human motion with quaternion-based neural networks. Int. J. Comput. Vis. 128(4), 855–872 (2020)

    Article  MATH  Google Scholar 

  41. Romero, A., Ballas, N., Kahou, S.E., Chassang, A., Gatta, C., Bengio, Y.: Fitnets: hints for thin deep nets. arXiv preprint arXiv:1412.6550 (2014)

  42. Shen, Z., He, Z., Xue, X.: Meal: multi-model ensemble via adversarial learning. In: AAAI, pp. 4886–4893 (2019)

    Google Scholar 

  43. Sofianos, T., Sampieri, A., Franco, L., Galasso, F.: Space-time-separable graph convolutional network for pose forecasting. In: ICCV, pp. 11209–11218 (2021)

    Google Scholar 

  44. Sun, J., Lin, Z., Han, X., Hu, J.F., Xu, J., Zheng, W.S.: Action-guided 3D human motion prediction. NeurIPS 34, 30169–30180 (2021)

    Google Scholar 

  45. Tung, F., Mori, G.: Similarity-preserving knowledge distillation. In: ICCV, pp. 1365–1374 (2019)

    Google Scholar 

  46. Vapnik, V., Vashist, A.: A new learning paradigm: learning using privileged information. Neural Netw. 22(5–6), 544–557 (2009)

    Article  MATH  Google Scholar 

  47. Von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: ECCV, pp. 601–617 (2018)

    Google Scholar 

  48. Wang, T., Yuan, L., Zhang, X., Feng, J.: Distilling object detectors with fine-grained feature imitation. In: CVPR, pp. 4933–4942 (2019)

    Google Scholar 

  49. Xu, M., Gao, M., Chen, Y.T., Davis, L.S., Crandall, D.J.: Temporal recurrent networks for online action detection. In: CVPR, pp. 5532–5541 (2019)

    Google Scholar 

  50. Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (2018)

    Google Scholar 

  51. Yang, C., Xie, L., Su, C., Yuille, A.L.: Snapshot distillation: teacher-student optimization in one generation. In: CVPR, pp. 2859–2868 (2019)

    Google Scholar 

  52. Yuan, Y., Kitani, K.: DLow: diversifying latent flows for diverse human motion prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12354, pp. 346–364. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58545-7_20

    Chapter  Google Scholar 

  53. Zhao, P., Xie, L., Zhang, Y., Wang, Y., Tian, Q.: Privileged knowledge distillation for online action detection. arXiv preprint arXiv:2011.09158 (2020)

  54. Zhong, C., Hu, L., Zhang, Z., Ye, Y., Xia, S.: Spatio-temporal gating-adjacency GCN for human motion prediction. In: CVPR, pp. 6447–6456 (2022)

    Google Scholar 

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China (NO. 62176125, 61772272).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Huaijiang Sun .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Sun, X., Cui, Q., Sun, H., Li, B., Li, W., Lu, J. (2022). Overlooked Poses Actually Make Sense: Distilling Privileged Knowledge for Human Motion Prediction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13665. Springer, Cham. https://doi.org/10.1007/978-3-031-20065-6_39

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20065-6_39

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20064-9

  • Online ISBN: 978-3-031-20065-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics