Abstract
Graph convolutional network based methods that model the body-joints’ relations, have recently shown great promise in 3D skeleton-based human motion prediction. However, these methods have two critical issues: first, deep graph convolutions filter features within only limited graph spectrums, losing sufficient information in the full band; second, using a single graph to model the whole body underestimates the diverse patterns on various body-parts. To address the first issue, we propose adaptive graph scattering, which leverages multiple trainable band-pass graph filters to decompose pose features into richer graph spectrum bands. To address the second issue, body-parts are modeled separately to learn diverse dynamics, which enables finer feature extraction along the spatial dimensions. Integrating the above two designs, we propose a novel skeleton-parted graph scattering network (SPGSN). The cores of the model are cascaded multi-part graph scattering blocks (MPGSBs), building adaptive graph scattering on diverse body-parts, as well as fusing the decomposed features based on the inferred spectrum importance and body-part interactions. Extensive experiments have shown that SPGSN outperforms state-of-the-art methods by remarkable margins of \(13.8\%\), \(9.3\%\) and \(2.7\%\) in terms of 3D mean per joint position error (MPJPE) on Human3.6M, CMU Mocap and 3DPW datasets, respectively (The codes are available at https://github.com/MediaBrain-SJTU/SPGSN).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
References
Andén, J., Mallat, S.: Deep scattering spectrum. IEEE Trans. Signal Process. 62(16), 4114–4128 (2014)
Bruna, J., Mallat, S.: Invariant scattering convolution networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1872–1886 (2013)
Bruna, J., Zaremba, W., Szlam, A., LeCun, Y.: Spectral networks and locally connected networks on graphs. In: ICLR (Apr 2014)
Cai, Y., Huang, L., Wang, Y., Cham, T.-J., Cai, J., Yuan, J., Liu, J., Yang, X., Zhu, Y., Shen, X., Liu, D., Liu, J., Thalmann, N.M.: Learning progressive joint propagation for human motion prediction. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12352, pp. 226–242. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58571-6_14
Chen, G., Song, X., Zeng, H., Jiang, S.: Scene recognition with prototype-agnostic scene layout. IEEE Trans. Image Process. 29, 5877–5888 (2020)
Chen, S., Liu, B., Feng, C., Vallespi-Gonzalez, C., Wellington, C.: 3d point cloud processing and learning for autonomous driving. IEEE Sig. Process. Mag. 38, 68–86 (2020)
Cui, Q., Sun, H., Yang, F.: Learning dynamic relationships for 3d human motion prediction. In: CVPR (June 2020)
Dai, H., Dai, B., Song, L.: Discriminative embeddings of latent variable models for structured data. In: ICML (June 2016)
Dang, L., Nie, Y., Long, C., Zhang, Q., Li, G.: Msr-gcn: Multi-scale residual graph convolution networks for human motion prediction. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11467–11476 (October 2021)
Defferrard, M., Bresson, X., Vandergheynst, P.: Convolutional neural networks on graphs with fast localized spectral filtering. In: NeurIPS (Dec 2016)
Fan, L., Wang, W., Huang, S., Tang, X., Zhu, S.C.: Understanding human gaze communication by spatio-temporal graph reasoning. In: ICCV (Oct 2019)
Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: ICCV, pp. 4346–4354 (December 2015)
Gama, F., Ribeiro, A., Bruna, J.: Diffusion scattering transforms on graphs. In: ICLR (May 2019)
Gama, F., Ribeiro, A., Bruna, J.: Stability of graph scattering transforms. In: NeurIPS, vol. 32 (December 2019)
Gao, F., Wolf, G., Hirn, M.: Geometric scattering for graph data analysis. In: ICML, pp. 2122–2131 (June 2019)
Gui, L.-Y., Wang, Y.-X., Liang, X., Moura, J.M.F.: Adversarial geometry-aware human motion prediction. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11208, pp. 823–842. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01225-0_48
Gui, L., Zhang, K., Wang, Y., Liang, X., Moura, J., Veloso, M.: Teaching robots to predict human motion. In: The IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (Oct 2018)
Guo, X., Choi, J.: Human motion prediction via learning local structure representations and temporal dependencies. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 2580–2587 (2019)
Hamilton, W., Ying, Z., Leskovec, J.: Inductive representation learning on large graphs. In: NeurIPS (Dec 2017)
Hu, G., Cui, B., Yu, S.: Skeleton-based action recognition with synchronous local and non-local spatio-temporal learning and frequency attention. In: ICME (July 2019)
Hu, Y., Chen, S., Zhang, Y., Gu, X.: Collaborative motion prediction via neural motion message passing. In: CVPR (June 2020)
Huang, Y., Bi, H., Li, Z., Mao, T., Wang, Z.: Stgat: Modeling spatial-temporal interactions for human trajectory prediction. In: ICCV, pp. 6272–6281 (2019)
Ioannidis, V.N., Chen, S., Giannakis, G.B.: Pruned graph scattering transforms. In: ICLR (Apr 2020)
Ionescu, C., Papava, D., Olaru, V., Sminchisescu, C.: Human3.6m: Large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2013)
Jain, A., Zamir, A., Savarese, S., Saxena, A.: Structural-rnn: Deep learning on spatio-temporal graphs. In: CVPR, pp. 5308–5317 (June 2016)
Kingma, D.P., Ba, J.: Adam: A method for stochastic optimization. arXiv:1412.6980 (2014)
Kipf, T., Fetaya, E., Wang, K.C., Welling, M., Zemel, R.: Neural relational inference for interacting systems. In: ICML. pp. 2688–2697 (2018)
Kipf, T., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (Apr 2017)
Kosaraju, V., Sadeghian, A., Martín-Martín, R., Reid, I., Rezatofighi, S.H., Savarese, S.: Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. arXiv preprint arXiv:1907.03395 (2019)
Lee, S., Lim, J., Suh, I.H.: Progressive feature matching: Incremental graph construction and optimization. IEEE Trans. Image Process. 29, 6992–7005 (2020)
Lehrmann, A., Gehler, P., Nowozin, S.: Efficient nonlinear markov models for human motion. In: CVPR, pp. 1314–1321 (June 2014)
Li, C., Zhang, Z., Sun Lee, W., Hee Lee, G.: Convolutional sequence to sequence model for human dynamics. In: CVPR (June 2018)
Li, J., Yang, F., Tomizuka, M., Choi, C.: Evolvegraph: Multi-agent trajectory prediction with dynamic relational reasoning. NeurIPS (2020)
Li, M., Chen, S., Zhang, Y., Tsang, I.: Graph cross networks with vertex infomax pooling. In: NeurIPS, vol. 33, pp. 14093–14105 (2020)
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In: CVPR (June 2020)
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Multiscale spatio-temporal graph neural networks for 3d skeleton-based motion prediction. IEEE Trans. Image Process. 30, 7760–7775 (2021)
Li, Y., Tarlow, D., Brockschmidt, M., Zemel, R.: Gated graph sequence neural networks. In: ICLR (May 2016)
Liu, Z., Su, P., Wu, S., Shen, X., Chen, H., Hao, Y., Wang, M.: Motion prediction using trajectory cues. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 13299–13308 (October 2021)
Lu, X., Wang, W., Danelljan, M., Zhou, T., Shen, J., Van Gool, L.: Video object segmentation with episodic graph memory networks. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12348, pp. 661–679. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58580-8_39
Mao, W., Liu, M., Salzmann, M.: History repeats itself: Human motion prediction via motion attention. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12359, pp. 474–489. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58568-6_28
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: ICCV (Oct 2019)
von Marcard, T., Henschel, R., Black, M.J., Rosenhahn, B., Pons-Moll, G.: Recovering accurate 3D human pose in the wild using IMUs and a moving camera. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 614–631. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_37
Martinez, J., Black, M., Romero, J.: On human motion prediction using recurrent neural networks. In: CVPR, pp. 4674–4683 (July 2017)
Min, Y., Wenkel, F., Wolf, G.: Scattering gcn: Overcoming oversmoothness in graph convolutional networks. In: Advances in Neural Information Processing Systems (NeurIPS), pp. 14498–14508 (Dec 2020)
Min, Y., Wenkel, F., Wolf, G.: Geometric scattering attention networks. In: ICASSP, pp. 8518–8522 (2021)
Niepert, M., Ahmed, M., Kutzkovl, K.: Learning convolutional neural networks for graphs. In: ICML (June 2016)
Pan, C., Chen, S., Ortega, A.: Spatio-temporal graph scattering transform. In: ICLR (May 2021)
Pavlovic, V., Rehg, J.M., MacCormick, J.: Learning switching linear models of human motion. In: NeurIPS (2001)
Qi, S., Wang, W., Jia, B., Shen, J., Zhu, S.C.: Learning human-object interactions by graph parsing neural networks. In: ECCV, pp. 401–417 (2018)
Rizkallah, M., Su, X., Maugey, T., Guillemot, C.: Geometry-aware graph transforms for light field compact representation. IEEE Trans. Image Process. 29, 602–616 (2020)
Shi, L., Zhang, Y., Cheng, J., Lu, H.: Skeleton-based action recognition with directed graph neural networks. In: CVPR (June 2019)
Sifre, L., Mallat, S.: Rotation, scaling and deformation invariant scattering for texture discrimination. In: CVPR, pp. 1233–1240 (June 2013)
Sofianos, T., Sampieri, A., Franco, L., Galasso, F.: Space-time-separable graph convolutional network for pose forecasting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 11209–11218 (October 2021)
Tabassum, S., Pereira, F.S., Fernandes, S., Gama, J.: Social network analysis: An overview. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 8(5), e1256 (2018)
Taylor, G., Hinton, G.: Factored conditional restricted Boltzmann machines for modeling motion style. In: ICML (June 2009)
Taylor, G., Hinton, G., Roweis, S.: Modeling human motion using binary latent variables. In: NeurIPS (December 2007)
Velickovic, P., Cucurull, G., Casanova, A., Romero, A., Liò, P., Bengio, Y.: Graph attention networks. In: ICLR (Apr 2018)
Walker, J., Marino, K., Gupta, A., Hebert, M.: The pose knows: Video forecasting by generating pose futures. In: ICCV, pp. 3332–3341 (Oct 2017)
Wang, W., Zhu, H., Dai, J., Pang, Y., Shen, J., Shao, L.: Hierarchical human parsing with typed part-relation reasoning. In: CVPR (June 2020)
Xu, C., Chen, S., Li, M., Zhang, Y.: Invariant teacher and equivariant student for unsupervised 3d human pose estimation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 3013–3021 (2021)
Xu, C., Li, M., Ni, Z., Zhang, Y., Chen, S.: Groupnet: Multiscale hypergraph neural networks for trajectory prediction with relational reasoning. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6498–6507 (2022)
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: AAAI (Feb 2018)
Zhang, J., Shen, F., Xu, X., Shen, H.T.: Temporal reasoning graph for activity recognition. IEEE Trans. Image Process. 29, 5491–5506 (2020)
Zhang, X., Xu, C., Tian, X., Tao, D.: Graph edge convolutional neural networks for skeleton-based action recognition. IEEE Trans. Neural Netw. Learn. Syst. 31(8), 3047–3060 (2019)
Zheng, C., Pan, L., Wu, P.: Multimodal deep network embedding with integrated structure and attribute information. IEEE Trans. Neural Netw. Learn. Syst. 31(5), 1437–1449 (2020)
Zou, D., Lerman, G.: Graph convolutional neural networks via scattering. Appl. Comput. Harmon. Anal. 49(3), 1046–1074 (2020)
Acknowledgements
This work is supported by the National Key Research and Development Program of China (2020YFB1406801), the National Natural Science Foundation of China under Grant (62171276), 111 plan (BP0719010), STCSM (18DZ2270700, 21511100900), State Key Laboratory of UHD Video and Audio Production and Presentation.
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Li, M., Chen, S., Zhang, Z., Xie, L., Tian, Q., Zhang, Y. (2022). Skeleton-Parted Graph Scattering Networks for 3D Human Motion Prediction. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13666. Springer, Cham. https://doi.org/10.1007/978-3-031-20068-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-031-20068-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20067-0
Online ISBN: 978-3-031-20068-7
eBook Packages: Computer ScienceComputer Science (R0)