Motion Prediction Using Temporal Inception Module

Lebailly, Tim; Kiciroglu, Sena; Salzmann, Mathieu; Fua, Pascal; Wang, Wei

doi:10.1007/978-3-030-69532-3_39

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 12623))

Included in the following conference series:

Asian Conference on Computer Vision

896 Accesses
11 Citations

Abstract

Human motion prediction is a necessary component for many applications in robotics and autonomous driving. Recent methods propose using sequence-to-sequence deep learning models to tackle this problem. However, they do not focus on exploiting different temporal scales for different length inputs. We argue that the diverse temporal scales are important as they allow us to look at the past frames with different receptive fields, which can lead to better predictions. In this paper, we propose a Temporal Inception Module (TIM) to encode human motion. Making use of TIM, our framework produces input embeddings using convolutional layers, by using different kernel sizes for different input lengths. The experimental results on standard motion prediction benchmark datasets Human3.6M and CMU motion capture dataset show that our approach consistently outperforms the state of the art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Available at http://mocap.cs.cmu.edu/.

References

Gui, L., Zhang, K., Wang, Y., Liang, X., Moura, J.M.F., Veloso, M.: Teaching robots to predict human motion. In: International Conference on Intelligent Robots and Systems, pp. 562–567 (2018)
Google Scholar
Habibi, G., Jaipuria, N., How, J.P.: Context-aware pedestrian motion prediction in urban intersections. arxiv (2018)
Google Scholar
Fan, Z., Wang, Z., Cui, J., Davoine, F., Zhao, H., Zha, H.: Monocular pedestrian tracking from a moving vehicle. In: Asian Conference on Computer Vision Workshops, pp. 335–346 (2012)
Google Scholar
Kiciroglu, S., Rhodin, H., Sinha, S.N., Salzmann, M., Fua, P.: ActiveMoCap: optimized viewpoint selection for active human motion capture. In: Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: International Conference on Computer Vision (2019)
Google Scholar
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. C-23, 90–93 (1974)
Google Scholar
Lin, J., Lee, G.H.: Trajectory space factorization for deep video-based 3D human pose estimation. In: British Machine Vision Conference (2019)
Google Scholar
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: International Conference on 3D Vision (2017)
Google Scholar
Szegedy, C., et al.: Going deeper with convolutions. In: Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Google Scholar
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Bruna, J., Zaremba, W., Szlam, A., Lecun, Y.: Spectral networks and locally connected networks on graphs. In: International Conference on Learning Representations (2014)
Google Scholar
Ionescu, C., Papava, I., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Google Scholar
Ormoneit, D., Sidenbladh, H., Black, M., Hastie, T.: Learning and tracking cyclic human motion. In: Advances in Neural Information Processing Systems, pp. 894–900 (2001)
Google Scholar
Urtasun, R., Fua, P.: 3D human body tracking using deterministic temporal motion models. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 92–106. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24672-5_8
Chapter Google Scholar
Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: International Conference on Computer Vision (2015)
Google Scholar
Jain, A., Zamir, A., adn Saxena, S.S.A.: Structural-RNN: deep learning on spatio-temporal graphs. In: Conference on Computer Vision and Pattern Recognition (2016)
Google Scholar
Ghosh, P., Song, J., Aksan, E., Hilliges, O.: Learning human motion models for long-term predictions. In: International Conference on 3D Vision (2017)
Google Scholar
Martinez, J., Black, M., Romero, J.: On human motion prediction using recurrent neural networks. In: Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Wang, B., Adeli, E., Chiu, H.K., Huang, D.A., Niebles, J.C.: Imitation learning for human pose prediction. In: International Conference on Computer Vision, pp. 7123–7132 (2019)
Google Scholar
Butepage, J., Black, M., Kragic, D., Kjellstrom, H.: Deep representation learning for human motion prediction and classification. In: Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Bütepage, J., Kjellström, H., Kragic, D.: Anticipating many futures: online human motion prediction and generation for human-robot interaction. In: International Conference on Robotics and Automation, pp. 1–9 (2018)
Google Scholar
Bütepage, J., Kjellström, H., Kragic, D.: Predicting the what and how - a probabilistic semi-supervised approach to multi-task human activity modeling. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2923–2926 (2019)
Google Scholar
Aliakbarian, S., Saleh, F.S., Salzmann, M., Petersson, L., Gould, S.: A stochastic conditioning scheme for diverse human motion prediction. In: Conference on Computer Vision and Pattern Recognition (2020)
Google Scholar
Li, C., Zhang, Z., Lee, W.S., Lee, G.H.: Convolutional sequence to sequence model for human dynamics. In: Conference on Computer Vision and Pattern Recognition (2018)
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Google Scholar
Liu, W., Chen, J.J., Li, C., Qian, C., Chu, X., Hu, X.: A cascaded inception of inception network with attention modulated feature fusion for human pose estimation. In: American Association for Artificial Intelligence Conference (2018)
Google Scholar
Cho, S., Foroosh, H.: Spatio-temporal fusion networks for action recognition. In: Asian Conference on Computer Vision (2018)
Google Scholar
Hussein, N., Gavves, E., Smeulders, A.: Timeception for complex action recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 254–263 (2019)
Google Scholar
Yang, C., Xu, Y., Shi, J., Dai, B., Zhou, B.: Temporal pyramid network for action recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 588–597 (2020)
Google Scholar
Doshi, J.: Residual inception skip network for binary segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 206–2063 (2018)
Google Scholar
Shi, W., Jiang, F., Zhao, D.: Single image super-resolution with dilated convolution based multi-scale information learning inception module. In: International Conference on Image Processing, pp. 977–981 (2017)
Google Scholar
Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Improved inception-residual convolutional neural network for object recognition. Neural Comput. Appl. 32(1), 279–293 (2018). https://doi.org/10.1007/s00521-018-3627-6
Article Google Scholar
Pavllo, D., Grangier, D., Auli, M.: Quaternet: a quaternion-based recurrent model for human motion. In: British Machine Vision Conference (2018)
Google Scholar
Martinez, J., Hossain, R., Romero, J., Little, J.: A simple yet effective baseline for 3D human pose estimation. In: International Conference on Computer Vision (2017)
Google Scholar

Download references

Author information

Authors and Affiliations

CVLab EPFL, Lausanne, Switzerland
Tim Lebailly, Sena Kiciroglu, Mathieu Salzmann, Pascal Fua & Wei Wang
ClearSpace, Ecublens, Switzerland
Mathieu Salzmann
University of Trento, Trento, Italy
Wei Wang

Authors

Tim Lebailly
View author publications
You can also search for this author in PubMed Google Scholar
Sena Kiciroglu
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Salzmann
View author publications
You can also search for this author in PubMed Google Scholar
Pascal Fua
View author publications
You can also search for this author in PubMed Google Scholar
Wei Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Wang .

Editor information

Editors and Affiliations

Waseda University, Tokyo, Japan
Hiroshi Ishikawa
Institute of Automation of Chinese Academy of Sciences, Beijing, China
Cheng-Lin Liu
Czech Technical University in Prague, Prague, Czech Republic
Tomas Pajdla
University of Pennsylvania, Philadelphia, PA, USA
Jianbo Shi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Lebailly, T., Kiciroglu, S., Salzmann, M., Fua, P., Wang, W. (2021). Motion Prediction Using Temporal Inception Module. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_39

Download citation

DOI: https://doi.org/10.1007/978-3-030-69532-3_39
Published: 27 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69531-6
Online ISBN: 978-3-030-69532-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics