Abstract
Human motion prediction is a necessary component for many applications in robotics and autonomous driving. Recent methods propose using sequence-to-sequence deep learning models to tackle this problem. However, they do not focus on exploiting different temporal scales for different length inputs. We argue that the diverse temporal scales are important as they allow us to look at the past frames with different receptive fields, which can lead to better predictions. In this paper, we propose a Temporal Inception Module (TIM) to encode human motion. Making use of TIM, our framework produces input embeddings using convolutional layers, by using different kernel sizes for different input lengths. The experimental results on standard motion prediction benchmark datasets Human3.6M and CMU motion capture dataset show that our approach consistently outperforms the state of the art methods.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Available at http://mocap.cs.cmu.edu/.
References
Gui, L., Zhang, K., Wang, Y., Liang, X., Moura, J.M.F., Veloso, M.: Teaching robots to predict human motion. In: International Conference on Intelligent Robots and Systems, pp. 562–567 (2018)
Habibi, G., Jaipuria, N., How, J.P.: Context-aware pedestrian motion prediction in urban intersections. arxiv (2018)
Fan, Z., Wang, Z., Cui, J., Davoine, F., Zhao, H., Zha, H.: Monocular pedestrian tracking from a moving vehicle. In: Asian Conference on Computer Vision Workshops, pp. 335–346 (2012)
Kiciroglu, S., Rhodin, H., Sinha, S.N., Salzmann, M., Fua, P.: ActiveMoCap: optimized viewpoint selection for active human motion capture. In: Conference on Computer Vision and Pattern Recognition (2020)
Mao, W., Liu, M., Salzmann, M., Li, H.: Learning trajectory dependencies for human motion prediction. In: International Conference on Computer Vision (2019)
Ahmed, N., Natarajan, T., Rao, K.R.: Discrete cosine transform. IEEE Trans. Comput. C-23, 90–93 (1974)
Lin, J., Lee, G.H.: Trajectory space factorization for deep video-based 3D human pose estimation. In: British Machine Vision Conference (2019)
Huang, Y., et al.: Towards accurate marker-less human shape and pose estimation over time. In: International Conference on 3D Vision (2017)
Szegedy, C., et al.: Going deeper with convolutions. In: Conference on Computer Vision and Pattern Recognition, pp. 1–9 (2015)
Li, M., Chen, S., Zhao, Y., Zhang, Y., Wang, Y., Tian, Q.: Dynamic multiscale graph neural networks for 3D skeleton based human motion prediction. In: Conference on Computer Vision and Pattern Recognition (2020)
Bruna, J., Zaremba, W., Szlam, A., Lecun, Y.: Spectral networks and locally connected networks on graphs. In: International Conference on Learning Representations (2014)
Ionescu, C., Papava, I., Olaru, V., Sminchisescu, C.: Human3.6M: large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Trans. Pattern Anal. Mach. Intell. 36(7), 1325–1339 (2014)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Ormoneit, D., Sidenbladh, H., Black, M., Hastie, T.: Learning and tracking cyclic human motion. In: Advances in Neural Information Processing Systems, pp. 894–900 (2001)
Urtasun, R., Fua, P.: 3D human body tracking using deterministic temporal motion models. In: Pajdla, T., Matas, J. (eds.) ECCV 2004. LNCS, vol. 3023, pp. 92–106. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-24672-5_8
Fragkiadaki, K., Levine, S., Felsen, P., Malik, J.: Recurrent network models for human dynamics. In: International Conference on Computer Vision (2015)
Jain, A., Zamir, A., adn Saxena, S.S.A.: Structural-RNN: deep learning on spatio-temporal graphs. In: Conference on Computer Vision and Pattern Recognition (2016)
Ghosh, P., Song, J., Aksan, E., Hilliges, O.: Learning human motion models for long-term predictions. In: International Conference on 3D Vision (2017)
Martinez, J., Black, M., Romero, J.: On human motion prediction using recurrent neural networks. In: Conference on Computer Vision and Pattern Recognition (2017)
Wang, B., Adeli, E., Chiu, H.K., Huang, D.A., Niebles, J.C.: Imitation learning for human pose prediction. In: International Conference on Computer Vision, pp. 7123–7132 (2019)
Butepage, J., Black, M., Kragic, D., Kjellstrom, H.: Deep representation learning for human motion prediction and classification. In: Conference on Computer Vision and Pattern Recognition (2017)
Bütepage, J., Kjellström, H., Kragic, D.: Anticipating many futures: online human motion prediction and generation for human-robot interaction. In: International Conference on Robotics and Automation, pp. 1–9 (2018)
Bütepage, J., Kjellström, H., Kragic, D.: Predicting the what and how - a probabilistic semi-supervised approach to multi-task human activity modeling. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 2923–2926 (2019)
Aliakbarian, S., Saleh, F.S., Salzmann, M., Petersson, L., Gould, S.: A stochastic conditioning scheme for diverse human motion prediction. In: Conference on Computer Vision and Pattern Recognition (2020)
Li, C., Zhang, Z., Lee, W.S., Lee, G.H.: Convolutional sequence to sequence model for human dynamics. In: Conference on Computer Vision and Pattern Recognition (2018)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2016)
Liu, W., Chen, J.J., Li, C., Qian, C., Chu, X., Hu, X.: A cascaded inception of inception network with attention modulated feature fusion for human pose estimation. In: American Association for Artificial Intelligence Conference (2018)
Cho, S., Foroosh, H.: Spatio-temporal fusion networks for action recognition. In: Asian Conference on Computer Vision (2018)
Hussein, N., Gavves, E., Smeulders, A.: Timeception for complex action recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 254–263 (2019)
Yang, C., Xu, Y., Shi, J., Dai, B., Zhou, B.: Temporal pyramid network for action recognition. In: Conference on Computer Vision and Pattern Recognition, pp. 588–597 (2020)
Doshi, J.: Residual inception skip network for binary segmentation. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp. 206–2063 (2018)
Shi, W., Jiang, F., Zhao, D.: Single image super-resolution with dilated convolution based multi-scale information learning inception module. In: International Conference on Image Processing, pp. 977–981 (2017)
Alom, M.Z., Hasan, M., Yakopcic, C., Taha, T.M., Asari, V.K.: Improved inception-residual convolutional neural network for object recognition. Neural Comput. Appl. 32(1), 279–293 (2018). https://doi.org/10.1007/s00521-018-3627-6
Pavllo, D., Grangier, D., Auli, M.: Quaternet: a quaternion-based recurrent model for human motion. In: British Machine Vision Conference (2018)
Martinez, J., Hossain, R., Romero, J., Little, J.: A simple yet effective baseline for 3D human pose estimation. In: International Conference on Computer Vision (2017)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Lebailly, T., Kiciroglu, S., Salzmann, M., Fua, P., Wang, W. (2021). Motion Prediction Using Temporal Inception Module. In: Ishikawa, H., Liu, CL., Pajdla, T., Shi, J. (eds) Computer Vision – ACCV 2020. ACCV 2020. Lecture Notes in Computer Science(), vol 12623. Springer, Cham. https://doi.org/10.1007/978-3-030-69532-3_39
Download citation
DOI: https://doi.org/10.1007/978-3-030-69532-3_39
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-69531-6
Online ISBN: 978-3-030-69532-3
eBook Packages: Computer ScienceComputer Science (R0)