Development of human motion prediction strategy using inception residual block

Gupta, Shekhar; Yadav, Gaurav Kumar; Nandi, G. C.

doi:10.1007/s11042-023-14440-y

Development of human motion prediction strategy using inception residual block

Published: 23 February 2023

Volume 82, pages 21177–21191, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

168 Accesses
2 Citations
Explore all metrics

Abstract

Human Motion Prediction is a crucial task in computer vision and robotics. It has versatile application potentials, such as human-robot interactions, human action tracking for airport security systems, autonomous car navigation, and computer gaming, to name a few. However, predicting human motion based on past actions is extremely challenging due to the difficulties in correctly detecting spatial and temporal features. We propose an Inception Residual Block(IRB) to detect temporal features in human poses due to its inherent capability of processing multiple kernels to capture salient features. Here, we propose to use multiple 1-D Convolution Neural Networks (CNN) with different kernel sizes and input sequence lengths and concatenate them to get proper embedding. As kernels stride over different receptive fields, they detect smaller and bigger salient features at multiple temporal scales. Our main contribution is to propose a residual connection between input and the output of the inception block to have a continuity between the previously observed pose and the next predicted pose. With this proposed architecture, it learns prior knowledge much better about human poses, and we achieve much higher prediction accuracy as detailed in the paper. Subsequently, we further propose to feed the output of the IRB as an input to the Graph Convolution Neural Network (GCN) due to its better spatial feature learning capability. We perform a parametric analysis for a better design of our model. Subsequently, we evaluate our approach on the Human 3.6M dataset and CMU MoCap dataset and compare our short-term and long-term predictions with the state-of-the-art papers, where our model outperforms most of the pose results, the detailed reasons of which have been elaborated in the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Convolutional neural network: a review of models, methodologies and applications to object detection

Article 20 December 2019

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Article 03 June 2022

Human Action Recognition and Prediction: A Survey

Article 28 March 2022

Data Availability

No associated data available

References

Aliakbarian S, Saleh FS, Salzmann M, Petersson L, Gould S (2020) A stochastic conditioning scheme for diverse human motion prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5223–5232
Butepage J, Black MJ, Kragic D, Kjellstrom H (2017) Deep representation learning for human motion prediction and classification. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 6158–6166
Bütepage J, Kjellström H, Kragic D (2018) Anticipating many futures: online human motion prediction and generation for human-robot interaction. In: 2018 IEEE international conference on robotics and automation (ICRA). IEEE, pp 4563–4570
Butepage J, Kjellstrom H, Kragic D (2019) Predicting the what and how-a probabilistic semi-supervised approach to multi-task human activity modeling. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 0–0
Cruttwell GS, Gavranović B, Ghani N, Wilson P, Zanasi F (2022) Categorical foundations of gradient-based learning. In: European Symposium on Programming. Springer, Cham, pp 1–28
Espinoza JLV, Liniger A, Schwarting W, Rus D, Van Gool L (2022) Deep interactive motion prediction and planning: playing games with motion prediction models. In: Learning for dynamics and control conference. PMLR, pp 1006–1019
Fragkiadaki K, Levine S, Felsen P, Malik J (2015) Recurrent network models for human dynamics. In: Proceedings of the IEEE international conference on computer vision, pp 4346–4354
Gong H, Sim J, Likhachev M, Shi J (2011) Multi-hypothesis motion planning for visual object tracking. In: 2011 international conference on computer vision. IEEE, pp 619–626
Gupta A, Martinez J, Little JJ, Woodham RJ (2014) 3d pose from motion for cross-view action recognition via non-linear circulant temporal encoding. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2601–2608
Hamilton W, Ying Z, Leskovec J (2017) Inductive representation learning on large graphs. Advances in neural information processing systems 30
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Ionescu C, Papava D, Olaru V, Sminchisescu C (2013) Human3. 6m: large scale datasets and predictive methods for 3d human sensing in natural environments. IEEE Trans Pattern Anal Mach Intell 36(7):1325–1339
Article Google Scholar
Jain A, Zamir AR, Savarese S, Saxena A (2016) Structural-rnn: deep learning on spatio-temporal graphs. In: Proceedings of the Ieee conference on computer vision and pattern recognition, pp 5308–5317
Kalatian A, Farooq B (2022) A context-aware pedestrian trajectory prediction framework for automated vehicles. Transportation Research part C: Emerging Technologies 103453:134
Google Scholar
Kong Y, Fu Y (2022) Human action recognition and prediction: a survey. Int J Comput Vis 130(5):1366–1401
Article Google Scholar
Koppula HS, Saxena A (2013) Anticipating human activities for reactive robotic response. In: IROS, Tokyo, p 2071
Lebailly T, Kiciroglu S, Salzmann M, Fua P, Wang W (2020) Motion prediction using temporal inception module. In: Proceedings of the Asian conference on computer vision
Li M, Chen S, Zhao Y, Zhang Y, Wang Y, Tian Q (2020) Dynamic multiscale graph neural networks for 3d skeleton based human motion prediction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 214–223
Li C, Xie C, Zhang B, Han J, Zhen X, Chen J (2021) Memory attention networks for skeleton-based action recognition. IEEE Transactions on Neural Networks and Learning Systems
Li C, Zhang B, Chen C, Ye Q, Han J, Guo G, Ji R (2019) Deep manifold structure transfer for action recognition. IEEE Trans Image Process 28(9):4646–4658
Article MathSciNet MATH Google Scholar
Li C, Zhang Z, Lee WS, Lee GH (2018) Convolutional sequence to sequence model for human dynamics. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5226–5234
Liu R, Liu C (2020) Human motion prediction using adaptable recurrent neural networks and inverse kinematics. IEEE Control Systems Letters 5 (5):1651–1656
Article Google Scholar
Liu C, Mu Y (2021) Searching motion graphs for human motion synthesis. In: Proceedings of the 29th ACM international conference on multimedia, pp 871–879
Mao W, Liu M, Salzmann M, Li H (2019) Learning trajectory dependencies for human motion prediction. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 9489–9497
Marcheggiani D, Bastings J, Titov I (2018) Exploiting semantics in neural machine translation with graph convolutional networks. The Association for Computational Linguistics
Martinez J, Black MJ, Romero J (2017) On human motion prediction using recurrent neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2891–2900
Monti F, Boscaini D, Masci J, Rodola E, Svoboda J, Bronstein MM (2017) Geometric deep learning on graphs and manifolds using mixture model cnns. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5115–5124
Paden B, Čáp M, Yong SZ, Yershov D, Frazzoli E (2016) A survey of motion planning and control techniques for self-driving urban vehicles. IEEE Trans Intell Vehicles 1(1):33–55
Article Google Scholar
Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch
Sang H-F, Chen Z-Z, He D. -K. (2020) Human motion prediction based on attention mechanism. Multimed Tools Appl 79(9):5529–5544
Article Google Scholar
Szegedy C, Ioffe S, Vanhoucke V, Alemi A (2017) Inception-v4, inception-resnet and the impact of residual connections on learning. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2818–2826
Velickovic P, Cucurull G, Casanova A, Romero A, Lio P, Bengio Y (2017) Graph attention networks. Stat 1050:20
Google Scholar
Welling M, Kipf TN (2016) Semi-supervised classification with graph convolutional networks. In: J International conference on learning representations (ICLR 2017)
Yadav GK, Nandi GC (2020) Development of adaptive sampling based strategy for human activity predictions using sequential networks. In: 2020 IEEE 4th conference on information & communication technology (CICT). IEEE, pp 1–6
Ying R, He R, Chen K, Eksombatchai P, Hamilton WL, Leskovec J (2018) Graph convolutional neural networks for web-scale recommender systems. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 974–983
Yuan J, Cao M, Cheng H, Yu H, Xie J, Wang C (2022) A unified structure learning framework for graph attention networks. Neurocomputing 495:194–204
Article Google Scholar
Zhang Z (2018) Improved adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS). Ieee, pp 1–2
Zhang J, Shum HP, Han J, Shao L (2018) Action recognition from arbitrary views using transferable dictionary learning. IEEE Trans Image Process 27(10):4709–4723
Article MathSciNet MATH Google Scholar
Zhang B, Yang Y, Chen C, Yang L, Han J, Shao L (2017) Action recognition using 3d histograms of texture and a multi-class boosting classifier. IEEE Trans Image Process 26(10):4648–4660
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work is supported by Center of Intelligent Robotics, IIIT Allahabad.

Author information

Authors and Affiliations

Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, 211012, Uttar Pradesh, India
Shekhar Gupta, Gaurav Kumar Yadav & G. C. Nandi

Authors

Shekhar Gupta
View author publications
You can also search for this author in PubMed Google Scholar
Gaurav Kumar Yadav
View author publications
You can also search for this author in PubMed Google Scholar
G. C. Nandi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shekhar Gupta.

Ethics declarations

Conflict of Interests

The authors have no conflicts of interest to declare.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Gupta, S., Yadav, G.K. & Nandi, G.C. Development of human motion prediction strategy using inception residual block. Multimed Tools Appl 82, 21177–21191 (2023). https://doi.org/10.1007/s11042-023-14440-y

Download citation

Received: 27 May 2021
Revised: 29 June 2022
Accepted: 31 January 2023
Published: 23 February 2023
Issue Date: June 2023
DOI: https://doi.org/10.1007/s11042-023-14440-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of human motion prediction strategy using inception residual block

Abstract

Access this article

Similar content being viewed by others

Convolutional neural network: a review of models, methodologies and applications to object detection

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Human Action Recognition and Prediction: A Survey

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Development of human motion prediction strategy using inception residual block

Abstract

Access this article

Similar content being viewed by others

Convolutional neural network: a review of models, methodologies and applications to object detection

Yoga pose classification: a CNN and MediaPipe inspired deep learning approach for real-world application

Human Action Recognition and Prediction: A Survey

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation