Action prediction via deep residual feature learning and weighted loss

Guo, Shuangshuang; Qing, Laiyun; Miao, Jun; Duan, Lijuan

doi:10.1007/s11042-019-7675-4

Action prediction via deep residual feature learning and weighted loss

Published: 18 May 2019

Volume 79, pages 4713–4727, (2020)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Shuangshuang Guo¹,
Laiyun Qing¹,
Jun Miao² &
…
Lijuan Duan³

420 Accesses
3 Citations
Explore all metrics

Abstract

Action prediction based on partially observed videos is challenging as the information provided by partial videos is not discriminative enough for classification. In this paper, we propose a Deep Residual Feature Learning (DeepRFL) framework to explore more discriminative information from partial videos, achieving similar representations as those of complete videos. The whole framework performs as a teacher-student network, where the teacher network supports the complete video feature supervision to the student network to capture the salient differences between partial videos and their corresponding complete videos based on the residual feature learning. The teacher and student network are trained simultaneously, and the technique called partial feature detach is employed to prevent the teacher network from disturbing by the student network. We also design a novel weighted loss function to give less penalization to partial videos that have small observation ratios. Extensive evaluations on the challenging UCF101 and HMDB51 datasets demonstrate that the proposed method outperforms state-of-the-art results without knowing the observation ratios of testing videos. The code will be publicly available soon.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Video summarization using deep learning techniques: a detailed analysis and investigation

Article 15 March 2023

Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey

Article Open access 06 June 2023

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

Article 25 September 2020

References

Aliakbarian MS, Saleh FS, Salzmann M, Fernando B, Petersson L, Andersson L (2017) Encouraging LSTMs to Anticipate Actions Very Early. In: ICCV, pp 280–289
Bendersky M, Garcia-Pueyo L, Harmsen J, Josifovski V, Lepikhin D (2014) Up next: retrieval methods for large scale related video suggestion. In: ACM SIGKDD, pp 1769–1778
Cao Y et al (2013) Recognize Human Activities from Partially Observed Videos. In: CVPR. IEEE, pp 2658–2665
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR, pp 4724–4733
Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp 2625–2634
Guo S, Qing L, Miao J, Duan L (2018) Deep Residual Feature Learning for Action Prediction. In: BigMM, pp 1–6
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: CVPR. IEEE, pp 770–778
He D, Zhou Z, Gan C et al (2018) StNet: Local and Global Spatial-Temporal Modeling for Action Recognition. arXiv:1811.01549
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28
Article Google Scholar
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456
Karpathy A et al (2014) Large-scale Video Classification with Convolutional Neural Networks. In: CVPR, pp 1725–1732
Kay W et al (2017) The kinetics human action video dataset. arXiv:1705.06950
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incre- mental updates. In: CVPR, pp 2921–2928
Kong Y, Tao Z, Fu Y (2017) Deep Sequential Context Networks for Action Prediction. In: CVPR. IEEE, pp 3662–3670
Kong Y, Gao S, Sun B, Fu Y (2018) Action Prediction from Videos via Memorizing Hard-to-Predict Samples. In: AAAI
Kong Y, Tao Z, Fu Y (2018) Adversarial action prediction networks, TPAMI
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: ICCV. IEEE, pp 2556–2563
Lai S, Zheng WS, Hu JF, Zhang J (2018) Global-Local Temporal saliency action prediction. TIP 27(5):2272–2285
MathSciNet MATH Google Scholar
Li Y et al (2016) Online human action detection using joint classification-regression recurrent neural networks. In: ECCV, pp 203–220
Liu Y, Nie L, Han L et al (2015) Action2Activity: Recognizing Complex Activities from Sensor Data. In: IJCAI, pp 1617–1623
Lu Y, Wei Y, Liu L et al (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 76(8):10701–10719
Article Google Scholar
Ma S, Sigal L, Sclaroff S (2016) Learning Activity Progression in LSTMs for Activity Detection and Early Detection. In: CVPR, pp 1942–1950
Ng JY-H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. In: CVPR
Nie L, Wang X, Zhang J et al (2017) Enhancing Micro-video Understanding by Harnessing External Sounds. In: ACM on Multimedia Conference, pp 1192–1200
Paszke et al (2017) Automatic differentiation in PyTorch. In: NIPS Workshop
Ryoo MS (2011) Human activity prediction: Early recognition of ongoing activities from streaming videos. In: ICCV. IEEE, pp 1036–1043
Shou Z et al (2018) Online detection of action start in untrimmed, streaming videos. In: ECCV, pp 551–568
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402
Szegedy C et al (2015) Going deeper with convolutions. In: CVPR, pp 1–9
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2016) C3D: Generic Features for Video Analysis. In: ICCV, pp 4489–4497
Tran D, Ray J, Shou Z, Chang S, Paluri M (2017) ConvNet Architecture Search for Spatiotemporal Feature Learning, arXiv:1708.05038
Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: CVPR. IEEE, pp 98–106
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp 4305–4314
Wang L et al (2016) Temporal segment networks: Towards good practices for deep action recognition. In: ECCV, pp 20–36
Wang L, Li W, Li W et al (2017) Appearance-and-relation networks for video classification. arXiv:1711.09125
Xu Z, Qing L, Miao J (2015) Activity Auto-Completion : Predicting Human Activities from Partial Videos. In: ICCV. IEEE, pp 3191–3199

Download references

Acknowledgements

This research is partially sponsored by Natural Science Foundation of China (Nos. 61872333, 61472387 and 61650201) and Beijing Natural Science Foundation (Nos. 4152005 and 4162058).

Author information

Authors and Affiliations

School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China
Shuangshuang Guo & Laiyun Qing
Beijing Key Laboratory of Internet Culture and Digital Dissemination Research, School of Computer Science, Beijing Information Science and Technology University, Beijing, China
Jun Miao
Faculty of Information Technology, Beijing University of Technology, Beijing, China
Lijuan Duan

Authors

Shuangshuang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Laiyun Qing
View author publications
You can also search for this author in PubMed Google Scholar
Jun Miao
View author publications
You can also search for this author in PubMed Google Scholar
Lijuan Duan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Laiyun Qing.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Guo, S., Qing, L., Miao, J. et al. Action prediction via deep residual feature learning and weighted loss. Multimed Tools Appl 79, 4713–4727 (2020). https://doi.org/10.1007/s11042-019-7675-4

Download citation

Received: 04 October 2018
Revised: 15 February 2019
Accepted: 24 April 2019
Published: 18 May 2019
Issue Date: February 2020
DOI: https://doi.org/10.1007/s11042-019-7675-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Action prediction via deep residual feature learning and weighted loss

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Action prediction via deep residual feature learning and weighted loss

Abstract

Access this article

Similar content being viewed by others

Video summarization using deep learning techniques: a detailed analysis and investigation

Large-scale Multi-modal Pre-trained Models: A Comprehensive Survey

A survey on video-based Human Action Recognition: recent updates, datasets, challenges, and applications

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation