Abstract
Action prediction based on partially observed videos is challenging as the information provided by partial videos is not discriminative enough for classification. In this paper, we propose a Deep Residual Feature Learning (DeepRFL) framework to explore more discriminative information from partial videos, achieving similar representations as those of complete videos. The whole framework performs as a teacher-student network, where the teacher network supports the complete video feature supervision to the student network to capture the salient differences between partial videos and their corresponding complete videos based on the residual feature learning. The teacher and student network are trained simultaneously, and the technique called partial feature detach is employed to prevent the teacher network from disturbing by the student network. We also design a novel weighted loss function to give less penalization to partial videos that have small observation ratios. Extensive evaluations on the challenging UCF101 and HMDB51 datasets demonstrate that the proposed method outperforms state-of-the-art results without knowing the observation ratios of testing videos. The code will be publicly available soon.
Similar content being viewed by others
References
Aliakbarian MS, Saleh FS, Salzmann M, Fernando B, Petersson L, Andersson L (2017) Encouraging LSTMs to Anticipate Actions Very Early. In: ICCV, pp 280–289
Bendersky M, Garcia-Pueyo L, Harmsen J, Josifovski V, Lepikhin D (2014) Up next: retrieval methods for large scale related video suggestion. In: ACM SIGKDD, pp 1769–1778
Cao Y et al (2013) Recognize Human Activities from Partially Observed Videos. In: CVPR. IEEE, pp 2658–2665
Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In: CVPR, pp 4724–4733
Donahue J et al (2015) Long-term recurrent convolutional networks for visual recognition and description. In: CVPR, pp 2625–2634
Guo S, Qing L, Miao J, Duan L (2018) Deep Residual Feature Learning for Action Prediction. In: BigMM, pp 1–6
He K, Zhang X, Ren S, Sun J (2016) Deep Residual Learning for Image Recognition. In: CVPR. IEEE, pp 770–778
He D, Zhou Z, Gan C et al (2018) StNet: Local and Global Spatial-Temporal Modeling for Action Recognition. arXiv:1811.01549
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intell Syst Appl 13(4):18–28
Ioffe S, Szegedy C (2015) Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: ICML, pp 448–456
Karpathy A et al (2014) Large-scale Video Classification with Convolutional Neural Networks. In: CVPR, pp 1725–1732
Kay W et al (2017) The kinetics human action video dataset. arXiv:1705.06950
Kim J, Grauman K (2009) Observe locally, infer globally: a space-time mrf for detecting abnormal activities with incre- mental updates. In: CVPR, pp 2921–2928
Kong Y, Tao Z, Fu Y (2017) Deep Sequential Context Networks for Action Prediction. In: CVPR. IEEE, pp 3662–3670
Kong Y, Gao S, Sun B, Fu Y (2018) Action Prediction from Videos via Memorizing Hard-to-Predict Samples. In: AAAI
Kong Y, Tao Z, Fu Y (2018) Adversarial action prediction networks, TPAMI
Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: NIPS, pp 1097–1105
Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb: a large video database for human motion recognition. In: ICCV. IEEE, pp 2556–2563
Lai S, Zheng WS, Hu JF, Zhang J (2018) Global-Local Temporal saliency action prediction. TIP 27(5):2272–2285
Li Y et al (2016) Online human action detection using joint classification-regression recurrent neural networks. In: ECCV, pp 203–220
Liu Y, Nie L, Han L et al (2015) Action2Activity: Recognizing Complex Activities from Sensor Data. In: IJCAI, pp 1617–1623
Lu Y, Wei Y, Liu L et al (2017) Towards unsupervised physical activity recognition using smartphone accelerometers. Multimed Tools Appl 76(8):10701–10719
Ma S, Sigal L, Sclaroff S (2016) Learning Activity Progression in LSTMs for Activity Detection and Early Detection. In: CVPR, pp 1942–1950
Ng JY-H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. In: CVPR
Nie L, Wang X, Zhang J et al (2017) Enhancing Micro-video Understanding by Harnessing External Sounds. In: ACM on Multimedia Conference, pp 1192–1200
Paszke et al (2017) Automatic differentiation in PyTorch. In: NIPS Workshop
Ryoo MS (2011) Human activity prediction: Early recognition of ongoing activities from streaming videos. In: ICCV. IEEE, pp 1036–1043
Shou Z et al (2018) Online detection of action start in untrimmed, streaming videos. In: ECCV, pp 551–568
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: NIPS, pp 568–576
Soomro K, Zamir AR, Shah M (2012) UCF101: A dataset of 101 human actions classes from videos in the wild. arXiv:1212.0402
Szegedy C et al (2015) Going deeper with convolutions. In: CVPR, pp 1–9
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2016) C3D: Generic Features for Video Analysis. In: ICCV, pp 4489–4497
Tran D, Ray J, Shou Z, Chang S, Paluri M (2017) ConvNet Architecture Search for Spatiotemporal Feature Learning, arXiv:1708.05038
Vondrick C, Pirsiavash H, Torralba A (2016) Anticipating visual representations from unlabeled video. In: CVPR. IEEE, pp 98–106
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: CVPR, pp 4305–4314
Wang L et al (2016) Temporal segment networks: Towards good practices for deep action recognition. In: ECCV, pp 20–36
Wang L, Li W, Li W et al (2017) Appearance-and-relation networks for video classification. arXiv:1711.09125
Xu Z, Qing L, Miao J (2015) Activity Auto-Completion : Predicting Human Activities from Partial Videos. In: ICCV. IEEE, pp 3191–3199
Acknowledgements
This research is partially sponsored by Natural Science Foundation of China (Nos. 61872333, 61472387 and 61650201) and Beijing Natural Science Foundation (Nos. 4152005 and 4162058).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Guo, S., Qing, L., Miao, J. et al. Action prediction via deep residual feature learning and weighted loss. Multimed Tools Appl 79, 4713–4727 (2020). https://doi.org/10.1007/s11042-019-7675-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-019-7675-4