Abstract
Action temporal detection is a derivative task of action recognition which needs researchers to predict temporal intervals and specific categories in untrimmed videos. Aiming at the problem of too many proposed segments and insufficient filtering effect in multi-stage networks, we propose an action temporal detection method using confidence curve analysis to generate proposal segments. Fixed step window sliding is adopted to generate candidate segments in a video, and we adjust a training mode in segment network. The proposal segments are generated by analyzing the confidence curve of candidate segments, finally proposal segments are input into localization network to classify and adjust confidence level. Extensive experiments performed on THUMOS2014 benchmark show that the proposed method performs significantly better than the original muti-stage convolutional network that mAP increase from 19.0% to 26.4% with 252% accelerating.
Similar content being viewed by others
References
Chauhan JS, Wang Y (2018) Context-aware action detection in untrimmed videos using bidirectional LSTM[C]. 2018 15th conference on computer and robot vision (CRV), pp 222–229
Cuzzolin F, Singh G (2016) Untrimmed video classification for activity detection: submission to activitynet challenge. CVPR ActivityNet Workshop
Dai X, Singh B, Zhang G, Davis LS, Chen YQ (2017) Temporal context network for activity localization in videos. In: 2017 IEEE international conference on computer vision, pp 5727–5736
Diba A, Fayyaz M, Sharma V et al (2018) Spatio-temporal channel correlation networks for action classification[C]. European conference on computer vision, pp 299-315
Everingham M, Winn J (2006) The pascal visual object classes challenge 2007 (voc2007) development kit[J]. Int J Comput Vis 111(1):98–136
Gao J, Yang Z, Sun C et al (2017) Turn tap: Temporal unit regression network for temporal action propos- als[C]. 2017 IEEE international conference on computer vision, pp 3648–3656
Girshick RB, Donahue J, Darrell T et al (2013) Rich feature hierarchies for accurate object detection and semantic segmentation[J/OL]. CoRR http://arxiv.org/abs/1311.2524
Guo D, Li W, Fang X (2018) Fully convolutional network for multiscale temporal action proposals[J]. IEEE Trans Multimedia 20(12):3428–3438
Heilbron FC, Escorcia V, Ghanem B et al (2015) Activitynet: A large-scale video benchmark for human ac- tivity understanding[C]. 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 961–970. https://doi.org/10.1109/CVPR.2015.7298698
Jain M, van Gemert JC, Snoek CGM (2015) What do 15,000 object categories tell us about classifying and localizing actions? In: 2015 IEEE conference on computer vision and pattern recognition, pp 46–55
Jain M, van Gemert J, Mensink T, Snoek C (2015) Objects2action: classifying and localizing actions without any video example. CoRR
Jiyang G, Zhenheng Y, Ram N (2017) Cascaded boundary regression for temporal action detection. CoRR
Kläser A, Marszałek M, Schmid C et al (2012) Human focused action localization in video[C]// Kutulakos K N. trends and topics in computer vision. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 219–233
Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2018) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 5(4):2315–2322
Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375
Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gener Comput Syst 82:142–148
Oneata D, Verbeek J, Schmid C (2014) The LEAR submission at Thumos 2014[M/OL]. https://hal.inria.fr/hal-01074442
Puscas MM, Sangineto E, Culibrk D, Sebe N (2015) Unsupervised tube extraction using transductive learning and dense trajectories. In 2015 IEEE international conference on computer vision, pp 1653–1661
Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3D residual networks[C]. 2017 IEEE international conference on computer vision, pp 5533–5541
Shou Z, Wang D, Chang S (2016) Temporal action localization in untrimmed videos via multi-stage cnns[C]. 2016 IEEE conference on computer vision and pattern recognition, pp 1049–1058
Shou Z, Chan J, Zareian A et al (2017) Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos[C]. 2017 IEEE conference on computer vision and pattern recognition, pp 1417–1426
Shou Z, Gao H, Zhang L, Miyazawa K, Chang S-F (2018) Autoloc: weakly-supervised temporal action localization in untrimmed videos[C]. European Conference on Computer Vision, pp 162-179
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos [J/OL]. CoRR. http://arxiv.org/abs/1406.2199
Soomro K, Zamir A R, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild[J/OL]. CoRR. abs/1212.0402. http://arxiv.org/abs/1212.0402
Tran D, Bourdev L, Fergus R et al (2015) Learning spatiotemporal features with 3d convolutional net- works[C]. 2015 IEEE international conference on computer vision, pp 4489–4497
Wang H, Schmid C (2013) Action recognition with improved trajectories. In 2013 IEEE international conference on computer vision, pp 3551–3558
Wang L, Tang X, Qiao Y (2014) Action recognition and detection by combining motion and appearance features[C]. ECCV THUMOS Workshop
Xu Z, Yang Y, Hauptmann AG (2015) A discriminative cnn video representation for event detection. In 2015 IEEE conference on computer vision and pattern recognition, pp 1798–1807
Yanchun W, Jianqin Y, Lei W et al (2018) Temporal action detection based on action temporal semantic continuity[J]. IEEE Access 6:31677–31684
Yeung S, Russakovsky O, Mori G et al (2016) End-to-end learning of Action detection from frame glimpses in videos[C]. 2016 IEEE conference on computer vision and pattern recognition, pp 2678–2687
Yuan J, Ni B, Yang X et al (2016) Temporal action localization with pyramid of score distribution features[C]. 2016 IEEE conference on computer vision and pattern recognition. IEEE
Yuan Z, Stroud CJ, Lu T, Deng J Temporal action localization by structured maximal sums. pp 3215–3223. https://doi.org/10.1109/CVPR.2017.342,2017.
Zhao Y, Xiong Y, Wang L et al (2017) Temporal action detection with structured segment networks[C]. 2017 IEEE international conference on computer vision, pp 2933–2942
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant No. 61901356 and the HPC Platform of Xi’an Jiaotong University.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Song, H., Tian, L. & Li, C. Action temporal detection method based on confidence curve analysis. Multimed Tools Appl 79, 34471–34488 (2020). https://doi.org/10.1007/s11042-020-08771-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-08771-3