Skip to main content
Log in

Action temporal detection method based on confidence curve analysis

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Action temporal detection is a derivative task of action recognition which needs researchers to predict temporal intervals and specific categories in untrimmed videos. Aiming at the problem of too many proposed segments and insufficient filtering effect in multi-stage networks, we propose an action temporal detection method using confidence curve analysis to generate proposal segments. Fixed step window sliding is adopted to generate candidate segments in a video, and we adjust a training mode in segment network. The proposal segments are generated by analyzing the confidence curve of candidate segments, finally proposal segments are input into localization network to classify and adjust confidence level. Extensive experiments performed on THUMOS2014 benchmark show that the proposed method performs significantly better than the original muti-stage convolutional network that mAP increase from 19.0% to 26.4% with 252% accelerating.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Chauhan JS, Wang Y (2018) Context-aware action detection in untrimmed videos using bidirectional LSTM[C]. 2018 15th conference on computer and robot vision (CRV), pp 222–229

  2. Cuzzolin F, Singh G (2016) Untrimmed video classification for activity detection: submission to activitynet challenge. CVPR ActivityNet Workshop

  3. Dai X, Singh B, Zhang G, Davis LS, Chen YQ (2017) Temporal context network for activity localization in videos. In: 2017 IEEE international conference on computer vision, pp 5727–5736

  4. Diba A, Fayyaz M, Sharma V et al (2018) Spatio-temporal channel correlation networks for action classification[C]. European conference on computer vision, pp 299-315

  5. Everingham M, Winn J (2006) The pascal visual object classes challenge 2007 (voc2007) development kit[J]. Int J Comput Vis 111(1):98–136

    Article  Google Scholar 

  6. Gao J, Yang Z, Sun C et al (2017) Turn tap: Temporal unit regression network for temporal action propos- als[C]. 2017 IEEE international conference on computer vision, pp 3648–3656

  7. Girshick RB, Donahue J, Darrell T et al (2013) Rich feature hierarchies for accurate object detection and semantic segmentation[J/OL]. CoRR http://arxiv.org/abs/1311.2524

  8. Guo D, Li W, Fang X (2018) Fully convolutional network for multiscale temporal action proposals[J]. IEEE Trans Multimedia 20(12):3428–3438

    Article  Google Scholar 

  9. Heilbron FC, Escorcia V, Ghanem B et al (2015) Activitynet: A large-scale video benchmark for human ac- tivity understanding[C]. 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 961–970. https://doi.org/10.1109/CVPR.2015.7298698

  10. Jain M, van Gemert JC, Snoek CGM (2015) What do 15,000 object categories tell us about classifying and localizing actions? In: 2015 IEEE conference on computer vision and pattern recognition, pp 46–55

  11. Jain M, van Gemert J, Mensink T, Snoek C (2015) Objects2action: classifying and localizing actions without any video example. CoRR

  12. Jiyang G, Zhenheng Y, Ram N (2017) Cascaded boundary regression for temporal action detection. CoRR

  13. Kläser A, Marszałek M, Schmid C et al (2012) Human focused action localization in video[C]// Kutulakos K N. trends and topics in computer vision. Springer Berlin Heidelberg, Berlin, Heidelberg, pp 219–233

    Book  Google Scholar 

  14. Lu H, Li Y, Mu S, Wang D, Kim H, Serikawa S (2018) Motor anomaly detection for unmanned aerial vehicles using reinforcement learning. IEEE Internet Things J 5(4):2315–2322

    Article  Google Scholar 

  15. Lu H, Li Y, Chen M, Kim H, Serikawa S (2018) Brain intelligence: go beyond artificial intelligence. Mobile Netw Appl 23(2):368–375

    Article  Google Scholar 

  16. Lu H, Li Y, Uemura T, Kim H, Serikawa S (2018) Low illumination underwater light field images reconstruction using deep convolutional neural networks. Future Gener Comput Syst 82:142–148

    Article  Google Scholar 

  17. Oneata D, Verbeek J, Schmid C (2014) The LEAR submission at Thumos 2014[M/OL]. https://hal.inria.fr/hal-01074442

  18. Puscas MM, Sangineto E, Culibrk D, Sebe N (2015) Unsupervised tube extraction using transductive learning and dense trajectories. In 2015 IEEE international conference on computer vision, pp 1653–1661

  19. Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3D residual networks[C]. 2017 IEEE international conference on computer vision, pp 5533–5541

  20. Shou Z, Wang D, Chang S (2016) Temporal action localization in untrimmed videos via multi-stage cnns[C]. 2016 IEEE conference on computer vision and pattern recognition, pp 1049–1058

  21. Shou Z, Chan J, Zareian A et al (2017) Cdc: Convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos[C]. 2017 IEEE conference on computer vision and pattern recognition, pp 1417–1426

  22. Shou Z, Gao H, Zhang L, Miyazawa K, Chang S-F (2018) Autoloc: weakly-supervised temporal action localization in untrimmed videos[C]. European Conference on Computer Vision, pp 162-179

  23. Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos [J/OL]. CoRR. http://arxiv.org/abs/1406.2199

  24. Soomro K, Zamir A R, Shah M (2012) UCF101: a dataset of 101 human actions classes from videos in the wild[J/OL]. CoRR. abs/1212.0402. http://arxiv.org/abs/1212.0402

  25. Tran D, Bourdev L, Fergus R et al (2015) Learning spatiotemporal features with 3d convolutional net- works[C]. 2015 IEEE international conference on computer vision, pp 4489–4497

  26. Wang H, Schmid C (2013) Action recognition with improved trajectories. In 2013 IEEE international conference on computer vision, pp 3551–3558

  27. Wang L, Tang X, Qiao Y (2014) Action recognition and detection by combining motion and appearance features[C]. ECCV THUMOS Workshop

  28. Xu Z, Yang Y, Hauptmann AG (2015) A discriminative cnn video representation for event detection. In 2015 IEEE conference on computer vision and pattern recognition, pp 1798–1807

  29. Yanchun W, Jianqin Y, Lei W et al (2018) Temporal action detection based on action temporal semantic continuity[J]. IEEE Access 6:31677–31684

    Article  Google Scholar 

  30. Yeung S, Russakovsky O, Mori G et al (2016) End-to-end learning of Action detection from frame glimpses in videos[C]. 2016 IEEE conference on computer vision and pattern recognition, pp 2678–2687

  31. Yuan J, Ni B, Yang X et al (2016) Temporal action localization with pyramid of score distribution features[C]. 2016 IEEE conference on computer vision and pattern recognition. IEEE

  32. Yuan Z, Stroud CJ, Lu T, Deng J Temporal action localization by structured maximal sums. pp 3215–3223. https://doi.org/10.1109/CVPR.2017.342,2017.

  33. Zhao Y, Xiong Y, Wang L et al (2017) Temporal action detection with structured segment networks[C]. 2017 IEEE international conference on computer vision, pp 2933–2942

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant No. 61901356 and the HPC Platform of Xi’an Jiaotong University.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lihua Tian.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Song, H., Tian, L. & Li, C. Action temporal detection method based on confidence curve analysis. Multimed Tools Appl 79, 34471–34488 (2020). https://doi.org/10.1007/s11042-020-08771-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-08771-3

Keywords

Navigation