Skip to main content
Log in

Behavior detection and evaluation based on multi-frame MobileNet

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Video-based behavior detection is an important research direction in computer vision, which has great application potential in intelligent video surveillance, sports behavior evaluation, gait recognition, and so on. However, due to the complexity of video content and background, video behavior detection and evaluation face many challenges and are still in their early stages. This paper proposes a novel multi-frame MobileNet model, which describes the internal differences of similar behaviors by introducing multiple continuous frames of behaviors to be detected, and realizes fine-grained behavior detection and evaluation. Firstly, using energy trend images (ETIs) of behaviors as features, multiple continuous frames of the target video are fed into the proposed network to explore the relationship between adjacent frames. Then,in the weighted point-wise convolution stage, by adding a fade-in factor to the timeline for providing different weights to each involved frame, which makes better use of the progressive relationship between behavior frames at different times. Finally, the effectiveness of the proposed method is verified by comparative experiments on multiple video data sets such as UCF101, HMDB51 and CASIA-B.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Akpinar KN, Genc S, Karagol S (2020) Chest x-ray abnormality detection based on squeezenet. In 2020 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), pp 1–5

  2. An J, Cheng Y, He X, Gui X, Wu S, Zhang X (2021) Multiuser behavior recognition module based on dc-dmn. IEEE Sens J, pp 1–1

  3. Carreira J, Zisserman A (2017) Quo vadis, action recognition? a new model and the kinetics dataset. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4724–4733

  4. Chen Y, Ge H, Liu Y, Cai X, Sun L (2023) Agpn: Action granularity pyramid network for video action recognition. IEEE Trans Circ Syst Video Technol, pp 1–1

  5. Du W, Wang Y, Qiao Y (2017) Rpan: An end-to-end recurrent pose-attention network for action recognition in videos. In 2017 IEEE International Conference on Computer Vision (ICCV), pp 3745–3754

  6. Du B, Zhao J, Cao M, Li M, Yu H (2021) Behavior recognition based on improved faster rcnn. In 2021 14th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp 1–6

  7. Gomes R, Rozario P, Adhikari N (2021) Deep learning optimization in remote sensing image segmentation using dilated convolutions and shufflenet. In 2021 IEEE International Conference on Electro Information Technology (EIT), pp 244–249

  8. Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press

  9. Hara K, Kataoka H, Satoh Y (2018) Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet? In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 6546–6555

  10. Howard A, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H (2017) Mobilenets: Efficient convolutional neural networks for mobile vision applications. 04

  11. Hu K, Jin J, Zheng F, Weng L, Ding Y (2022) Overview of behavior recognition based on deep learning. Artif Intell Rev

  12. Jeff D, Anne HL, Marcus R, Subhashini V, Sergio G, Kate S, Trevor D (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691

    Article  Google Scholar 

  13. Juergen S (2014) Deep learning in neural networks: An overview. Neural Netw 61:04

    Google Scholar 

  14. Kacem A, Daoudi M, Amor BB, Berretti S, Paiva J (2018) A novel geometric framework on gram matrix trajectories for human behavior understanding. IEEE Trans Pattern Anal Mach Intell, PP:1–1, 09

  15. Karen S, Andrew Z (2014) Two-stream convolutional networks for action recognition in videos. Advan Neural Inform Process Syst 1:06

    Google Scholar 

  16. Kong Longteng, Huang Di, Qin Jie, Wang Yunhong (2020) A joint framework for athlete tracking and action recognition in sports videos. IEEE Trans Circ Syst Video Technol 30(2):532–548

    Article  Google Scholar 

  17. Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) Hmdb51: A large video database for human motion recognition. pp 2556–2563, 11

  18. Kumar D, Priyanka T, Murugesh A, Kafle VP (2020) Visual action recognition using deep learning in video surveillance systems. In 2020 ITU Kaleidoscope: Industry-Driven Digital Transformation (ITU K), pp 1–8

  19. Li H, Huang J, Zhou M, Shi Q, Fei Q (2022) Self-attention pooling-based long-term temporal network for action recognition. IEEE Trans Cognitive Develop Syst, pp 1–1

  20. Limin W, Yuanjun X, Yu ZW, Lin QD, Xiaoou T, Luc VG (2016) Temporal segment networks: Towards good practices for deep action recognition. 9912:10

  21. Liu W, Li H, Zhang H (2022) Dangerous driving behavior recognition based on hand trajectory. Sustainability, 14(19)

  22. Ng JY-H, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 4694–4702

  23. Nguyen C, Nguyen N, Huynh S, Nguyen V, Nguyen S (2022) Learning generalized feature for temporal action detection: Application for natural driving action recognition challenge. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pp 3248–3255

  24. Qiu Z, Yao T, Mei T (2017) Learning spatio-temporal representation with pseudo-3d residual networks. In 2017 IEEE International Conference on Computer Vision (ICCV), pp 5534–5542

  25. Rabano SL, Cabatuan MK, Sybingco E, Dadios EP, Calilung EJ (2018) Common garbage classification using mobilenet. In 2018 IEEE 10th International Conference on Humanoid, Nanotechnology, Information Technology,Communication and Control, Environment and Management (HNICEM), pp 1–4

  26. Rahadian R, Suyanto S (2019) Deep residual neural network for age classification with face image. In 2019 International Seminar on Research of Information Technology and Intelligent Systems (ISRITI), pp 21–24

  27. Rismiyati, Endah SN, Khadijah, Shiddiq IN (2020) Xception architecture transfer learning for garbage classification. In 2020 4th International Conference on Informatics and Computational Sciences (ICICoS), pp 1–4

  28. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen L-C (2018) Mobilenetv2: Inverted residuals and linear bottlenecks. pp 4510–4520, 06

  29. Silva MO, Valadão MDM, Cavalcante VLG, Santos AV, Torres GM, Mattos EVCU, Pereira AMC, Uchôa MS, Torres LM, Linhares JEBS, Silva NEM, Silva AP, Cruz CFS, Rômulo SF, Belem RJS, Bezerra TB, Waldir SS, Carvalho CB (2022) Action recognition of industrial workers using detectron2 and automl algorithms. In 2022 IEEE International Conference on Consumer Electronics - Taiwan, pp 321–322

  30. Singh J, Goyal G (2019) Identifying biometrics in the wild- a time, erosion and neural inspired framework for gait identification. J Visual Commun Image Representation 66(102725):12

    Google Scholar 

  31. Soomro K, Zamir A, Shah M (2012) Ucf101: A dataset of 101 human actions classes from videos in the wild. CoRR, 12

  32. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3d convolutional networks. 4489–4497, 12

  33. Wang X, Yan WQ (2020) Human gait recognition based on frame-by-frame gait energy images and convolutional long short term memory. Int J Neural Syst 30(1):1950027

    Article  Google Scholar 

  34. Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 7794–7803

  35. Wang X, Yan WQ (2022) Human identification based on gait manifold. Appl Intell,

  36. Xu B, Hao Y, Yingbin Z, Heng W, Tianyu L, Yu-Gang J (2019) Dense dilated network for video action recognition. IEEE Trans Image Process 28(10):4941–4953

    Article  MathSciNet  Google Scholar 

  37. Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. 4, pp 441–444, 01

Download references

Acknowledgements

This research was funded by the Key R &D project of Zhejiang Province, China, grant number No. 2021C03151 and the Natural Science Foundation of Zhejiang Province, grant number No.Y20F020113.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiuhui Wang.

Ethics declarations

Conflicts of interest

The authors declare that there is no conflict of interests regarding the publication of this paper.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, L., Wang, X., Bao, Q. et al. Behavior detection and evaluation based on multi-frame MobileNet. Multimed Tools Appl 83, 15733–15750 (2024). https://doi.org/10.1007/s11042-023-16150-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-023-16150-x

Keywords

Navigation