An Abnormal Behavior Recognition Method Based on Fusion Features

Yu, Gang; Liu, Jia; Zhang, Chang

doi:10.1007/978-3-030-89134-3_21

Gang Yu¹³,
Jia Liu¹³ &
Chang Zhang¹³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13015))

Included in the following conference series:

International Conference on Intelligent Robotics and Applications

2929 Accesses

Abstract

The human action recognition technology has developed rapidly in recent years. The technologies of RNN and 3D convolution based on posture information and video frame information respectively have achieved high accuracy using various data sets, however, both of them have shortcomings in the field of abnormal behavior recognition. The definition of abnormal behavior needs to consider not only the action type simply, but also the environmental information comprehensively, so there are limitations in using RNN only based on posture information. Due to the input characteristics, action recognition technology based on 3D convolution is more related to environmental information and group behavior information, it cannot locate the action time accurately. This paper proposed an abnormal behavior recognition framework based on P3D and LSTM. The framework used pre-trained P3D to extract environmental features, and adopted pre-trained LSTM to extract individual action features to help system for time positioning, finally apply ranking model to classify abnormal behaviors after combining environmental features with action features. When training LSTM model, a regression network was added to enhance its time positioning ability. The experiment showed that the proposed framework based on P3D and LSTM has a greater improvement in the recognition accuracy and time positioning than only using 3D convolution technology or LSTM technology, and can accurately recognize abnormal behaviors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories, In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2011)
Google Scholar
Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vision 103(1), 60–79 (2013)
Article MathSciNet Google Scholar
Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)
Google Scholar
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition (2016).https://doi.org/10.1109/CVPR.2016.213
Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2
Chapter Google Scholar
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Google Scholar
Yousefzadeh, R., Van Gool, L.: Temporal 3Dd convnets: new architecture and transfer learning for video classification. arXiv preprint arXiv:1711.08200 (2017)
Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5734–5743 (2017)
Google Scholar
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2017)
Article Google Scholar
Du, W., Wang, Y., Qiao, Y.: RPAN: an end-to-end recurrent pose-attention network for action recognition in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3725–3734 (2017)
Google Scholar
Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015)
Google Scholar
Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017)
Google Scholar
Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1 (2016)
Google Scholar
Zeng, R., et al.: Graph convolutional networks for temporal action localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7094–7103 (2019)
Google Scholar
DIba, A., Sharma, V., Van Gool, L., Stiefelhagen, R.: DynamoNet: dynamic action and motion network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6192–6201 (2019)
Google Scholar
Lin, T., Liu, X., Li, X., Ding, E., Wen, S.: BMN: boundary-matching network for temporal action proposal generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3889–3898 (2019)
Google Scholar
Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)
Google Scholar
Wu, S., Moore, B.E., Shah, M.: Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2054–2060 (2010)
Google Scholar
Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553 (2015)
Mohammadi, S., Perina, A., Kiani, H., Murino, V.: Angry crowds: detecting violent events in videos. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_1
Chapter Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)
Google Scholar
Sultani, W., Choi, J.Y.: Abnormal traffic detection using intelligent driver model. In: 2010 20th International Conference on Pattern Recognition, pp. 324–327 (2010)
Google Scholar
Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013)
Google Scholar
Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011, pp. 3313–3320 (2011)
Google Scholar
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541 (2017)
Google Scholar
He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)
Google Scholar
Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6488 (2018)
Google Scholar
Xu, H., Das, A., Saenko, K.: Two-stream region convolutional 3D network for temporal activity detection. IEEE Trans. Pattern Anal. Mach. Intell. 41(10), 2319–2332 (2019)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Harbin Institute of Technology, Shenzhen, Shenzhen, 518055, China
Gang Yu, Jia Liu & Chang Zhang

Authors

Gang Yu
View author publications
You can also search for this author in PubMed Google Scholar
Jia Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chang Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gang Yu .

Editor information

Editors and Affiliations

Tsinghua University, Beijing, China
Xin-Jun Liu
Tsinghua University, Beijing, China
Zhenguo Nie
Beihang University, Beijing, China
Jingjun Yu
Tsinghua University, Beijing, China
Fugui Xie
Shandong University, Shandong, China
Rui Song

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, G., Liu, J., Zhang, C. (2021). An Abnormal Behavior Recognition Method Based on Fusion Features. In: Liu, XJ., Nie, Z., Yu, J., Xie, F., Song, R. (eds) Intelligent Robotics and Applications. ICIRA 2021. Lecture Notes in Computer Science(), vol 13015. Springer, Cham. https://doi.org/10.1007/978-3-030-89134-3_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-89134-3_21
Published: 18 October 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-89133-6
Online ISBN: 978-3-030-89134-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics