Skip to main content

An Abnormal Behavior Recognition Method Based on Fusion Features

  • Conference paper
  • First Online:
Intelligent Robotics and Applications (ICIRA 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13015))

Included in the following conference series:

  • 2929 Accesses

Abstract

The human action recognition technology has developed rapidly in recent years. The technologies of RNN and 3D convolution based on posture information and video frame information respectively have achieved high accuracy using various data sets, however, both of them have shortcomings in the field of abnormal behavior recognition. The definition of abnormal behavior needs to consider not only the action type simply, but also the environmental information comprehensively, so there are limitations in using RNN only based on posture information. Due to the input characteristics, action recognition technology based on 3D convolution is more related to environmental information and group behavior information, it cannot locate the action time accurately. This paper proposed an abnormal behavior recognition framework based on P3D and LSTM. The framework used pre-trained P3D to extract environmental features, and adopted pre-trained LSTM to extract individual action features to help system for time positioning, finally apply ranking model to classify abnormal behaviors after combining environmental features with action features. When training LSTM model, a regression network was added to enhance its time positioning ability. The experiment showed that the proposed framework based on P3D and LSTM has a greater improvement in the recognition accuracy and time positioning than only using 3D convolution technology or LSTM technology, and can accurately recognize abnormal behaviors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Action recognition by dense trajectories, In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2011)

    Google Scholar 

  2. Wang, H., Kläser, A., Schmid, C., Liu, C.L.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vision 103(1), 60–79 (2013)

    Article  MathSciNet  Google Scholar 

  3. Wang, H., Schmid, C.: Action recognition with improved trajectories. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3551–3558 (2013)

    Google Scholar 

  4. Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199 (2014)

  5. Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition (2016).https://doi.org/10.1109/CVPR.2016.213

  6. Wang, L., et al.: Temporal segment networks: towards good practices for deep action recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 20–36. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_2

    Chapter  Google Scholar 

  7. Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)

    Google Scholar 

  8. Yousefzadeh, R., Van Gool, L.: Temporal 3Dd convnets: new architecture and transfer learning for video classification. arXiv preprint arXiv:1711.08200 (2017)

  9. Shou, Z., Chan, J., Zareian, A., Miyazawa, K., Chang, S.F.: CDC: convolutional-de-convolutional networks for precise temporal action localization in untrimmed videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5734–5743 (2017)

    Google Scholar 

  10. Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29

    Chapter  Google Scholar 

  11. Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Trans. Pattern Anal. Mach. Intell. 43(1), 172–186 (2017)

    Article  Google Scholar 

  12. Du, W., Wang, Y., Qiao, Y.: RPAN: an end-to-end recurrent pose-attention network for action recognition in videos. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3725–3734 (2017)

    Google Scholar 

  13. Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: deep networks for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4694–4702 (2015)

    Google Scholar 

  14. Song, S., Lan, C., Xing, J., Zeng, W., Liu, J.: An end-to-end spatio-temporal attention model for human action recognition from skeleton data. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 31, no. 1 (2017)

    Google Scholar 

  15. Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 30, no. 1 (2016)

    Google Scholar 

  16. Zeng, R., et al.: Graph convolutional networks for temporal action localization. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 7094–7103 (2019)

    Google Scholar 

  17. DIba, A., Sharma, V., Van Gool, L., Stiefelhagen, R.: DynamoNet: dynamic action and motion network. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6192–6201 (2019)

    Google Scholar 

  18. Lin, T., Liu, X., Li, X., Ding, E., Wen, S.: BMN: boundary-matching network for temporal action proposal generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3889–3898 (2019)

    Google Scholar 

  19. Basharat, A., Gritai, A., Shah, M.: Learning object motion patterns for anomaly detection and improved object detection. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008)

    Google Scholar 

  20. Wu, S., Moore, B.E., Shah, M.: Chaotic invariants of lagrangian particle trajectories for anomaly detection in crowded scenes. In: 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pp. 2054–2060 (2010)

    Google Scholar 

  21. Xu, D., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv preprint arXiv:1510.01553 (2015)

  22. Mohammadi, S., Perina, A., Kiani, H., Murino, V.: Angry crowds: detecting violent events in videos. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9911, pp. 3–18. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46478-7_1

    Chapter  Google Scholar 

  23. Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732 (2014)

    Google Scholar 

  24. Sultani, W., Choi, J.Y.: Abnormal traffic detection using intelligent driver model. In: 2010 20th International Conference on Pattern Recognition, pp. 324–327 (2010)

    Google Scholar 

  25. Lu, C., Shi, J., Jia, J.: Abnormal event detection at 150 FPS in MATLAB. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2720–2727 (2013)

    Google Scholar 

  26. Zhao, B., Fei-Fei, L., Xing, E.P.: Online detection of unusual events in videos via dynamic sparse coding. In: CVPR 2011, pp. 3313–3320 (2011)

    Google Scholar 

  27. Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3D residual networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5533–5541 (2017)

    Google Scholar 

  28. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2961–2969 (2017)

    Google Scholar 

  29. Sultani, W., Chen, C., Shah, M.: Real-world anomaly detection in surveillance videos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6479–6488 (2018)

    Google Scholar 

  30. Xu, H., Das, A., Saenko, K.: Two-stream region convolutional 3D network for temporal activity detection. IEEE Trans. Pattern Anal. Mach. Intell. 41(10), 2319–2332 (2019)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Gang Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yu, G., Liu, J., Zhang, C. (2021). An Abnormal Behavior Recognition Method Based on Fusion Features. In: Liu, XJ., Nie, Z., Yu, J., Xie, F., Song, R. (eds) Intelligent Robotics and Applications. ICIRA 2021. Lecture Notes in Computer Science(), vol 13015. Springer, Cham. https://doi.org/10.1007/978-3-030-89134-3_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89134-3_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89133-6

  • Online ISBN: 978-3-030-89134-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics