Surveillance videos classification based on multilayer long short-term memory networks

  • Hong ZhangEmail author
  • Liang Zhao
  • Gang Dai


Image classification and video recognition are always a key issue in computer vision. Until now, the recognition of videos has not achieved good results in some application filed, such as the recognition of surveillance videos. In order to achieve better recognition results, in this paper, we propose a new algorithm to recognize video by five coherent pictures. Firstly, the features of the video frames are extracted by Resnet, and then the features are sent to a 2-layer LSTM for processing, and finally classification by gathering the fully connected layer. We use the collected shipping data as a dataset to detect the algorithm model in this paper. The results of experiment show that the recognition of the proposed algorithm are better than other methods, and the total accuracy increased from 0.967 to 0.981.


Video recognition Deep learning Resnet LSTM 



This research is supported by the National Natural Science Foundation of China (No.61373109, No. 61602349), the Educational Research Project from the Educational Commission of Hubei Province (2016234).


  1. 1.
    Ballas N, Yao L, Pal C et al (2015) Delving deeper into convolutional networks for learning video representations[J]. Comput SciGoogle Scholar
  2. 2.
    Brox T, Bruhn A, Papenberg N, Weickert J (2004) High accuracy optical flow estimation based on a theory for warping. In ECCV 5Google Scholar
  3. 3.
    Chen YN, Han CC, Wang CT et al (2006) The application of a convolution neural network on face and license plate detection[C]. Int Conf Pattern Recogn IEEE Comput Soc 552–555Google Scholar
  4. 4.
    Chung J, Gulcehre C, Cho KH et al (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling[J]. Eprint ArxivGoogle Scholar
  5. 5.
    Deng J, Dong W, Socher R et al (2009) ImageNet: A large-scale hierarchical image database[C]. Comput Vis Pattern Recogn 2009. CVPR 2009. IEEE Conference IEEE 248–255Google Scholar
  6. 6.
    Deutsch. Supervised Sequence Labelling with Recurrent Neural Networks | Springer[J]. Springer-Verlag Berlin Heidelberg, 2012Google Scholar
  7. 7.
    Donahue J, Hendricks LA, Guadarrama S et al (2015) Long-term recurrent convolutional networks for visual recognition and description[C]. Comput Vis Pattern Recogn IEEE 677Google Scholar
  8. 8.
    Donahue J, Hendricks LA, Rohrbach M et al (2015) Long-term recurrent convolutional networks for visual recognition and description. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA, 2625–2634Google Scholar
  9. 9.
    Glorot X, Bordes A, Bengio Y (2012) Deep Sparse Rectifier Neural Networks[C]. Int Conf Art Intell Stat 315–323Google Scholar
  10. 10.
    Graves A, Jaitly N, Mohamed AR (2013) Hybrid speech recognition with deep bidirectional LSTM. Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech Republic, 273–278Google Scholar
  11. 11.
    He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition[C]. IEEE Conf Comput Vis Pattern Recogn IEEE Comput Soc 770–778Google Scholar
  12. 12.
    Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks[J]. Science 313(5786):504–507MathSciNetCrossRefGoogle Scholar
  13. 13.
    Hochreiter S (1998) Recurrent neural net learning and vanishing gradient[J]Google Scholar
  14. 14.
    Ioffe S, Szegedy C (2015) Batch normalization: accelerating deep network training by reducing internal covariate shift[J]. 448–456Google Scholar
  15. 15.
    Ji S, Xu W, Yang M, Yu K (2013) 3D convolutional neural networks for human action recognition. Pattern Analysis and Machine Intelligence, IEEE Transactions on 35(1):221–231 2, 5CrossRefGoogle Scholar
  16. 16.
    Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. Proceedings of the ACM International Conference on Image and Video Retrieval. Amsterdam, Netherlands, 494–501Google Scholar
  17. 17.
    Kiperwasser E, Goldberg Y (2016) Simple and accurate dependency parsing using bidirectional LSTM feature representations[J]Google Scholar
  18. 18.
    Kolen JF, Kremer SC (2001) Gradient flow in recurrent nets: the difficulty of learning long term dependencies[J]. 28(2):237–243Google Scholar
  19. 19.
    Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convolutional neural networks[C]. Int Conf Neural Inf Process Syst. Curran Associates Inc. 1097–1105Google Scholar
  20. 20.
    Lecun Y, Boser B, Denker JS et al (1989) Backpropagation applied to handwritten zip code recognition[J]. Neural Comput 1(4):541–551CrossRefGoogle Scholar
  21. 21.
    Ng YH, Hausknecht M, Vijayanarasimhan S et al (2015) Beyond short snippets: deep networks for video classification[J]Google Scholar
  22. 22.
    Pascanu R, Mikolov T, Bengio Y (2013) On the difficulty of training recurrent neural networks[C]. International Conference on International Conference on Machine Learning., III-1310Google Scholar
  23. 23.
    Rumelhart DE, Hinton GE et al (1986) Learning representations by back-propagating errors[J]. 323(6088):399–421Google Scholar
  24. 24.
    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. arXiv preprint arXiv:1406.2199, 2, 5, 6Google Scholar
  25. 25.
    Simonyan K, Zisserman A (2014) Two-stream convolutional networks foraction recognition in videos. Proceedings of the International Conference on neural information processing systems. Montreal, Canada, 568–576Google Scholar
  26. 26.
    Sutskever I (2013) Training recurrent neural networks[J]. DoctoralGoogle Scholar
  27. 27.
    Szarvas M, Yoshizawa A, Yamamoto M et al (2005) Pedestrian detection with convolutional neural networks[C]. Intelligent Vehicles Symposium, 2005. Proc IEEE IEEE 224–229Google Scholar
  28. 28.
    Szegedy C, Ioffe S, Vanhoucke V et al (2016) Inception-v4, inception-ResNet and the impact of residual connections on learning[J]Google Scholar
  29. 29.
    Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions[C]. IEEE Conf Comput Vis Pattern Recogn IEEE 1–9Google Scholar
  30. 30.
    Tivive FHC, Bouzerdoum A (2003) A new class of convolutional neural networks (SICoNNets) and their application of face detection[C]. International Joint Conference on Neural Networks. IEEE 3:2157–2162Google Scholar
  31. 31.
    Tivive FHC, Bouzerdown A (2006) An eye feature detector based on convolutional neural network[C]. Eighth Int Symp Signal Process Applic IEEE 90–93Google Scholar
  32. 32.
    Tran D, Bourdev L, Fergus R et al (2014) Learning spatiotemporal features with 3D convolutional networks[J]Google Scholar
  33. 33.
    Wang X, Ji Q (2015) Video event recognition with deep hierarchical context model. Proceedings of the IEEE conference on computer vision and pattern recognition. Boston, USA, 4418–4427Google Scholar
  34. 34.
    Yang J, Yu K, Gong Y et al (2009) Linear spatial pyramid matching using sparse coding for image classification[C]. Proc IEEE Conf Comput Vis Pattern Recogn. Piscataway, NJ: IEEE Press, 1794–1801Google Scholar
  35. 35.
    Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks[C]. European Conference on Computer Vision. Cham, Switzerland: Springer International Publishing AG, 818–833CrossRefGoogle Scholar
  36. 36.
    Zhang X, Zou J, He K, Sun J (2016) Accelerating very deep convolutional networks for classification and detection[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence 38(10):1943–1955CrossRefGoogle Scholar
  37. 37.
    Zhu L, Xu Z, Yang Y et al (2017) Uncovering the temporal context for video question answering. Int J Comput Vis 124(3):409–421MathSciNetCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2020

Authors and Affiliations

  1. 1.College of Computer Science & TechnologyWuhan University of Science & TechnologyWuhanChina
  2. 2.Hubei Province Key Laboratory of Intelligent Information Processing and Real-time Industrial SystemWuhanChina

Personalised recommendations