Advertisement

Abstract

The paper considers usage of fine-tuning of the deep neural network ensemble for recognition of 60 event types in the set of 60,000 images from WIDER database. The applied ensemble consists of two deep convolutional neural networks (CNN) using the GoogLeNet architecture, previously trained on other image bases: ImageNet and Places. Separately the accuracy of recognition of 10 events was analyzed: “Car Racing”, “Ceremony”, “Concert”, “Demonstration”, “Football”, “Meeting”, “Picnic”, “Swimming”, “Tennis” and “Traffic”. During the ensemble training output layer in the each of deep CNN is replaced to the layer with respectively 10 and 60 neurons and we tune only weights which connect output layer with previous one. The classification accuracy of 10 event classes from the WIDER image database averages 83.22%, for 60 event classes accuracy is 50.4%. In addition, the approach based on the automatic features formation using deep CNN provided a much better recognition quality of social events compared to the choice of features manually (LBP, LDP or HOG) and their further classification by support vector machine. The testing time of the developed ensemble provides the possibility of using the classifier in practical applications of event recognition with a processing speed up to 20 frames per second.

Keywords

Image recognition Social event Deep learning Neural network Fine-tuning 

Notes

Acknowledgment

This article is written in the course of the grant of the President of the Russian Federation for state support of young Russian scientists № MK-3130.2017.9 (contract № 14.Z56.17.3130-MK) on the theme “Recognition of road conditions on images using deep learning”.

References

  1. 1.
    Zeno, B., Yudin, D., Alkhatib, B.: Event recognition on images using support vector machine and multi-level histograms of local patterns. ARPN J. Eng. Appl. Sci. 11(20), 12282–12287 (2016)Google Scholar
  2. 2.
    Hinton, G., Osindero, S., The, Y.-W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATHGoogle Scholar
  3. 3.
    Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE TPAMI 35(8), 1798–1828 (2013)CrossRefGoogle Scholar
  4. 4.
    LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural Comput. 1(4), 541–551 (1989)CrossRefGoogle Scholar
  5. 5.
    Krizhevsky, A., Sutskever, I., Hinton, G. E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)Google Scholar
  6. 6.
    Razavian, A., Azizpour, H., Sullivan, J., Carlsson, S.: CNN Features off-the-shelf: an Astounding Baseline for Recognition. In: CoRR, arXiv:1403.6382 (2014)
  7. 7.
    Web Image Dataset for Event Recognition (WIDER). http://personal.ie.cuhk.edu.hk/~xy012/event_recog/WIDER/ Accessed 12 Apr 2017
  8. 8.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. arXiv:1409.4842 (2014)
  9. 9.
    Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009)Google Scholar
  10. 10.
    Large Scale Visual Recognition Challenge 2012 (ILSVRC2012). http://www.image-net.org/challenges/LSVRC/2012/index Accessed 12 Apr 2017
  11. 11.
    Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., Aude, O.: Learning deep features for scene recognition using places database. In: Advances in Neural Information Processing Systems, pp. 487–495 (2014)Google Scholar
  12. 12.
    Places Database. http://places.csail.mit.edu/ Accessed 12 Apr 2017
  13. 13.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) Computer Vision – ECCV 2014. ECCV 2014. Lecture Notes in Computer Science, vol. 8689, pp. 818–833. Springer, Cham (2014)Google Scholar
  14. 14.
    Jia, Y.: Caffe: Deep learning framework by the BVLC. http://caffe.berkeleyvision.org/ Accessed 12 Apr 2017
  15. 15.
    Zhang, B., Gao, Y.: Local Derivative Pattern Versus Local Binary Pattern: Face Recognition With High Order Local Pattern Descriptor. IEEE Trans. Image Process. 19(2), 533–544 (2010)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Conference on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 886–893. IEEE Computer Society, Washington (2005)Google Scholar
  17. 17.
    Xiong, Y., Zhu, K., Lin D., Tang, X.: Recognize Complex Events from Static Images by Fusing Deep Channels. In: Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1600–1609 (2015)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Belgorod State Technological University Named After V.G. ShukhovBelgorodRussia
  2. 2.ITMO UniversitySt. PetersburgRussia

Personalised recommendations