Organizing Multimedia Data in Video Surveillance Systems Based on Face Verification with Convolutional Neural Networks

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10716)


In this paper we propose the two-stage approach of organizing information in video surveillance systems. At first, the faces are detected in each frame and a video stream is split into sequences of frames with face region of one person. Secondly, these sequences (tracks) that contain identical faces are grouped using face verification algorithms and hierarchical agglomerative clustering. Gender and age are estimated for each cluster (person) in order to facilitate the usage of the organized video collection. The particular attention is focused on the aggregation of features extracted from each frame with the deep convolutional neural networks. The experimental results of the proposed approach using YTF and IJB-A datasets demonstrated that the most accurate and fast solution is achieved for matching of normalized average of feature vectors of all frames in a track.


Organizing video data Video surveillance system Deep convolutional neural networks Clustering Face verification 



The article was prepared within the framework of the Academic Fund Program at the National Research University Higher School of Economics (HSE) in 2017 (grant 17-05-0007) and by the Russian Academic Excellence Project “5–100”. Andrey V. Savchenko is partially supported by Russian Federation President grant no. MD-306.2017.9.


  1. 1.
    Manju, A., Valarmathie, P.: Organizing multimedia big data using semantic based video content extraction technique. In: IEEE International Conference on Soft-Computing and Networks Security (ICSNS), pp. 1–4 (2015)Google Scholar
  2. 2.
    Zhang, Y.J., Lu, H.B.: A hierarchical organization scheme for video data. Pattern Recognit. 35(11), 2381–2387 (2002)CrossRefzbMATHGoogle Scholar
  3. 3.
    Chen, J.C., Ranjan, R., Kumar, A., Chen, C.H., Patel, V.M., Chellappa, R.: An end-to-end system for unconstrained face verification with deep convolutional neural networks. In: IEEE International Conference on Computer Vision Workshops, pp. 118–126 (2015)Google Scholar
  4. 4.
    Li, H., Hua, G., Shen, X., Lin, Z., Brandt, J.: Eigen-PEP for video face recognition. In: Cremers, D., Reid, I., Saito, H., Yang, M.-H. (eds.) ACCV 2014. LNCS, vol. 9005, pp. 17–33. Springer, Cham (2015). Google Scholar
  5. 5.
    Yang, J., Ren, P., Chen, D., Wen, F., Li, H., Hua, G.: Neural aggregation network for video face recognition (2016). arXiv: 1603.05474
  6. 6.
    Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision, pp. 6–17 (2015)Google Scholar
  7. 7.
    Savchenko, A.V.: Deep convolutional neural networks and maximum-likelihood principle in approximate nearest neighbor search. In: Alexandre, L.A., Salvador Sánchez, J., Rodrigues, J.M.F. (eds.) IbPRIA 2017. LNCS, vol. 10255, pp. 42–49. Springer, Cham (2017). CrossRefGoogle Scholar
  8. 8.
    Savchenko, A.V.: Deep neural networks and maximum likelihood search for approximate nearest neighbor in video-based image recognition. Opt. Mem. Neural Netw. (Inf. Opt.) 26(2), 129–136 (2017)CrossRefGoogle Scholar
  9. 9.
    Szeliski, R.: Computer Vision: Algorithms and Applications. Springer Science and Business Media, Berlin (2010)zbMATHGoogle Scholar
  10. 10.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7575, pp. 702–715. Springer, Heidelberg (2012). CrossRefGoogle Scholar
  11. 11.
    Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics, New York (2009).
  12. 12.
    Wu, X., He, R., Sun, Z.: A lightened CNN for deep face representation (2015). arXiv:1511.02683
  13. 13.
    Levi, G., Hassner, T.: Age and gender classification using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 34–42 (2015)Google Scholar
  14. 14.
    Savchenko, A.V.: Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases. Opt. Lett. 11(2), 329–341 (2017)MathSciNetCrossRefzbMATHGoogle Scholar
  15. 15.
    Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 529–534 (2011)Google Scholar
  16. 16.
    Klare, B.F., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., Grother, P., Mah, A., Jain, A.K.: Pushing the frontiers of unconstrained face detection and recognition: IARPA Janus Benchmark, A. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1931–1939 (2015)Google Scholar
  17. 17.
    Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Babenko, A., Lempitsky, V.: The inverted multi-index. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3069–3076. IEEE (2012)Google Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsNizhny NovgorodRussian Federation

Personalised recommendations