Cluster Analysis of Facial Video Data in Video Surveillance Systems Using Deep Learning

  • Anastasiia D. SokolovaEmail author
  • Andrey V. Savchenko
Conference paper
Part of the Springer Proceedings in Mathematics & Statistics book series (PROMS, volume 247)


In this paper, we propose the approach of structuring information in video surveillance systems by grouping the videos, which contain identical faces. First, the faces are detected in each frame and features of each facial region are extracted at the output of preliminarily trained deep convolution neural networks. Second, the tracks that contain identical faces are grouped using face verification algorithms and hierarchical agglomerative clustering. In the experimental study with the YTF dataset, we examined several ways to aggregate features of individual frame in order to obtain descriptor of the whole video track. It was demonstrated that the most accurate and fast algorithm is the matching of normalized average feature vectors.


Organizing video data Deep convolutional neural networks Video surveillance systems 



The work was conducted at Laboratory of Algorithms and Technologies for Network Analysis, National Research University Higher School of Economics and supported by RSF (Russian Science Foundation) grant 14-41-00039.


  1. 1.
    Zhang, Y.J., Lu, H.B.: A hierarchical organization scheme for video data. Pattern Recogn. 35(11), 2381–2387 (2002)CrossRefGoogle Scholar
  2. 2.
    Sokolova, A.D., Kharchevnikova, A.S., Savchenko, A.V.: Organizing multi-media data in video surveillance systems based on face verification with convolutional neural networks. arXiv preprint arXiv:1709.05675 (2017)
  3. 3.
    Chen, J.C., Ranjan, R., Kumar, A., Chen, C.H., Patel, V.M., Chellappa, R.: An end-to-end system for unconstrained face verification with deep convolutional neural networks. In: IEEE International Conference on Computer Vision Workshops, pp. 118–126 (2015)Google Scholar
  4. 4.
    Li, H., Hua, G., Shen, X., Lin, Z., Brandt, J.: Eigen-PEP for video face recognition. In: Cremers D., Reid I., Saito H., Yang M.H. (eds.) Asian Conference on Computer Vision. ACCV 2014. LNCS, vol. 9005, pp. 17–33. Springer, Cham (2014)Google Scholar
  5. 5.
    Savchenko, A.V.: Deep neural networks and maximum likelihood search for approximate nearest neighbor in video-based image recognition. Opt. Memory Neural Netw. (Information Optics) 26(2), 129136 (2017)Google Scholar
  6. 6.
    Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis. Wiley (2009)Google Scholar
  7. 7.
    Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)Google Scholar
  8. 8.
    Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision, pp. 6–17 (2015)Google Scholar
  9. 9.
    Wolf, L., Hassner, T., Maoz, I.: Face recognition in unconstrained videos with matched background similarity. In: IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), pp. 529–534 (2011)Google Scholar
  10. 10.
    Szeliski, R.: Computer vision: algorithms and applications. Springer Science and Business Media (2010)Google Scholar
  11. 11.
    Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: Exploiting the circulant structure of tracking-by-detection with kernels. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) European Conference on Computer Vision (ECCV 2012). LNCS, vol. 7575, pp. 702–715. Springer, Berlin, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Savchenko, A.V.: Clustering and maximum likelihood search for efficient statistical classification with medium-sized databases. Opt. Lett. 11(2), 329–341 (2017)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Wu, X., He, R., Sun, Z.: A Lightened CNN for deep face representation. arXiv:1511.02683 (2015)
  14. 14.
    Seltman, H.J.: Experimental design and analysis. Carnegie Mellon University, Pittsburgh (2012)Google Scholar
  15. 15.
    Yang, J., Ren, P., Chen, D., Wen, F., Li, H., Hua, G.: Neural aggregation network for video face recognition, arXiv: 1603.05474 (2016)
  16. 16.
    Miech, A., Laptev, I., Sivic, J.: Learnable pooling with context gating for video classification. arXiv preprint arXiv:1706.06905 (2017)
  17. 17.
    Rassadin, A.G., Gruzdev, A.S., Savchenko, A.V.: Group-level emotion recognition using transfer learning from face identification. arXiv preprint arXiv:1709.01688. accepted at ACM ICMI (2017)
  18. 18.
    Savchenko, V.V.: Study of stationarity of the random time series using the principle of the information divergence minimum. Radiophys. Quantum Electron. 60(1), 81–87 (2017)CrossRefGoogle Scholar
  19. 19.
    Jia, Y., et al. Caffe: Convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on Multimedia, pp. 675–678 (2014)Google Scholar
  20. 20.
    Kulis, B.: Metric learning: a survey. Found. Trends Mach. Learn. 5(4), 287–364 (2013)MathSciNetCrossRefGoogle Scholar
  21. 21.
    Savchenko, A.V., Belova, N.S.: Statistical testing of segment homogeneity in classification of piecewise-regular objects. Int. J. Appl. Math. Comput. Sci. 25(4), 915–925 (2015)MathSciNetCrossRefGoogle Scholar
  22. 22.
    Savchenko, A.V.: Maximum-likelihood approximate nearest neighbor method in real-time image recognition. Pattern Recogn. 61, 459–469 (2017)CrossRefGoogle Scholar
  23. 23.
    Nikitin, M.Y., Konushin, V.S., Konushin, A.S.: Neural network model for video-based face recognition with frames quality assessment. Comput. Opt. 5, 732–742 (2017)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  1. 1.National Research University Higher School of EconomicsLaboratory of Algorithms Technologies for Network AnalysisNizhny NovgorodRussia

Personalised recommendations