Deep Video Code for Efficient Face Video Retrieval

  • Shishi Qiao
  • Ruiping WangEmail author
  • Shiguang Shan
  • Xilin Chen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10113)


In this paper, we address the problem of face video retrieval. Given one face video of a person as query, we search the database and return the most relevant face videos, i.e., ones have same class label with the query. Such problem is of great challenge. For one thing, faces in videos have large intra-class variations. For another, it is a retrieval task which has high request on efficiency of space and time. To handle such challenges, this paper proposes a novel Deep Video Code (DVC) method which encodes face videos into compact binary codes. Specifically, we devise a multi-branch CNN architecture that takes face videos as training inputs, models each of them as a unified representation by temporal feature pooling operation, and finally projects the high-dimensional representations into Hamming space to generate a single binary code for each video as output, where distance of dissimilar pairs is larger than that of similar pairs by a margin. To this end, a smooth upper bound on triplet loss function which can avoid bad local optimal solution is elaborately designed to preserve relative similarity among face videos in the output space. Extensive experiments with comparison to the state-of-the-arts verify the effectiveness of our method.


Binary Code Video Modeling Locality Sensitive Hashing Video Representation Video Classification 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work is partially supported by 973 Program under contract No. 2015CB351802, Natural Science Foundation of China under contracts Nos. 61390511, 61379083, 61272321, and Youth Innovation Promotion Association CAS No. 2015085.

Supplementary material

416261_1_En_20_MOESM1_ESM.pdf (180 kb)
Supplementary material 1 (pdf 180 KB)


  1. 1.
    Shan, C.: Face recognition and retrieval in video. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds.) Video Search and Mining. Springer, Heidelberg (2010)Google Scholar
  2. 2.
    Sivic, J., Everingham, M., Zisserman, A.: Person spotting: video shot retrieval for face sets. In: Leow, W.-K., Lew, M.S., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 226–236. Springer, Heidelberg (2005). doi: 10.1007/11526346_26 CrossRefGoogle Scholar
  3. 3.
    Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal image sequence. In: FG (1998)Google Scholar
  4. 4.
    Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: CVPR (2010)Google Scholar
  5. 5.
    Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: CVPR (2011)Google Scholar
  6. 6.
    Kim, T.K., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. IEEE TPAMI 29, 1005–1018 (2007)CrossRefGoogle Scholar
  7. 7.
    Wang, R., Chen, X.: Manifold discriminant analysis. In: CVPR (2009)Google Scholar
  8. 8.
    Wang, R., Shan, S., Chen, X., Gao, W.: Manifold-manifold distance with application to face recognition based on image set. In: CVPR (2008)Google Scholar
  9. 9.
    Wang, R., Guo, H., Davis, L.S., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: CVPR (2012)Google Scholar
  10. 10.
    Parkhi, O., Simonyan, K., Vedaldi, A., Zisserman, A.: A compact and discriminative face track descriptor. In: CVPR (2014)Google Scholar
  11. 11.
    Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: Deep networks for video classification. In: CVPR (2015)Google Scholar
  12. 12.
    Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927 (2014)
  13. 13.
    Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: CVPR (2015)Google Scholar
  14. 14.
    Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. arXiv preprint arXiv:1511.06452 (2015)
  15. 15.
    Li, Y., Wang, R., Shan, S., Chen, X.: Hierarchical hybrid statistic based video binary code and its application to face retrieval in TV-series. In: FG (2015)Google Scholar
  16. 16.
    Arandjelović, O., Zisserman, A.: Automatic face recognition for film character retrieval in feature-length films. In: CVPR (2005)Google Scholar
  17. 17.
    Arandjelović, O., Zisserman, A.: On film character retrieval in feature-length films. In: Hammoud, R.I. (ed.) Interactive Video. Springer, Heidelberg (2006)Google Scholar
  18. 18.
    Everingham, M., Sivic, J., Zisserman, A.: Hello! My name is... Buffy-automatic naming of characters in TV video. In: BMVC (2006)Google Scholar
  19. 19.
    Li, Y., Wang, R., Cui, Z., Shan, S., Chen, X.: Compact video code and its application to robust face retrieval in TV-series. In: BMVC (2014)Google Scholar
  20. 20.
    Li, Y., Wang, R., Huang, Z., Shan, S., Chen, X.: Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold. In: CVPR (2015)Google Scholar
  21. 21.
    Dong, Z., Jia, S., Wu, T., Pei, M.: Face video retrieval via deep learning of binary hash representations. In: AAAI (2016)Google Scholar
  22. 22.
    Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE TPAMI 35, 221–231 (2013)CrossRefGoogle Scholar
  23. 23.
    Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)Google Scholar
  24. 24.
    Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV (2015)Google Scholar
  25. 25.
    Chatfield, K., Arandjelović, R., Parkhi, O., Zisserman, A.: On-the-fly learning for visual search of large-scale image and video datasets. Int. J. Multimedia Inf. Retrieval 4, 75–93 (2015)CrossRefGoogle Scholar
  26. 26.
    Crowley, E.J., Parkhi, O.M., Zisserman, A.: Face painting: querying art with photos. In: BMVC (2015)Google Scholar
  27. 27.
    Ghaleb, E., Tapaswi, M., Al-Halah, Z., Ekenel, H.K., Stiefelhagen, R.: ACCIO: a data set for face track retrieval in movies across age. In: ACM International Conference on Multimedia Retrieval (2015)Google Scholar
  28. 28.
    Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB (1999)Google Scholar
  29. 29.
    Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS (2009)Google Scholar
  30. 30.
    Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML (2011)Google Scholar
  31. 31.
    Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR (2011)Google Scholar
  32. 32.
    Kulis, B., Darrell, T.: Learning to hash with binary reconstructive embeddings. In: NIPS (2009)Google Scholar
  33. 33.
    Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: CVPR (2010)Google Scholar
  34. 34.
    Norouzi, M., Fleet, D.J.: Minimal loss hashing for compact binary codes. In: ICML (2011)Google Scholar
  35. 35.
    Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: CVPR (2012)Google Scholar
  36. 36.
    Rastegari, M., Farhadi, A., Forsyth, D.: Attribute discovery via predictable discriminative binary codes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 876–889. Springer, Heidelberg (2012). doi: 10.1007/978-3-642-33783-3_63 CrossRefGoogle Scholar
  37. 37.
    Wang, J., Liu, W., Sun, A., Jiang, Y.G.: Learning hash codes with listwise supervision. In: ICCV (2013)Google Scholar
  38. 38.
    Wang, J., Wang, J., Yu, N., Li, S.: Order preserving hashing for approximate nearest neighbor search. In: ACM International Conference on Multimedia (2013)Google Scholar
  39. 39.
    Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S.: Supervised hashing for image retrieval via image representation learning. In: AAAI (2014)Google Scholar
  40. 40.
    Zhang, R., Lin, L., Zhang, R., Zuo, W., Zhang, L.: Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE TIP 24, 4766–4779 (2015)MathSciNetGoogle Scholar
  41. 41.
    Lin, K., Yang, H.F., Hsiao, J.H., Chen, C.S.: Deep learning of binary hash codes for fast image retrieval. In: CVPRW (2015)Google Scholar
  42. 42.
    Liong, V.E., Lu, J., Wang, G., Moulin, P., Zhou, J.: Deep hashing for compact binary codes learning. In: CVPR (2015)Google Scholar
  43. 43.
    Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR (2015)Google Scholar
  44. 44.
    Zhao, F., Huang, Y., Wang, L., Tan, T.: Deep semantic ranking based hashing for multi-label image retrieval. In: CVPR (2015)Google Scholar
  45. 45.
    Norouzi, M., Fleet, D.J., Salakhutdinov, R.R.: Hamming distance metric learning. In: NIPS (2012)Google Scholar
  46. 46.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)Google Scholar
  47. 47.
    Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)Google Scholar
  48. 48.
    Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)Google Scholar
  49. 49.
    Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (2014)Google Scholar
  50. 50.
    Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Shishi Qiao
    • 1
    • 2
  • Ruiping Wang
    • 1
    • 2
    • 3
    Email author
  • Shiguang Shan
    • 1
    • 2
    • 3
  • Xilin Chen
    • 1
    • 2
    • 3
  1. 1.Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS)Institute of Computing Technology, CASBeijingChina
  2. 2.University of Chinese Academy of SciencesBeijingChina
  3. 3.Cooperative Medianet Innovation CenterBeijingChina

Personalised recommendations