Abstract
In this paper, we address the problem of face video retrieval. Given one face video of a person as query, we search the database and return the most relevant face videos, i.e., ones have same class label with the query. Such problem is of great challenge. For one thing, faces in videos have large intra-class variations. For another, it is a retrieval task which has high request on efficiency of space and time. To handle such challenges, this paper proposes a novel Deep Video Code (DVC) method which encodes face videos into compact binary codes. Specifically, we devise a multi-branch CNN architecture that takes face videos as training inputs, models each of them as a unified representation by temporal feature pooling operation, and finally projects the high-dimensional representations into Hamming space to generate a single binary code for each video as output, where distance of dissimilar pairs is larger than that of similar pairs by a margin. To this end, a smooth upper bound on triplet loss function which can avoid bad local optimal solution is elaborately designed to preserve relative similarity among face videos in the output space. Extensive experiments with comparison to the state-of-the-arts verify the effectiveness of our method.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
The source code of DVC is available at http://vipl.ict.ac.cn/resources/codes.
References
Shan, C.: Face recognition and retrieval in video. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds.) Video Search and Mining. Springer, Heidelberg (2010)
Sivic, J., Everingham, M., Zisserman, A.: Person spotting: video shot retrieval for face sets. In: Leow, W.-K., Lew, M.S., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 226–236. Springer, Heidelberg (2005). doi:10.1007/11526346_26
Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal image sequence. In: FG (1998)
Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: CVPR (2010)
Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: CVPR (2011)
Kim, T.K., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. IEEE TPAMI 29, 1005–1018 (2007)
Wang, R., Chen, X.: Manifold discriminant analysis. In: CVPR (2009)
Wang, R., Shan, S., Chen, X., Gao, W.: Manifold-manifold distance with application to face recognition based on image set. In: CVPR (2008)
Wang, R., Guo, H., Davis, L.S., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: CVPR (2012)
Parkhi, O., Simonyan, K., Vedaldi, A., Zisserman, A.: A compact and discriminative face track descriptor. In: CVPR (2014)
Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: Deep networks for video classification. In: CVPR (2015)
Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927 (2014)
Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: CVPR (2015)
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. arXiv preprint arXiv:1511.06452 (2015)
Li, Y., Wang, R., Shan, S., Chen, X.: Hierarchical hybrid statistic based video binary code and its application to face retrieval in TV-series. In: FG (2015)
Arandjelović, O., Zisserman, A.: Automatic face recognition for film character retrieval in feature-length films. In: CVPR (2005)
Arandjelović, O., Zisserman, A.: On film character retrieval in feature-length films. In: Hammoud, R.I. (ed.) Interactive Video. Springer, Heidelberg (2006)
Everingham, M., Sivic, J., Zisserman, A.: Hello! My name is... Buffy-automatic naming of characters in TV video. In: BMVC (2006)
Li, Y., Wang, R., Cui, Z., Shan, S., Chen, X.: Compact video code and its application to robust face retrieval in TV-series. In: BMVC (2014)
Li, Y., Wang, R., Huang, Z., Shan, S., Chen, X.: Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold. In: CVPR (2015)
Dong, Z., Jia, S., Wu, T., Pei, M.: Face video retrieval via deep learning of binary hash representations. In: AAAI (2016)
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE TPAMI 35, 221–231 (2013)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV (2015)
Chatfield, K., Arandjelović, R., Parkhi, O., Zisserman, A.: On-the-fly learning for visual search of large-scale image and video datasets. Int. J. Multimedia Inf. Retrieval 4, 75–93 (2015)
Crowley, E.J., Parkhi, O.M., Zisserman, A.: Face painting: querying art with photos. In: BMVC (2015)
Ghaleb, E., Tapaswi, M., Al-Halah, Z., Ekenel, H.K., Stiefelhagen, R.: ACCIO: a data set for face track retrieval in movies across age. In: ACM International Conference on Multimedia Retrieval (2015)
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB (1999)
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS (2009)
Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML (2011)
Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR (2011)
Kulis, B., Darrell, T.: Learning to hash with binary reconstructive embeddings. In: NIPS (2009)
Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: CVPR (2010)
Norouzi, M., Fleet, D.J.: Minimal loss hashing for compact binary codes. In: ICML (2011)
Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: CVPR (2012)
Rastegari, M., Farhadi, A., Forsyth, D.: Attribute discovery via predictable discriminative binary codes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 876–889. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33783-3_63
Wang, J., Liu, W., Sun, A., Jiang, Y.G.: Learning hash codes with listwise supervision. In: ICCV (2013)
Wang, J., Wang, J., Yu, N., Li, S.: Order preserving hashing for approximate nearest neighbor search. In: ACM International Conference on Multimedia (2013)
Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S.: Supervised hashing for image retrieval via image representation learning. In: AAAI (2014)
Zhang, R., Lin, L., Zhang, R., Zuo, W., Zhang, L.: Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE TIP 24, 4766–4779 (2015)
Lin, K., Yang, H.F., Hsiao, J.H., Chen, C.S.: Deep learning of binary hash codes for fast image retrieval. In: CVPRW (2015)
Liong, V.E., Lu, J., Wang, G., Moulin, P., Zhou, J.: Deep hashing for compact binary codes learning. In: CVPR (2015)
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR (2015)
Zhao, F., Huang, Y., Wang, L., Tan, T.: Deep semantic ranking based hashing for multi-label image retrieval. In: CVPR (2015)
Norouzi, M., Fleet, D.J., Salakhutdinov, R.R.: Hamming distance metric learning. In: NIPS (2012)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (2014)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)
Acknowledgements
This work is partially supported by 973 Program under contract No. 2015CB351802, Natural Science Foundation of China under contracts Nos. 61390511, 61379083, 61272321, and Youth Innovation Promotion Association CAS No. 2015085.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
1 Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Qiao, S., Wang, R., Shan, S., Chen, X. (2017). Deep Video Code for Efficient Face Video Retrieval. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10113. Springer, Cham. https://doi.org/10.1007/978-3-319-54187-7_20
Download citation
DOI: https://doi.org/10.1007/978-3-319-54187-7_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54186-0
Online ISBN: 978-3-319-54187-7
eBook Packages: Computer ScienceComputer Science (R0)