Deep Video Code for Efficient Face Video Retrieval

Qiao, Shishi; Wang, Ruiping; Shan, Shiguang; Chen, Xilin

doi:10.1007/978-3-319-54187-7_20

Deep Video Code for Efficient Face Video Retrieval

Shishi Qiao^17,18,
Ruiping Wang^17,18,19,
Shiguang Shan^17,18,19 &
…
Xilin Chen^17,18,19

Conference paper
First Online: 11 March 2017

2415 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 10113))

Abstract

In this paper, we address the problem of face video retrieval. Given one face video of a person as query, we search the database and return the most relevant face videos, i.e., ones have same class label with the query. Such problem is of great challenge. For one thing, faces in videos have large intra-class variations. For another, it is a retrieval task which has high request on efficiency of space and time. To handle such challenges, this paper proposes a novel Deep Video Code (DVC) method which encodes face videos into compact binary codes. Specifically, we devise a multi-branch CNN architecture that takes face videos as training inputs, models each of them as a unified representation by temporal feature pooling operation, and finally projects the high-dimensional representations into Hamming space to generate a single binary code for each video as output, where distance of dissimilar pairs is larger than that of similar pairs by a margin. To this end, a smooth upper bound on triplet loss function which can avoid bad local optimal solution is elaborately designed to preserve relative similarity among face videos in the output space. Extensive experiments with comparison to the state-of-the-arts verify the effectiveness of our method.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

1.
The source code of DVC is available at http://vipl.ict.ac.cn/resources/codes.

References

Shan, C.: Face recognition and retrieval in video. In: Schonfeld, D., Shan, C., Tao, D., Wang, L. (eds.) Video Search and Mining. Springer, Heidelberg (2010)
Google Scholar
Sivic, J., Everingham, M., Zisserman, A.: Person spotting: video shot retrieval for face sets. In: Leow, W.-K., Lew, M.S., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 226–236. Springer, Heidelberg (2005). doi:10.1007/11526346_26
Chapter Google Scholar
Yamaguchi, O., Fukui, K., Maeda, K.: Face recognition using temporal image sequence. In: FG (1998)
Google Scholar
Cevikalp, H., Triggs, B.: Face recognition based on image sets. In: CVPR (2010)
Google Scholar
Hu, Y., Mian, A.S., Owens, R.: Sparse approximated nearest points for image set classification. In: CVPR (2011)
Google Scholar
Kim, T.K., Kittler, J., Cipolla, R.: Discriminative learning and recognition of image set classes using canonical correlations. IEEE TPAMI 29, 1005–1018 (2007)
Article Google Scholar
Wang, R., Chen, X.: Manifold discriminant analysis. In: CVPR (2009)
Google Scholar
Wang, R., Shan, S., Chen, X., Gao, W.: Manifold-manifold distance with application to face recognition based on image set. In: CVPR (2008)
Google Scholar
Wang, R., Guo, H., Davis, L.S., Dai, Q.: Covariance discriminative learning: a natural and efficient approach to image set classification. In: CVPR (2012)
Google Scholar
Parkhi, O., Simonyan, K., Vedaldi, A., Zisserman, A.: A compact and discriminative face track descriptor. In: CVPR (2014)
Google Scholar
Ng, J.Y.H., Hausknecht, M., Vijayanarasimhan, S., Vinyals, O., Monga, R., Toderici, G.: Beyond short snippets: Deep networks for video classification. In: CVPR (2015)
Google Scholar
Wang, J., Shen, H.T., Song, J., Ji, J.: Hashing for similarity search: a survey. arXiv preprint arXiv:1408.2927 (2014)
Xu, Z., Yang, Y., Hauptmann, A.G.: A discriminative CNN video representation for event detection. In: CVPR (2015)
Google Scholar
Song, H.O., Xiang, Y., Jegelka, S., Savarese, S.: Deep metric learning via lifted structured feature embedding. arXiv preprint arXiv:1511.06452 (2015)
Li, Y., Wang, R., Shan, S., Chen, X.: Hierarchical hybrid statistic based video binary code and its application to face retrieval in TV-series. In: FG (2015)
Google Scholar
Arandjelović, O., Zisserman, A.: Automatic face recognition for film character retrieval in feature-length films. In: CVPR (2005)
Google Scholar
Arandjelović, O., Zisserman, A.: On film character retrieval in feature-length films. In: Hammoud, R.I. (ed.) Interactive Video. Springer, Heidelberg (2006)
Google Scholar
Everingham, M., Sivic, J., Zisserman, A.: Hello! My name is... Buffy-automatic naming of characters in TV video. In: BMVC (2006)
Google Scholar
Li, Y., Wang, R., Cui, Z., Shan, S., Chen, X.: Compact video code and its application to robust face retrieval in TV-series. In: BMVC (2014)
Google Scholar
Li, Y., Wang, R., Huang, Z., Shan, S., Chen, X.: Face video retrieval with image query via hashing across Euclidean space and Riemannian manifold. In: CVPR (2015)
Google Scholar
Dong, Z., Jia, S., Wu, T., Pei, M.: Face video retrieval via deep learning of binary hash representations. In: AAAI (2016)
Google Scholar
Ji, S., Xu, W., Yang, M., Yu, K.: 3D convolutional neural networks for human action recognition. IEEE TPAMI 35, 221–231 (2013)
Article Google Scholar
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: CVPR (2014)
Google Scholar
Pfister, T., Charles, J., Zisserman, A.: Flowing convnets for human pose estimation in videos. In: ICCV (2015)
Google Scholar
Chatfield, K., Arandjelović, R., Parkhi, O., Zisserman, A.: On-the-fly learning for visual search of large-scale image and video datasets. Int. J. Multimedia Inf. Retrieval 4, 75–93 (2015)
Article Google Scholar
Crowley, E.J., Parkhi, O.M., Zisserman, A.: Face painting: querying art with photos. In: BMVC (2015)
Google Scholar
Ghaleb, E., Tapaswi, M., Al-Halah, Z., Ekenel, H.K., Stiefelhagen, R.: ACCIO: a data set for face track retrieval in movies across age. In: ACM International Conference on Multimedia Retrieval (2015)
Google Scholar
Gionis, A., Indyk, P., Motwani, R.: Similarity search in high dimensions via hashing. In: VLDB (1999)
Google Scholar
Weiss, Y., Torralba, A., Fergus, R.: Spectral hashing. In: NIPS (2009)
Google Scholar
Liu, W., Wang, J., Kumar, S., Chang, S.F.: Hashing with graphs. In: ICML (2011)
Google Scholar
Gong, Y., Lazebnik, S.: Iterative quantization: a procrustean approach to learning binary codes. In: CVPR (2011)
Google Scholar
Kulis, B., Darrell, T.: Learning to hash with binary reconstructive embeddings. In: NIPS (2009)
Google Scholar
Wang, J., Kumar, S., Chang, S.F.: Semi-supervised hashing for scalable image retrieval. In: CVPR (2010)
Google Scholar
Norouzi, M., Fleet, D.J.: Minimal loss hashing for compact binary codes. In: ICML (2011)
Google Scholar
Liu, W., Wang, J., Ji, R., Jiang, Y.G., Chang, S.F.: Supervised hashing with kernels. In: CVPR (2012)
Google Scholar
Rastegari, M., Farhadi, A., Forsyth, D.: Attribute discovery via predictable discriminative binary codes. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7577, pp. 876–889. Springer, Heidelberg (2012). doi:10.1007/978-3-642-33783-3_63
Chapter Google Scholar
Wang, J., Liu, W., Sun, A., Jiang, Y.G.: Learning hash codes with listwise supervision. In: ICCV (2013)
Google Scholar
Wang, J., Wang, J., Yu, N., Li, S.: Order preserving hashing for approximate nearest neighbor search. In: ACM International Conference on Multimedia (2013)
Google Scholar
Xia, R., Pan, Y., Lai, H., Liu, C., Yan, S.: Supervised hashing for image retrieval via image representation learning. In: AAAI (2014)
Google Scholar
Zhang, R., Lin, L., Zhang, R., Zuo, W., Zhang, L.: Bit-scalable deep hashing with regularized similarity learning for image retrieval and person re-identification. IEEE TIP 24, 4766–4779 (2015)
MathSciNet Google Scholar
Lin, K., Yang, H.F., Hsiao, J.H., Chen, C.S.: Deep learning of binary hash codes for fast image retrieval. In: CVPRW (2015)
Google Scholar
Liong, V.E., Lu, J., Wang, G., Moulin, P., Zhou, J.: Deep hashing for compact binary codes learning. In: CVPR (2015)
Google Scholar
Lai, H., Pan, Y., Liu, Y., Yan, S.: Simultaneous feature learning and hash coding with deep neural networks. In: CVPR (2015)
Google Scholar
Zhao, F., Huang, Y., Wang, L., Tan, T.: Deep semantic ranking based hashing for multi-label image retrieval. In: CVPR (2015)
Google Scholar
Norouzi, M., Fleet, D.J., Salakhutdinov, R.R.: Hamming distance metric learning. In: NIPS (2012)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)
Google Scholar
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: ICML (2010)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM International Conference on Multimedia (2014)
Google Scholar
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep feedforward neural networks. In: AISTATS (2010)
Google Scholar

Download references

Acknowledgements

This work is partially supported by 973 Program under contract No. 2015CB351802, Natural Science Foundation of China under contracts Nos. 61390511, 61379083, 61272321, and Youth Innovation Promotion Association CAS No. 2015085.

Author information

Authors and Affiliations

Key Laboratory of Intelligent Information Processing of Chinese Academy of Sciences (CAS), Institute of Computing Technology, CAS, Beijing, 100190, China
Shishi Qiao, Ruiping Wang, Shiguang Shan & Xilin Chen
University of Chinese Academy of Sciences, Beijing, 100049, China
Shishi Qiao, Ruiping Wang, Shiguang Shan & Xilin Chen
Cooperative Medianet Innovation Center, Beijing, China
Ruiping Wang, Shiguang Shan & Xilin Chen

Authors

Shishi Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Ruiping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Shiguang Shan
View author publications
You can also search for this author in PubMed Google Scholar
Xilin Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruiping Wang .

Editor information

Editors and Affiliations

National Tsing Hua University, Hsinchu, Taiwan
Shang-Hong Lai
Graz University of Technology, Graz, Austria
Vincent Lepetit
Drexel University, Philadelphia, Pennsylvania, USA
Ko Nishino
The University of Tokyo , Tokyo, Japan
Yoichi Sato

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 180 KB)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, S., Wang, R., Shan, S., Chen, X. (2017). Deep Video Code for Efficient Face Video Retrieval. In: Lai, SH., Lepetit, V., Nishino, K., Sato, Y. (eds) Computer Vision – ACCV 2016. ACCV 2016. Lecture Notes in Computer Science(), vol 10113. Springer, Cham. https://doi.org/10.1007/978-3-319-54187-7_20

Download citation

DOI: https://doi.org/10.1007/978-3-319-54187-7_20
Published: 11 March 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-54186-0
Online ISBN: 978-3-319-54187-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics