Deep Metric Learning with Improved Triplet Loss for Face Clustering in Videos

Zhang, Shun; Gong, Yihong; Wang, Jinjun

doi:10.1007/978-3-319-48890-5_49

Shun Zhang¹⁶,
Yihong Gong¹⁶ &
Jinjun Wang¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9916))

Included in the following conference series:

Pacific Rim Conference on Multimedia

3242 Accesses
12 Citations

Abstract

Face clustering in videos is to partition a large amount of faces into a given number of clusters, such that some measure of distance is minimized within clusters and maximized between clusters. In real-world videos, head pose, facial expression, scale, illumination, occlusion and some uncontrolled factors may dramatically change the appearance variations of faces. In this paper, we tackle this problem by learning non-linear metric function with a deep convolutional neural network from the input image to a low-dimensional feature embedding with the visual constraints among face tracks. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function, which pushes the negative face away from the positive pairs, and requires the distance of the positive pair to be less than a margin. We extensively evaluate the proposed algorithm on a set of challenging videos and demonstrate significant performance improvement over existing techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
http://shunzhang.me.pn/papers/eccv2016/.

References

Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR (2005)
Google Scholar
Cinbis, R.G., Verbeek, J., Schmid, C.: Unsupervised metric learning for face identification in TV video. In: ICCV (2011)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Google Scholar
Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. PR 48(10), 2993–3003 (2015)
Google Scholar
Guillaumin, M., Verbeek, J., Schmid, C.: Is that you? metric learning approaches for face identification. In: CVPR (2009)
Google Scholar
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv (2014)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)
Google Scholar
Roth, M., Bauml, M., Nevatia, R., Stiefelhagen, R.: Robust multi-pose face tracking by multi-stage tracklet association. In: ICPR (2012)
Google Scholar
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)
Google Scholar
See, J., Eswaran, C.: Exemplar extraction using spatio-temporal hierarchical agglomerative clustering for face recognition in video. In: ICCV, pp. 1481–1486 (2011)
Google Scholar
Tapaswi, M., Parkhi, O.M., Rahtu, E., Sommerlade, E., Stiefelhagen, R., Zisserman, A.: Total cluster: a person agnostic clustering method for broadcast videos. In: ICVGIP (2014)
Google Scholar
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: CVPR, pp. 1386–1393 (2014)
Google Scholar
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: NIPS (2005)
Google Scholar
Wu, B., Lyu, S., Hu, B.G., Ji, Q.: Simultaneous clustering and tracklet linking for multi-face tracking in videos. In: ICCV (2013)
Google Scholar
Wu, B., Zhang, Y., Hu, B.G., Ji, Q.: Constrained clustering and its application to face clustering in videos. In: CVPR (2013)
Google Scholar
Xiao, S., Tan, M., Xu, D.: Weighted block-sparse low rank representation for face clustering in videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8693, pp. 123–138. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10599-4_9
Google Scholar
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv (2014)
Google Scholar

Download references

Acknowledgement

This work is supported by National Basic Research Program of China (973 Program) under Grant No. 2015CB351705, and the National Natural Science Foundation of China (NSFC) under Grant No. 61332018.

Author information

Authors and Affiliations

Institute of Artificial Intelligence and Robotics, Xi’an Jiaotong University, Xi’an, 710049, Shaanxi, China
Shun Zhang, Yihong Gong & Jinjun Wang

Authors

Shun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yihong Gong
View author publications
You can also search for this author in PubMed Google Scholar
Jinjun Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yihong Gong .

Editor information

Editors and Affiliations

Zhengzhou University, Zhengzhou, China
Enqing Chen
Jiaotong University, Xi’an, China
Yihong Gong
Zhengzhou University, Zhengzhou, China
Yun Tie

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Zhang, S., Gong, Y., Wang, J. (2016). Deep Metric Learning with Improved Triplet Loss for Face Clustering in Videos. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9916. Springer, Cham. https://doi.org/10.1007/978-3-319-48890-5_49

Download citation

DOI: https://doi.org/10.1007/978-3-319-48890-5_49
Published: 27 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48889-9
Online ISBN: 978-3-319-48890-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics