Abstract
Face clustering in videos is to partition a large amount of faces into a given number of clusters, such that some measure of distance is minimized within clusters and maximized between clusters. In real-world videos, head pose, facial expression, scale, illumination, occlusion and some uncontrolled factors may dramatically change the appearance variations of faces. In this paper, we tackle this problem by learning non-linear metric function with a deep convolutional neural network from the input image to a low-dimensional feature embedding with the visual constraints among face tracks. Our network directly optimizes the embedding space so that the Euclidean distances correspond to a measure of semantic face similarity. This is technically realized by minimizing an improved triplet loss function, which pushes the negative face away from the positive pairs, and requires the distance of the positive pair to be less than a margin. We extensively evaluate the proposed algorithm on a set of challenging videos and demonstrate significant performance improvement over existing techniques.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chopra, S., Hadsell, R., LeCun, Y.: Learning a similarity metric discriminatively, with application to face verification. In: CVPR (2005)
Cinbis, R.G., Verbeek, J., Schmid, C.: Unsupervised metric learning for face identification in TV video. In: ICCV (2011)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)
Ding, S., Lin, L., Wang, G., Chao, H.: Deep feature learning with relative distance comparison for person re-identification. PR 48(10), 2993–3003 (2015)
Guillaumin, M., Verbeek, J., Schmid, C.: Is that you? metric learning approaches for face identification. In: CVPR (2009)
Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. arXiv (2014)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: BMVC (2015)
Roth, M., Bauml, M., Nevatia, R., Stiefelhagen, R.: Robust multi-pose face tracking by multi-stage tracklet association. In: ICPR (2012)
Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR (2015)
See, J., Eswaran, C.: Exemplar extraction using spatio-temporal hierarchical agglomerative clustering for face recognition in video. In: ICCV, pp. 1481–1486 (2011)
Tapaswi, M., Parkhi, O.M., Rahtu, E., Sommerlade, E., Stiefelhagen, R., Zisserman, A.: Total cluster: a person agnostic clustering method for broadcast videos. In: ICVGIP (2014)
Wang, J., Song, Y., Leung, T., Rosenberg, C., Wang, J., Philbin, J., Chen, B., Wu, Y.: Learning fine-grained image similarity with deep ranking. In: CVPR, pp. 1386–1393 (2014)
Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: NIPS (2005)
Wu, B., Lyu, S., Hu, B.G., Ji, Q.: Simultaneous clustering and tracklet linking for multi-face tracking in videos. In: ICCV (2013)
Wu, B., Zhang, Y., Hu, B.G., Ji, Q.: Constrained clustering and its application to face clustering in videos. In: CVPR (2013)
Xiao, S., Tan, M., Xu, D.: Weighted block-sparse low rank representation for face clustering in videos. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8693, pp. 123–138. Springer, Heidelberg (2014). doi:10.1007/978-3-319-10599-4_9
Yi, D., Lei, Z., Liao, S., Li, S.Z.: Learning face representation from scratch. arXiv (2014)
Acknowledgement
This work is supported by National Basic Research Program of China (973 Program) under Grant No. 2015CB351705, and the National Natural Science Foundation of China (NSFC) under Grant No. 61332018.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Zhang, S., Gong, Y., Wang, J. (2016). Deep Metric Learning with Improved Triplet Loss for Face Clustering in Videos. In: Chen, E., Gong, Y., Tie, Y. (eds) Advances in Multimedia Information Processing - PCM 2016. PCM 2016. Lecture Notes in Computer Science(), vol 9916. Springer, Cham. https://doi.org/10.1007/978-3-319-48890-5_49
Download citation
DOI: https://doi.org/10.1007/978-3-319-48890-5_49
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-48889-9
Online ISBN: 978-3-319-48890-5
eBook Packages: Computer ScienceComputer Science (R0)