Advertisement

A Comprehensive Study on Center Loss for Deep Face Recognition

  • Yandong Wen
  • Kaipeng Zhang
  • Zhifeng Li
  • Yu QiaoEmail author
Article
  • 299 Downloads

Abstract

Deep convolutional neural networks (CNNs) trained with the softmax loss have achieved remarkable successes in a number of close-set recognition problems, e.g. object recognition, action recognition, etc. Unlike these close-set tasks, face recognition is an open-set problem where the testing classes (persons) are usually different from those in training. This paper addresses the open-set property of face recognition by developing the center loss. Specifically, the center loss simultaneously learns a center for each class, and penalizes the distances between the deep features of the face images and their corresponding class centers. Training with the center loss enables CNNs to extract the deep features with two desirable properties: inter-class separability and intra-class compactness. In addition, we extend the center loss in two aspects. First, we adopt parameter sharing between the softmax loss and the center loss, to reduce the extra parameters introduced by centers. Second, we generalize the concept of center from a single point to a region in embedding space, which further allows us to account for intra-class variations. The advanced center loss significantly enhances the discriminative power of deep features. Experimental results show that our method achieves high accuracies on several important face recognition benchmarks, including Labeled Faces in the Wild, YouTube Faces, IJB-A Janus, and MegaFace Challenging 1.

Keywords

Convolutional neural networks Face recognition Discriminative feature learning Center loss 

Notes

Acknowledgements

This work was supported in part by National Natural Science Foundation of China (U1613211, 61633021) and Shenzhen Research Program (JCYJ20170818164704758, JCYJ20150925163005055, ZDSYS201605101739178).

References

  1. Ahonen, T., Hadid, A., & Pietikainen, M. (2006). Face description with local binary patterns: Application to face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(12), 2037–2041.CrossRefzbMATHGoogle Scholar
  2. Baccouche, M., Mamalet, F., Wolf, C., Garcia, C., & Baskurt, A. (2011). Sequential deep learning for human action recognition. In A. A. Salah & B. Lepri (Eds.), Human behavior understanding (pp. 29–39). New York: Springer.CrossRefGoogle Scholar
  3. Belhumeur, P. N., Hespanha, J. P., & Kriegman, D. J. (1997). Eigenfaces vs. fisherfaces: Recognition using class specific linear projection. IEEE Transactions on pattern analysis and machine intelligence, 19(7), 711–720.CrossRefGoogle Scholar
  4. Bredin, H. (2017). Tristounet: triplet loss for speaker turn embedding. In 2017 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 5430–5434). IEEE.Google Scholar
  5. Cao, Q., Shen, L., Xie, W., Parkhi, O. M., & Zisserman, A. (2017). Vggface2: A dataset for recognising faces across pose and age. arXiv:1710.08092.
  6. Cao, Z., Yin, Q., Tang, X., & Sun, J. (2010). Face recognition with learning-based descriptor. In 2010 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 2707–2714). IEEE.Google Scholar
  7. Chen, D., Cao, X., Wang, L., Wen, F., & Sun, J. (2012). Bayesian face revisited: A joint formulation. In A. Fitzgibbon, S. Lazebnik, P. Perona, Y. Sato, & C. Schmid (Eds.), Computer vision-ECCV 2012 (pp. 566–579). New York: Springer.CrossRefGoogle Scholar
  8. Chen, D., Cao, X., Wen, F., & Sun, J. (2013). Blessing of dimensionality: High-dimensional feature and its efficient compression for face verification. In 2013 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 3025–3032). IEEE.Google Scholar
  9. Chen, J. C., Patel, V. M., & Chellappa, R. (2016). Unconstrained face verification using deep CNN features. In 2016 IEEE winter conference on applications of computer vision (WACV) (pp. 1–9). IEEE.Google Scholar
  10. Chopra, S., Hadsell, R., & LeCun, Y. (2005). Learning a similarity metric discriminatively, with application to face verification. In IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005 (Vol. 1, pp. 539–546). IEEE.Google Scholar
  11. Chu, W., & Cai, D. (2017). Stacked similarity-aware autoencoders. In Proceedings of the 26th international joint conference on artificial intelligence (pp. 1561–1567). New Orleans: AAAI Press.Google Scholar
  12. Crosswhite, N., Byrne, J., Stauffer, C., Parkhi, O., Cao, Q., & Zisserman, A. (2017). Template adaptation for face verification and identification. In 2017 12th IEEE international conference on automatic face and gesture recognition (FG 2017) (pp. 1–8). IEEE.Google Scholar
  13. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition, 2005. CVPR 2005 (Vol. 1, pp. 886–893). IEEE.Google Scholar
  14. Duan, Y., Lu, J., Feng, J., & Zhou, J. (2017). Learning rotation-invariant local binary descriptor. IEEE Transactions on Image Processing, 26(8), 3636–3651.MathSciNetGoogle Scholar
  15. Glorot, X., & Bengio, Y. (2010). Understanding the difficulty of training deep feedforward neural networks. In Proceedings of the thirteenth international conference on artificial intelligence and statistics (pp. 249–256).Google Scholar
  16. Hadsell, R., Chopra, S., & LeCun, Y. (2006). Dimensionality reduction by learning an invariant mapping. In 2006 IEEE computer society conference on computer vision and pattern recognition (Vol. 2, pp. 1735–1742). IEEE.Google Scholar
  17. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. arXiv:1512.03385.
  18. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In Proceedings of the IEEE international conference on computer vision (pp. 1026–1034)Google Scholar
  19. Hu, J., Lu, J., & Tan, Y. P. (2014). Discriminative deep metric learning for face verification in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1875–1882).Google Scholar
  20. Huang, G. B., & Learned-Miller, E. (2014). Labeled faces in the wild: Updates and new reporting procedures. In Technical Report (pp 14–003). Amherst, MA, USA: Department of Computer Sciences, University of Massachusetts Amherst.Google Scholar
  21. Huang, G. B., Ramesh, M., Berg, T., & Learned-Miller, E. (2007). Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In Technical report Amherst: University of Massachusetts.Google Scholar
  22. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM international conference on multimedia (pp. 675–678). ACM.Google Scholar
  23. Jin, H., Wang, X., Liao, S., & Li, S. Z. (2017). Deep person re-identification with improved embedding. arXiv:1705.03332.
  24. Klare, B. F., Klein, B., Taborsky, E., Blanton, A., Cheney, J., Allen, K., Grother, P., Mah, A., & Jain, A. K. (2015). Pushing the frontiers of unconstrained face detection and recognition: Iarpa janus benchmark a. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1931–1939).Google Scholar
  25. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).Google Scholar
  26. LeCun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11), 2278–2324.CrossRefGoogle Scholar
  27. LeCun, Y., Cortes, C., & Burges, C. J. (1998). The MNIST database of handwritten digits.Google Scholar
  28. Liao, S., Lei, Z., Yi, D., Li, S. Z. (2014). A benchmark study of large-scale unconstrained face recognition. In 2014 IEEE international joint conference on biometrics (IJCB) (pp. 1–8). IEEE.Google Scholar
  29. Liu, J., Deng, Y., & Huang, C. (2015). Targeting ultimate accuracy: Face recognition via deep embedding. arXiv:1506.07310.
  30. Liu, W., Wen, Y., Yu, Z., & Yang, M. (2016). Large-margin softmax loss for convolutional neural networks. In ICML (pp. 507–516).Google Scholar
  31. Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., & Song, L. (2017). Sphereface: Deep hypersphere embedding for face recognition. In The IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1).Google Scholar
  32. Liu, Y., Li, H., & Wang, X. (2017). Rethinking feature discrimination and polymerization for large-scale recognition. arXiv:1710.00870.
  33. Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110.MathSciNetCrossRefGoogle Scholar
  34. Lu, J., Liong, V. E., Zhou, X., & Zhou, J. (2015). Learning compact binary face descriptor for face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 37(10), 2041–2056.CrossRefGoogle Scholar
  35. Masi, I., Rawls, S., Medioni, G., & Natarajan, P. (2016). Pose-aware face recognition in the wild. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4838–4846).Google Scholar
  36. Mika, S., Ratsch, G., Weston, J., Scholkopf, B., & Mullers, K. R. (1999). Fisher discriminant analysis with kernels. In Neural networks for signal processing IX, 1999. Proceedings of the 1999 IEEE signal processing society workshop (pp. 41–48). IEEE.Google Scholar
  37. Miller, D., Kemelmacher-Shlizerman, I., & Seitz, S. M. (2015). Megaface: A million faces for recognition at scale. arXiv:1505.02108.
  38. Nagi, J., Di Caro, G. A., Giusti, A., Nagi, F., & Gambardella, L. M. (2012). Convolutional neural support vector machines: Hybrid visual pattern classifiers for multi-robot systems. In 2012 11th international conference on machine learning and applications (ICMLA) (Vol. 1, pp. 27–32). IEEE.Google Scholar
  39. Ng, H. W., & Winkler, S. (2014). A data-driven approach to cleaning large face datasets. In 2014 IEEE international conference on image processing (ICIP) (pp. 343–347). IEEE.Google Scholar
  40. Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2015). Deep face recognition. Proceedings of the British Machine Vision, 1(3), 6.Google Scholar
  41. Prince, S. J., & Elder, J. H. (2007). Probabilistic linear discriminant analysis for inferences about identity. In IEEE 11th international conference on computer vision, 2007. ICCV 2007 (pp. 1–8). IEEE.Google Scholar
  42. Ranjan, R., Castillo, C. D., & Chellappa, R. (2017). L2-constrained softmax loss for discriminative face verification. arXiv:1703.09507.
  43. Rao, Y., Lin, J., Lu, J., & Zhou, J. (2017). Learning discriminative aggregation network for video-based face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3781–3790).Google Scholar
  44. Rippel, O., Paluri, M., Dollar, P., & Bourdev, L. (2015). Metric learning with adaptive density discrimination. arXiv:1511.05939.
  45. Sankaranarayanan, S., Alavi, A., Castillo, C. D., & Chellappa, R. (2016). Triplet probabilistic embedding for face verification and clustering. In 2016 IEEE 8th international conference on biometrics theory, applications and systems (BTAS) (pp. 1–8). IEEE.Google Scholar
  46. Schroff, F., Kalenichenko, D., & Philbin, J. (2015). Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 815–823)Google Scholar
  47. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
  48. Simonyan, K., Parkhi, O. M., Vedaldi, A., & Zisserman, A. (2013). Fisher vector faces in the wild. In BMVC (vol. 2, p. 4).Google Scholar
  49. Sohn, K. (2016). Improved deep metric learning with multi-class n-pair loss objective. In Advances in neural information processing systems (pp. 1857–1865).Google Scholar
  50. Sohn, K., Liu, S., Zhong, G., Yu, X., Yang, M. H., Chandraker, M. (2017). Unsupervised domain adaptation for face recognition in unlabeled videos. arXiv:1708.02191.
  51. Song, H. O., Xiang, Y., Jegelka, S., & Savarese, S. (2016). Deep metric learning via lifted structured feature embedding. In 2016 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 4004–4012). IEEE.Google Scholar
  52. Sun, Y., Chen, Y., Wang, X., & Tang, X. (2014). Deep learning face representation by joint identification-verification. In Advances in neural information processing systems (pp. 1988–1996).Google Scholar
  53. Sun, Y., Wang, X., & Tang, X. (2013). Hybrid deep learning for face verification. In Proceedings of the IEEE international conference on computer vision (pp. 1489–1496).Google Scholar
  54. Sun, Y., Wang, X., & Tang, X. (2014). Deep learning face representation from predicting 10,000 classes. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1891–1898).Google Scholar
  55. Sun, Y., Wang, X., & Tang, X. (2015). Deeply learned face representations are sparse, selective, and robust. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2892–2900).Google Scholar
  56. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1–9).Google Scholar
  57. Tadmor, O., Rosenwein, T., Shalev-Shwartz, S., Wexler, Y., & Shashua, A. (2016). Learning a metric embedding for face recognition using the multibatch method. In Advances in neural information processing systems (pp. 1388–1389).Google Scholar
  58. Taigman, Y., Yang, M., Ranzato, M., & Wolf, L. (2014). Deepface: Closing the gap to human-level performance in face verification. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1701–1708).Google Scholar
  59. Tang, Y. (2013). Deep learning using linear support vector machines. arXiv:1306.0239.
  60. Tran, L., Yin, X., & Liu, X. (2017). Disentangled representation learning gan for pose-invariant face recognition. In CVPR (Vol 3, p. 7).Google Scholar
  61. Vinyals, O., Jia, Y., Deng, L., & Darrell, T. (2012). Learning with recursive perceptual representations. In Advances in neural information processing systems (pp. 2825–2833).Google Scholar
  62. Wang, D., Otto, C., & Jain, A. K. (2015a). Face search at scale: 80 million gallery. arXiv:1507.07242.
  63. Wang, F., Xiang, X., Cheng, J., & Yuille, A. L. (2017). Normface: \( l\_2 \) hypersphere embedding for face verification. arXiv:1704.06369.
  64. Wang, H., Wang, Y., Zhou, Z., Ji, X., Li, Z., Gong, D., Zhou, J., & Liu, W. (2018a). Cosface: Large margin cosine loss for deep face recognition. arXiv:1801.09414.
  65. Wang, L., Qiao, Y., & Tang, X. (2015b). Action recognition with trajectory-pooled deep-convolutional descriptors. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4305–4314).Google Scholar
  66. Wang, X., & Tang, X. (2004). A unified framework for subspace face recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9), 1222–1228.CrossRefGoogle Scholar
  67. Wang, Y., Gong, D., Zhou, Z., Ji, X., Wang, H., Li, Z., Liu, W., & Zhang, T. (2018b). Orthogonal deep features decomposition for age-invariant face recognition. arXiv:1810.07599.
  68. Wen, Y., Li, Z., & Qiao, Y. (2016). Latent factor guided convolutional neural networks for age-invariant face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 4893–4901).Google Scholar
  69. Wen, Y., Zhang, K., Li, Z., & Qiao, Y. (2016). A discriminative feature learning approach for deep face recognition. In B. Leibe, J. Matas, N. Sebe, & M. Welling (Eds.), European conference on computer vision (pp. 499–515). New York: Springer.Google Scholar
  70. Wisniewksi, G., Bredin, H., Gelly, G., & Barras, C. (2017). Combining speaker turn embedding and incremental structure prediction for low-latency speaker diarization. Proceedings of Interspeech, 2017, 3582–3586.CrossRefGoogle Scholar
  71. Wolf, L., Hassner, T., & Maoz, I. (2011). Face recognition in unconstrained videos with matched background similarity. In 2011 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 529–534). IEEE.Google Scholar
  72. Wright, J., Yang, A. Y., Ganesh, A., Sastry, S. S., & Ma, Y. (2009). Robust face recognition via sparse representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2), 210–227.CrossRefGoogle Scholar
  73. Wu, W., Kan, M., Liu, X., Yang, Y., Shan, S., & Chen, X. (2017). Recursive spatial transformer (rest) for alignment-free face recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 3772–3780).Google Scholar
  74. Yang, J., Ren, P., Chen, D., Wen, F., Li, H., & Hua, G. (2016). Neural aggregation network for video face recognition. arXiv:1603.05474.
  75. Yang, X., Yumer, E., Asente, P., Kraley, M., Kifer, D., & Lee Giles, C. (2017). Learning to extract semantic structure from documents using multimodal fully convolutional neural networks. In The IEEE conference on computer vision and pattern recognition (CVPR).Google Scholar
  76. Yao, J., Yu, Y., Deng, Y., & Sun, C. (2017). A feature learning approach for image retrieval. In International conference on neural information processing (pp. 405–412). New York: Springer.Google Scholar
  77. Yi, D., Lei, Z., Liao, S., & Li, S. Z. (2014). Learning face representation from scratch. arXiv:1411.7923.
  78. Zhang, K., Zhang, Z., Li, Z., & Qiao, Y. (2016). Joint face detection and alignment using multi-task cascaded convolutional networks. arXiv:1604.02878.
  79. Zhang, L., Yang, M., & Feng, X. (2011). Sparse representation or collaborative representation: Which helps face recognition? In 2011 IEEE international conference on computer vision (ICCV) (pp. 471–478). IEEE.Google Scholar
  80. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., & Torralba, A. (2014). Object detectors emerge in deep scene cnns. arXiv:1412.6856.
  81. Zhou, B., Lapedriza, A., Xiao, J., Torralba, A., & Oliva, A. (2014). Learning deep features for scene recognition using places database. In Advances in neural information processing systems (pp. 487–495).Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Shenzhen Key Lab on Computer Vision and Pattern RecognitionShenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina
  2. 2.Tencent AI LabShenzhenChina
  3. 3.SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of SciencesShenzhenChina

Personalised recommendations