Skip to main content
Log in

Sample hardness guided softmax loss for face recognition

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Face recognition (FR) has received remarkable attention for improving feature discrimination with the development of deep convolutional neural networks (CNNs). Although the existing methods have achieved great success in designing margin-based loss functions by using hard sample mining strategy, they still suffer from two issues: 1) the neglect of some training status and feature position information and 2) inaccurate weight assignment for hard samples due to the coarse hardness description. To solve these issues, we develop a novel loss function, namely Hardness Loss, to adaptively assign weights for the misclassified (hard) samples guided by their corresponding hardness, which accounts for multiple training status and feature position information. Specifically, we propose an estimator to provide the real-time training status to precisely compute the hardness for weight assignment. To the best of our knowledge, this is the first attempt to design a loss function by using multiple pieces of information about the training status and feature positions. Extensive experiments on popular face benchmarks demonstrate that the proposed method is superior to the state-of-the-art (SOTA) losses under various FR scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Cao Q, Shen L, Xie W et al (2018) Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), IEEE, pp 67–74

  2. Chen S, Liu Y, Gao X et al (2018) Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In: Chinese Conference on Biometric Recognition, Springer, pp 428–438

  3. Chen X, Lau HY (2021) The identity-level angular triplet loss for cross-age face recognition. Appl Intell, pp 1–10

  4. Deng J, Guo J, Xue N et al (2019) Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4690–4699

  5. Guo Y, Zhang L, Hu Y et al (2016) Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: European conference on computer vision, Springer, pp 87–102

  6. He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  7. Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708

  8. Huang GB, Mattar M, Berg T et al (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In: Workshop on faces in’Real-Life’Images: detection, alignment, and recognition

  9. Huang Y, Wang Y, Tai Y et al (2020) Curricularface: adaptive curriculum learning loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5901–5910

  10. Kemelmacher-Shlizerman I, Seitz SM, Miller D et al (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4873–4882

  11. Kobayashi T (2021) Group softmax loss with discriminative feature grouping. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2615–2624

  12. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25:1097–1105

    Google Scholar 

  13. Kumar N, Sukavanam N (2020) An improved cnn framework for detecting and tracking human body in unconstraint environment. Knowledge-Based Systems 193(105):198

    Google Scholar 

  14. LeCun Y, Bottou L, Bengio Y, et al. (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  15. Li Y, Guo K, Lu Y et al (2021) Cropping and attention based approach for masked face recognition. Appl Intell 51(5):3012–3025

    Article  Google Scholar 

  16. Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988

  17. Liu W, Wen Y, Yu Z et al (2017) Sphereface: Deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220

  18. Meng Q, Zhao S, Huang Z et al (2021) Magface: A universal representation for face recognition and quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14,225–14,234

  19. Moschoglou S, Papaioannou A, Sagonas C et al (2017) Agedb: the first manually collected, in-the-wild age database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 51–59

  20. Ng HW, Winkler S (2014) A data-driven approach to cleaning large face datasets. In: IEEE international conference on image processing (ICIP), vol 2014. IEEE, pp 343–347

  21. Paszke A, Gross S, Massa F, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems 32:8026–8037

    Google Scholar 

  22. Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823

  23. Sengupta S, Chen J C, Castillo C et al (2016) Frontal to profile face verification in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV), vol 2016. IEEE, pp 1–9

  24. Shi Y, Jain AK (2019) Probabilistic face embeddings. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

  25. Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769

  26. Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9

  27. Vu HN, Nguyen MH, Pham C (2021) Masked face recognition with convolutional neural networks and local binary patterns. Appl Intell, pp 1–16

  28. Wang CP, Wei W, Zhang JS et al (2018) Robust face recognition via discriminative and common hybrid dictionary learning. Appl Intell 48(1):156–165

    Article  Google Scholar 

  29. Wang H, Wang Y, Zhou Z et al (2018) Cosface: Large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274

  30. Wang X, Zhang S, Wang S et al (2020) Mis-classified vector guided softmax loss for face recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 12,241– 12,248

  31. Wen Y, Zhang K, Li Z et al (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Springer, pp 499–515

  32. Whitelam C, Taborsky E, Blanton A et al (2017) Iarpa janus benchmark-b face dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 90–98

  33. Xie W, Zisserman A (2018) Multicolumn networks for face recognition. In: Proceedings of the British Machine Vision Conference (BMVC)

  34. Xie W, Shen L, Zisserman A (2018) Comparator networks. In: Proceedings of the European conference on computer vision (ECCV), pp 782–797

  35. Zhang L, Sun L, Yu L et al (2021) Arface: attention-aware and regularization for face recognition with reinforcement learning. IEEE Transactions on Biometrics, Behavior, and Identity Science

  36. Zhang X, Zhao R, Qiao Y et al (2019) Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,823–10,832

  37. Zhang X, Zhao R, Yan J et al (2019) P2sgrad: Refined gradients for optimizing deep face models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9906–9914

  38. Zhao K, Xu J, Cheng MM (2019) Regularface: Deep face recognition via exclusive regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1136–1144

  39. Zheng T, Deng W (2018) Cross-pose lfw: a database for studying cross-pose face recognition in unconstrained environments. Beijing University of Posts and Telecommunications 5:7

    Google Scholar 

Download references

Acknowledgements

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant 2018B010109001, Grant 2019B020214001 and Grant 2020B1111010002; and Guangdong Marine Economic Development Project under Grant GDNRC[2020]018.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Lianfang Tian or Qiliang Du.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Gradient formula derivation

Appendix A: Gradient formula derivation

Let’s rewrite the formulation of the Hardness loss:

$$ \begin{array}{@{}rcl@{}} &\mathcal{L}=-\log \frac{e^{s T\left( \cos \theta_{y_{i}}\right)}}{e^{s T\left( \cos \theta_{y_{i}}\right)}+{\sum}_{j=1, j \neq y_{i}}^{n} e^{s N\left( t, \cos \theta_{j}\right)}} \\ &T\left( \cos \theta_{y_{i}}\right)=\cos \left( \theta_{y_{i}}+m\right) \\ &N\left( t, \cos \theta_{j}\right)= \left\{\begin{array}{cc} \cos \theta_{j} & \text {easy} \\ \cos \theta_{j}\left( \text{dif}_{n e g}+t_{p}^{(k)} \cdot \text{dif}_{b}+t_{p}^{(k)}\right) & \text {hard} \end{array}\right. \end{array} $$

Before we calculate the gradients w.r.t. xi and Wj, the logits can be summarized in following three cases:

$$ \begin{array}{@{}rcl@{}} &\mathcal{L}=-\log \frac{e^{f_{y_{i}}}}{e^{f_{y_{i}}}+{\sum}_{j=1, j \neq y_{i}}^{n} e^{f_{j}}} \\ &f_{j}=\left\{\begin{array}{cc} s \cdot \cos \left( \theta_{y_{i}}+m\right) & j=y_{i} \\ s \cdot \cos \theta_{j} & j \neq y_{i}, \text { easy } \\ s \cos \theta_{j}\left( \text{dif}_{n e g}+t_{p}^{(k)} \cdot \text{dif}_{b}+t_{p}^{(k)}\right) & j \neq y_{i}, \text { hard } \end{array}\right. \end{array} $$

By chain rule, we can get:

$$ \begin{array}{@{}rcl@{}} \frac{\partial \mathcal{L}}{\partial x_{i}}=\frac{\partial \mathcal{L}}{\partial f_{j}} \cdot \frac{\partial f_{j}}{\partial \cos \theta_{j}} \cdot \frac{\partial \cos \theta_{j}}{\partial x_{i}} \\ \frac{\partial \mathcal{L}}{\partial W_{j}}=\frac{\partial \mathcal{L}}{\partial f_{j}} \cdot \frac{\partial f_{j}}{\partial \cos \theta_{j}} \cdot \frac{\partial \cos \theta_{j}}{\partial W_{j}} \end{array} $$

For \(\frac {\partial {\mathscr{L}}}{\partial f_{j}}\), it is easily calculated with Softmax function:

$$ \begin{array}{@{}rcl@{}} \frac{\partial L}{\partial f_{j}}=\left\{\begin{array}{ll} a-1 & j=y_{i} \\ 1-a & j \neq y_{i} \end{array}, a=\frac{e^{f_{y_{i}}}}{e^{f_{y_{i}}}+{\sum}_{j=1, j \neq y_{i}}^{n} e^{f_{j}}}\right. \end{array} $$

For \(\frac {\partial {\cos \limits } \theta _{j}}{\partial x_{i}}\) and \(\frac {\partial {\cos \limits } \theta _{j}}{\partial W_{j}}\), considering \({\cos \limits } \theta _{j}=\frac {{W_{j}^{T}} x_{i}}{\left \|{W_{j}^{T}}\right \|\left \|x_{i}\right \|}={W_{j}^{T}} x_{i}\), thus we have:

$$ \begin{array}{@{}rcl@{}} \frac{\partial \cos \theta_{j}}{\partial x_{i}}=W_{j} \\ \frac{\partial \cos \theta_{j}}{\partial W_{j}}=x_{i} \end{array} $$

For \(\frac {\partial f_{j}}{\partial {\cos \limits } \theta _{j}}\), it is discussed in following three cases:

when j = yi:

$$ \begin{array}{@{}rcl@{}} \frac{\partial f_{y_{i}}}{\partial \cos \theta_{y_{i}}} &=&\frac{\partial s \cdot \cos \left( \theta_{y_{i}}+m\right)}{\partial \cos \theta_{y_{i}}} \\ &=&s \frac{\partial\left( \cos \theta_{y_{i}} \cos m-\sin \theta_{y_{i}} \sin m\right)}{\partial \cos \theta_{y_{i}}} \\ &=&s\left( \cos m+\sin m \cdot \frac{\cos \theta_{y_{i}}}{\sin \theta_{y_{i}}}\right) \\ &=&s \frac{\cos m \sin \theta_{y_{i}}+\sin m \cos \theta_{y_{i}}}{\sin \theta_{y_{i}}} \\ &=&s \frac{\sin \left( \theta_{y_{i}}+m\right)}{\sin \theta_{y_{i}}} \end{array} $$

when jyi, for easy sample:

$$ \begin{array}{@{}rcl@{}} \frac{\partial f_{j}}{\partial \cos \theta_{j}}=\frac{\partial s \cdot \cos \theta_{j}}{\partial \cos \theta_{j}}=s \end{array} $$

when jyi, for hard sample:

$$ \begin{array}{@{}rcl@{}} \frac{\partial f_{j}}{\partial \cos \theta_{j}} &=\frac{\partial s \cdot \cos \theta_{j}\left( \text{dif}_{n e g}+t_{p} \cdot \text{dif}_{b}+t_{p}\right)}{\partial \cos \theta_{j}} \\ &=\frac{\partial s \cdot \cos \theta_{j}\left( \cos \theta_{j}+t_{p} \cdot\left( 1+\cos \theta_{j}-\cos \left( \theta_{y_{i}}+m\right)\right)\right)}{\partial \cos \theta_{j}} \\ &=s {\Delta} \end{array} $$

where \({\Delta } = 2 {\cos \limits } \theta _{j}+t_{p}\left [1+2 {\cos \limits } \theta _{j}-{\cos \limits } \left (\theta _{y_{i}}+m\right )\right ]\), and \(t_{p} =t_{p}^{(k)}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sun, Z., Tian, L., Du, Q. et al. Sample hardness guided softmax loss for face recognition. Appl Intell 53, 2640–2655 (2023). https://doi.org/10.1007/s10489-022-03504-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-022-03504-5

Keywords

Navigation