Sample hardness guided softmax loss for face recognition

Sun, Zhengzheng; Tian, Lianfang; Du, Qiliang; Bhutto, Jameel A.

doi:10.1007/s10489-022-03504-5

Sample hardness guided softmax loss for face recognition

Published: 11 May 2022

Volume 53, pages 2640–2655, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhengzheng Sun ORCID: orcid.org/0000-0002-9293-5625¹,
Lianfang Tian^1,2,3,
Qiliang Du^1,4 &
…
Jameel A. Bhutto¹

477 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Face recognition (FR) has received remarkable attention for improving feature discrimination with the development of deep convolutional neural networks (CNNs). Although the existing methods have achieved great success in designing margin-based loss functions by using hard sample mining strategy, they still suffer from two issues: 1) the neglect of some training status and feature position information and 2) inaccurate weight assignment for hard samples due to the coarse hardness description. To solve these issues, we develop a novel loss function, namely Hardness Loss, to adaptively assign weights for the misclassified (hard) samples guided by their corresponding hardness, which accounts for multiple training status and feature position information. Specifically, we propose an estimator to provide the real-time training status to precisely compute the hardness for weight assignment. To the best of our knowledge, this is the first attempt to design a loss function by using multiple pieces of information about the training status and feature positions. Extensive experiments on popular face benchmarks demonstrate that the proposed method is superior to the state-of-the-art (SOTA) losses under various FR scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Emphasizing Closeness and Diversity Simultaneously for Deep Face Representation

Hard-Mining Loss Based Convolutional Neural Network for Face Recognition

Weighted Softmax Loss for Face Recognition via Cosine Distance

References

Cao Q, Shen L, Xie W et al (2018) Vggface2: A dataset for recognising faces across pose and age. In: 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), IEEE, pp 67–74
Chen S, Liu Y, Gao X et al (2018) Mobilefacenets: Efficient cnns for accurate real-time face verification on mobile devices. In: Chinese Conference on Biometric Recognition, Springer, pp 428–438
Chen X, Lau HY (2021) The identity-level angular triplet loss for cross-age face recognition. Appl Intell, pp 1–10
Deng J, Guo J, Xue N et al (2019) Arcface: Additive angular margin loss for deep face recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 4690–4699
Guo Y, Zhang L, Hu Y et al (2016) Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. In: European conference on computer vision, Springer, pp 87–102
He K, Zhang X, Ren S et al (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Huang G, Liu Z, Van Der Maaten L et al (2017) Densely connected convolutional networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4700–4708
Huang GB, Mattar M, Berg T et al (2008) Labeled faces in the wild: A database forstudying face recognition in unconstrained environments. In: Workshop on faces in’Real-Life’Images: detection, alignment, and recognition
Huang Y, Wang Y, Tai Y et al (2020) Curricularface: adaptive curriculum learning loss for deep face recognition. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5901–5910
Kemelmacher-Shlizerman I, Seitz SM, Miller D et al (2016) The megaface benchmark: 1 million faces for recognition at scale. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 4873–4882
Kobayashi T (2021) Group softmax loss with discriminative feature grouping. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp 2615–2624
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems 25:1097–1105
Google Scholar
Kumar N, Sukavanam N (2020) An improved cnn framework for detecting and tracking human body in unconstraint environment. Knowledge-Based Systems 193(105):198
Google Scholar
LeCun Y, Bottou L, Bengio Y, et al. (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Li Y, Guo K, Lu Y et al (2021) Cropping and attention based approach for masked face recognition. Appl Intell 51(5):3012–3025
Article Google Scholar
Lin TY, Goyal P, Girshick R et al (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision, pp 2980–2988
Liu W, Wen Y, Yu Z et al (2017) Sphereface: Deep hypersphere embedding for face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 212–220
Meng Q, Zhao S, Huang Z et al (2021) Magface: A universal representation for face recognition and quality assessment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 14,225–14,234
Moschoglou S, Papaioannou A, Sagonas C et al (2017) Agedb: the first manually collected, in-the-wild age database. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp 51–59
Ng HW, Winkler S (2014) A data-driven approach to cleaning large face datasets. In: IEEE international conference on image processing (ICIP), vol 2014. IEEE, pp 343–347
Paszke A, Gross S, Massa F, et al. (2019) Pytorch: an imperative style, high-performance deep learning library. Advances in neural information processing systems 32:8026–8037
Google Scholar
Schroff F, Kalenichenko D, Philbin J (2015) Facenet: A unified embedding for face recognition and clustering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 815–823
Sengupta S, Chen J C, Castillo C et al (2016) Frontal to profile face verification in the wild. In: IEEE Winter Conference on Applications of Computer Vision (WACV), vol 2016. IEEE, pp 1–9
Shi Y, Jain AK (2019) Probabilistic face embeddings. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)
Shrivastava A, Gupta A, Girshick R (2016) Training region-based object detectors with online hard example mining. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 761–769
Szegedy C, Liu W, Jia Y et al (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1–9
Vu HN, Nguyen MH, Pham C (2021) Masked face recognition with convolutional neural networks and local binary patterns. Appl Intell, pp 1–16
Wang CP, Wei W, Zhang JS et al (2018) Robust face recognition via discriminative and common hybrid dictionary learning. Appl Intell 48(1):156–165
Article Google Scholar
Wang H, Wang Y, Zhou Z et al (2018) Cosface: Large margin cosine loss for deep face recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5265–5274
Wang X, Zhang S, Wang S et al (2020) Mis-classified vector guided softmax loss for face recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp 12,241– 12,248
Wen Y, Zhang K, Li Z et al (2016) A discriminative feature learning approach for deep face recognition. In: European conference on computer vision, Springer, pp 499–515
Whitelam C, Taborsky E, Blanton A et al (2017) Iarpa janus benchmark-b face dataset. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 90–98
Xie W, Zisserman A (2018) Multicolumn networks for face recognition. In: Proceedings of the British Machine Vision Conference (BMVC)
Xie W, Shen L, Zisserman A (2018) Comparator networks. In: Proceedings of the European conference on computer vision (ECCV), pp 782–797
Zhang L, Sun L, Yu L et al (2021) Arface: attention-aware and regularization for face recognition with reinforcement learning. IEEE Transactions on Biometrics, Behavior, and Identity Science
Zhang X, Zhao R, Qiao Y et al (2019) Adacos: Adaptively scaling cosine logits for effectively learning deep face representations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 10,823–10,832
Zhang X, Zhao R, Yan J et al (2019) P2sgrad: Refined gradients for optimizing deep face models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 9906–9914
Zhao K, Xu J, Cheng MM (2019) Regularface: Deep face recognition via exclusive regularization. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 1136–1144
Zheng T, Deng W (2018) Cross-pose lfw: a database for studying cross-pose face recognition in unconstrained environments. Beijing University of Posts and Telecommunications 5:7
Google Scholar

Download references

Acknowledgements

This work was supported by Key-Area Research and Development Program of Guangdong Province under Grant 2018B010109001, Grant 2019B020214001 and Grant 2020B1111010002; and Guangdong Marine Economic Development Project under Grant GDNRC[2020]018.

Author information

Authors and Affiliations

School of Automation Science and Engineering, South China University of Technology, Guangzhou, 510641, China
Zhengzheng Sun, Lianfang Tian, Qiliang Du & Jameel A. Bhutto
Southern Marine Science and Engineering Guangdong Laboratory, Zhuhai, 519000, China
Lianfang Tian
Key Laboratory of Marine Environmental Survey Technology and Application, Ministry of Natural Resources, Guangzhou, 510641, China
Lianfang Tian
Key Laboratory of Autonomous Systems and Network Control of Ministry of Education, Guangzhou, 510641, China
Qiliang Du

Authors

Zhengzheng Sun
View author publications
You can also search for this author in PubMed Google Scholar
Lianfang Tian
View author publications
You can also search for this author in PubMed Google Scholar
Qiliang Du
View author publications
You can also search for this author in PubMed Google Scholar
Jameel A. Bhutto
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Lianfang Tian or Qiliang Du.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Gradient formula derivation

Let’s rewrite the formulation of the Hardness loss:

$$ \begin{array}{@{}rcl@{}} &\mathcal{L}=-\log \frac{e^{s T\left( \cos \theta_{y_{i}}\right)}}{e^{s T\left( \cos \theta_{y_{i}}\right)}+{\sum}_{j=1, j \neq y_{i}}^{n} e^{s N\left( t, \cos \theta_{j}\right)}} \\ &T\left( \cos \theta_{y_{i}}\right)=\cos \left( \theta_{y_{i}}+m\right) \\ &N\left( t, \cos \theta_{j}\right)= \left\{\begin{array}{cc} \cos \theta_{j} & \text {easy} \\ \cos \theta_{j}\left( \text{dif}_{n e g}+t_{p}^{(k)} \cdot \text{dif}_{b}+t_{p}^{(k)}\right) & \text {hard} \end{array}\right. \end{array} $$

Before we calculate the gradients w.r.t. x_i and W_j, the logits can be summarized in following three cases:

$$ \begin{array}{@{}rcl@{}} &\mathcal{L}=-\log \frac{e^{f_{y_{i}}}}{e^{f_{y_{i}}}+{\sum}_{j=1, j \neq y_{i}}^{n} e^{f_{j}}} \\ &f_{j}=\left\{\begin{array}{cc} s \cdot \cos \left( \theta_{y_{i}}+m\right) & j=y_{i} \\ s \cdot \cos \theta_{j} & j \neq y_{i}, \text { easy } \\ s \cos \theta_{j}\left( \text{dif}_{n e g}+t_{p}^{(k)} \cdot \text{dif}_{b}+t_{p}^{(k)}\right) & j \neq y_{i}, \text { hard } \end{array}\right. \end{array} $$

By chain rule, we can get:

$$ \begin{array}{@{}rcl@{}} \frac{\partial \mathcal{L}}{\partial x_{i}}=\frac{\partial \mathcal{L}}{\partial f_{j}} \cdot \frac{\partial f_{j}}{\partial \cos \theta_{j}} \cdot \frac{\partial \cos \theta_{j}}{\partial x_{i}} \\ \frac{\partial \mathcal{L}}{\partial W_{j}}=\frac{\partial \mathcal{L}}{\partial f_{j}} \cdot \frac{\partial f_{j}}{\partial \cos \theta_{j}} \cdot \frac{\partial \cos \theta_{j}}{\partial W_{j}} \end{array} $$

For $\frac {\partial {\mathscr{L}}}{\partial f_{j}}$, it is easily calculated with Softmax function:

$$ \begin{array}{@{}rcl@{}} \frac{\partial L}{\partial f_{j}}=\left\{\begin{array}{ll} a-1 & j=y_{i} \\ 1-a & j \neq y_{i} \end{array}, a=\frac{e^{f_{y_{i}}}}{e^{f_{y_{i}}}+{\sum}_{j=1, j \neq y_{i}}^{n} e^{f_{j}}}\right. \end{array} $$

For $\frac {\partial {\cos \limits } \theta _{j}}{\partial x_{i}}$ and $\frac {\partial {\cos \limits } \theta _{j}}{\partial W_{j}}$, considering ${\cos \limits } \theta _{j}=\frac {{W_{j}^{T}} x_{i}}{\left \|{W_{j}^{T}}\right \|\left \|x_{i}\right \|}={W_{j}^{T}} x_{i}$, thus we have:

$$ \begin{array}{@{}rcl@{}} \frac{\partial \cos \theta_{j}}{\partial x_{i}}=W_{j} \\ \frac{\partial \cos \theta_{j}}{\partial W_{j}}=x_{i} \end{array} $$

For $\frac {\partial f_{j}}{\partial {\cos \limits } \theta _{j}}$, it is discussed in following three cases:

when j = y_i:

$$ \begin{array}{@{}rcl@{}} \frac{\partial f_{y_{i}}}{\partial \cos \theta_{y_{i}}} &=&\frac{\partial s \cdot \cos \left( \theta_{y_{i}}+m\right)}{\partial \cos \theta_{y_{i}}} \\ &=&s \frac{\partial\left( \cos \theta_{y_{i}} \cos m-\sin \theta_{y_{i}} \sin m\right)}{\partial \cos \theta_{y_{i}}} \\ &=&s\left( \cos m+\sin m \cdot \frac{\cos \theta_{y_{i}}}{\sin \theta_{y_{i}}}\right) \\ &=&s \frac{\cos m \sin \theta_{y_{i}}+\sin m \cos \theta_{y_{i}}}{\sin \theta_{y_{i}}} \\ &=&s \frac{\sin \left( \theta_{y_{i}}+m\right)}{\sin \theta_{y_{i}}} \end{array} $$

when j≠y_i, for easy sample:

$$ \begin{array}{@{}rcl@{}} \frac{\partial f_{j}}{\partial \cos \theta_{j}}=\frac{\partial s \cdot \cos \theta_{j}}{\partial \cos \theta_{j}}=s \end{array} $$

when j≠y_i, for hard sample:

$$ \begin{array}{@{}rcl@{}} \frac{\partial f_{j}}{\partial \cos \theta_{j}} &=\frac{\partial s \cdot \cos \theta_{j}\left( \text{dif}_{n e g}+t_{p} \cdot \text{dif}_{b}+t_{p}\right)}{\partial \cos \theta_{j}} \\ &=\frac{\partial s \cdot \cos \theta_{j}\left( \cos \theta_{j}+t_{p} \cdot\left( 1+\cos \theta_{j}-\cos \left( \theta_{y_{i}}+m\right)\right)\right)}{\partial \cos \theta_{j}} \\ &=s {\Delta} \end{array} $$

where ${\Delta } = 2 {\cos \limits } \theta _{j}+t_{p}\left [1+2 {\cos \limits } \theta _{j}-{\cos \limits } \left (\theta _{y_{i}}+m\right )\right ]$, and $t_{p} =t_{p}^{(k)}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sun, Z., Tian, L., Du, Q. et al. Sample hardness guided softmax loss for face recognition. Appl Intell 53, 2640–2655 (2023). https://doi.org/10.1007/s10489-022-03504-5

Download citation

Accepted: 12 March 2022
Published: 11 May 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03504-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Sample hardness guided softmax loss for face recognition

Abstract

Access this article

Similar content being viewed by others

Emphasizing Closeness and Diversity Simultaneously for Deep Face Representation

Hard-Mining Loss Based Convolutional Neural Network for Face Recognition

Weighted Softmax Loss for Face Recognition via Cosine Distance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Appendix A: Gradient formula derivation

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Sample hardness guided softmax loss for face recognition

Abstract

Access this article

Similar content being viewed by others

Emphasizing Closeness and Diversity Simultaneously for Deep Face Representation

Hard-Mining Loss Based Convolutional Neural Network for Face Recognition

Weighted Softmax Loss for Face Recognition via Cosine Distance

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher’s note

Appendix A: Gradient formula derivation

Appendix A: Gradient formula derivation

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation