Abstract
Robust loss functions are crucial for training models with strong generalization capacity in the presence of noisy labels. The commonly used Cross Entropy (CE) loss function tends to overfit noisy labels, while symmetric losses that are robust to label noise are limited by their symmetry conditions. We conduct an analysis of the gradient of CE and identify the main difficulty posed by label noise: the imbalance of gradient norm among samples. Inspired by long-tail learning, we propose a gradient prior (GP)-based logit adjustment method to mitigate the impact of label noise. This method makes full use of the gradient of samples to adjust the logit, enabling DNNs to effectively ignore noisy samples and instead focus more on learning hard samples. Experiments on benchmark datasets demonstrate that our method significantly improves the performance of CE and outperforms existing methods, especially in the case of symmetric noise. Experiments on the object detection dataset Pascal VOC further verify the plug-and-play and effective robustness of our method.
Similar content being viewed by others
Data availability
Due to privacy restrictions, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.
References
Zou Z, Chen K, Shi Z, Guo Y, Ye J (2023) Object detection in 20 years: a survey. Proceedings of the IEEE
Peng Y, Fu B, Qin X (2022) Meta-style: few-shot learning dataset for social media field. In: International conference on artificial neural networks. Springer, pp 433–444
Zhuang F, Qi Z, Duan K, Xi D, Zhu Y, Zhu H, Xiong H, He Q (2020) A comprehensive survey on transfer learning. Proc IEEE 109(1):43–76
Northcutt C, Jiang L, Chuang I (2021) Confident learning: estimating uncertainty in dataset labels. J Artif Intell Res 70:1373–1411
Garg A, Nguyen C, Felix R, Do T-T, Carneiro G (2023) Instance-dependent noisy label learning via graphical modelling. In: Proceedings of the IEEE/CVF winter conference on applications of computer vision. pp 2288–2298
Vahdat A (2017) Toward robustness against label noise in training deep discriminative neural networks. Adv Neural Inf Proces Syst 30
Lee K-H, He X, Zhang L, Yang L (2018) Cleannet: transfer learning for scalable image classifier training with label noise. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 5447–5456
Li Y, Yang J, Song Y, Cao L, Luo J, Li L-J (2017) Learning from noisy labels with distillation. In: Proceedings of the IEEE international conference on computer vision. pp 1910–1918
Patrini G, Rozza A, Menon AK, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1944–1952
Han B, Yao J, Niu G, Zhou M, Tsang I, Zhang Y, Sugiyama M (2018) Masking: a new perspective of noisy supervision. Adv Neural Inf Proces Syst 31
Sukhbaatar S, Bruna J, Paluri M, Bourdev L, Fergus R (2015) Training convolutional networks with noisy labels. In: 3rd International conference on learning representations
Goldberger J, Ben-Reuven E (2017) Training deep neural-networks using a noise adaptation layer. In: International conference on learning representations
Yao J, Wang J, Tsang IW, Zhang Y, Sun J, Zhang C, Zhang R (2018) Deep learning from noisy image labels with quality embedding. IEEE Trans Image Process 28(4):1909–1922
Jiang L, Zhou Z, Leung T, Li L-J, Fei-Fei L (2018) Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: International conference on machine learning. PMLR, pp 2304–2313
Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M (2019) How does disagreement help generalization against label corruption? In: International conference on machine learning. PMLR, pp 7164–7173
Malach E, Shalev-Shwartz S (2017) Decoupling “when to update” from “how to update”. Adv Neural Inf Proces Syst 30
Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang I, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. Adv Neural Inf Proces Syst 31
Wang Y, Ma X, Chen Z, Luo Y, Yi J, Bailey J (2019) Symmetric cross entropy for robust learning with noisy labels. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 322–330
Wang X, Kodirov E, Hua Y, Robertson NM. IMAE for noise-robust learning: mean absolute error does not treat examples equally and gradient magnitude’s variance matters
Lin T-Y, Goyal P, Girshick R, He K, Dollár P (2017) Focal loss for dense object detection. In: Proceedings of the IEEE international conference on computer vision. pp 2980–2988
Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Adv Neural Inf Proces Syst 31
Feng L, Shu S, Lin Z, Lv F, Li L, An B (2021) Can cross entropy loss be robust to label noise? In: Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence. pp 2206–2212
Amid E, Warmuth MKK, Anil R, Koren T (2019) Robust bi-tempered logistic loss based on Bregman divergences. Adv Neural Inf Proces Syst 32
Ma X, Huang H, Wang Y, Romano S, Erfani S, Bailey J (2020) Normalized loss functions for deep learning with noisy labels. In: International conference on machine learning. PMLR, pp 6543–6553
Ghosh A, Kumar H, Sastry PS (2017) Robust loss functions under label noise for deep neural networks. In: Proceedings of the AAAI conference on artificial intelligence, vol 31
Menon AK, Jayasumana S, Rawat AS, Jain H, Veit A, Kumar S. Long-tail learning via logit adjustment. In: International conference on learning representations
Tharwat A (2021) Classification assessment methods. Appl Comput Inform 17(1):168–192
Müller R, Kornblith S, Hinton GE (2019) When does label smoothing help? Adv Neural Inf Proces Syst 32
Samuel Dvir, Chechik Gal (2021) Distributional robustness loss for long-tail learning. In: Proceedings of the IEEE/CVF international conference on computer vision. pp 9495–9504
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images
Li W, Wang L, Li W, Agustsson E, Gool LV (2017) WebVision database: visual learning and understanding from web data. CoRR
Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal visual object classes (VOC) challenge. Int J Comput Vis 88(2):303–338
Mark Everingham SM, Van Eslami Luc, Gool Christopher KI, Winn Williams John, Andrew Zisserman (2015) The pascal visual object classes challenge: a retrospective. Int J Comput Vis 111(1):98–136
Gavrikov P, Keuper J (2022) CNN filter DB: an empirical investigation of trained convolutional filters. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp 19066–19076
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 770–778
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1–9
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp 2818–2826
Howard AG, Zhu M, Chen B, Kalenichenko D, Wang W, Weyand T, Andreetto M, Adam H. Mobilenets: efficient convolutional neural networks for mobile vision applications
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L (2009) Imagenet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp 248–255
Ren S, He K, Girshick R, Sun J (2015) Faster R-CNN: towards real-time object detection with region proposal networks. Adv Neural Inf Proces Syst 28
Acknowledgements
This work was supported by The National Natural Science Foundation of China (No. 61402537), Sichuan Science and Technology Program (Nos. 2019ZDZX0006, 2020YFQ0056), the West Light Foundation of Chinese Academy of Sciences (201899) and the Talents by Sichuan provincial Party Committee Organization Department, and Science and Technology Service Network Initiative (KFJ-STS-QYZD-2021-21-001).
Author information
Authors and Affiliations
Contributions
Boyi Fu performed the conceptualization, methodology, software, validation, formal analysis, data curation, visualization and wrote the manuscript. Yungcong Peng performed the formal analysis, investigation and reviewed the manuscript. Xiaolin Qin performed the supervision, project administration, funding acquisition and reviewed the manuscript.
Corresponding author
Ethics declarations
Ethical and informed consent for data used
The data used is completely public and free, and does not involve ethics and informed consent.
Conflict of interest
The authors declare there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Fu, B., Peng, Y. & Qin, X. Learning with noisy labels via logit adjustment based on gradient prior method. Appl Intell 53, 24393–24406 (2023). https://doi.org/10.1007/s10489-023-04609-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-023-04609-1