Abstract
Recently, intensive attempts have been made to design robust models for fine-grained visual recognition, most notably are the impressive gains for training with noisy labels by incorporating a reweighting strategy into a meta-learning framework. However, it is limited to up or downweighting the contribution of an instance for label reweighting approaches in the learning process. To solve this issue, a novel noise-tolerant method with auxiliary web data is proposed. Specifically, first, the associations made from embeddings of well-labeled data with those of web data and back at the same class are measured. Next, its association probability is employed as a weighting fusion strategy into angular margin-based loss, which makes the trained model robust to noisy datasets. To reduce the influence of the gap between the well-labeled and noisy web data, a bridge schema is proposed via the corresponding loss that encourages the learned embeddings to be coherent. Lastly, the formulation is encapsulated into the meta-learning framework, which can reduce the overfitting of models and learn the network parameters to be noise-tolerant. Extensive experiments are performed on benchmark datasets, and the results clearly show the superiority of the proposed method over existing state-of-the-art approaches.
Similar content being viewed by others
References
Gao Y, Han X T, Wang X, et al. Channel interaction networks for fine-grained image categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020. 10818–10825
Wei X S, Luo J H, Wu J, et al. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Process, 2017, 26: 2868–2881
Xu T, Zhang P C, Huang Q Y, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1316–1324
Sun Z R, Yao Y Z, Wei X S, et al. Webly supervised fine-grained recognition: benchmark datasets and an approach. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 1297–1306
Xu F R, Wang M, Zhang W, et al. Discrimination-aware mechanism for fine-grained representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 813–822
Wu L, Wang Y, Li X, et al. Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Trans Cybern, 2019, 49: 1791–1802
Zhu H W, Ke W J, Li D, et al. Dual cross-attention learning for fine-grained visual categorization and object re-identification. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2022. 4682–4692
Wei X S, Song Y Z, Aodha O M, et al. Fine-grained image analysis with deep learning: a survey. IEEE Trans Pattern Anal Mach Intell, 2022, 44: 8927–8948
Sun G L, Cholakkal H, Khan S, et al. Fine-grained recognition: accounting for subtle differences between similar classes. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2020. 12047–12054
Lin T Y, RoyChowdhury A, Maji S. Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Trans Pattern Anal Mach Intell, 2017, 40: 1309–1322
Ji R Y, Wen L Y, Zhang L B, et al. Attention convolutional binary neural tree for fine-grained visual categorization. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2020. 975–984
Huang Z X, Li Y. Interpretable and accurate fine-grained recognition via region grouping. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2020. 2642–2651
Niu L, Veeraraghavan A, Sabharwal A. Webly supervised learning meets zero-shot learning: a hybrid approach for fine-grained classification. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2018. 7171–7180
Zhang C Y, Yao Y Z, Liu H F, et al. Web-supervised network with softly update-drop training for fine-grained visual classification. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2020. 12781–12788
Okamoto N, Hirakawa T, Yamashita T, et al. Supplementary: deep ensemble learning by diverse knowledge distillation for fine-grained object classification. In: Proceedings of the European Conference on Computer Vision, 2022. 3872–3881
Xu Z, Huang S L, Zhang Y, et al. Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 1100–1113
Sun X X, Chen L Y, Yang J F. Learning from web data using adversarial discriminative neural networks for fine-grained classification. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2019. 273–280
Yang J, Sun X, Lai Y K, et al. Recognition from web data: a progressive filtering approach. IEEE Trans Image Process, 2018, 27: 5303–5315
Hendrycks D, Mazeika M, Wilson D, et al. Using trusted data to train deep networks on labels corrupted by severe noise. In: Proceedings of the Advanced Neural Information Processing Systems, 2018. 10456–10465
Nguyen T D, Mummadi K C, Ngo P N T, et al. Self: learning to filter noisy labels with self-ensembling. In: Proceedings of the International Conference on Learning Representations, 2020. 817–828
Joulin A, van der Maaten L, Jabri A, et al. Learning visual features from large weakly supervised data. In: Proceedings of the European Conference on Computer Vision, 2016. 67–84
Krause J, Sapp B, Howard A, et al. The unreasonable effectiveness of noisy data for fine-grained recognition. In: Proceedings of the European Conference on Computer Vision, 2016. 301–320
Mirzasoleiman B, Cao K D, Leskovec J. Coresets for robust training of deep neural networks against noisy labels. In: Proceedings of the Advanced Neural Information Processing Systems, 2020. 2067–2076
Jiang L, Huang D, Liu M, et al. Beyond synthetic noise: deep learning on controlled noisy labels. In: Proceedings of the International Conference on Machine Learning, 2020. 2414–2423
Zhang M Y, Lee J, Agarwal S. Learning from noisy labels with no change to the training process. In: Proceedings of the International Conference on Machine Learning, 2021. 12468–12478
Bukchin G, Schwartz E, Saenko K, et al. Fine-grained angular contrastive learning with coarse labels. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2021. 8730–8740
Ren M Y, Zeng W Y, Yang B, et al. Learning to reweight examples for robust deep learning. In: Proceedings of the International Conference on Machine Learning, 2019. 4331–4340
Haeusser P, Mordvintsev A, Cremers D. Learning by association—a versatile semi-supervised training method for neural networks. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2017. 172–181
Dubey A, Gupta O, Raskar R, et al. Maximum-entropy fine grained classification. In: Proceedings of the Advanced Neural Information Processing Systems, 2018. 637–647
Tanaka D, Ikami D, Yamasaki T, et al. Joint optimization framework for learning with noisy labels. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2018. 5552–5560
Hong G Z, Mao Z Y, Lin X J, et al. Student-teacher learning from clean inputs to noisy inputs. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2021. 12075–12084
Wang Z H, Wang S J, Yang S H, et al. Weakly supervised fine-grained image classification via Guassian mixture model oriented discriminative learning. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2020. 1278–1287
Wang Z H, Wang S J, Li H J, et al. Graph-propagation based correlation learning for weakly supervised fine-grained image classification. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2020. 12289–12296
Hu T, Qi H G. See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification. 2019. ArXiv:1901.09891
Ma H, Yang Z Y, Liu H Y. Fine-grained unsupervised temporal action segmentation and distributed representation for skeleton-based human motion analysis. IEEE Trans Cybern, 2022, 52: 13411–13424
Fu J L, Zheng H L, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2017. 4438–4446
Wang Y, Morariu V I, Davis L S. Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2018. 4148–4157
Sun M, Yuan Y C, Zhou F, et al. Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision, 2018. 805–821
Dubey A, Gupta O, Guo P, et al. Pairwise confusion for fine-grained visual classification. In: Proceedings of the European Conference on Computer Vision, 2018. 70–86
van Rooyen B, Menon A, Williamson R C. Learning with symmetric label noise: the importance of being unhinged. In: Proceedings of the Advanced Neural Information Processing Systems, 2015. 10–18
Hu W, Huang Y Y, Zhang F, et al. Noise-tolerant paradigm for training face recognition cnns. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2019. 11887–11896
Ma X J, Huang H X, Wang Y S, et al. Normalized loss functions for deep learning with noisy labels. In: Proceedings of the International Conference on Machine Learning, 2020. 1787–1796
Peng Y X, Ye Z D, Qi J W, et al. Unsupervised visual-textual correlation learning with fine-grained semantic alignment. IEEE Trans Cybern, 2022, 52: 3669–3683
Shu J, Xie Q, Yi L X, et al. Meta-weight-net: learning an explicit mapping for sample weighting. In: Proceedings of the Advanced Neural Information Processing Systems, 2019. 1917–1928
Berthon A, Han B, Niu G, et al. Confidence scores make instance-dependent label-noise learning possible. In: Proceedings of the International Conference on Machine Learning, 2021. 825–836
Touvron H, Sablayrolles A, Douze M, et al. Grafit: learning fine-grained image representations with coarse labels. In: Proceedings of the International Conference on Computer Vision, 2021. 874–884
Li Y C, Yang J C, Song Y L, et al. Learning from noisy labels with distillation. In: Proceedings of the International Conference on Computer Vision, 2017. 1928–1936
Han B, Yao J C, Niu G, et al. Masking: a new perspective of noisy supervision. In: Proceedings of the Advanced Neural Information Processing Systems, 2018. 5841–5851
Li J N, Wong Y K, Zhao Q, et al. Learning to learn from noisy labeled data. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2019. 5051–5059
Zhu Y H, Liu C L, Jiang S Q. Multi-attention meta learning for few-shot fine-grained image recognition. In: Proceedings of the International Joint Conference on Artificial Intelligence, 2020. 1090–1096
Zheng G Q, Awadallah A H, Dumais S. Meta label correction for noisy label learning. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2021. 1290–1297
Xu Y J, Zhu L C, Jiang L, et al. Faster meta update strategy for noise-robust deep learning. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2021. 144–153
Guan J C, Liu Y, Lu Z W. Fine-grained analysis of stability and generalization for modern meta learning algorithms. In: Proceedings of the Advanced Neural Information Processing Systems, 2022. 4643–4652
Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the International Conference on Machine Learning, 2017. 1126–1135
Tarvainen A, Valpola H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Proceedings of the Advanced Neural Information Processing Systems, 2017. 3658–3567
Reddi S J, Kale S, Kumar S. On the convergence of Adam and beyond. In: Proceedings of the International Conference on Learning Representations, 2019. 2763–2772
Herranz L, Jiang S Q, Li X Y. Scene recognition with CNNs: objects, scales and dataset bias. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2016. 571–579
Liu C B, Xie H T, Zha Z J, et al. Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2020. 11555–11562
Wang Z, Hu G S, Hu Q H. Training noise-robust deep neural networks via meta-learning. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2020. 4523–4532
Zhang Y, Wei X S, Wu J X, et al. Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process, 2016, 25: 1713–1725
Dixit M, Chen S, Gao D S, et al. Scene classification with semantic fisher vectors. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2015. 2974–2983
Zheng H L, Fu J L, Mei T, et al. Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the International Conference on Computer Vision, 2017. 5209–5217
Guo S, Huang W L, Wang L M, et al. Locally supervised deep hybrid model for scene recognition. IEEE Trans Image Process, 2016, 26: 808–820
Zheng H L, Fu J L, Zha Z J, et al. Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2019. 5012-–5021
Yang L F, Li X, Song R J, et al. Dynamic MLP for fine-grained image classification by leveraging geographical and temporal information. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2022. 10935–10944
Li L, Spratling M W. Data augmentation alone can improve adversarial training. In: Proceedings of the International Conference on Learning Representations, 2023. 3465–3475
Acknowledgements
This work was supported by National Natural Science Fund for Distinguished Young Scholars of China (Grant No. 62125203), National Natural Science Foundation of China (Grant Nos. 62302233, 62276143), Key Program of the National Natural Science Foundation of China (Grant No. 61932013), and Natural Science Foundation of Jiangsu Province of China (Grant Nos. BK20180470, BK20200739), Postdoctoral Science Foundation of Jiangsu Province of China (Grant No. 2021K172B), and China Postdoctoral Science Foundation (Grant No. 2021M691655).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Li, Y., Xiao, F., Li, H. et al. Meta label associated loss for fine-grained visual recognition. Sci. China Inf. Sci. 67, 162102 (2024). https://doi.org/10.1007/s11432-023-3922-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11432-023-3922-2