Skip to main content
Log in

Meta label associated loss for fine-grained visual recognition

  • Research Paper
  • Published:
Science China Information Sciences Aims and scope Submit manuscript

Abstract

Recently, intensive attempts have been made to design robust models for fine-grained visual recognition, most notably are the impressive gains for training with noisy labels by incorporating a reweighting strategy into a meta-learning framework. However, it is limited to up or downweighting the contribution of an instance for label reweighting approaches in the learning process. To solve this issue, a novel noise-tolerant method with auxiliary web data is proposed. Specifically, first, the associations made from embeddings of well-labeled data with those of web data and back at the same class are measured. Next, its association probability is employed as a weighting fusion strategy into angular margin-based loss, which makes the trained model robust to noisy datasets. To reduce the influence of the gap between the well-labeled and noisy web data, a bridge schema is proposed via the corresponding loss that encourages the learned embeddings to be coherent. Lastly, the formulation is encapsulated into the meta-learning framework, which can reduce the overfitting of models and learn the network parameters to be noise-tolerant. Extensive experiments are performed on benchmark datasets, and the results clearly show the superiority of the proposed method over existing state-of-the-art approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Gao Y, Han X T, Wang X, et al. Channel interaction networks for fine-grained image categorization. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2020. 10818–10825

  2. Wei X S, Luo J H, Wu J, et al. Selective convolutional descriptor aggregation for fine-grained image retrieval. IEEE Trans Image Process, 2017, 26: 2868–2881

    Article  MathSciNet  Google Scholar 

  3. Xu T, Zhang P C, Huang Q Y, et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018. 1316–1324

  4. Sun Z R, Yao Y Z, Wei X S, et al. Webly supervised fine-grained recognition: benchmark datasets and an approach. In: Proceedings of the IEEE International Conference on Computer Vision, 2021. 1297–1306

  5. Xu F R, Wang M, Zhang W, et al. Discrimination-aware mechanism for fine-grained representation learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2021. 813–822

  6. Wu L, Wang Y, Li X, et al. Deep attention-based spatially recursive networks for fine-grained visual recognition. IEEE Trans Cybern, 2019, 49: 1791–1802

    Article  Google Scholar 

  7. Zhu H W, Ke W J, Li D, et al. Dual cross-attention learning for fine-grained visual categorization and object re-identification. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2022. 4682–4692

  8. Wei X S, Song Y Z, Aodha O M, et al. Fine-grained image analysis with deep learning: a survey. IEEE Trans Pattern Anal Mach Intell, 2022, 44: 8927–8948

    Article  Google Scholar 

  9. Sun G L, Cholakkal H, Khan S, et al. Fine-grained recognition: accounting for subtle differences between similar classes. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2020. 12047–12054

  10. Lin T Y, RoyChowdhury A, Maji S. Bilinear convolutional neural networks for fine-grained visual recognition. IEEE Trans Pattern Anal Mach Intell, 2017, 40: 1309–1322

    Article  Google Scholar 

  11. Ji R Y, Wen L Y, Zhang L B, et al. Attention convolutional binary neural tree for fine-grained visual categorization. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2020. 975–984

  12. Huang Z X, Li Y. Interpretable and accurate fine-grained recognition via region grouping. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2020. 2642–2651

  13. Niu L, Veeraraghavan A, Sabharwal A. Webly supervised learning meets zero-shot learning: a hybrid approach for fine-grained classification. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2018. 7171–7180

  14. Zhang C Y, Yao Y Z, Liu H F, et al. Web-supervised network with softly update-drop training for fine-grained visual classification. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2020. 12781–12788

  15. Okamoto N, Hirakawa T, Yamashita T, et al. Supplementary: deep ensemble learning by diverse knowledge distillation for fine-grained object classification. In: Proceedings of the European Conference on Computer Vision, 2022. 3872–3881

  16. Xu Z, Huang S L, Zhang Y, et al. Webly-supervised fine-grained visual categorization via deep domain adaptation. IEEE Trans Pattern Anal Mach Intell, 2018, 40: 1100–1113

    Article  Google Scholar 

  17. Sun X X, Chen L Y, Yang J F. Learning from web data using adversarial discriminative neural networks for fine-grained classification. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2019. 273–280

  18. Yang J, Sun X, Lai Y K, et al. Recognition from web data: a progressive filtering approach. IEEE Trans Image Process, 2018, 27: 5303–5315

    Article  MathSciNet  Google Scholar 

  19. Hendrycks D, Mazeika M, Wilson D, et al. Using trusted data to train deep networks on labels corrupted by severe noise. In: Proceedings of the Advanced Neural Information Processing Systems, 2018. 10456–10465

  20. Nguyen T D, Mummadi K C, Ngo P N T, et al. Self: learning to filter noisy labels with self-ensembling. In: Proceedings of the International Conference on Learning Representations, 2020. 817–828

  21. Joulin A, van der Maaten L, Jabri A, et al. Learning visual features from large weakly supervised data. In: Proceedings of the European Conference on Computer Vision, 2016. 67–84

  22. Krause J, Sapp B, Howard A, et al. The unreasonable effectiveness of noisy data for fine-grained recognition. In: Proceedings of the European Conference on Computer Vision, 2016. 301–320

  23. Mirzasoleiman B, Cao K D, Leskovec J. Coresets for robust training of deep neural networks against noisy labels. In: Proceedings of the Advanced Neural Information Processing Systems, 2020. 2067–2076

  24. Jiang L, Huang D, Liu M, et al. Beyond synthetic noise: deep learning on controlled noisy labels. In: Proceedings of the International Conference on Machine Learning, 2020. 2414–2423

  25. Zhang M Y, Lee J, Agarwal S. Learning from noisy labels with no change to the training process. In: Proceedings of the International Conference on Machine Learning, 2021. 12468–12478

  26. Bukchin G, Schwartz E, Saenko K, et al. Fine-grained angular contrastive learning with coarse labels. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2021. 8730–8740

  27. Ren M Y, Zeng W Y, Yang B, et al. Learning to reweight examples for robust deep learning. In: Proceedings of the International Conference on Machine Learning, 2019. 4331–4340

  28. Haeusser P, Mordvintsev A, Cremers D. Learning by association—a versatile semi-supervised training method for neural networks. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2017. 172–181

  29. Dubey A, Gupta O, Raskar R, et al. Maximum-entropy fine grained classification. In: Proceedings of the Advanced Neural Information Processing Systems, 2018. 637–647

  30. Tanaka D, Ikami D, Yamasaki T, et al. Joint optimization framework for learning with noisy labels. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2018. 5552–5560

  31. Hong G Z, Mao Z Y, Lin X J, et al. Student-teacher learning from clean inputs to noisy inputs. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2021. 12075–12084

  32. Wang Z H, Wang S J, Yang S H, et al. Weakly supervised fine-grained image classification via Guassian mixture model oriented discriminative learning. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2020. 1278–1287

  33. Wang Z H, Wang S J, Li H J, et al. Graph-propagation based correlation learning for weakly supervised fine-grained image classification. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2020. 12289–12296

  34. Hu T, Qi H G. See better before looking closer: weakly supervised data augmentation network for fine-grained visual classification. 2019. ArXiv:1901.09891

  35. Ma H, Yang Z Y, Liu H Y. Fine-grained unsupervised temporal action segmentation and distributed representation for skeleton-based human motion analysis. IEEE Trans Cybern, 2022, 52: 13411–13424

    Article  Google Scholar 

  36. Fu J L, Zheng H L, Mei T. Look closer to see better: recurrent attention convolutional neural network for fine-grained image recognition. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2017. 4438–4446

  37. Wang Y, Morariu V I, Davis L S. Learning a discriminative filter bank within a cnn for fine-grained recognition. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2018. 4148–4157

  38. Sun M, Yuan Y C, Zhou F, et al. Multi-attention multi-class constraint for fine-grained image recognition. In: Proceedings of the European Conference on Computer Vision, 2018. 805–821

  39. Dubey A, Gupta O, Guo P, et al. Pairwise confusion for fine-grained visual classification. In: Proceedings of the European Conference on Computer Vision, 2018. 70–86

  40. van Rooyen B, Menon A, Williamson R C. Learning with symmetric label noise: the importance of being unhinged. In: Proceedings of the Advanced Neural Information Processing Systems, 2015. 10–18

  41. Hu W, Huang Y Y, Zhang F, et al. Noise-tolerant paradigm for training face recognition cnns. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2019. 11887–11896

  42. Ma X J, Huang H X, Wang Y S, et al. Normalized loss functions for deep learning with noisy labels. In: Proceedings of the International Conference on Machine Learning, 2020. 1787–1796

  43. Peng Y X, Ye Z D, Qi J W, et al. Unsupervised visual-textual correlation learning with fine-grained semantic alignment. IEEE Trans Cybern, 2022, 52: 3669–3683

    Article  Google Scholar 

  44. Shu J, Xie Q, Yi L X, et al. Meta-weight-net: learning an explicit mapping for sample weighting. In: Proceedings of the Advanced Neural Information Processing Systems, 2019. 1917–1928

  45. Berthon A, Han B, Niu G, et al. Confidence scores make instance-dependent label-noise learning possible. In: Proceedings of the International Conference on Machine Learning, 2021. 825–836

  46. Touvron H, Sablayrolles A, Douze M, et al. Grafit: learning fine-grained image representations with coarse labels. In: Proceedings of the International Conference on Computer Vision, 2021. 874–884

  47. Li Y C, Yang J C, Song Y L, et al. Learning from noisy labels with distillation. In: Proceedings of the International Conference on Computer Vision, 2017. 1928–1936

  48. Han B, Yao J C, Niu G, et al. Masking: a new perspective of noisy supervision. In: Proceedings of the Advanced Neural Information Processing Systems, 2018. 5841–5851

  49. Li J N, Wong Y K, Zhao Q, et al. Learning to learn from noisy labeled data. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2019. 5051–5059

  50. Zhu Y H, Liu C L, Jiang S Q. Multi-attention meta learning for few-shot fine-grained image recognition. In: Proceedings of the International Joint Conference on Artificial Intelligence, 2020. 1090–1096

  51. Zheng G Q, Awadallah A H, Dumais S. Meta label correction for noisy label learning. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2021. 1290–1297

  52. Xu Y J, Zhu L C, Jiang L, et al. Faster meta update strategy for noise-robust deep learning. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2021. 144–153

  53. Guan J C, Liu Y, Lu Z W. Fine-grained analysis of stability and generalization for modern meta learning algorithms. In: Proceedings of the Advanced Neural Information Processing Systems, 2022. 4643–4652

  54. Finn C, Abbeel P, Levine S. Model-agnostic meta-learning for fast adaptation of deep networks. In: Proceedings of the International Conference on Machine Learning, 2017. 1126–1135

  55. Tarvainen A, Valpola H. Mean teachers are better role models: weight-averaged consistency targets improve semi-supervised deep learning results. In: Proceedings of the Advanced Neural Information Processing Systems, 2017. 3658–3567

  56. Reddi S J, Kale S, Kumar S. On the convergence of Adam and beyond. In: Proceedings of the International Conference on Learning Representations, 2019. 2763–2772

  57. Herranz L, Jiang S Q, Li X Y. Scene recognition with CNNs: objects, scales and dataset bias. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2016. 571–579

  58. Liu C B, Xie H T, Zha Z J, et al. Filtration and distillation: enhancing region attention for fine-grained visual categorization. In: Proceedings of the Association for the Advancement of Artificial Intelligence, 2020. 11555–11562

  59. Wang Z, Hu G S, Hu Q H. Training noise-robust deep neural networks via meta-learning. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2020. 4523–4532

  60. Zhang Y, Wei X S, Wu J X, et al. Weakly supervised fine-grained categorization with part-based image representation. IEEE Trans Image Process, 2016, 25: 1713–1725

    Article  MathSciNet  Google Scholar 

  61. Dixit M, Chen S, Gao D S, et al. Scene classification with semantic fisher vectors. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2015. 2974–2983

  62. Zheng H L, Fu J L, Mei T, et al. Learning multi-attention convolutional neural network for fine-grained image recognition. In: Proceedings of the International Conference on Computer Vision, 2017. 5209–5217

  63. Guo S, Huang W L, Wang L M, et al. Locally supervised deep hybrid model for scene recognition. IEEE Trans Image Process, 2016, 26: 808–820

    Article  MathSciNet  Google Scholar 

  64. Zheng H L, Fu J L, Zha Z J, et al. Looking for the devil in the details: learning trilinear attention sampling network for fine-grained image recognition. In: Proceedings of IEEE Conference Computer Vision and Pattern Recognition, 2019. 5012-–5021

  65. Yang L F, Li X, Song R J, et al. Dynamic MLP for fine-grained image classification by leveraging geographical and temporal information. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, 2022. 10935–10944

  66. Li L, Spratling M W. Data augmentation alone can improve adversarial training. In: Proceedings of the International Conference on Learning Representations, 2023. 3465–3475

Download references

Acknowledgements

This work was supported by National Natural Science Fund for Distinguished Young Scholars of China (Grant No. 62125203), National Natural Science Foundation of China (Grant Nos. 62302233, 62276143), Key Program of the National Natural Science Foundation of China (Grant No. 61932013), and Natural Science Foundation of Jiangsu Province of China (Grant Nos. BK20180470, BK20200739), Postdoctoral Science Foundation of Jiangsu Province of China (Grant No. 2021K172B), and China Postdoctoral Science Foundation (Grant No. 2021M691655).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fu Xiao.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, Y., Xiao, F., Li, H. et al. Meta label associated loss for fine-grained visual recognition. Sci. China Inf. Sci. 67, 162102 (2024). https://doi.org/10.1007/s11432-023-3922-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11432-023-3922-2

Keywords

Navigation