Skip to main content
Log in

Data reweighting net for web fine-grained image classification

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Fine-grained visual classification (FGVC) necessitates expert knowledge,which is expensive and requires a large training sample size. Consequently, using sample data acquired through the web has emerged as a novel approach for augmenting training samples. However, the web data often includes noisy samples, leading to misclassification of deep learning models. This paper presents a a meta-learning-base method called Data Reweighting Net (DR-Net). It enables the use of small, clean meta set as a guiding mechanism to accurately learn web image datasets that contain noise. More specifically, the DR-Net fully learns from small, clean meta set to discard noisy samples and obtain clean web samples through low similarity properties. DR-Net enables classification networks to adaptively learn training sets through sample weighting, mitigating the impact of noisy labels on classification learning. Our experiments on Web-bird, Web-aircraft, Web-car, CIFAR-10, and CIFAR-100 datasets demonstrate the feasibility of our proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data Availability

The WebFG-496, CUB200-2011, FGVC-aircraft, Stanford Cars datasets are provided in Table 1 taken from [20, 27, 72, 73].

References

  1. Balaha MM, El-Kady S, Balaha HM, Salama M, Emad E, Hassan M, Saafan MM (2023) A vision-based deep learning approach for independent-users arabic sign language interpretation. Multim Tools Appl 82(5):6807–6826. https://doi.org/10.1007/S11042-022-13423-9

    Article  Google Scholar 

  2. Ahmed U, Lin JC, Srivastava G (2022) Mitigating adversarial evasion attacks by deep active learning for medical image classification. Multim Tools Appl 81(29):41899–41910. https://doi.org/10.1007/S11042-021-11473-Z

    Article  Google Scholar 

  3. Sharma A, Mishra PK (2022) Image enhancement techniques on deep learning approaches for automated diagnosis of COVID-19 features using CXR images. Multim Tools Appl 81(29):42649–42690. https://doi.org/10.1007/S11042-022-13486-8

    Article  Google Scholar 

  4. Raghavan R, Verma DC, Pandey D, Anand R, Pandey BK, Singh H (2022) Optimized building extraction from high-resolution satellite imagery using deep learning. Multim Tools Appl 81(29):42309–42323. https://doi.org/10.1007/S11042-022-13493-9

    Article  Google Scholar 

  5. Yadavendra Chand S (2022) Semantic segmentation and detection of satellite objects using u-net model of deep learning. Multim Tools Appl 81(30):44291–44310. https://doi.org/10.1007/S11042-022-12892-2

    Article  Google Scholar 

  6. Yao Y, Shen F, Zhang J, Liu L, Tang Z, Shao L (2019) Extracting privileged information for enhancing classifier learning. IEEE Trans Image Process 28(1):436–450. https://doi.org/10.1109/TIP.2018.2869721

    Article  ADS  MathSciNet  Google Scholar 

  7. Yao Y, Shen F, Zhang J, Liu L, Tang Z, Shao L (2019) Extracting multiple visual senses for web learning. IEEE Trans. Multim. 21(1):184–196. https://doi.org/10.1109/TMM.2018.2847248

    Article  Google Scholar 

  8. Xie G-S, Liu L, Jin X, Zhu F, Zhang Z, Qin J, Yao Y, Shao L (2019) Attentive region embedding network for zero-shot learning. In: 2019 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 9376–9385. https://doi.org/10.1109/CVPR.2019.00961

  9. Luo H, Lin G, Liu Z, Liu F, Tang Z, Yao Y (2019) Segeqa: video segmentation based visual attention for embodied question answering. In: 2019 IEEE/CVF International conference on computer vision (ICCV), pp 9666–9675 . https://doi.org/10.1109/ICCV.2019.00976

  10. Xie G-S, Liu L, Zhu F, Zhao F, Zhang Z, Yao Y, Qin J, Shao L (2020) Region graph embedding network for zero-shot learning. In: Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part IV 16, pp 562–580 . Springer

  11. Yao Y, Hua X, Gao G, Sun Z, Li Z, Zhang J (2020) Bridging the web data and fine-grained visual recognition via alleviating label noise and domain mismatch. In: Proceedings of the 28th ACM international conference on multimedia. MM ’20, pp 1735–1744. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3394171.3413851

  12. Sun Z, Shen F, Huang D, Wang Q, Shu X, Yao Y, Tang J (2022) Pnp: robust learning from noisy labels by probabilistic noise prediction. In: 2022 IEEE/CVF Conference on computer vision and pattern recognition (CVPR), pp 5301–5310. https://doi.org/10.1109/CVPR52688.2022.00524

  13. Shu X, Tang J, Li Z, Lai H, Zhang L, Yan S (2018) Personalized age progression with bi-level aging dictionary learning. IEEE Trans Pattern Anal Mach Intell 40(4):905–917. https://doi.org/10.1109/TPAMI.2017.2705122

    Article  PubMed  Google Scholar 

  14. Shu X, Tang J, Qi G, Liu W, Yang J (2021) Hierarchical long short-term concurrent memory for human interaction recognition. IEEE Trans Pattern Anal Mach Intell 43(3):1110–1118. https://doi.org/10.1109/TPAMI.2019.2942030

    Article  PubMed  Google Scholar 

  15. Nie L, Yan S, Wang M, Hong R, Chua T-S (2012) Harvesting visual concepts for image search with complex queries. In: Proceedings of the 20th ACM international conference on multimedia. MM ’12, pp 59–68. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/2393347.2393363

  16. Nie L, Wang M, Zhang L, Yan S, Zhang B, Chua T (2015) Disease inference from health-related questions via sparse deep learning. IEEE Trans Knowl Data Eng 27(8):2107–2119. https://doi.org/10.1109/TKDE.2015.2399298

    Article  Google Scholar 

  17. Yao Y, Chen T, Xie G-S, Zhang C, Shen F, Wu Q, Tang Z, Zhang J (2021) Non-salient region object mining for weakly supervised semantic segmentation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR), pp 2623–2632. https://doi.org/10.1109/CVPR46437.2021.00265

  18. Nie L, Zhao Y, Akbari M, Shen J, Chua T (2015) Bridging the vocabulary gap between health seekers and healthcare knowledge. IEEE Trans Knowl Data Eng 27(2):396–409. https://doi.org/10.1109/TKDE.2014.2330813

    Article  Google Scholar 

  19. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  20. Krause J, Stark M, Deng J, Fei-Fei L (2013) 3d object representations for fine-grained categorization. In: 2013 IEEE International conference on computer vision workshops, pp 554–561. https://doi.org/10.1109/ICCVW.2013.77

  21. Maji S, Rahtu E, Kannala J, Blaschko M, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151

  22. Yao Y, Zhang J, Shen F, Hua X, Xu J, Tang Z (2017) Exploiting web images for dataset construction: A domain robust approach. IEEE Trans Multim 19(8):1771–1784. https://doi.org/10.1109/TMM.2017.2684626

    Article  Google Scholar 

  23. Yao Y, Zhang J, Shen F, Liu L, Zhu F, Zhang D, Shen HT (2020) Towards automatic construction of diverse, high-quality image datasets. IEEE Trans Knowl Data Eng 32(6):1199–1211. https://doi.org/10.1109/TKDE.2019.2903036

    Article  Google Scholar 

  24. Yao Y, Hua X-s, Shen F, Zhang J, Tang Z (2016) A domain robust approach for image dataset construction. In: Proceedings of the 24th ACM international conference on multimedia. MM ’16, pp 212–216. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/2964284.2967213

  25. Zhang C, Yao Y, Liu H, Xie G-S, Shu X, Zhou T, Zhang Z, Shen F, Tang Z (2020) Web-supervised network with softly update-drop training for fine-grained visual classification. Proceedings of the AAAI Conference on Artificial Intelligence 34(07):12781–12788. https://doi.org/10.1609/aaai.v34i07.6973

  26. Sun Z, Hua X-S, Yao Y, Wei X-S, Hu G, Zhang J (2020) Crssc: salvage reusable samples from noisy data for robust learning. In: Proceedings of the 28th ACM international conference on multimedia. MM ’20, pp 92–101. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3394171.3413978

  27. Sun Z, Yao Y, Wei X-S, Zhang Y, Shen F, Wu J, Zhang J, Shen HT (2021) Webly supervised fine-grained recognition: benchmark datasets and an approach. In: Proceedings of the IEEE/CVF international conference on computer vision (ICCV), pp 10602–10611

  28. Arpit D, Jastrzębski S, Ballas N, Krueger D, Bengio E, Kanwal MS, Maharaj T, Fischer A, Courville A, Bengio Y, et al. (2017) A closer look at memorization in deep networks. In: International conference on machine learning, pp 233–242. PMLR

  29. Zhang C, Bengio S, Hardt M, Recht B, Vinyals O (2021) Understanding deep learning (still) requires rethinking generalization. Commun ACM 64(3):107–115. https://doi.org/10.1145/3446776

    Article  Google Scholar 

  30. Zhang W, Wang D, Tan X (2019) Robust class-specific autoencoder for data cleaning and classification in the presence of label noise. Neural Process Lett 50(2):1845–1860. https://doi.org/10.1007/s11063-018-9963-9

    Article  Google Scholar 

  31. Shu J, Xie Q, Yi L, Zhao Q, Zhou S, Xu Z, Meng D (2019) Meta-weight-net: learning an explicit mapping for sample weighting. Adv Neural Inform Process Syst 32

  32. Ronneberger O, Fischer P, Brox T (2015) U-net: convolutional networks for biomedical image segmentation. In: Navab N, Hornegger J, Wells WM, Frangi AF (eds) Medical image computing and computer-assisted intervention - MICCAI 2015. Springer, Cham, pp 234–241

    Chapter  Google Scholar 

  33. Zhang N, Donahue J, Girshick R, Darrell T (2014) Part-based r-cnns for fine-grained category detection. In: Fleet D, Pajdla T, Schiele B, Tuytelaars T (eds) Computer Vision - ECCV 2014. Springer, Cham, pp 834–849

  34. Wei X, Xie C, Wu J (2016) Mask-cnn: localizing parts and selecting descriptors for fine-grained image recognition. arXiv:1605.06878

  35. Lin D, Shen X, Lu C, Jia J (2015) Deep lac: deep localization, alignment and classification for fine-grained recognition. In: 2015 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1666–1674. https://doi.org/10.1109/CVPR.2015.7298775

  36. Nie X, Chai B, Wang L, Liao Q, Xu M (2023) Learning enhanced features and inferring twice for fine-grained image classification. Multim Tools Appl 82(10):14799–14813. https://doi.org/10.1007/s11042-022-13619-z

    Article  Google Scholar 

  37. Huang S, Xu Z, Tao D, Zhang Y (2016) Part-stacked cnn for fine-grained visual categorization. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1173–1182. https://doi.org/10.1109/CVPR.2016.132

  38. Du R, Chang D, Bhunia AK, Xie J, Ma Z, Song Y-Z, Guo J (2020) Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: Vedaldi A, Bischof H, Brox T, Frahm J-M (eds) Computer Vision - ECCV 2020. Springer, Cham, pp 153–168

    Chapter  Google Scholar 

  39. Wu Z, Chen Q, Liu Y, Zhang Y, Zhu C, Yu Y (2021) Progressive multi-stage interactive training in mobile network for fine-grained recognition. arXiv:2112.04223

  40. Yang L, Li X, Song R, Zhao B, Tao J, Zhou S, Liang J, Yang J (2022) Dynamic mlp for fine-grained image classification by leveraging geographical and temporal information. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 10945–10954

  41. Wang Q, Wang J, Quan X, Feng F, Xu Z, Nie S, Wang S, Khabsa M, Firooz H, Liu D (2023) Mustie: multimodal structural transformer for web information extraction. In: Proceedings of the 61st annual meeting of the association for computational linguistics (vol 1: Long Papers), pp 2405–2420

  42. Wang Q, Fang Y, Ravula A, Feng F, Quan X, Liu D (2022) Webformer: the web-page transformer for structure information extraction. In: Proceedings of the ACM Web conference 2022. WWW ’22, pp 3124–3133. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3485447.3512032

  43. Yang L, Wang Q, Wang J, Quan X, Feng F, Chen Y, Khabsa M, Wang S, Xu Z, Liu D (2023) Mixpave: mix-prompt tuning for few-shot product attribute value extraction. Findings of the association for computational linguistics: ACL 2023:9978–9991

    Google Scholar 

  44. Krause J, Sapp B, Howard A, Zhou H, Toshev A, Duerig T, Philbin J, Fei-Fei L (2016) The unreasonable effectiveness of noisy data for fine-grained recognition. In: Leibe B, Matas J, Sebe N, Welling M (eds) Computer Vision - ECCV 2016. Springer, Cham, pp 301–320

    Chapter  Google Scholar 

  45. Han B, Yao Q, Yu X, Niu G, Xu M, Hu W, Tsang IW, Sugiyama M (2018) Co-teaching: robust training of deep neural networks with extremely noisy labels. In: Bengio S, Wallach HM, Larochelle H, Grauman K, Cesa-Bianchi N, Garnett R (eds) Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada, pp 8536–8546. https://proceedings.neurips.cc/paper/2018/hash/a19744e268754fb0148b01764 7355b7b-Abstract.html

  46. Yu X, Han B, Yao J, Niu G, Tsang I, Sugiyama M (2019) How does disagreement help generalization against label corruption? In: International conference on machine learning, pp 7164–7173. PMLR

  47. Liu D, Cui Y, Yan L, Mousas C, Yang B, Chen Y (2022) Densernet: weakly supervised visual localization using multi-scale feature aggregation. Proceedings of the AAAI conference on artificial intelligence 6101–6109. https://doi.org/10.1609/aaai.v35i7.16760

  48. Liu D, Liang J, Geng T, Loui A, Zhou T (2023) Tripartite feature enhanced pyramid network for dense prediction. IEEE Trans Image Process 32:2678–2692. https://doi.org/10.1109/TIP.2023.3272826

    Article  ADS  PubMed  Google Scholar 

  49. Liu D, Cui Y, Tan W, Chen Y (2021) Sg-net: spatial granularity network for one-stage video instance segmentation. In: 2021 IEEE/CVF conference on computer vision and pattern recognition (CVPR). https://doi.org/10.1109/cvpr46437.2021.00969

  50. Cui Y, Yan L, Cao Z, Liu D (2021) Tf-blender: temporal feature blender for video object detection. Cornell University - arXiv, Cornell University - arXiv

    Google Scholar 

  51. Wang W, Liang J, Liu D (2022) Learning equivariant segmentation with instance-unique querying

  52. Shu J, Yuan X, Meng D, Xu Z (2022) Cmw-net: learning a class-aware sample weighting mapping for robust deep learning. CoRR arXiv:2202.05613

  53. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP (2002) SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 16:321–357. https://doi.org/10.1613/jair.953

    Article  Google Scholar 

  54. Dong Q, Gong S, Zhu X (2017) Class rectification hard mining for imbalanced deep learning. In: Proceedings of the IEEE International conference on computer vision, pp 1851–1860

  55. Zadrozny B (2004) Learning and evaluating classifiers under sample selection bias. In: Proceedings of the twenty-first international conference on machine learning, p 114

  56. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778

  57. Yue C, Huang R, Towey D, Xian Z, Wu G (2024) An entropy-based group decision-making approach for software quality evaluation. Expert Syst Appl 238:121979. https://doi.org/10.1016/j.eswa.2023.121979

    Article  Google Scholar 

  58. Dubey A, Gupta O, Guo P, Raskar R, Farrell R, Naik N (2018) Pairwise confusion for fine-grained visual classification. In: Proceedings of the European conference on computer vision (ECCV), pp 70–86

  59. Yang Z, Luo T, Wang D, Hu Z, Gao J, Wang L (2018) Learning to navigate for fine-grained classification. In: Proceedings of the European conference on computer vision (ECCV), pp 420–435

  60. Wang Y, Morariu VI, Davis LS (2018) Learning a discriminative filter bank within a cnn for fine-grained recognition. In: 2018 IEEE/CVF Conference on computer vision and pattern recognition, pp 4148–4157. https://doi.org/10.1109/CVPR.2018.00436

  61. Song K, Wei X, Shu X, Song R, Lu J (2020) Bi-modal progressive mask attention for fine-grained recognition. IEEE Trans Image Process 29:7006–7018. https://doi.org/10.1109/TIP.2020.2996736

    Article  ADS  Google Scholar 

  62. Li J, Zhu L, Huang Z, Lu K, Zhao J (2018) I read, i saw, i tell: texts assisted fine-grained visual classification. In: Proceedings of the 26th ACM international conference on multimedia. MM ’18, pp 663–671. Association for Computing Machinery, New York, USA. https://doi.org/10.1145/3240508.3240579

  63. Wang Y, Choi J, Morariu VI, Davis LS (2016) Mining discriminative triplets of patches for fine-grained classification. In: 2016 IEEE Conference on computer vision and pattern recognition (CVPR), pp 1163–1172. https://doi.org/10.1109/CVPR.2016.131

  64. Wei X, Xie C, Wu J, Shen C (2018) Mask-cnn: localizing parts and selecting descriptors for fine-grained bird species categorization. Pattern Recognit 76:704–714. https://doi.org/10.1016/j.patcog.2017.10.002

    Article  ADS  Google Scholar 

  65. Zhang C, Lin G, Wang Q, Shen F, Yao Y, Tang Z (2022) Guided by meta-set: a data-driven method for fine-grained visual recognition. IEEE Transactions on Multimedia

  66. Zhang Z, Liu Q, Wang Y (2018) Road extraction by deep residual u-net. IEEE Geosci Remote Sensing Lett 15(5):749–753

    Article  ADS  CAS  Google Scholar 

  67. Fan T, Wang G, Li Y, Wang H (2020) Ma-net: a multi-scale attention network for liver and tumor segmentation. IEEE Access 8:179656–179665

    Article  Google Scholar 

  68. Chaurasia A, Culurciello E (2017) Linknet: exploiting encoder representations for efficient semantic segmentation. In: 2017 IEEE Visual communications and image processing (VCIP), pp 1–4. IEEE

  69. Kirillov A, He K, Girshick R, Dollár P (2017) A unified architecture for instance and semantic segmentation. In: CVPR

  70. Li H, Xiong P, An J, Wang L (2018) Pyramid attention network for semantic segmentation. arXiv:1805.10180

  71. Chen L-C, Zhu Y, Papandreou G, Schroff F, Adam H (2018) Encoder-decoder with atrous separable convolution for semantic image segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 801–818

  72. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The caltech-ucsd birds-200-2011 dataset

  73. Maji S, Rahtu E, Kannala J, Blaschko MB, Vedaldi A (2013) Fine-grained visual classification of aircraft. arXiv:1306.5151

  74. Paszke A, Gross S, Chintala S, Chanan G, Yang E, DeVito Z, Lin Z, Desmaison A, Antiga L, Lerer A (2017) Automatic differentiation in pytorch

  75. Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252. https://doi.org/10.1007/s11263-015-0816-y

    Article  MathSciNet  Google Scholar 

  76. Zhang H, Cisse M, Dauphin YN, Lopez-Paz D (2017) mixup: beyond empirical risk minimization. arXiv:1710.09412

  77. Krizhevsky A, Hinton G et al (2009) Learning multiple layers of features from tiny images

  78. Patrini G, Rozza A, Krishna Menon A, Nock R, Qu L (2017) Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1944–1952

  79. Zhang Z, Sabuncu M (2018) Generalized cross entropy loss for training deep neural networks with noisy labels. Adv Neural Inform Process Syst 31

  80. Ren M, Zeng W, Yang B, Urtasun R (2018) Learning to reweight examples for robust deep learning. In: International conference on machine learning, pp 4334–4343. PMLR

Download references

Funding

This research was funded by the Macau Science and Technology Development Funds [Grant number 0061/2020/A2].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sio-long Lo.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Wu, Z., Lo, Sl. et al. Data reweighting net for web fine-grained image classification. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-18598-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11042-024-18598-x

Keywords

Navigation