Webly Supervised Image Classification with Self-contained Confidence

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 12353)


This paper focuses on webly supervised learning (WSL), where datasets are built by crawling samples from the Internet and directly using search queries as web labels. Although WSL benefits from fast and low-cost data collection, noises in web labels hinder better performance of the image classification model. To alleviate this problem, in recent works, self-label supervised loss \(\mathcal {L}_s\) is utilized together with webly supervised loss \(\mathcal {L}_w\). \(\mathcal {L}_s\) relies on pseudo labels predicted by the model itself. Since the correctness of the web label or pseudo label is usually on a case-by-case basis for each web sample, it is desirable to adjust the balance between \(\mathcal {L}_s\) and \(\mathcal {L}_w\) on sample level. Inspired by the ability of Deep Neural Networks (DNNs) in confidence prediction, we introduce Self-Contained Confidence (SCC) by adapting model uncertainty for WSL setting, and use it to sample-wisely balance \(\mathcal {L}_s\) and \(\mathcal {L}_w\). Therefore, a simple yet effective WSL framework is proposed. A series of SCC-friendly regularization approaches are investigated, among which the proposed graph-enhanced mixup is the most effective method to provide high-quality confidence to enhance our framework. The proposed WSL framework has achieved the state-of-the-art results on two large-scale WSL datasets, WebVision-1000 and Food101-N. Code is available at


Webly supervised learning Noisy labels Model uncertainty 



The work described in this paper was partially supported by Innovation and Technology Commission of the Hong Kong Special Administrative Region, China (Enterprise Support Scheme under the Innovation and Technology Fund B/E030/18).

Supplementary material

504445_1_En_46_MOESM1_ESM.pdf (134 kb)
Supplementary material 1 (pdf 134 KB)


  1. 1.
    Berthelot, D., Carlini, N., Goodfellow, I., Papernot, N., Oliver, A., Raffel, C.: Mixmatch: a holistic approach to semi-supervised learning. In: NeurIPS (2019)Google Scholar
  2. 2.
    Bossard, L., Guillaumin, M., Van Gool, L.: Food-101 – mining discriminative components with random forests. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8694, pp. 446–461. Springer, Cham (2014). Scholar
  3. 3.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009)Google Scholar
  4. 4.
    Dietterich, T.G.: Ensemble methods in machine learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000). Scholar
  5. 5.
    Gal, Y.: Uncertainty in deep learning. Univ. Camb. 1( 3) (2016)Google Scholar
  6. 6.
    Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: representing model uncertainty in deep learning. In: ICML, pp. 1050–1059 (2016)Google Scholar
  7. 7.
    Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. In: ICML, pp. 1321–1330 (2017)Google Scholar
  8. 8.
    Guo, S., et al.: Curriculumnet: weakly supervised learning from large-scale web images. In: ECCV, pp. 135–150. Springer (2018)Google Scholar
  9. 9.
    Han, J., Luo, P., Wang, X.: Deep self-learning from noisy labels. In: ICCV, pp. 5138–5147 (2019)Google Scholar
  10. 10.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  11. 11.
    He, T., Zhang, Z., Zhang, H., Zhang, Z., Xie, J., Li, M.: Bag of tricks for image classification with convolutional neural networks. In: CVPR, pp. 558–567 (2019)Google Scholar
  12. 12.
    Hendrycks, D., Mu, N., Cubuk, E.D., Zoph, B., Gilmer, J., Lakshminarayanan, B.: AugMix: a simple data processing method to improve robustness and uncertainty. In: ICLR (2020)Google Scholar
  13. 13.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In: NIPS Deep Learning and Representation Learning Workshop (2015)Google Scholar
  14. 14.
    Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: Mentornet: learning data-driven curriculum for very deep neural networks on corrupted labels. In: ICML, pp. 2304–2313 (2018)Google Scholar
  15. 15.
    Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: ICLR (2017)Google Scholar
  16. 16.
    Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NIPS, pp. 6402–6413 (2017)Google Scholar
  17. 17.
    Lee, K.H., He, X., Zhang, L., Yang, L.: Cleannet: transfer learning for scalable image classifier training with label noise. In: CVPR, pp. 5447–5456 (2018)Google Scholar
  18. 18.
    Li, Q., Peng, X., Cao, L., Du, W., Xing, H., Qiao, Y.: Product image recognition with guidance learning and noisy supervision. arXiv preprint arXiv:1907.11384 (2019)
  19. 19.
    Li, W., Wang, L., Li, W., Agustsson, E., Van Gool, L.: Webvision database: visual learning and understanding from web data. arXiv preprint arXiv:1708.02862 (2017)
  20. 20.
    Mahajan, D., et al.: Exploring the limits of weakly supervised pretraining. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11206, pp. 185–201. Springer, Cham (2018). Scholar
  21. 21.
    Müller, R., Kornblith, S., Hinton, G.E.: When does label smoothing help? In: NeurIPS, pp. 4694–4703 (2019)Google Scholar
  22. 22.
    Ovadia, Y., et al.: Can you trust your model’s uncertainty? evaluating predictive uncertainty under dataset shift. In: NeurIPS, pp. 13991–14002 (2019)Google Scholar
  23. 23.
    Patrini, G., Rozza, A., Krishna Menon, A., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: CVPR, pp. 1944–1952 (2017)Google Scholar
  24. 24.
    Pereyra, G., Tucker, G., Chorowski, J., Kaiser, Ł., Hinton, G.: Regularizing neural networks by penalizing confident output distributions. arXiv preprint arXiv:1701.06548 (2017)
  25. 25.
    Radosavovic, I., Dollár, P., Girshick, R., Gkioxari, G., He, K.: Data distillation: towards omni-supervised learning. In: CVPR, pp. 4119–4128 (2018)Google Scholar
  26. 26.
    Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A.: Training deep neural networks on noisy labels with bootstrapping. In: ICLR (2015)Google Scholar
  27. 27.
    Shah, M., et al.: Inferring context from pixels for multimodal image classification. In: CIKM, pp. 189–198. ACM (2019)Google Scholar
  28. 28.
    Snell, J., Swersky, K., Zemel, R.: Prototypical networks for few-shot learning. In: NIPS, pp. 4077–4087 (2017)Google Scholar
  29. 29.
    Sun, C., Shrivastava, A., Singh, S., Gupta, A.: Revisiting unreasonable effectiveness of data in deep learning era. In: ICCV, pp. 843–852 (2017)Google Scholar
  30. 30.
    Sun, Y., Wang, X., Tang, X.: Deep learning face representation from predicting 10,000 classes. In: CVPR, pp. 1891–1898 (2014)Google Scholar
  31. 31.
    Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826 (2016)Google Scholar
  32. 32.
    Tanaka, D., Ikami, D., Yamasaki, T., Aizawa, K.: Joint optimization framework for learning with noisy labels. In: CVPR, pp. 5552–5560 (2018)Google Scholar
  33. 33.
    Thulasidasan, S., Chennupati, G., Bilmes, J.A., Bhattacharya, T., Michalak, S.: On mixup training: improved calibration and predictive uncertainty for deep neural networks. In: NeurIPS, pp. 13888–13899 (2019)Google Scholar
  34. 34.
    Tu, Y., Niu, L., Chen, J., Cheng, D., Zhang, L.: Learning from web data with self-organizing memory module. In: CVPR, pp. 12846–12855 (2020)Google Scholar
  35. 35.
    Xia, X., et al.: Are anchor points really indispensable in label-noise learning? In: NeurIPS, pp. 6838–6849 (2019)Google Scholar
  36. 36.
    Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)
  37. 37.
    Yalniz, I.Z., Jégou, H., Chen, K., Paluri, M., Mahajan, D.: Billion-scale semi-supervised learning for image classification. arXiv preprint arXiv:1905.00546 (2019)
  38. 38.
    Yu, X., Liu, T., Gong, M., Batmanghelich, K., Tao, D.: An efficient and provable approach for mixture proportion estimation using linear independence assumption. In: CVPR, pp. 4480–4489 (2018)Google Scholar
  39. 39.
    Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: Beyond empirical risk minimization. In: ICLR (2018)Google Scholar
  40. 40.
    Zhang, W., Wang, Y., Qiao, Y.: Metacleaner: Learning to hallucinate clean representations for noisy-labeled visual recognition. In: CVPR, pp. 7373–7382 (2019)Google Scholar
  41. 41.
    Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. In: TPAMI (2017)Google Scholar
  42. 42.
    Zhu, X.J.: Semi-supervised learning literature survey. University of Wisconsin-Madison Department of Computer Sciences, Technical report (2005)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.SenseTime ResearchHong Kong SARChina
  2. 2.Rice UniversityHoustonUSA
  3. 3.The Chinese University of Hong KongHong Kong SARChina
  4. 4.The University of Hong KongHong Kong SARChina

Personalised recommendations