Deep Neural Networks for Corrupted Labels

  • Ishan JindalEmail author
  • Matthew Nokleby
  • Daniel Pressel
  • Xuewen Chen
  • Harpreet Singh
Part of the Studies in Computational Intelligence book series (SCI, volume 866)


The success of deep convolutional networks on image and text classification and recognition tasks depends on the availability of large, correctly labeled training datasets, but obtaining the correct labels for these gigantic datasets is very difficult task. To deal with this problem, we describe an approach for learning deep networks from datasets corrupted by unknown label noise. We append a nonlinear noise model to a standard deep network, which is learned in tandem with the parameters of the network. Further, we train the network using a loss function that encourages the clustering of training images. We argue that the non-linear noise model, while not rigorous as a probabilistic model, results in a more effective denoising operator during backpropagation. We evaluate the performance of proposed approach on image classification task with artificially injected label noise to MNIST, CIFAR-10, CIFAR-100 and ImageNet datasets and on a large-scale Clothing 1M dataset with inherent label noise. Further, we show that with the different initialization and the regularization of the noise model, we can apply this learning procedure to text classification tasks as well. We evaluate the performance of modified approach on TREC text classification dataset. On all these datasets, the proposed approach provides significantly improved classification performance over the state of the art and is robust to the amount of label noise and the training samples. This approach is computationally fast, completely parallelizable, and easily implemented with existing machine learning libraries.


Label noise Deep learning Image classification Text classification E-M style label denoising Convolutional network 


  1. 1.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  2. 2.
    Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)Google Scholar
  3. 3.
    Torralba, A., Fergus, R., Freeman, W.T.: 80 million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans. pattern Anal. Mach. Intell. 30(11), 1958–1970 (2008)CrossRefGoogle Scholar
  4. 4.
    Johnson, J., Karpathy, A., Fei-Fei, L.: Densecap: Fully convolutional localization networks for dense captioning. In: IEEE Conference on Proceedings of the Computer Vision and Pattern Recognition, pp. 4565–4574 (2016)Google Scholar
  5. 5.
    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  6. 6.
    Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images (2009)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE (2009)Google Scholar
  8. 8.
    Frénay, B., Verleysen, M.: Classification in the presence of label noise: a survey. IEEE Trans. Neural Netw. Learn. Syst. 25(5), 845–869 (2014)CrossRefGoogle Scholar
  9. 9.
    Zhu, X., Wu, X.: Class noise vs. attribute noise: a quantitative study. Artif. Intell. Rev. 22(3), 177–210 (2004)CrossRefGoogle Scholar
  10. 10.
    Xiao, T., Xia, T., Yang, Y., Huang, C., Wang, X.: Learning from massive noisy labeled data for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2691–2699 (2015)Google Scholar
  11. 11.
    Kim, Y.: Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1746–1751 (2014)Google Scholar
  12. 12.
    Zhu, X.: Semi-supervised learning literature survey (2005)Google Scholar
  13. 13.
    Aslam, J.A., Decatur, S.E.: On the sample complexity of noise-tolerant learning. Inf. Process. Lett. 57(4), 189–195 (1996)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Natarajan, N., Dhillon, I.S., Ravikumar, P.K., Tewari, A.: Learning with noisy labels. In: Advances in Neural Information Processing Systems, pp. 1196–1204 (2013)Google Scholar
  15. 15.
    Liu, T., Tao, D.: Classification with noisy labels by importance reweighting. IEEE Trans. Pattern Anal. Mach. Intell. 38(3), 447–461 (2016)CrossRefGoogle Scholar
  16. 16.
    Lawrence, N.D., Schölkopf, B.: Estimating a kernel fisher discriminant in the presence of label noise. In: ICML, vol. 1, Citeseer, pp. 306–313 (2001)Google Scholar
  17. 17.
    Rebbapragada, U., Brodley, C.E.: Class noise mitigation through instance weighting. In: European Conference on Machine Learning, pp. 708–715. Springer (2007)Google Scholar
  18. 18.
    Brodley, C.E., Friedl, M.A., et al.: Identifying and eliminating mislabeled training instances. In: AAAI/IAAI, vol. 1, pp. 799–805 (1996)Google Scholar
  19. 19.
    Brodley, C.E., Friedl, M.A.: Identifying mislabeled training data. J. Artif. Intell. Res. 11, 131–167 (1999)CrossRefGoogle Scholar
  20. 20.
    Manwani, N., Sastry, P.: Noise tolerance under risk minimization. IEEE Trans. Cybern. 43(3), 1146–1151 (2013)CrossRefGoogle Scholar
  21. 21.
    Ma, X., Wang, Y., Houle, M.E., Zhou, S., Erfani, S.M., Xia, S.T., Wijewickrema, S., Bailey, J.: Dimensionality-driven learning with noisy labels (2018). arXiv:1806.02612
  22. 22.
    Wang, Y., Liu, W., Ma, X., Bailey, J., Zha, H., Song, L., Xia, S.T.: Iterative learning with open-set noisy labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8688–8696 (2018)Google Scholar
  23. 23.
    Sukhbaatar, S., Bruna, J., Paluri, M., Bourdev, L., Fergus, R.: Training convolutional networks with noisy labels (2014). arXiv:1406.2080
  24. 24.
    Jindal, I., Nokleby, M., Chen, X.: Learning deep networks from noisy labels with dropout regularization. In: 2016 IEEE 16th International Conference on Data Mining (ICDM), pp. 967–972. IEEE (2016)Google Scholar
  25. 25.
    Jindal, I., Pressel, D., Lester, B., Nokleby, M.: An effective label noise model for dnn text classification (2019). arXiv:1903.07507
  26. 26.
    Tanaka, D., Ikami, D., Yamasaki, T., Aizawa, K.: Joint optimization framework for learning with noisy labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5552–5560 (2018)Google Scholar
  27. 27.
    Li, Y., Yang, J., Song, Y., Cao, L., Luo, J., Li, L.J.: Learning from noisy labels with distillation. In: ICCV, pp. 1928–1936 (2017)Google Scholar
  28. 28.
    Malach, E., Shalev-Shwartz, S.: Decoupling “when to update” from “how to update”. In: Advances in Neural Information Processing Systems, pp. 961–971 (2017)Google Scholar
  29. 29.
    Veit, A., Alldrin, N., Chechik, G., Krasin, I., Gupta, A., Belongie, S.: Learning from noisy large-scale datasets with minimal supervision. In: The Conference on Computer Vision and Pattern Recognition (2017)Google Scholar
  30. 30.
    Vahdat, A.: Toward robustness against label noise in training deep discriminative neural networks. In: Advances in Neural Information Processing Systems, pp. 5596–5605 (2017)Google Scholar
  31. 31.
    Yao, J., Wang, J., Tsang, I.W., Zhang, Y., Sun, J., Zhang, C., Zhang, R.: Deep learning from noisy image labels with quality embedding. IEEE Trans. Image Process. (2018)Google Scholar
  32. 32.
    Reed, S., Lee, H., Anguelov, D., Szegedy, C., Erhan, D., Rabinovich, A.: Training deep neural networks on noisy labels with bootstrapping (2014). arXiv:1412.6596
  33. 33.
    Azadi, S., Feng, J., Jegelka, S., Darrell, T.: Auxiliary image regularization for deep cnns with noisy labels (2015). arXiv:1511.07069
  34. 34.
    Joulin, A., van der Maaten, L., Jabri, A., Vasilache, N.: Learning visual features from large weakly supervised data. In: European Conference on Computer Vision, pp. 67–84. Springer (2016)Google Scholar
  35. 35.
    Jiang, L., Zhou, Z., Leung, T., Li, L.J., Fei-Fei, L.: Mentornet: Regularizing very deep neural networks on corrupted labels (2017). arXiv:1712.05055
  36. 36.
    Ghosh, A., Kumar, H., Sastry, P.: Robust loss functions under label noise for deep neural networks. In: AAAI, pp. 1919–1925 (2017)Google Scholar
  37. 37.
    Mnih, V., Hinton, G.E.: Learning to label aerial images from noisy data. In: Proceedings of the 29th International conference on machine learning (ICML-12), pp. 567–574 (2012)Google Scholar
  38. 38.
    Patrini, G., Rozza, A., Menon, A.K., Nock, R., Qu, L.: Making deep neural networks robust to label noise: a loss correction approach. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition(CVPR), pp. 2233–2241 (2017)Google Scholar
  39. 39.
    Han, B., Yao, J., Niu, G., Zhou, M., Tsang, I., Zhang, Y., Sugiyama, M.: Masking: A new perspective of noisy supervision (2018). arXiv:1805.08193
  40. 40.
    Misra, I., Lawrence Zitnick, C., Mitchell, M., Girshick, R.: Seeing through the human reporting bias: visual classifiers from noisy human-centric labels. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2930–2939 (2016)Google Scholar
  41. 41.
    Goldberger, J., Ben-Reuven, E.: Training deep neural-networks using a noise adaptation layer (2017)Google Scholar
  42. 42.
    Audhkhasi, K., Osoba, O., Kosko, B.: Noise-enhanced convolutional neural networks. Neural Netw. 78, 15–23 (2016)CrossRefGoogle Scholar
  43. 43.
    Vedaldi, A., Lenc, K.: Matconvnet: convolutional neural networks for matlab. In: Proceedings of the 23rd ACM international conference on Multimedia, pp. 689–692. ACM (2015)Google Scholar
  44. 44.
    LeCun, Y., Cortes, C., Burges, C.J.: The MNIST database of handwritten digits (1998)Google Scholar
  45. 45.
    Pressel, D., Ray Choudhury, S., Lester, B., Zhao, Y., Barta, M.: Baseline: a library for rapid modeling, experimentation and development of deep learning algorithms targeting nlp. In: Proceedings of Workshop for NLP Open Source Software (NLP-OSS), Association for Computational Linguistics, pp. 34–40 (2018)Google Scholar
  46. 46.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778 (2016)Google Scholar
  47. 47.
    Voorhees, E.M., Tice, D.M.: The TREC-8 question answering track evaluation. In: TREC, vol. 82, (1999)Google Scholar
  48. 48.
    Rolnick, D., Veit, A., Belongie, S., Shavit, N.: Deep learning is robust to massive label noise (2017). arXiv:1705.10694
  49. 49.
    Van Der Maaten, L.: Accelerating t-sne using tree-based algorithms. J. Mach. Learn. Res. 15(1), 3221–3245 (2014)MathSciNetzbMATHGoogle Scholar
  50. 50.
    Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization (2016). arXiv:1611.03530

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Ishan Jindal
    • 1
    Email author
  • Matthew Nokleby
    • 1
  • Daniel Pressel
    • 2
  • Xuewen Chen
    • 1
  • Harpreet Singh
    • 1
  1. 1.Wayne State UniversityDetroitUSA
  2. 2.Interactions Digital RootsAnn ArborUSA

Personalised recommendations