Classification with label noise: a Markov chain sampling framework


The effectiveness of classification methods relies largely on the correctness of instance labels. In real applications, however, the labels of instances are often not highly reliable due to the presence of label noise. Training effective classifiers in the presence of label noise is a challenging task that enjoys many real-world applications. In this paper, we propose a Markov chain sampling (MCS) framework that accurately identifies mislabeled instances and robustly learns effective classifiers. MCS builds a Markov chain where each state uniquely represents a set of randomly sampled instances. We show that the Markov chain has a unique stationary distribution, which puts much larger probability weights on the states dominated by correctly labeled instances than the states dominated by mislabeled instances. We propose a Markov Chain Monte Carlo sampling algorithm to approximate the stationary distribution, which is further used to compute the mislabeling probability for each instance, and train noise-resistant classifiers. The MCS framework is highly compatible with a wide spectrum of classifiers that produce probabilistic classification results. Extensive experiments on both real and synthetic data sets demonstrate the superior effectiveness and efficiency of the proposed MCS framework.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9


  1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66

    Google Scholar 

  2. Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4):343–370

    Google Scholar 

  3. Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156

    MathSciNet  MATH  Article  Google Scholar 

  4. Belur V (1991) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Washington DC

    Google Scholar 

  5. Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. Asian Conf Mach Learn 20:97–112

    Google Scholar 

  6. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    MATH  Article  Google Scholar 

  7. Cawley G, Talbot N (2006)

  8. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27

    Article  Google Scholar 

  9. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27

    MATH  Article  Google Scholar 

  10. Delany SJ, Cunningham P (2004) An analysis of case-base editing in a spam filtering system. Eur Conf Case-Based Reason 3115:128–141

    Google Scholar 

  11. Folleco A, Khoshgoftaar TM, Van Hulse J, Bullard L (2008) Identifying learners robust to low quality data. In: International conference on information reuse and integration, pp 190–195

  12. Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869

    MATH  Article  Google Scholar 

  13. Gamberger D, Lavrač N (1996) Noise detection and elimination applied to noise handling in a KRK chess endgame. In: International workshop on inductive logic programming, pp 72–88

  14. Gamberger D, Lavrač N, Džeroski S (1996) Noise elimination in inductive concept learning: a case study in medical diagnosis. Int Workshop Algorithm Learn Theory 1160:199–212

    MATH  Google Scholar 

  15. Gamberger D, Lavrac N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: International conference on machine learning, pp 143–151

  16. Ghosh A, Manwani N, Sastry P (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107

    Article  Google Scholar 

  17. Gilks WR, Richardson S, Spiegelhalter D (1995) Markov Chain Monte Carlo in practice. CRC Press, Boca Raton

    Google Scholar 

  18. Grimmett G, Stirzaker D (2001) Probability and random processes. Oxford University Press, Oxford

    Google Scholar 

  19. Guyon I, Matic N, Vapnik V, et al (1994) Discovering informative patterns and data cleaning. In: International conference on knowledge discovery and data mining, pp 145–156

  20. Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley, Hoboken

    Google Scholar 

  21. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Uncertainty in artificial intelligence, pp 338–345

  22. Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems, pp 1953–1961

  23. Khoshgoftaar TM, Rebours P (2004) Generating multiple noise elimination filters with the ensemble-partitioning filter. In: International conference on information reuse and integration, pp 369–375

  24. Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. Int Conf Mach Learn 1:306–313

    Google Scholar 

  25. Li H (2015) Theoretical analysis and efficient algorithms for crowdsourcing. University of California, Berkeley

    Google Scholar 

  26. Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Advances in neural information processing systems, pp 692–700

  27. Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461

    Article  Google Scholar 

  28. Long PM, Servedio RA (2008) Random classification noise defeats all convex potential boosters. In: International conference on machine learning, pp 608–615

  29. Manwani N, Sastry P (2013) Noise tolerance under risk minimization. IEEE Trans Cybern 43(3):1146–1151

    Article  Google Scholar 

  30. Miranda AL, Garcia LPF, Carvalho AC, Lorena AC (2009) Use of classification algorithms in noise detection and elimination. Int Conf Hybrid Artif Intell Syst 5572:417–424

    Article  Google Scholar 

  31. Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204

  32. Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306

    Article  Google Scholar 

  33. Patrini G, Nielsen F, Nock R, Carioni M (2016) Loss factorization, weakly supervised learning and label noise robustness. In: International conference on machine learning, pp 708–717

  34. Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, Amsterdam

    Google Scholar 

  35. Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11(Apr):1297–1322

    MathSciNet  Google Scholar 

  36. Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, Cambridge

    Google Scholar 

  37. Scott C, Blanchard G, Handy G (2013) Classification with asymmetric label noise: consistency and maximal denoising. Conf Learn Theory 30:489–511

    MATH  Google Scholar 

  38. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: International conference on knowledge discovery and data mining, pp 614–622

  39. Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: International conference on artificial neural networks, pp 884–893

  40. Sun JW, Zhao Wang CI, Chen SF (2007) Identifying and correcting mislabeled training instances. Future Gener Commun Netw 1:244–250

    Article  Google Scholar 

  41. Thongkam J, Xu G, Zhang Y, Huang F (2008) Support vector machine for outlier detection in breast cancer survivability prediction. Asia-Pac Web Conf 4977:99–109

    Google Scholar 

  42. Valiant LG (1984) A theory of the learnable. ACM Commun 27(11):1134–1142

    MATH  Article  Google Scholar 

  43. Vapnik V (2013) The nature of statistical learning theory. Springer, New York

    Google Scholar 

  44. Vaughan JW (2018) Making better use of the crowd: how crowdsourcing can advance machine learning research. J Mach Learn Res 18(1):7026–7071

    MathSciNet  MATH  Google Scholar 

  45. Whitehill J, Wu Tf, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems, pp 2035–2043

  46. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421

    MathSciNet  MATH  Article  Google Scholar 

  47. Wilson DR, Martinez TR (1997) Instance pruning techniques. Int Conf Mach Learn 97:403–411

    Google Scholar 

  48. Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286

    MATH  Article  Google Scholar 

  49. Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62

    Article  Google Scholar 

  50. Yang T, Mahdavi M, Jin R, Zhang L, Zhou Y (2012) Multiple kernel learning from noisy labels by stochastic programming. In: International conference on machine learning, pp 1–8

  51. Zhou D, Basu S, Mao Y, Platt JC (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systems, pp 2195–2203

Download references


This work was supported in part by the NSERC Discovery Grant Program, the Australian Research Council Projects FL-170100117, DP-180103424, and IH180100002. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.

Author information



Corresponding author

Correspondence to Lingyang Chu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Jesse Davis, Elisa Fromont, Derek Greene, Bjorn Bringmann.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhao, Z., Chu, L., Tao, D. et al. Classification with label noise: a Markov chain sampling framework. Data Min Knowl Disc 33, 1468–1504 (2019).

Download citation


  • Label noise
  • Classification
  • Markov Chain Monte Carlo
  • Sampling framework