Data Mining and Knowledge Discovery

, Volume 33, Issue 5, pp 1468–1504 | Cite as

Classification with label noise: a Markov chain sampling framework

  • Zijin Zhao
  • Lingyang ChuEmail author
  • Dacheng Tao
  • Jian Pei
Part of the following topical collections:
  1. Journal Track of ECML PKDD 2019


The effectiveness of classification methods relies largely on the correctness of instance labels. In real applications, however, the labels of instances are often not highly reliable due to the presence of label noise. Training effective classifiers in the presence of label noise is a challenging task that enjoys many real-world applications. In this paper, we propose a Markov chain sampling (MCS) framework that accurately identifies mislabeled instances and robustly learns effective classifiers. MCS builds a Markov chain where each state uniquely represents a set of randomly sampled instances. We show that the Markov chain has a unique stationary distribution, which puts much larger probability weights on the states dominated by correctly labeled instances than the states dominated by mislabeled instances. We propose a Markov Chain Monte Carlo sampling algorithm to approximate the stationary distribution, which is further used to compute the mislabeling probability for each instance, and train noise-resistant classifiers. The MCS framework is highly compatible with a wide spectrum of classifiers that produce probabilistic classification results. Extensive experiments on both real and synthetic data sets demonstrate the superior effectiveness and efficiency of the proposed MCS framework.


Label noise Classification Markov Chain Monte Carlo Sampling framework 



This work was supported in part by the NSERC Discovery Grant Program, the Australian Research Council Projects FL-170100117, DP-180103424, and IH180100002. All opinions, findings, conclusions and recommendations in this paper are those of the authors and do not necessarily reflect the views of the funding agencies.


  1. Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6(1):37–66Google Scholar
  2. Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4):343–370Google Scholar
  3. Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156MathSciNetzbMATHGoogle Scholar
  4. Belur V (1991) Nearest neighbor (NN) norms: NN pattern classification techniques. IEEE Computer Society Press, Washington DCGoogle Scholar
  5. Biggio B, Nelson B, Laskov P (2011) Support vector machines under adversarial label noise. Asian Conf Mach Learn 20:97–112Google Scholar
  6. Breiman L (2001) Random forests. Mach Learn 45(1):5–32zbMATHGoogle Scholar
  7. Cawley G, Talbot N (2006)
  8. Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol 2(3):27:1–27:27Google Scholar
  9. Cover T, Hart P (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27zbMATHGoogle Scholar
  10. Delany SJ, Cunningham P (2004) An analysis of case-base editing in a spam filtering system. Eur Conf Case-Based Reason 3115:128–141Google Scholar
  11. Folleco A, Khoshgoftaar TM, Van Hulse J, Bullard L (2008) Identifying learners robust to low quality data. In: International conference on information reuse and integration, pp 190–195Google Scholar
  12. Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869zbMATHGoogle Scholar
  13. Gamberger D, Lavrač N (1996) Noise detection and elimination applied to noise handling in a KRK chess endgame. In: International workshop on inductive logic programming, pp 72–88Google Scholar
  14. Gamberger D, Lavrač N, Džeroski S (1996) Noise elimination in inductive concept learning: a case study in medical diagnosis. Int Workshop Algorithm Learn Theory 1160:199–212zbMATHGoogle Scholar
  15. Gamberger D, Lavrac N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: International conference on machine learning, pp 143–151Google Scholar
  16. Ghosh A, Manwani N, Sastry P (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107Google Scholar
  17. Gilks WR, Richardson S, Spiegelhalter D (1995) Markov Chain Monte Carlo in practice. CRC Press, Boca RatonzbMATHGoogle Scholar
  18. Grimmett G, Stirzaker D (2001) Probability and random processes. Oxford University Press, OxfordzbMATHGoogle Scholar
  19. Guyon I, Matic N, Vapnik V, et al (1994) Discovering informative patterns and data cleaning. In: International conference on knowledge discovery and data mining, pp 145–156Google Scholar
  20. Hosmer DW Jr, Lemeshow S, Sturdivant RX (2013) Applied logistic regression, vol 398. Wiley, HobokenzbMATHGoogle Scholar
  21. John GH, Langley P (1995) Estimating continuous distributions in Bayesian classifiers. In: Uncertainty in artificial intelligence, pp 338–345Google Scholar
  22. Karger DR, Oh S, Shah D (2011) Iterative learning for reliable crowdsourcing systems. In: Advances in neural information processing systems, pp 1953–1961Google Scholar
  23. Khoshgoftaar TM, Rebours P (2004) Generating multiple noise elimination filters with the ensemble-partitioning filter. In: International conference on information reuse and integration, pp 369–375Google Scholar
  24. Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. Int Conf Mach Learn 1:306–313Google Scholar
  25. Li H (2015) Theoretical analysis and efficient algorithms for crowdsourcing. University of California, BerkeleyGoogle Scholar
  26. Liu Q, Peng J, Ihler AT (2012) Variational inference for crowdsourcing. In: Advances in neural information processing systems, pp 692–700Google Scholar
  27. Liu T, Tao D (2016) Classification with noisy labels by importance reweighting. IEEE Trans Pattern Anal Mach Intell 38(3):447–461Google Scholar
  28. Long PM, Servedio RA (2008) Random classification noise defeats all convex potential boosters. In: International conference on machine learning, pp 608–615Google Scholar
  29. Manwani N, Sastry P (2013) Noise tolerance under risk minimization. IEEE Trans Cybern 43(3):1146–1151Google Scholar
  30. Miranda AL, Garcia LPF, Carvalho AC, Lorena AC (2009) Use of classification algorithms in noise detection and elimination. Int Conf Hybrid Artif Intell Syst 5572:417–424Google Scholar
  31. Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems, pp 1196–1204Google Scholar
  32. Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306Google Scholar
  33. Patrini G, Nielsen F, Nock R, Carioni M (2016) Loss factorization, weakly supervised learning and label noise robustness. In: International conference on machine learning, pp 708–717Google Scholar
  34. Quinlan JR (2014) C4.5: programs for machine learning. Elsevier, AmsterdamGoogle Scholar
  35. Raykar VC, Yu S, Zhao LH, Valadez GH, Florin C, Bogoni L, Moy L (2010) Learning from crowds. J Mach Learn Res 11(Apr):1297–1322MathSciNetGoogle Scholar
  36. Scholkopf B, Smola AJ (2001) Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT Press, CambridgeGoogle Scholar
  37. Scott C, Blanchard G, Handy G (2013) Classification with asymmetric label noise: consistency and maximal denoising. Conf Learn Theory 30:489–511zbMATHGoogle Scholar
  38. Sheng VS, Provost F, Ipeirotis PG (2008) Get another label? Improving data quality and data mining using multiple, noisy labelers. In: International conference on knowledge discovery and data mining, pp 614–622Google Scholar
  39. Stempfel G, Ralaivola L (2009) Learning SVMs from sloppily labeled data. In: International conference on artificial neural networks, pp 884–893Google Scholar
  40. Sun JW, Zhao Wang CI, Chen SF (2007) Identifying and correcting mislabeled training instances. Future Gener Commun Netw 1:244–250Google Scholar
  41. Thongkam J, Xu G, Zhang Y, Huang F (2008) Support vector machine for outlier detection in breast cancer survivability prediction. Asia-Pac Web Conf 4977:99–109Google Scholar
  42. Valiant LG (1984) A theory of the learnable. ACM Commun 27(11):1134–1142zbMATHGoogle Scholar
  43. Vapnik V (2013) The nature of statistical learning theory. Springer, New YorkzbMATHGoogle Scholar
  44. Vaughan JW (2018) Making better use of the crowd: how crowdsourcing can advance machine learning research. J Mach Learn Res 18(1):7026–7071MathSciNetzbMATHGoogle Scholar
  45. Whitehill J, Wu Tf, Bergsma J, Movellan JR, Ruvolo PL (2009) Whose vote should count more: optimal integration of labels from labelers of unknown expertise. In: Advances in neural information processing systems, pp 2035–2043Google Scholar
  46. Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421MathSciNetzbMATHGoogle Scholar
  47. Wilson DR, Martinez TR (1997) Instance pruning techniques. Int Conf Mach Learn 97:403–411Google Scholar
  48. Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286zbMATHGoogle Scholar
  49. Xiao H, Biggio B, Nelson B, Xiao H, Eckert C, Roli F (2015) Support vector machines under adversarial label contamination. Neurocomputing 160:53–62Google Scholar
  50. Yang T, Mahdavi M, Jin R, Zhang L, Zhou Y (2012) Multiple kernel learning from noisy labels by stochastic programming. In: International conference on machine learning, pp 1–8Google Scholar
  51. Zhou D, Basu S, Mao Y, Platt JC (2012) Learning from the wisdom of crowds by minimax entropy. In: Advances in neural information processing systems, pp 2195–2203Google Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.School of Computing ScienceSimon Fraser UniversityBurnabyCanada
  2. 2.The Faculty of Engineering and Information Technologies, The UBTECH Sydney Artificial Intelligence Centre and the School of Information TechnologiesThe University of SydneyDarlingtonAustralia

Personalised recommendations