Emerging topics and challenges of learning from noisy data in nonstandard classification: a survey beyond binary class noise

  • Ronaldo C. Prati
  • Julián Luengo
  • Francisco Herrera
Regular Paper


The problem of class noisy instances is omnipresent in different classification problems. However, most of research focuses on noise handling in binary classification problems and adaptations to multiclass learning. This paper aims to contextualize noise labels in the context of non-binary classification problems, including multiclass, multilabel, multitask, multi-instance ordinal and data stream classification. Practical considerations for analyzing noise under these classification problems, as well as trends, open-ended problems and future research directions are analyzed. We believe this paper could help expand research on class noise handling and help practitioners to better identify the particular aspects of noise in challenging classification scenarios.


Class noise Multiclass Multilabel Multitask Multi-instance Ordinal classification Data streams 



This work have been partially supported by the São Paulo State (Brazil) research council FAPESP under project 2015/20606-6, the Spanish Ministry of Science and Technology under project TIN2014-57251-P and the Andalusian Research Plan under project P12-TIC-2958.


  1. 1.
    Abellán J, Masegosa AR (2010) Bagging decision trees on data sets with classification noise. In: International symposium on foundations of information and knowledge systems. Springer, pp 248–265Google Scholar
  2. 2.
    Amores J (2013) Multiple instance classification: review, taxonomy and comparative study. Artif Intell 201:81–105MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Angluin D, Laird P (1988) Learning from noisy examples. Mach Learn 2(4):343–370Google Scholar
  4. 4.
    Baranauskas JA (2015) The number of classes as a source for instability of decision tree algorithms in high dimensional datasets. Artif Intell Rev 43(2):301–310CrossRefGoogle Scholar
  5. 5.
    Bartlett PL, Jordan MI, McAuliffe JD (2006) Convexity, classification, and risk bounds. J Am Stat Assoc 101(473):138–156MathSciNetzbMATHCrossRefGoogle Scholar
  6. 6.
    Beigman E, Klebanov BB (2009) Learning with annotation noise. In: Proceedings of the joint conference of the 47th annual meeting of the ACL and the 4th international joint conference on natural language processing of the AFNLP: volume 1–volume 1, ACL ’09, pp 280–287Google Scholar
  7. 7.
    Ben-David A, Sterling L, Tran T (2009) Adding monotonicity to learning algorithms may impair their accuracy. Expert Syst Appl 36(3):6627–6634CrossRefGoogle Scholar
  8. 8.
    Bi Y, Jeske DR (2010) The efficiency of logistic regression compared to normal discriminant analysis under class-conditional classification noise. J Multivar Anal 101(7):1622–1637MathSciNetzbMATHCrossRefGoogle Scholar
  9. 9.
    Bouchachia A (2011) Fuzzy classification in dynamic environments. Soft Comput 15(5):1009–1022CrossRefGoogle Scholar
  10. 10.
    Brefeld U, Scheffer T (2004) Co-Em support vector learning. In: International conference on machine learning (ICML), p 16Google Scholar
  11. 11.
    Breve FA, Zhao L, Quiles MG (2015) Particle competition and cooperation for semi-supervised learning with label noise. Neurocomputing 160:63–72CrossRefGoogle Scholar
  12. 12.
    Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167zbMATHGoogle Scholar
  13. 13.
    Caruana R (1997) Multitask learning. Mach Learn 28(1):41–75MathSciNetCrossRefGoogle Scholar
  14. 14.
    Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv 41(3):15CrossRefGoogle Scholar
  15. 15.
    Chapelle O, Shivaswamy P, Vadrevu S, Weinberger K, Zhang Y, Tseng B (2010) Multi-task learning for boosting with application to web search ranking. In: ACM SIGKDD international conference on knowledge discovery and data mining (KDD). ACM, pp 1189–1198Google Scholar
  16. 16.
    Charte F, Rivera AJ, del Jesús MJ, Herrera F (2015) Addressing imbalance in multilabel classification: measures and random resampling algorithms. Neurocomputing 163:3–16CrossRefGoogle Scholar
  17. 17.
    Chen K, Kämäräinen J-K (2016) Learning with ambiguous label distribution for apparent age estimation. In: Asian conference on computer vision. Springer, pp 330–343Google Scholar
  18. 18.
    Chen P-Y, Chen C-C, Yang C-H, Chang S-M, Lee K-J (2017) milr: Multiple-instance logistic regression with lasso penalty. R J 9(1):446–457Google Scholar
  19. 19.
    Cheng W, Hüllermeier E, Dembczynski KJ (2010) Bayes optimal multilabel classification via probabilistic classifier chains. In: International conference on machine learning (ICML), pp 279–286Google Scholar
  20. 20.
    Cheplygina V, Tax DM, Loog M (2015) Multiple instance learning with bag dissimilarities. Pattern Recognit 48(1):264–275CrossRefGoogle Scholar
  21. 21.
    Chevaleyre Y, Zucker J-D (2000) Noise-tolerant rule induction from multi-instance data. In: ICML 2000, workshop on attribute-value and relational learningGoogle Scholar
  22. 22.
    Daniels HA, Velikova MV (2006) Derivation of monotone decision models from noisy data. IEEE Trans Syst Man Cybern C 36(5):705–710CrossRefGoogle Scholar
  23. 23.
    de Faria ER, de Leon Ferreira ACP, Gama J et al (2016) Minas: multiclass learning algorithm for novelty detection in data streams. Data Min Knowl Discov 30(3):640–680MathSciNetCrossRefGoogle Scholar
  24. 24.
    Dembczyński K, Waegeman W, Cheng W, Hüllermeier E (2012) On label dependence and loss minimization in multi-label classification. Mach Learn 88(1–2):5–45MathSciNetzbMATHCrossRefGoogle Scholar
  25. 25.
    Dietterich TG, Bakiri G (1995) Solving multiclass learning problems via error-correcting output codes. J Artif Intell Res 2:263–286zbMATHGoogle Scholar
  26. 26.
    Ditzler G, Roveri M, Alippi C, Polikar R (2015) Learning in nonstationary environments: a survey. IEEE Comput Intell Mag 10(4):12–25CrossRefGoogle Scholar
  27. 27.
    Du J, Cai Z (2015) Modelling class noise with symmetric and asymmetric distributions. In: AAAI conference on artificial intelligence (AAAI), pp 2589–2595Google Scholar
  28. 28.
    Evgeniou T, Micchelli CA, Pontil M (2005) Learning multiple tasks with kernel methods. J Mach Learn Res 6:615–637MathSciNetzbMATHGoogle Scholar
  29. 29.
    Feelders A (2010) Monotone relabeling in ordinal classification. In: IEEE international conference on data mining (ICDM). IEEE, pp 803–808Google Scholar
  30. 30.
    Frénay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRefGoogle Scholar
  31. 31.
    Friedman JH (1989) Regularized discriminant analysis. J Am Stat Assoc 84(405):165–175MathSciNetCrossRefGoogle Scholar
  32. 32.
    Gaba A, Winkler RL (1992) Implications of errors in survey data: a Bayesian model. Manag Sci 38(7):913–925zbMATHCrossRefGoogle Scholar
  33. 33.
    Gaber MM, Gama J, Krishnaswamy S, Gomes JB, Stahl F (2014) Data stream mining in ubiquitous environments: state-of-the-art and current directions. Wiley Interdiscip Rev Data Min Knowl Discov 4(2):116–138CrossRefGoogle Scholar
  34. 34.
    Gaber MM, Zaslavsky A, Krishnaswamy S (2005) Mining data streams: a review. ACM Sigmod Record 34(2):18–26zbMATHCrossRefGoogle Scholar
  35. 35.
    Galimberti G, Soffritti G, Maso MD et al (2012) Classification trees for ordinal responses in r: the rpartscore package. J Stat Softw 47(10):1CrossRefGoogle Scholar
  36. 36.
    Gama J, Zliobaite I, Bifet A, Pechenizkiy M, Bouchachia A (2014) A survey on concept drift adaptation. ACM Comput Surv 46(4):44:1–44:37zbMATHCrossRefGoogle Scholar
  37. 37.
    Gamberger D, Boskovic R, Lavrac N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: International conference on machine learning (ICML). Morgan Kaufmann Publishers, pp 143–151Google Scholar
  38. 38.
    Gamberger D, Lavrač N, Džeroski S (1996) Noise elimination in inductive concept learning: a case study in medical diagnosis. In: International workshop on algorithmic learning theory (ALT). Springer, pp 199–212Google Scholar
  39. 39.
    Gao B-B, Xing C, Xie C-W, Wu J, Geng X (2017) Deep label distribution learning with label ambiguity. IEEE Trans Image Process 26(6):2825–2838MathSciNetCrossRefGoogle Scholar
  40. 40.
    Gao J, Fan W, Han J (2007) On appropriate assumptions to mine data streams: analysis and practice. In: IEEE international conference on data mining (ICDM). IEEE, pp 143–152Google Scholar
  41. 41.
    García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, BerlinCrossRefGoogle Scholar
  42. 42.
    Garofalakis M, Gehrke J, Rastogi R (2016) Data stream management: processing high-speed data streams. Springer, BerlinCrossRefGoogle Scholar
  43. 43.
    Geng X (2016) Label distribution learning. IEEE Trans Knowl Data Eng 28(7):1734–1748CrossRefGoogle Scholar
  44. 44.
    Ghosh A, Manwani N, Sastry P (2015) Making risk minimization tolerant to label noise. Neurocomputing 160:93–107CrossRefGoogle Scholar
  45. 45.
    Gibaja E, Ventura S (2015) A tutorial on multilabel learning. ACM Comput Surv 47(3):52CrossRefGoogle Scholar
  46. 46.
    Gomes JB, Gaber MM, Sousa PA, Menasalvas E (2014) Mining recurring concepts in a dynamic feature space. IEEE Trans Neural Netw Learn Syst 25(1):95–110CrossRefGoogle Scholar
  47. 47.
    Gutiér rez PA, García S (2016) Current prospects on ordinal and monotonic classification. Prog AI 5(3):171–179Google Scholar
  48. 48.
    Gutiérrez PA, Perez-Ortiz M, Sanchez-Monedero J, Fernández-Navarro F, Hervas-Martinez C (2016) Ordinal regression methods: survey and experimental study. IEEE Trans Knowl Data Eng 28(1):127–146CrossRefGoogle Scholar
  49. 49.
    He Z, Li X, Zhang Z, Wu F, Geng X, Zhang Y, Yang M-H, Zhuang Y (2017) Data-dependent label distribution learning for age estimation. IEEE Trans Image Process 26(8):3846–3858MathSciNetCrossRefGoogle Scholar
  50. 50.
    Hernández-González J, Inza I, Lozano JA (2016) Weak supervision and other non-standard classification problems: a taxonomy. Pattern Recognit Lett 69:49–55CrossRefGoogle Scholar
  51. 51.
    Herrera F, Charte F, Rivera AJ, del Jesus MJ (2016) Multilabel classification: problem analysis, metrics and techniques. Springer, BerlinCrossRefGoogle Scholar
  52. 52.
    Herrera F, Ventura S, Bello R, Cornelis C, Zafra A, Sánchez-Tarragó D, Vluymans S (2016) Multiple instance learning: foundations and algorithms. Springer, BerlinCrossRefGoogle Scholar
  53. 53.
    Hornung R (2017) Ordinal forests. Technical report 212. University of Munich, Department of StatisticsGoogle Scholar
  54. 54.
    Hu Q, Che X, Zhang L, Zhang D, Guo M, Yu D (2012) Rank entropy-based decision trees for monotonic classification. IEEE Trans Knowl Data Eng 24(11):2052–2064CrossRefGoogle Scholar
  55. 55.
    Ipeirotis PG, Provost F, Sheng VS, Wang J (2014) Repeated labeling using multiple noisy labelers. Data Min Knowl Discov 28(2):402–441MathSciNetzbMATHCrossRefGoogle Scholar
  56. 56.
    Jabbari S, Holte RC, Zilles S (2012) Pac-learning with general class noise models. In: Annual conference on artificial intelligence. Springer, pp 73–84Google Scholar
  57. 57.
    Josse J, Wager S (2016) Bootstrap-based regularization for low-rank matrix estimation. J Mach Learn Res 17(1):4227–4255MathSciNetzbMATHGoogle Scholar
  58. 58.
    Khardon R, Wachman G (2007) Noise tolerant variants of the perceptron algorithm. J Mach Learn Res 8:227–248zbMATHGoogle Scholar
  59. 59.
    Krawczyk B, Woźniak M (2015) One-class classifiers with incremental learning and forgetting for data streams with concept drift. Soft Comput 19(12):3387–3400CrossRefGoogle Scholar
  60. 60.
    Kubat M (2015) Similarities: nearest neighbor classifiers. In: An introduction to machine learning. Springer, pp 43–64Google Scholar
  61. 61.
    Lachenbruch PA (1979) Note on initial misclassification effects on the quadratic discriminant function. Technometrics 21(1):129–132MathSciNetzbMATHCrossRefGoogle Scholar
  62. 62.
    Lawrence ND, Schölkopf B (2001) Estimating a kernel fisher discriminant in the presence of label noise. In: International conference on machine learning (ICML), pp 306–313Google Scholar
  63. 63.
    Leisch F, Weingessel A, Hornik K (1998) On the generation of correlated artificial binary data. SFB Adaptive information systems and modelling in economics and management science, 13. Working paper series, WU Vienna University of Economics and Business, ViennaGoogle Scholar
  64. 64.
    Leung T, Song Y, Zhang J (2011) Handling label noise in video classification via multiple instance learning. In: IEEE international conference on computer vision (ICCV). IEEE, pp 2056–2063Google Scholar
  65. 65.
    Li S-T, Chen C-C (2015) A regularized monotonic fuzzy support vector machine model for data mining with prior knowledge. IEEE Trans Fuzzy Syst 23(5):1713–1727CrossRefGoogle Scholar
  66. 66.
    Li W, Vasconcelos N (2015) Multiple instance learning for soft bags via top instances. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4277–4285Google Scholar
  67. 67.
    Li Y, Tax DMJ, Duin RPW, Loog M (2013) Multiple-instance learning as a classifier combining problem. Pattern Recognit 46(3):865–874. CrossRefGoogle Scholar
  68. 68.
    Lin H-T, Li L (2012) Reduction from cost-sensitive ordinal ranking to weighted binary classification. Neural Comput 24(5):1329–1367zbMATHCrossRefGoogle Scholar
  69. 69.
    Little RJ, Rubin DB (2002) Statistical analysis with missing data. Wiley, New YorkzbMATHCrossRefGoogle Scholar
  70. 70.
    Liu B (2015) Sentiment analysis: mining opinions, sentiments, and emotions. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  71. 71.
    Lorena AC, Garcia L PF, de Carvalho ACPLF (2015) Adapting noise filters for ranking. In: Brazilian conference on intelligent systems (BRACIS), pp 299–304Google Scholar
  72. 72.
    Luengo J, Shim S-O, Alshomrani S, Altalhi A, Herrera F (2018) CNC-NOS: class noise cleaning by ensemble filtering and noise scoring. Knowl Based Syst 140:27–49CrossRefGoogle Scholar
  73. 73.
    Ma L, Destercke S, Wang Y (2016) Online active learning of decision trees with evidential data. Pattern Recognit 52:33–45CrossRefGoogle Scholar
  74. 74.
    Maloof MA, Michalski RS (2000) Selecting examples for partial memory learning. Mach Learn 41(1):27–52zbMATHCrossRefGoogle Scholar
  75. 75.
    Manwani N, Sastry P (2013) Noise tolerance under risk minimization. IEEE Trans Cybern 43(3):1146–1151CrossRefGoogle Scholar
  76. 76.
    Maron O (1998) Learning from ambiguity. PhD thesis, Massachusetts Institute of TechnologyGoogle Scholar
  77. 77.
    Maron O, Lozano-Pérez T (1998) A framework for multiple-instance learning. Adv Neural Inf Process Syst 10:570–576Google Scholar
  78. 78.
    Masud M, Gao J, Khan L, Han J, Thuraisingham BM (2011) Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Trans Knowl Data Eng 23(6):859–874CrossRefGoogle Scholar
  79. 79.
    Masud MM, Chen Q, Gao J, Khan L, Han J, Thuraisingham B (2010) Classification and novel class detection of data streams in a dynamic feature space. In: European conference on machine learning and principles and practice of knowledge discovery (ECML/PKDD). Springer, pp 337–352Google Scholar
  80. 80.
    Masud MM, Chen Q, Khan L, Aggarwal CC, Gao J, Han J, Srivastava A, Oza NC (2013) Classification and adaptive novel class detection of feature-evolving data streams. IEEE Trans Knowl Data Eng 25(7):1484–1497CrossRefGoogle Scholar
  81. 81.
    McLachlan G (1972) Asymptotic results for discriminant analysis when the initial samples are misclassified. Technometrics 14(2):415–422zbMATHCrossRefGoogle Scholar
  82. 82.
    Miao Q, Cao Y, Xia G, Gong M, Liu J, Song J (2016) Rboost: label noise-robust boosting algorithm based on a nonconvex loss function and the numerically stable base learners. IEEE Trans Neural Netw Learn Syst 27(11):2216–2228MathSciNetCrossRefGoogle Scholar
  83. 83.
    Michalek JE, Tripathi RC (1980) The effect of errors in diagnosis and measurement on the estimation of the probability of an event. J Am Stat Assoc 75(371):713–721MathSciNetzbMATHCrossRefGoogle Scholar
  84. 84.
    Milstein I, David AB, Potharst R (2013) Generating noisy monotone ordinal datasets. Artif Intell Rev 3(1):p30Google Scholar
  85. 85.
    Minku LL, White AP, Yao X (2010) The impact of diversity on online ensemble learning in the presence of concept drift. IEEE Trans Knowl Data Eng 22(5):730–742CrossRefGoogle Scholar
  86. 86.
    Miranda ALB, Garcia LPF, Carvalho ACPLF, Lorena AC (2009) Use of classification algorithms in noise detection and elimination. In: Corchado E, Wu X, Oja E, Herrero Á, Baruque B (eds) Proceedings of the hybrid artificial intelligence systems: 4th international conference, HAIS 2009, Salamanca, Spain. Springer, Berlin, pp 424–471Google Scholar
  87. 87.
    Montañes E, Senge R, Barranquero J, Quevedo JR, del Coz JJ, Hüllermeier E (2014) Dependent binary relevance models for multi-label classification. Pattern Recognit 47(3):1494–1508CrossRefGoogle Scholar
  88. 88.
    Napierała K, Stefanowski J, Wilk S (2010) Learning from imbalanced data in presence of noisy and borderline examples. In: International conference on rough sets and current trends in computing. Springer, pp 158–167Google Scholar
  89. 89.
    Natarajan N, Dhillon IS, Ravikumar PK, Tewari A (2013) Learning with noisy labels. In: Advances in neural information processing systems (NIPS), pp 1196–1204Google Scholar
  90. 90.
    Nettleton DF, Orriols-Puig A, Fornells A (2010) A study of the effect of different types of noise on the precision of supervised learning techniques. Artif Intell Rev 33(4):275–306CrossRefGoogle Scholar
  91. 91.
    Nicholson B, Sheng VS, Zhang J (2016) Label noise correction and application in crowdsourcing. Expert Syst Appl 66:149–162CrossRefGoogle Scholar
  92. 92.
    Nowak S, Rüger S (2010) How reliable are annotations via crowdsourcing: a study about inter-annotator agreement for multi-label image annotation. In: International conference on multimedia information retrieval (ICMR). ACM, pp 557–566Google Scholar
  93. 93.
    Okamoto S, Yugami N (2003) Effects of domain characteristics on instance-based learning algorithms. Theor Comput Sci 298(1):207–233MathSciNetzbMATHCrossRefGoogle Scholar
  94. 94.
    Ozuysal M, Calonder M, Lepetit V, Fua P (2010) Fast keypoint recognition using random ferns. IEEE Trans Pattern Anal Mach Intell 32(3):448–461CrossRefGoogle Scholar
  95. 95.
    Pan SJ, Yang Q (2010) A survey on transfer learning. IEEE Trans Knowl Data Eng 22(10):1345–1359CrossRefGoogle Scholar
  96. 96.
    Pathak D, Shelhamer E, Long J, Darrell T (2015) Fully convolutional multi-class multiple instance learning. In: International conference on learning representations (ICLR) workshop. arXiv:1412.7144
  97. 97.
    Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, BurlingtonzbMATHGoogle Scholar
  98. 98.
    Pérez CJ, González-Torre FJG, Martín J, Ruiz M, Rojano C (2007) Misclassified multinomial data: a Bayesian approach. RACSAM 101(1):71–80MathSciNetzbMATHGoogle Scholar
  99. 99.
    Perez PS, Nozawa SR, Macedo AA, Baranauskas JA (2016) Windowing improvements towards more comprehensible models. Knowl Based Syst 92:9–22CrossRefGoogle Scholar
  100. 100.
    Prati RC, Batista GEAPA, Silva DF (2015) Class imbalance revisited: a new experimental setup to assess the performance of treatment methods. Knowl Inf Syst 45(1):247–270CrossRefGoogle Scholar
  101. 101.
    Qi Z, Yang M, Zhang ZM, Zhang Z (2012) Mining noisy tagging from multi-label space. In: ACM international conference on information and knowledge management (CIKM). ACM, pp 1925–1929Google Scholar
  102. 102.
    Qu W, Zhang Y, Zhu J, Qiu Q (2009) Mining multi-label concept-drifting data streams using dynamic classifier ensemble. In: Asian conference on machine learning (ACML). Springer, pp 308–321Google Scholar
  103. 103.
    Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
  104. 104.
    Quinlan JR (1993) C4. 5: programs for machine learning. Elsevier, New YorkGoogle Scholar
  105. 105.
    Rademaker M, De Baets B, De Meyer H (2012) Optimal monotone relabelling of partially non-monotone ordinal data. Optim Methods Softw 27(1):17–31MathSciNetzbMATHCrossRefGoogle Scholar
  106. 106.
    Rakitsch B, Lippert C, Borgwardt K, Stegle O (2013) It is all in the noise: efficient multi-task Gaussian process inference with structured residuals. In: Advances in neural information processing systems (NIPS), pp 1466–1474Google Scholar
  107. 107.
    Ralaivola L, Denis F, Magnan CN (2006) CN = CPCN. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 721–728Google Scholar
  108. 108.
    Read J, Pfahringer B, Holmes G, Frank E (2011) Classifier chains for multi-label classification. Mach Learn 85(3):333–359MathSciNetCrossRefGoogle Scholar
  109. 109.
    Rider AK, Johnson RA, Davis DA, Hoens TR, Chawla NV (2013) Classifier evaluation with missing negative class labels. In: International symposium on intelligent data analysis. Springer, pp 380–391Google Scholar
  110. 110.
    Rolnick D, Veit A, Belongie S, Shavit N (2017) Deep learning is robust to massive label noise. arXiv preprint arXiv:1705.10694
  111. 111.
    Sabzevari M, Martínez-Muñoz G, Suárez A (2018) A two-stage ensemble method for the detection of class-label noise. Neurocomputing 275:2374–2383CrossRefGoogle Scholar
  112. 112.
    Sáez JA, Galar M, Luengo J, Herrera F (2014) Analyzing the presence of noise in multi-class problems: alleviating its influence with the one-vs-one decomposition. Knowl Inf Syst 38(1):179–206CrossRefGoogle Scholar
  113. 113.
    Sáez JA, Galar M, Luengo J, Herrera F (2016) INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inform Fusion 27:19–32CrossRefGoogle Scholar
  114. 114.
    Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) Smote-ipf: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203CrossRefGoogle Scholar
  115. 115.
    Sánchez JS, Pla F, Ferri FJ (1997) Prototype selection for the nearest neighbour rule through proximity graphs. Pattern Recognit Lett 18(6):507–513CrossRefGoogle Scholar
  116. 116.
    Scott C (2015) A rate of convergence for mixture proportion estimation, with application to learning from noisy labels. In: International conference on artificial intelligence and statistics (AISTATS), pp 838–846Google Scholar
  117. 117.
    Sluban B, Gamberger D, Lavrač N (2014) Ensemble-based noise detection: noise ranking and visual performance evaluation. Data Min Knowl Discov 28(2):265–303MathSciNetzbMATHCrossRefGoogle Scholar
  118. 118.
    Street WN, Kim Y (2001) A streaming ensemble algorithm (sea) for large-scale classification. In: Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 377–382Google Scholar
  119. 119.
    Sulis E, Farías DIH, Rosso P, Patti V, Ruffo G (2016) Figurative messages and affect in twitter: differences between #irony, #sarcasm and #not. Knowl Based Syst 108:132–143CrossRefGoogle Scholar
  120. 120.
    Sun B, Chen S, Wang J, Chen H (2016) A robust multi-class AdaBoost algorithm for mislabeled noisy data. Knowl Based Syst 102:87–102CrossRefGoogle Scholar
  121. 121.
    Sun S (2013) A survey of multi-view machine learning. Neural Comput Appl 23(7–8):2031–2038CrossRefGoogle Scholar
  122. 122.
    Sun Y, Tang K, Minku LL, Wang S, Yao X (2016) Online ensemble learning of data streams with gradually evolved classes. IEEE Trans Knowl Data Eng 28(6):1532–1545CrossRefGoogle Scholar
  123. 123.
    Tan M, Shi Q, van den Hengel A, Shen C, Gao J, Hu F, Zhang Z (2015) Learning graph structure for multi-label image classification via clique generation. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 4100–4109Google Scholar
  124. 124.
    Teng C-M (1999) Correcting noisy data. In: Proceedings of the sixteenth international conference on machine learning. Morgan Kaufmann Publishers, San Francisco, CA, USA, pp 239–248Google Scholar
  125. 125.
    Tu H-H, Lin H-T (2010) One-sided support vector regression for multiclass cost-sensitive classification. In: International conference on machine learning (ICML), pp 1095–1102Google Scholar
  126. 126.
    Van Hulse J, Khoshgoftaar T (2009) Knowledge discovery from imbalanced and noisy data. Data Knowl Eng 68(12):1513–1542CrossRefGoogle Scholar
  127. 127.
    Vens C, Struyf J, Schietgat L, Džeroski S, Blockeel H (2008) Decision trees for hierarchical multi-label classification. Mach Learn 73(2):185–214CrossRefGoogle Scholar
  128. 128.
    Wang S, Yao X (2012) Multiclass imbalance problems: analysis and potential solutions. IEEE Trans Syst Man Cybern B 42(4):1119–1130CrossRefGoogle Scholar
  129. 129.
    Wei Y, Zheng Y, Yang Q (2016) Transfer knowledge between cities. In: ACM SIGKDD conference on knowledge discovery and data mining (KDD). ACM, pp 1905–1914Google Scholar
  130. 130.
    Xiao H, Xiao H, Eckert C (2012) Adversarial label flips attack on support vector machines. In: Proceedings of the 20th european conference on artificial intelligence. IOS Press, pp 870–875Google Scholar
  131. 131.
    Xiao T, Xia T, Yang Y, Huang C, Wang X (2015) Learning from massive noisy labeled data for image classification. In: IEEE conference on computer vision and pattern recognition (CVPR), pp 2691–2699Google Scholar
  132. 132.
    Xing C, Geng X, Xue H (2016) Logistic boosting regression for label distribution learning, In: ‘Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4489–4497Google Scholar
  133. 133.
    Xu K, Liao SS, Li J, Song Y (2011) Mining comparative opinions from customer reviews for competitive intelligence. Decis Support Syst 50(4):743–754CrossRefGoogle Scholar
  134. 134.
    Xu L, Wang Z, Shen Z, Wang Y, Chen E (2014) Learning low-rank label correlations for multi-label classification with missing labels. In: International conference on data mining (ICDM). IEEE, pp 1067–1072Google Scholar
  135. 135.
    Xu M, Zhou Z-H (2017) Incomplete label distribution learning. In: Proceedings of the 26th international joint conference on artificial intelligence. AAAI Press, pp 3175–3181Google Scholar
  136. 136.
    Xu X, Li B (2007) Multiple class multiple-instance learning and its application to image categorization. Int J Image Graph 7(03):427–444CrossRefGoogle Scholar
  137. 137.
    Yang C-Y, Wang J-J, Chou J-J, Lian F-L (2015) Confirming robustness of fuzzy support vector machine via \(\xi \)-\(\alpha \) bound. Neurocomputing 162:256–266CrossRefGoogle Scholar
  138. 138.
    Yogatama D, Mann G (2014) Efficient transfer learning method for automatic hyperparameter tuning, In: Artificial intelligence and statistics, pp 1077–1085Google Scholar
  139. 139.
    Yuan X-T, Liu X, Yan S (2012) Visual classification with multitask joint sparse representation. IEEE Trans Image Process 21(10):4349–4360MathSciNetzbMATHCrossRefGoogle Scholar
  140. 140.
    Zeng X, Martinez T (2008) Using decision trees and soft labeling to filter mislabeled data. J Intell Syst 17(4):331–354Google Scholar
  141. 141.
    Zhang C, Wu C, Blanzieri E, Zhou Y, Wang Y, Du W, Liang Y (2009) Methods for labeling error detection in microarrays based on the effect of data perturbation on the regression model. Bioinformatics 25(20):2708–2714CrossRefGoogle Scholar
  142. 142.
    Zhang P, Zhu X, Shi Y, Guo L, Wu X (2011) Robust ensemble learning for mining noisy data streams. Decis Support Syst 50(2):469–479CrossRefGoogle Scholar
  143. 143.
    Zhang W, Rekaya R, Bertrand K (2006) A method for predicting disease subtypes in presence of misclassification among training samples using gene expression: application to human breast cancer. Bioinformatics 22(3):317–325CrossRefGoogle Scholar
  144. 144.
    Zhang Z, Zhou J (2010) Transfer estimation of evolving class priors in data stream classification. Pattern Recognit 43(9):3151–3161zbMATHCrossRefGoogle Scholar
  145. 145.
    Zhou J, Liu J, Narayan VA, Ye J, Initiative ADN et al (2013) Modeling disease progression via multi-task learning. Neuroimage 78:233–248CrossRefGoogle Scholar
  146. 146.
    Zhou Z-H, Zhang M-L, Huang S-J, Li Y-F (2012) Multi-instance multi-label learning. Artif Intell 176(1):2291–2320MathSciNetzbMATHCrossRefGoogle Scholar
  147. 147.
    Zhu X, Wu X (2004a) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210zbMATHCrossRefGoogle Scholar
  148. 148.
    Zhu X, Wu X (2004b) Cost-guided class noise handling for effective cost-sensitive learning. In: IEEE international conference on data mining (ICDM), IEEE, pp 297–304Google Scholar
  149. 149.
    Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: International conference on machine learning (ICML), vol 3, pp 920–927Google Scholar
  150. 150.
    Zhu X, Wu X, Chen Q (2006) Bridging local and global data cleansing: Identifying class noise in large, distributed data datasets. Data Min Knowl Discov 12(2–3):275–308MathSciNetCrossRefGoogle Scholar
  151. 151.
    Zhu X, Wu X, Khoshgoftaar TM, Shi Y (2007) An empirical study of the noise impact on cost-sensitive learning. In: International joint conference on artificial intelligence (IJCAI), vol 7, pp 1168–1173Google Scholar
  152. 152.
    Zhu Y, Shasha D (2002) Statstream: statistical monitoring of thousands of data streams in real time. In: International conference on very large data bases (VLDB), VLDB Endowment, pp 358–369Google Scholar
  153. 153.
    Žliobaitė I, Bifet A, Pfahringer B, Holmes G (2014) Active learning with drifting streaming data. IEEE Trans Neural Netw Learn Syst 25(1):27–39CrossRefGoogle Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Center of Mathematics, Computer Science and Cognition (CMCC)Federal University of ABC (UFABC)Santo AndréBrazil
  2. 2.Department of Computer Science and A.I. (DECSAI)University of Granada (UGR)GranadaSpain

Personalised recommendations