Data Mining and Knowledge Discovery

, Volume 30, Issue 5, pp 1192–1216 | Cite as

Ensembles of label noise filters: a ranking approach

  • Luís P. F. GarciaEmail author
  • Ana C. Lorena
  • Stan Matwin
  • André C. P. L. F. de Carvalho


Label noise can be a major problem in classification tasks, since most machine learning algorithms rely on data labels in their inductive process. Thereupon, various techniques for label noise identification have been investigated in the literature. The bias of each technique defines how suitable it is for each dataset. Besides, while some techniques identify a large number of examples as noisy and have a high false positive rate, others are very restrictive and therefore not able to identify all noisy examples. This paper investigates how label noise detection can be improved by using an ensemble of noise filtering techniques. These filters, individual and ensembles, are experimentally compared. Another concern in this paper is the computational cost of ensembles, once, for a particular dataset, an individual technique can have the same predictive performance as an ensemble. In this case the individual technique should be preferred. To deal with this situation, this study also proposes the use of meta-learning to recommend, for a new dataset, the best filter. An extensive experimental evaluation of the use of individual filters, ensemble filters and meta-learning was performed using public datasets with imputed label noise. The results show that ensembles of noise filters can improve noise filtering performance and that a recommendation system based on meta-learning can successfully recommend the best filtering technique for new datasets. A case study using a real dataset from the ecological niche modeling domain is also presented and evaluated, with the results validated by an expert.


Label noise Noise filters Ensemble filters Noise ranking Recommendation system 



The authors would like to thank FAPESP (processes 2011/14602-7 and 2012/22608-8), CNPq and CAPES for their financial support. The third author’s research was supported by the Natural Sciences and Engineering Research Council of Canada, by the CALDO Programme, and by the National Research Centre of Poland (NCN) Grant DEC-2013/09/B/ST6/01549. We are also very grateful to Dr. Augusto Hashimoto de Mendonça which works at Center for Water Resources & Applied Ecology from Environmental Engineering Sciences of the School of Engineering of São Carlos at University of São Paulo and Professor Dr. Giselda Durigan from Forestry Institute of the State of São Paulo for their evaluation of the given list of potentially noisy examples of non native specie H. coronarium dataset.


  1. Bensusan H, Giraud-Carrier CG, Kennedy CJ (2000) A higher-order approach to meta-learning. In: 10th international conference on inductive logic programming, pp 33–42Google Scholar
  2. Brazdil P, Giraud-Carrier CG, Soares C, Vilalta R (2009) Metalearning—applications to data mining. Cognitive technologies. Springer, BerlinzbMATHGoogle Scholar
  3. Breiman L (2001) Random forests. Mach Learn 45(1):5–32MathSciNetCrossRefzbMATHGoogle Scholar
  4. Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167zbMATHGoogle Scholar
  5. Brown G (2010) Ensemble learning. Encyclopedia of machine learning. Springer, Berlin, pp 312–320Google Scholar
  6. Demšar J (2006) Statistical comparisons of classifiers over multiple data sets. J Mach Learn Res 7:1–30MathSciNetzbMATHGoogle Scholar
  7. Everitt BS, Landau S, Leese M (2009) Cluster analysis. Wiley, New YorkzbMATHGoogle Scholar
  8. Frenay B, Verleysen M (2014) Classification in the presence of label noise: a survey. IEEE Trans Neural Netw Learn Syst 25(5):845–869CrossRefGoogle Scholar
  9. Gamberger D, Lavrač N, Groselj C (1999) Experiments with noise filtering in a medical domain. In: 16th international conference on machine learning (ICML), pp 143–151Google Scholar
  10. Garcia LPF, Lorena AC, de Carvalho ACPLF (2012) A study on class noise detection and elimination. In: Brazilian symposium on neural networks (SBRN), pp 13–18Google Scholar
  11. Garcia LPF, de Carvalho ACPLF, Lorena AC (2015a) Effect of label noise in the complexity of classification problems. Neurocomputing 160:108–119CrossRefGoogle Scholar
  12. Garcia LPF, Sáez JA, Luengo J, Lorena AC, de Carvalho ACPLF, Herrera F (2015b) Using the one-vs-one decomposition to improve the performance of class noise filters via an aggregation strategy in multi-class classification problems. Knowledge-Based Syst 90:153–164CrossRefGoogle Scholar
  13. Garcia LPF, de Carvalho ACPLF, Lorena AC (2016) Noise detection in the meta-learning level. Neurocomputing 176:14–25CrossRefGoogle Scholar
  14. García S, Luengo J, Herrera F (2015) Data preprocessing in data mining. Springer, New YorkCrossRefGoogle Scholar
  15. Giraud-Carrier CG, Vilalta R, Brazdil P (2004) Introduction to the special issue on meta-learning. Machine Learning 54(3):187–193CrossRefGoogle Scholar
  16. Giraud-Carrier CG, Brazdil P, Soares C, Vilalta R (2009) Meta-learning. In: Wang J (ed) Encyclopedia of data warehousing and mining. IGI Global, Hershey, pp 1207–1215CrossRefGoogle Scholar
  17. Hall MA (1999) Correlation-based feature selection for machine learning. Ph.D. thesis, University of WaikatoGoogle Scholar
  18. Han J, Kamber M, Pei J (2012) Data preprocessing. Data mining. The Morgan Kaufmann series in data management systems, 3rd edn. Morgan Kaufmann, Boston, pp 83–124Google Scholar
  19. Ho TK, Basu M (2002) Complexity measures of supervised classification problems. IEEE Trans Pattern Anal Mach Intell 24(3):289–300CrossRefGoogle Scholar
  20. Kanda J, de Carvalho ACPLF, Hruschka ER, Soares C (2011) Selection of algorithms to solve traveling salesman problems using meta-learning. Int J Hybrid Intell Syst 8(3):117–128CrossRefGoogle Scholar
  21. Khoshgoftaar T, Rebours P (2004) Generating multiple noise elimination filters with the ensemble-partitioning filter. In: IEEE international conference on information reuse and integration, pp 369–375Google Scholar
  22. Lichman M (2013) UCI machine learning repository. University of California, Irvine, School of Information and Computer Sciences.
  23. Lorena AC, Garcia LPF, de Carvalho ACPLF (2015) Adapting noise filters for ranking. In: Brazilian conference on intelligent systems (BRACIS), pp 299–304Google Scholar
  24. Mantovani RG, Rossi ALD, Vanschoren J, Bischl B, de Carvalho ACPLF (2015) To tune or not to tune: recommending when to adjust SVM hyper-parameters via meta-learning. In: International joint conference on neural networks (IJCNN), pp 1–8Google Scholar
  25. Michie D, Spiegelhalter DJ, Taylor CC (1994) Machine learning, neural and statistical classification. Ellis Horwood, Upper Saddle RiverzbMATHGoogle Scholar
  26. Miranda PBC, Prudêncio RBC, de Carvalho ACPLF, Soares C (2014) A hybrid meta-learning architecture for multi-objective optimization of SVM parameters. Neurocomputing 143:27–43CrossRefGoogle Scholar
  27. Orriols-Puig A, Maciá N, Ho TK (2010) Documentation for the data complexity library in C++. Technical report. La Salle—Universitat Ramon LlullGoogle Scholar
  28. Peng Y, Flach PA, Soares C, Brazdil P (2002) Improved dataset characterisation for meta-learning. In: 5th international conference on discovery science, pp 141–152Google Scholar
  29. Pfahringer B, Bensusan H, Giraud-Carrier CG (2000) Meta-learning by landmarking various learning algorithms. In: 17th international conference on machine learning (ICML), pp 743–750Google Scholar
  30. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106Google Scholar
  31. Rossi ALD, de Carvalho ACPLF, Soares C (2012) Meta-learning for periodic algorithm selection in time-changing data. In: Brazilian symposium on neural networks (SBRN), pp 7–12Google Scholar
  32. Rossi ALD, de Carvalho ACPLF, Soares C, de Souza BF (2014) Metastream: a meta-learning based method for periodic algorithm selection in time-changing data. Neurocomputing 127:52–64CrossRefGoogle Scholar
  33. Sáez JA, Luengo J, Herrera F (2013) Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification. Pattern Recognit 46(1):355–364CrossRefGoogle Scholar
  34. Sáez JA, Luengo J, Stefanowski J, Herrera F (2015) SMOTE-IPF: addressing the noisy and borderline examples problem in imbalanced classification by a re-sampling method with filtering. Inf Sci 291:184–203CrossRefGoogle Scholar
  35. Sáez JA, Galar M, Luengo J, Herrera F (2016) INFFC: an iterative class noise filter based on the fusion of classifiers with noise sensitivity control. Inf Fusion 27:19–32CrossRefGoogle Scholar
  36. Schubert E, Wojdanowski R, Zimek A, Kriegel HP (2012) On evaluation of outlier rankings and outlier scores. In: 12th SIAM international conference on data mining (SDM), pp 1047–1058Google Scholar
  37. Sluban B, Gamberger D, Lavrač N (2010) Advances in class noise detection. In: 19th European conference on artificial intelligence (ECAI), pp 1105–1106Google Scholar
  38. Sluban B, Gamberger D, Lavrač N (2014) Ensemble-based noise detection: noise ranking and visual performance evaluation. Data Min Knowl Discov 28(2):265–303MathSciNetCrossRefzbMATHGoogle Scholar
  39. Smith-Miles KA (2008) Cross-disciplinary perspectives on meta-learning for algorithm selection. ACM Comput Surv 41(1):6:1–6:25CrossRefGoogle Scholar
  40. de Souza BF, de Carvalho ACPLF, Soares C (2010) Empirical evaluation of ranking prediction methods for gene expression data classification. In: 12th Ibero-American conference on artificial intelligence (IBERAMIA), pp 194–203Google Scholar
  41. Teng CM (1999) Correcting noisy data. In: 16th international conference on machine learning (ICML), pp 239–248Google Scholar
  42. Toledo RY, Mota YC, Martínez L (2015) Correcting noisy ratings in collaborative recommender systems. Knowledge-Based Syst 76:96–108CrossRefGoogle Scholar
  43. Tomek I (1976) An experiment with the edited nearest-neighbor rule. IEEE Trans Syst Man Cybern 6(6):448–452MathSciNetCrossRefzbMATHGoogle Scholar
  44. Vapnik VN (1998) Statistical learning theory. Wiley, New YorkzbMATHGoogle Scholar
  45. Verbaeten S, Assche AV (2003) Ensemble methods for noise elimination in classification problems. In: 4th international workshop multiple classifier systems, pp 317–325Google Scholar
  46. Wilson DL (1972) Asymtoptic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421CrossRefzbMATHGoogle Scholar
  47. Wu X, Zhu X (2008) Mining with noise knowledge: error-aware data mining. IEEE Trans Syst Man Cybern Part A 38(4):917–932CrossRefGoogle Scholar
  48. Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study. Artif Intell Rev 22(3):177–210MathSciNetCrossRefzbMATHGoogle Scholar
  49. Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: 20th international conference on machine learning (ICML), pp 920–927Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  • Luís P. F. Garcia
    • 1
    Email author
  • Ana C. Lorena
    • 2
  • Stan Matwin
    • 3
    • 4
  • André C. P. L. F. de Carvalho
    • 1
  1. 1.Instituto de Ciências Matemáticas e de ComputaçãoUniversidade de São PauloSão CarlosBrazil
  2. 2.Instituto de Ciência e TecnologiaUniversidade Federal de São PauloSão PauloBrazil
  3. 3.Institute for Big Data AnalyticsDalhousie UniversityHalifaxCanada
  4. 4.Institute of Computer SciencePolish Academy of SciencesWarsawPoland

Personalised recommendations