Skip to main content
Log in

Ensemble-based noise detection: noise ranking and visual performance evaluation

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

Noise filtering is most frequently used in data preprocessing to improve the accuracy of induced classifiers. The focus of this work is different: we aim at detecting noisy instances for improved data understanding, data cleaning and outlier identification. The paper is composed of three parts. The first part presents an ensemble-based noise ranking methodology for explicit noise and outlier identification, named Noise- Rank, which was successfully applied to a real-life medical problem as proven in domain expert evaluation. The second part is concerned with quantitative performance evaluation of noise detection algorithms on data with randomly injected noise. A methodology for visual performance evaluation of noise detection algorithms in the precision-recall space, named Viper, is presented and compared to standard evaluation practice. The third part presents the implementation of the NoiseRank and Viper methodologies in a web-based platform for composition and execution of data mining workflows. This implementation allows public accessibility of the developed approaches, repeatability and sharing of the presented experiments as well as the inclusion of web services enabling to incorporate new noise detection algorithms into the proposed noise detection and performance evaluation workflows.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Notes

  1. The later is referred to as polishing by Teng (1999)

  2. Note that the publicly accessible implementation of the NoiseRank methodology described in Sect. 5.2, enables the developer to easily incorporate his own algorithms into the noise ranking ensemble.

  3. Institute for Cardiovascular Prevention and Rehabilitation, Zagreb, Croatia.

  4. This fact has been also previously confirmed by machine learning approaches on the same CHD domain (Gamberger and Lavrač 2002).

  5. Alternatively, attribute noise could have been inserted, but we were motivated by medical domains where false diagnostics is the main concern.

  6. The original CHD dataset includes also 19 instances with missing values, which were removed in the experiments described in this section in order to be able to apply two different saturation filters.

  7. Similar figures are available upon request for 2 and 10 % noise levels.

  8. Bayes added to Ens2 on domains TTT, KRKP and NAKE to get Ens3, or NN added to Ens1 on the CHD domain to get Ens3.

  9. An increase or decrease of the \(\varepsilon \) value for few percents, does not dramatically change the selection of the best noise detection algorithm.

  10. Widgets are processing units used as components of a workflow, which—given some input data and/or parameters—perform a certain computation or visualization task.

  11. Note that all the noise detection algorithms described in Sect. 3.2 were used, except for PruneSF which is unable to cope with missing values.

  12. The implementation uses the Highcharts charting library, available at http://www.highcharts.com.

  13. Similar results as in Figs. 15 and 16 are observed on domains with 2 and 10 % noise level.

  14. Tabular results are available by request from the authors.

References

  • Aggarwal CC, Yu PS (2001) Outlier detection for high dimensional data. In: Proceedings of the 2001 ACM SIGMOD international conference on management of data, pp 37–46

  • Brodley CE, Friedl MA (1999) Identifying mislabeled training data. J Artif Intell Res 11:131–167

    MATH  Google Scholar 

  • Deanfield J, Shea M, Ribiero P, de Landsheere C, Wilson R, Horlock P, Selwyn A (1984) Transient st-segment depression as a marker of myocardial ischemia during daily life. Am J Cardiol 54(10):1195–1200

    Article  Google Scholar 

  • Demšar J, Zupan B, Leban G, Curk T (2004) Orange: From experimental machine learning to interactive data mining. In: Boulicaut JF, Esposito F, Giannotti F, Pedreschi D (eds) Knowledge discovery in databases: PKDD 2004, lecture notes in computer science. Springer, vol 3202, pp 537–539

  • Frank A, Asuncion A (2010) UCI machine learning repository. URL http://archive.ics.uci.edu/ml

  • Fürnkranz J (1997) Pruning algorithms for rule learning. Mach Learn 27:139–171

    Article  Google Scholar 

  • Gamberger D, Lavrač N (1997) Conditions for Occam’s razor applicability and noise elimination. In: Lecture notes in artificial intelligence: machine learning: ECML-97, vol 1224, pp 108–123

  • Gamberger D, Lavrač N (2002) Expert-guided subgroup discovery: methodology and application. J Artif Intell Res 17:501–527

    MATH  Google Scholar 

  • Gamberger D, Lavrač N, Grošelj C (1999) Experiments with noise filtering in a medical domain. In: Proceedings of 16th international conference on machine learning—ICML, Morgan Kaufmann, pp 143–151

  • Gamberger D, Lavrač N, Džeroski S (2000) Noise detection and elimination in data preprocessing: experiments in medical domains. Appl Artif Intell 14(2):205–223

    Article  Google Scholar 

  • Gamberger D, Lavrač N, Krstačić G (2003) Active subgroup mining: a case study in a coronary heart disease risk group detection. Artif Intell Med 28:27–57

    Article  Google Scholar 

  • Gelfand S, Ravishankar C, Delp E (1991) An iterative growing and pruning algorithm for classification tree design. IEEE Trans Pattern Anal Mach Intell 13:163–174

    Article  Google Scholar 

  • Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH (2009) The weka data mining software: an update. SIGKDD Explor Newsl 11(1):10–18

    Article  Google Scholar 

  • Hodge V, Austin J (2004) A survey of outlier detection methodologies. Artif Intell Rev 22(2):85–126

    Article  MATH  Google Scholar 

  • Khoshgoftaar TM, Rebours P (2004) Generating multiple noise elimination filters with the ensemble-partitioning filter. In: Proceedings of the 2004 IEEE International Conference on information reuse and integration. IEEE Systems, Man, and Cybernetics Society, pp 369–375.

  • Khoshgoftaar T, Seliya N, Gao K (2004) Rule-based noise detection for software measurement data. In: Proceedings of the 2004 IEEE international conference on information reuse and integration, 2004 (IRI 2004), pp 302–307

  • Khoshgoftaar TM, Zhong S, Joshi V (2005) Enhancing software quality estimation using ensemble-classifier based noise filtering. Intell Data Anal 9(1):3–27

    Google Scholar 

  • Khoshgoftaar TM, Joshi VH, Seliya N (2006) Detecting noisy instances with the ensemble filter: a study in software quality estimation. Int J Softw Eng Knowl Eng 16(1):53–76

    Article  Google Scholar 

  • Kranjc J, Podpečan V, Lavrač N (2012) Clowdflows: a cloud based scientific workflow platform. In: Flach P, Bie T, Cristianini N (eds) Machine learning and knowledge discovery in databases, lecture notes in computer science. Springer, Berlin, vol 7524, pp 816–819

  • Libralon GL, Carvalho ACPLF, Lorena AC (2009) Ensembles of pre-processing techniques for noise detection in gene expression data. In: Proceedings of the 15th international conference on advances in neuro-information processing—volume part I, ICONIP’08. Springer, Berlin, pp 486–493

  • Manning CD, Raghavan P, Schtze H (2008) Introduction to Information Retrieval. Cambridge University Press, New York

    Book  MATH  Google Scholar 

  • Maron D, Ridker P, Pearson A (1998) Risk factors and the prevention of coronary heart disease. In: Wayne A, Schlant R, Fuster V (eds) HURST’S: the Heart, pp 1175–1195

  • Mingers J (1989) An empirical comparison of pruning methods for decision tree induction. Mach Learn 4:227–243

    Article  Google Scholar 

  • Miranda A, Garcia L, Carvalho A, Lorena A (2009) Use of classification algorithms in noise detection and elimination. In: Hybrid artificial intelligence systems, lecture notes in computer science. Springer, Berlin, vol 5572, pp 417–424

  • Niblett T, Bratko I (1987) Learning decision rules in noisy domains. In: Bramer M (ed) Research and development in expert systems. Cambridge University Press, Cambridge

    Google Scholar 

  • Pollak S (2009) Text classification of articles on kenyan elections. In: Proceedings of the 4th language & technology conference: human language technologies as a challenge for computer science and linguistics, pp 229–233.

  • Pollak S, Coesemans R, Daelemans W, Lavrač N (2011) Detecting contrast patterns in newspaper articles by combining discourse analysis and text mining. Pragmatics 21(4):674–683

    Google Scholar 

  • Quinlan JR (1987) Simplifying decision trees. Int J Man-Mach Stud 27:221–234

    Article  Google Scholar 

  • Sluban B, Gamberger D, Lavrač N (2010) Advances in class noise detection. In: Coelho H, Studer R, Wooldridge M (eds) Proceedings of the 19th European conference on artificial intelligence (ECAI 2010), pp 1105–1106

  • Sluban B, Gamberger D, Lavrač N (2011) Performance analysis of class noise detection algorithms. In: Ågotnes T (ed) STAIRS 2010—proceedings of the fifth starting AI researchers’ symposium, pp 303–314

  • Teng CM (1999) Correcting noisy data. In: Proceedings of the sixteenth international conference on machine learning, pp 239–248

  • Van Hulse JD, Khoshgoftaar TM (2006) Class noise detection using frequent itemsets. Intell Data Anal 10(6):487–507

    Google Scholar 

  • Van Hulse JD, Khoshgoftaar TM, Huang H (2007) The pairwise attribute noise detection algorithm. Knowl Inf Syst 11(2):171–190

    Article  Google Scholar 

  • Verbaeten S (2002) Identifying mislabeled training examples in ilp classification problems. In: Proceedings of twelfth Belgian-Dutch conference on machine learning, pp 1–8

  • Verbaeten S, Van Assche A (2003) Ensemble methods for noise elimination in classification problems. In: Windeatt T, Roli F (eds) Multiple classifier systems, lecture notes in computer science. Springer, Berlin, vol 2709, pp 317–325

  • Wiswedel B, Berthold MR (2005) Fuzzy clustering in parallel universes with noise detection. In: Proceedings of the ICDM 2005 workshop on computational intelligence in data mining, pp 29–37

  • Yin H, Dong H, Li Y (2009) A cluster-based noise detection algorithm. International workshop on database technology and applications, pp 386–389

  • Zhong S, Tang W, Khoshgoftaar TM (2005) Boosted noise filters for identifying mislabeled data. Technical report, Department of computer science and engineering, Florida Atlantic University

  • Zhu X, Wu X (2004) Class noise vs. attribute noise: a quantitative study of their impacts. Artif Intell Rev 22:177–210

    Article  MATH  Google Scholar 

  • Zhu X, Wu X, Chen Q (2003) Eliminating class noise in large datasets. In: Proceedings of the international conference on machine learning, pp 920–927

Download references

Acknowledgments

We are grateful to Goran Krstačić from the Institute for Cardiovascular Diseases and Rehabilitation, Zagreb, Croatia, for his evaluation of the given list of potentially noisy patient records in the Coronary Heart Disease medical domain. We also thank two colleagues from the Knowledge Technologies Department at Jožef Stefan Institute, Ljubljana, Slovenia: Janez Kranjc for advice concerning the implementation of the developed approaches in the ClowdFlows platform and Vid Podpečan for the implementation of Weka algorithms as web services. This work was partially funded by the Slovenian Research Agency and by the European Commission in the context of the FP7 project FIRST, Large scale information extraction and integration infrastructure for supporting financial decision making, under the grant agreement n. 257928.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Borut Sluban.

Additional information

Responsible editor: Eamonn Keogh

Appendices

Appendix 1: Evaluation of HARF with different agreement levels

We here present in more detail our High Agreement Random Forest noise detection algorithm (HARF), introduced in our paper (Sluban et al. 2010). The HARF noise detection algorithm is used as an example of a new algorithm whose noise detection performance is evaluated and compared to the performance of other existing approaches by our Viper visual performance evaluation methodology. This appendix briefly presents HARF and experimentally shows that HARF-70 and HARF-80 are the best performing variants of the HARF noise detection algorithm.

Classification filters using Random Forest (RF) classifiers (actually) act like an ensemble noise filter, since randomly generated decision trees vote for the class to be assigned to an instance. However, if the class which got the maximal number of votes won only by a single vote (over the 50–50 % voting outcome), this result is not reliable, since the decision trees are generated randomly. Therefore, we decided to demand high agreement of classifiers (decision trees), i.e., a significantly increased majority of votes, namely, for two class domains used in our experiments this means that over 50 % (e.g., 60, 70, 80 or 90 %) of the randomly generated decision trees should classify an instance into the opposite class in order to accept it as noisy.

We have tested and implemented this high agreement based approach in the classification filtering manner with the RF classifier with 500 decision trees and with four different agreement levels: 60, 70, 80 and 90 %, referred to as HARF-60, HARF-70, HARF-80 and HARF-90, respectively. The obtained noise detection results are presented in part C of Table 3.

The results in part C of Table 3 prove that our request for high agreement of decision trees in the Random Forest leads to excellent performance results of the HARF noise detection algorithm. Figure 15 clearly indicates the increase in precision of noise detection based on the increased agreement level, for all experimental domains with 5 % injected noise.Footnote 13 However, the best results of the proposed HARF noise detection algorithm in terms of \(F_{0.5}\)-scores are achieved at agreement levels 70 and 80 % as shown in Figure 16 and presented in bold in part C of Table 3. Therefore, only the HARF noise detection algorithms using RF500 with 70 and 80 % agreement level—referred to as HARF-70 and HARF-80, respectively—are used in further evaluation and comparisons of noise detection algorithms in this paper.

Fig. 15
figure 15

Precision of HARF noise detection based on the agreement level for RF on all domains with 5 % injected noise

Fig. 16
figure 16

\(F_{0.5}\)-scores of HARF noise detection based on the agreement level for RF on all domains with 5 % injected noise

Appendix 2: Discussion of tabular noise detection results

This appendix presents and discusses the performance results of different noise detection algorithms and their ensembles evaluated on four domains with three different levels of randomly injected class noise. The precision and \(F_{0.5}\)-score results in Tables 3 and 4 present averages and standard deviations over ten repetitions of noise detection on a given domain with a given level of randomly injected class noise. The best precision and the best \(F_{0.5}\)-score results in each table are printed in bold.

Observing the results in part (A) of Table 3 we can see that among the classification filters the RF and SVM filters generally perform best, RF on the KRKP and CHD datasets, SVM on the TTT dataset and both RF and SVM on the NAKE dataset. The NN filter is quite good on NAKE at lower noise levels, but otherwise achieves only average performance. Noise detection with the Bayes classification filter shows overall the worst performance results, except for the CHD domain where it outperforms NN and is comparable to SVM.

The performance of the saturation filters presented in part (B) of Table 3 is between the worst and the best classification filters. Except for lower noise levels on the KRKP dataset, where SF even outperforms the best classification filter (RF), the saturation filters are not really comparable to the best classification filtering results. On the other hand, among the saturation filters, SF achieves higher or comparable (on the NAKE dataset) precision results like PruneSF, whereas the \(F_{0.5}\)-scores of PruneSF are not much lower, expect for the KRKP dataset, which indicates that PruneSF has higher recall of injected class noise than SF.

Ensemble filtering using a majority voting scheme (a simple majority of votes of elementary noise filters in the given ensemble) did not outperform the precision of injected noise detection achieved by the best elementary filtering algorithms alone.Footnote 14 On the other hand, with consensus voting we significantly improved the precision of noise detection as well as the \(F_{0.5}\)-scores. Therefore, from now on, when we talk about ensemble noise filters in this paper we actually refer to ensemble noise filters using the consensus voting scheme.

Performance results of the ten ensemble noise filters constructed from different elementary noise detection algorithms can be found in Table 4. The results show that in terms of precision using many diverse elementary noise detection algorithms in the ensemble proves to be better. Even including a different type of worst performing noise filter like Bayes, this does not reduce the ensemble’s precision but it actually improves it, as can be seen by comparing precision results of Ens2 and Ens3, Ens5 and Ens6, as well as Ens8 and Ens9. The highest precision among noise detection ensembles was hence not surprisingly achieved by Ens10 which uses all elementary noise filters examined in this work. However, considering the \(F_{0.5}\)-scores it is interesting to notice that ensembles without SF perform best, namely Ens1, Ens2 and Ens3 perform better on all datasets at higher noise levels (5–10 %), whereas Ens7, Ens8 and Ens9 are better at lower noise levels (2–5 %).

Last we discuss the results presented in part (C) of Table 3 for our HARF noise detection algorithm (described in more detail in Appendix 1) with four different agreement levels: 60, 70, 80 and 90 %, also referred to as HARF-60, HARF-70, HARF-80 and HARF-90, respectively. The results in part (C) of Table 3 as well as Figs. 15 and 16 in Appendix 1 show that HARF-80 and HARF-90 are the most precise noise detection algorithms presented in Table 3 and achieve comparable or higher precision than the precisest ensemble filters in Table 4. Furthermore, in terms of the \(F_{0.5}\)-scores the best results are achieved by HARF-70 and HARF-80, which shows to be also significantly better than the results achieved by elementary noise detection algorithms, expect in one case where SVM outperforms the HARF algorithms on the TTT dataset with 10 % noise level. Compared to ensemble noise filters the HARF-70 and HARF-80 achieve lower \(F_{0.5}\)-scores only on the TTT dataset, but are slightly better on the KRKP and CHD datasets and better on the NAKE dataset.

At this point we can observe another interesting phenomenon, indicating that classification noise filters are robust noise detection algorithms. By analyzing the results in Tables 3 and 4 we see that the proportion of detected noise (i.e., the precision of noise detection) increases with the increased abundance of noise (increased levels of artificially inserted noise 2, 5, 10 %). Clearly, with increased noise levels the classifiers are less accurate (their accuracy drops), therefore leading to the increased number of misclassified instances, i.e., the number of instances which are considered potentially noisy increases. As an observed side effect, the proportion of detected artificially corrupted instances increases as well.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sluban, B., Gamberger, D. & Lavrač, N. Ensemble-based noise detection: noise ranking and visual performance evaluation. Data Min Knowl Disc 28, 265–303 (2014). https://doi.org/10.1007/s10618-012-0299-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-012-0299-1

Keywords

Navigation