Knowledge and Information Systems

, Volume 18, Issue 1, pp 83–108 | Cite as

A framework for monitoring classifiers’ performance: when and why failure occurs?

Regular Paper

Abstract

Classifier error is the product of model bias and data variance. While understanding the bias involved when selecting a given learning algorithm, it is similarly important to understand the variability in data over time, since even the One True Model might perform poorly when training and evaluation samples diverge. Thus, it becomes the ability to identify distributional divergence is critical towards pinpointing when fracture points in classifier performance will occur, particularly since contemporary methods such as tenfolds and hold-out are poor predictors in divergent circumstances. This article implement a comprehensive evaluation framework to proactively detect breakpoints in classifiers’ predictions and shifts in data distributions through a series of statistical tests. We outline and utilize three scenarios under which data changes: sample selection bias, covariate shift, and shifting class priors. We evaluate the framework with a variety of classifiers and datasets.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    SVMlight Support Vector Machine. http://www.cs.cornell.edu/People/tj/svm_light/
  2. 2.
    Basu A, Harris IR, Basu S (1997) Minimum distance estimation: the approach using density-based distances. Handb Stat 15: 21–48CrossRefMathSciNetGoogle Scholar
  3. 3.
    Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of suppervised learning performance criteria. In: Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 69–78Google Scholar
  4. 4.
    Chawla NV, Karakoulas G (2005) Learning from labeled and unlabeled data: an empirical study across techniques and domains. JAIR 23: 331–366MATHGoogle Scholar
  5. 5.
    Fan W, Davidson I (2006) ReverseTesting: an efficient framework to select amongst classifiers under sample selection bias. In: Proceedings of KDDGoogle Scholar
  6. 6.
    Fan W, Davidson I., Zadrozny B., Yu P. (2005) An improved categorization of classifier’s sensitivity on sample selection bias. In: 5th IEEE International Conference on Data MiningGoogle Scholar
  7. 7.
    Gibbons JD (1985) Nonparametric statistical inference, 2nd edn. M. Dekker, New YorkMATHGoogle Scholar
  8. 8.
    Groot P, ten Teije A, van Harmelen F (2004) A quantitative analysis of the robustness of knowledge-based systems through degradation studies. Knowl Inform Syst 7(2): 224–245CrossRefGoogle Scholar
  9. 9.
    Hall L, Mohney B, Kier L (1991) The electrotopological state: structure information at the atomic level for molecular graphs. J Chem Inform Comput Sci 31(76)Google Scholar
  10. 10.
    Heckman JJ Sample selection bias as a specification error. Econometrica 47(1):153–161Google Scholar
  11. 11.
    Kailath T (1967) The divergence and bhattacharyya distance measures in signal selection. IEEE Trans Commun 15(1): 52–60CrossRefGoogle Scholar
  12. 12.
    Kolmogorov AN (1933) On the empirical determination of a distribution function. (Italian) Giornale dell’Instituto Italiano degli Attuari 4: 83–91MATHGoogle Scholar
  13. 13.
    Kubat M, Holte R, Matwin S (1998) Machine learning for the detection of oil spills in satellite radar images. Mach Learn 30: 195–215CrossRefGoogle Scholar
  14. 14.
    Kukar M (2006) Quality assessment of individual classifications in machine learning and data mining. Knowl Inform Sys 9(3): 364–384CrossRefGoogle Scholar
  15. 15.
    Legendre P, Gallagher ED (2001) Ecologically meaningful transformations for ordination of species data. Oecologia 129: 271–280CrossRefGoogle Scholar
  16. 16.
    Lindman HR (1974) Analysis of variance in complex experimental designs. W. H. Freeman, San FranciscoMATHGoogle Scholar
  17. 17.
    Little R, Rubin D (1987) Statistical analysis with missing data. Wiley, New YorkMATHGoogle Scholar
  18. 18.
    Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databasesGoogle Scholar
  19. 19.
    Provost F, Domingos P (2003) Tree induction for probability-based ranking. Mach Learn 52(3): 199–215MATHCrossRefGoogle Scholar
  20. 20.
    Rao C (1995) A review of canonical coordinates and an alternative to corresponence analysis using hellinger distance. Questiio (Quaderns d’Estadistica i Investigacio Operativa) 19: 23–63Google Scholar
  21. 21.
    Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th Annual Conference on Computational Learning TheoryGoogle Scholar
  22. 22.
    Smirnov N (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. (Russ) Bull Mosc Univ 2: 3–16Google Scholar
  23. 23.
    Vapnik V (1996) The nature of statistical learning. Springer, New YorkGoogle Scholar
  24. 24.
    Woods K, Doss C, Bowyer K, Solka J, Priebe C, Kegelmeyer WP (1993) Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. IJPRAI 7(6): 1417–1436Google Scholar
  25. 25.
    Wu X, Kumar V, Quinlan JR, Ghosh J, Yang Q, Motoda H, McLachlan GJ, Ng A, Liu B, Yu PS, Zhou Z-H, Steinbach M, Hand DJ, Steinberg D (2008) Top 10 algorithms in data mining. Knowl Inform Syst 14(1): 1–37CrossRefGoogle Scholar
  26. 26.
    Yamanishi K, ichi Takeuchi J, Williams GJ, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In: Knowledge Discovery and Data Mining, pp 275–300Google Scholar
  27. 27.
    Zadrozny B (2004) Learning and evaluating under sample selection bias. In: Proceedings of the 21st International Conference on Machine LearningGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2008

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of Notre DameNotre DameUSA

Personalised recommendations