A framework for monitoring classifiers’ performance: when and why failure occurs?
Classifier error is the product of model bias and data variance. While understanding the bias involved when selecting a given learning algorithm, it is similarly important to understand the variability in data over time, since even the One True Model might perform poorly when training and evaluation samples diverge. Thus, it becomes the ability to identify distributional divergence is critical towards pinpointing when fracture points in classifier performance will occur, particularly since contemporary methods such as tenfolds and hold-out are poor predictors in divergent circumstances. This article implement a comprehensive evaluation framework to proactively detect breakpoints in classifiers’ predictions and shifts in data distributions through a series of statistical tests. We outline and utilize three scenarios under which data changes: sample selection bias, covariate shift, and shifting class priors. We evaluate the framework with a variety of classifiers and datasets.
Unable to display preview. Download preview PDF.
- 1.SVMlight Support Vector Machine. http://www.cs.cornell.edu/People/tj/svm_light/
- 3.Caruana R, Niculescu-Mizil A (2004) Data mining in metric space: an empirical analysis of suppervised learning performance criteria. In: Proceedings of the Tenth International Conference on Knowledge Discovery and Data Mining (KDD’04), pp 69–78Google Scholar
- 5.Fan W, Davidson I (2006) ReverseTesting: an efficient framework to select amongst classifiers under sample selection bias. In: Proceedings of KDDGoogle Scholar
- 6.Fan W, Davidson I., Zadrozny B., Yu P. (2005) An improved categorization of classifier’s sensitivity on sample selection bias. In: 5th IEEE International Conference on Data MiningGoogle Scholar
- 9.Hall L, Mohney B, Kier L (1991) The electrotopological state: structure information at the atomic level for molecular graphs. J Chem Inform Comput Sci 31(76)Google Scholar
- 10.Heckman JJ Sample selection bias as a specification error. Econometrica 47(1):153–161Google Scholar
- 18.Newman D, Hettich S, Blake C, Merz C (1998) UCI repository of machine learning databasesGoogle Scholar
- 20.Rao C (1995) A review of canonical coordinates and an alternative to corresponence analysis using hellinger distance. Questiio (Quaderns d’Estadistica i Investigacio Operativa) 19: 23–63Google Scholar
- 21.Shawe-Taylor J, Bartlett P, Williamson R, Anthony M (1996) A framework for structural risk minimisation. In: Proceedings of the 9th Annual Conference on Computational Learning TheoryGoogle Scholar
- 22.Smirnov N (1939) On the estimation of the discrepancy between empirical curves of distribution for two independent samples. (Russ) Bull Mosc Univ 2: 3–16Google Scholar
- 23.Vapnik V (1996) The nature of statistical learning. Springer, New YorkGoogle Scholar
- 24.Woods K, Doss C, Bowyer K, Solka J, Priebe C, Kegelmeyer WP (1993) Comparative evaluation of pattern recognition techniques for detection of microcalcifications in mammography. IJPRAI 7(6): 1417–1436Google Scholar
- 26.Yamanishi K, ichi Takeuchi J, Williams GJ, Milne P (2004) On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In: Knowledge Discovery and Data Mining, pp 275–300Google Scholar
- 27.Zadrozny B (2004) Learning and evaluating under sample selection bias. In: Proceedings of the 21st International Conference on Machine LearningGoogle Scholar