Machine Learning

, Volume 42, Issue 3, pp 203–231 | Cite as

Robust Classification for Imprecise Environments

  • Foster Provost
  • Tom Fawcett
Article

Abstract

In real-world environments it usually is difficult to specify target operating conditions precisely, for example, target misclassification costs. This uncertainty makes building robust classification systems problematic. We show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. In some cases, the performance of the hybrid actually can surpass that of the best known classifier. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. The hybrid also is efficient to build, to store, and to update. The hybrid is based on a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull (ROCCH) method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses. Finally, we point to empirical evidence that a robust hybrid classifier indeed is needed for many real-world problems.

classification learning uncertainty evaluation comparison multiple models cost-sensitive learning skewed distributions 

References

  1. Ali, K. M.& Pazzani, M. J. (1996). Error reduction through learning multiple descriptions. Machine Learning, 24(3), 173–202.Google Scholar
  2. Barber, C. B., Dobkin, D. P.,& Huhdanpaa, H. (1996). The quickhull algorithm for convex hulls. ACMTransactions on Mathematical Software, 22(4), 469–483. Available from ftp://geom.umn.edu/pub/software/qhull. tar.Z.Google Scholar
  3. Beck, J. R.& Schultz, E. K. (1986). The use of ROC curves in test performance evaluation. Arch Pathol Lab Med, 110, 13–20.Google Scholar
  4. Berry, M. J. A.& Linoff, G. (1997). Data Mining Techniques: For Marketing, Sales, and Customer Support. New York: John Wiley&Sons.Google Scholar
  5. Blackwell, D.& Girshick, M. A. (1954). Theory of Games and Statistical Decisions. John Wiley and Sons, Inc. Republished by Dover Publications, New York, in 1979.Google Scholar
  6. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30(7), 1145–1159.Google Scholar
  7. Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.Google Scholar
  8. Breiman, L., Friedman, J., Olshen, R.,& Stone, C. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International Group.Google Scholar
  9. Catlett, J. (1995). Tailoring rulesets to misclassification costs. In Proceedings of the Fifth International Workshop on Artificial Intelligence and Statistics (pp. 88–94).Google Scholar
  10. Cherikh, M. (1989). Optimal Decision and Detection in the Decentralized Case. Ph.D. Thesis, Case Western Reserve University.Google Scholar
  11. Clearwater, S.& Stern, E. (1991). A rule-learning program in high energy physics event classification. Comp Physics Comm, 67, 159–182.Google Scholar
  12. Dietterich, T. G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7), 1895–1924.Google Scholar
  13. Domingos, P. (1999). MetaCost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 155–164).Google Scholar
  14. Domingos, P.& Pazzani, M. (1997). Beyond independence: Conditions for the optimality of the simple bayesian classifier. Machine Learning, 29, 103–130.Google Scholar
  15. Dougherty, J., Kohavi, R.,& Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In A. Prieditis & S. Russell (Eds.), Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). San Francisco: Morgan Kaufmann.Google Scholar
  16. Drummond, C.& Holte, R. C. (2000). Explicitly representing expected cost: An alternative to ROC representation. In R. Ramakrishnan& S. Stolfo (Eds.), Proceedings on the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Google Scholar
  17. Egan, J. P. (1975). Signal Detection Theory and ROC Analysis. Series in Cognitition and Perception. New York: Academic Press.Google Scholar
  18. Ezawa, K., Singh, M.,& Norton, S. (1996). Learning goal oriented bayesian networks for telecommunications risk management. In L. Saitta (Ed.), Proceedings of the Thirteenth International Conference on Machine Learning (pp. 139–147). San Francisco, CA: Morgan Kaufmann.Google Scholar
  19. Fawcett, T.& Provost, F. (1996). Combining data mining and machine learning for effective user profiling. In Simoudis, Han,& Fayyad (Eds.), Proceedings on the Second International Conference on Knowledge Discovery and Data Mining (pp. 8–13). Menlo Park, CA: AAAI Press.Google Scholar
  20. Fawcett, T.& Provost, F. (1997). Adaptive fraud detection. Data Mining and Knowledge Discovery, 1(3), 291–316.Google Scholar
  21. Fawcett, T.& Provost, F. (1999). Activity monitoring: Noticing interesting changes in behavior. In Chaudhuri& Madigan (Eds.), Proceedings on the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 53–62).Google Scholar
  22. Friedman, C. P.& Wyatt, J. C. (1997). Evaluation Methods in Medical Informatics. New York: Springer-Verlag.Google Scholar
  23. Hanley, J. A.& McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology, 143, 29–36.Google Scholar
  24. Klinkenberg, R.& Thorsten, J. (2000). Detecting concept drift with support vector machines. In Proceedings of the Seventeenth International Conference on Machine Learning. San Francisco: Morgan Kaufmann.Google Scholar
  25. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In C. S. Mellish (Ed.), Proceedings of the 14th International Joint Conference on Artificial Intelligence (pp. 1137–1143). San Francisco: Morgan Kaufmann.Google Scholar
  26. Kohavi, R., Sommerfield, D.,& Dougherty, J. (1997). Data mining using \({\mathcal{M}}{\mathcal{L}}{\mathcal{C}}\)++: A machine learning library in C++. International Journal on Artificial Intelligence Tools, 6(4), 537–566. Available: http://www.sgi.com/ Technology/mlc.Google Scholar
  27. Kubat, M., Holte, R.,& Matwin, S. (1998). Machine learning for the detection of oil spills in satellite radar images. Machine Learning, 30(2/3), 195–215.Google Scholar
  28. Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T.,& Brunk, C. (1994). Reducing misclassification costs. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 217–225). San Francisco: Morgan Kaufmann.Google Scholar
  29. Provost, F.& Fawcett, T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions. In Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 43–48). Menlo Park, CA: AAAI Press.Google Scholar
  30. Provost, F., Fawcett, T.,& Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms. In J. Shavlik (Ed.), Proceedings of the Fifteenth International Conference on Machine Learning (pp. 445–453). San Francisco, CA: Morgan Kaufmann.Google Scholar
  31. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Francisco, CA: Morgan Kaufmann.Google Scholar
  32. Saitta, L.& Neri, F. (1998). Learning in the “Real World”. Machine Learning, 30, 133–163.Google Scholar
  33. Salzberg, S. L. (1997). On comparing classifiers: Pitfalls to avoid and a recommended approach. Data Mining and Knowledge Discovery, 1, 317–328.Google Scholar
  34. Srinivasan, A. (1999). Note on the location of optimal classifiers in ROC space. Technical report PRG-TR-2-99, Oxford University.Google Scholar
  35. Stadler, W. (Ed.). (1988). Multicriteria Optimization in Engineering and in the Sciences. New York: Plenum Press.Google Scholar
  36. Swets, J. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285–1293.Google Scholar
  37. Tcheng, D., Lambert, B., Lu, S. C.-Y.,& Rendell, L. (1989). Building robust learning systems by computing induction and optimization. In N. S. Sridharan (Ed.), Proceedings of the Eleventh International Joint Conference on Artificial Intelligence (pp. 806–812). San Francisco, CA: Morgan Kaufmann.Google Scholar
  38. Turney, P. (1996). Cost sensitive learning bibliography. Available: http://ai.iit.nrc.ca/bibliographies/cost-sensitive.html.Google Scholar
  39. Weinstein, M. C.& Fineberg, H.V. (1980). Clinical Decision Analysis. Philadelphia, PA: W.B. Saunders Company.Google Scholar
  40. Zahavi, J.& Levin, N. (1997). Issues and problems in applying neural computing to target marketing. Journal of Direct Marketing, 11(4), 63–75.Google Scholar

Copyright information

© Kluwer Academic Publishers 2001

Authors and Affiliations

  • Foster Provost
    • 1
  • Tom Fawcett
    • 2
  1. 1.New York UniversityNew YorkUSA
  2. 2.Hewlett-Packard LaboratoriesPalo AltoUSA

Personalised recommendations