Apte, C., Grossman, E., Pednault, E., Rosen, B., Tipu, F., & White, B. (1999). Probabilistic estimation-based data mining for discovering insurance risks.

*IEEE Intelligent Systems*,

*14*, 49–58.

Google ScholarBahl, L. R., Brown, P. F., de Souza, P. V., & Mercer, R. L. (1989). A tree-based statistical language model for natural language speech recognition. *IEEE Transactions on Acoustics, Speech, and Signal Processing*, 37:7, 1001–1008.

Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting and variants.

*Machine Learning*,

*36*, 105–142.

Google ScholarBennett, P. (2002). Using a symmetric distributions to improve classifier probabilities: A comparison of new and standard parametric methods. Technical report CMU-CS-02-126, School of Computer Science, Carnegie Mellon University.

Blake, C., & Merz, C. J. (2000). UCI repository of machine learning databases. Machine-readable data repository, Department of Information and Computer Science, University of California at Irvine, Irvine, CA. Available at http://www.ics.uci.edu/?mlearn/MLRepository.html.

Bradford, J. P., Kunz, C., Kohavi, R., Brunk, C., & Brodley, C. E. (1998). Pruning decision trees with misclassification costs.

*Proceedings of the Tenth European Conference on Machine Learning* (pp. 131–136). Berlin: Springer Verlag.

Google ScholarBradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms.

*Pattern Recognition*,

*30:7*, 1145–1159.

Google ScholarBreiman, L. (1996). Bagging predictors.

*Machine Learning*,

*24*, 123–140.

Google ScholarBreiman, L. (1998). Out-of-bag estimation. Unpublished manuscript.

Breiman, L. (2000). Private communication.

Breiman, L., Friedman, J. H., Olshen, R. A.,& Stone, C. J. (1984). *Classification and Regression Trees*. Wadsworth International Group.

Buntine,W. (1991).

*A theory of learning classification rules*. Ph.D. thesis, School of Computer Science, University of Technology, Sydney, Australia.

Google ScholarCestnik, B. (1990). Estimating probabilities:Acrucial task in machine learning. *Proceedings of the Ninth European Conference on Artificial Intelligence* (pp. 147–149). Pitman.

Clark, P., & Boswell, R. (1991). Rule induction with CN2: Some recent improvements.

*Proceedings of the Sixth European Working Session on Learning* (pp. 151–163). Berlin: Springer.

Google ScholarDanyluk, A., & Provost, F. (2002). Telecommunications network diagnosis. In W. Kloesgen, & J. Zytkow (Eds.), *Handbook of Knowledge Discovery and Data Mining*, 897–902.

Domingos, P. (1997). Why does bagging work? A Bayesian account and its implications.

*Proceedings of the Third International Conference on Knowledge Discovery and Data Mining* (pp. 155–158). Menlo Park, CA: AAAI Press.

Google ScholarDomingos, P. (1999). MetaCost: A general method for making classifiers cost-sensitive.

*Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining* (pp. 155–164). New York: ACM Press.

Google ScholarDomingos, P. (1997).Knowledge acquisition from examples via multiple models. In D. H. Fisher (Ed.),

*Proceedings of the Fourteenth International Conference on Machine Learning (ICML-97)* (pp. 98–106). San Francisco, CA: Morgan Kaufmann.

Google ScholarDrummond, C., & Holte, R. (2000). Exploiting the cost (in)sensitivity of decision tree splitting criteria.

*Proceedings of the Seventeenth International Conference on Machine Learning* (pp. 239–246). San Francisco: Morgan Kaufmann.

Google ScholarDzeroski, S., Cestnik, B., & Petrovski, I. (1993). Using the

*m*-estimate in rule induction.

*Journal of Computing and Information Technology*,

*1*, 37–46.

Google ScholarFriedman, N., & Goldszmidt, M. (1996). Learning Bayesian networks with local structure.

*Proceedings of the Twelfth Conference on Uncertainty in Artificial Intelligence* (pp. 252–262). San Francisco: Morgan Kaufmann.

Google ScholarGood, I. J. (1965).

*The Estimation of Probabilities: An Essay on Modern Bayesian Methods*. Cambridge, MA: MIT Press.

Google ScholarGordon, L., & Olshen, R. A. (1984). Almost sure consistent nonparametric regression from recursive partitioning schemes.

*Journal of Multivariate Analysis*,

*15*, 147–163.

Google ScholarHand, D. J. (1997).

*Construction and Assessment of Classification Rules*. Chichester: John Wiley and Sons.

Google ScholarHand, D. J., & Till, R. J. (2001). A simple generalization of the area under the ROC curve for multiple class classification problems.

*Machine Learning*,

*45:2*, 171–186.

Google ScholarHanley, J. A., & McNeil, B. J. (1982). The meaning and use of the area under a receiver operating characteristic (ROC) curve.

*Radiology*,

*143*, 29–36.

Google ScholarHastie, T. J., & Pregibon, D. (1990). Shrinking trees. Technical report, AT&T Laboratories.

Heckerman, D., Chickering, M., Meek, C., Rounthwaite, R., & Kadie, C. (2000). Dependency networks for density estimation, collaborative filtering, and data visualization.

*Proceedings of the Sixteenth Conference on Uncertainty in Artificial Intelligence*. San Francisco: Morgan Kaufmann.

Google ScholarHolte, R., Acker, L., & Porter, B. (1989). Concept learning and the problem of small disjuncts.

*Proceedings of the Eleventh International Joint Conference on Artificial Intelligence* (pp. 813–818). San Francisco: Morgan Kaufmann.

Google ScholarJelinek, F. (1997).

*Statistical Methods for Speech Recognition*. Cambridge, MA: MIT Press.

Google ScholarKohavi, R., Becker, B., & Sommerfield, D. (1997). Improving simple Bayes. *The Ninth European Conference on Machine Learning* (pp. 78–87).

Lim, T.-J., Loh, W.-Y., & Shih, Y.-S. (2000). A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms.

*Machine Learning*,

*40:3*, 203–228.

Google ScholarMargineantu, D. D., & Dietterich, T. G. (2001). Improved class probability estimates from decision tree models. In C. Holmes (Ed.),

*Nonlinear Estimation and Classification*. The Mathematical Sciences Research Institute, University of California, Berkeley.

Google ScholarMcCallum, A., Rosenfeld, R., Mitchell, T., & Ng, A. Y. (1998). Improving text classification by shrinkage in a hierarchy of classes.

*Proceedings of the Fifteenth International Conference on Machine Learning* (pp. 359–367). San Francisco: Morgan Kaufmann.

Google ScholarNiblett, T. (1987). Constructing decision trees in noisy domains.

*Proceedings of the Second European Working Session on Learning* (pp. 67–78). Wilmslow, England: Sigma Press.

Google ScholarPazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs.

*Proceedings of the Eleventh International Conference on Machine Learning* (pp. 217–225). San Francisco: Morgan Kaufmann.

Google ScholarPerlich, C., Provost, F., & Simonoff, J. S. (2003). Tree induction versus logistic regression: A learning-curve analysis. *Journal of Machine Learning Research*. (In press).

Provost, F., & Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees. CeDER Working Paper #IS-00-04, Stern School of Business, New York University, NY 10012.

Google ScholarProvost, F., & Fawcett,T. (1997). Analysis and visualization of classifier performance: Comparison under imprecise class and cost distributions.

*Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD-97)* (pp. 43–48). Menlo Park, CA: AAAI Press.

Google ScholarProvost, F., & Fawcett, T. (2001). Robust classification for imprecise environments.

*Machine Learning*,

*42*, 203–231.

Google ScholarProvost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing induction algorithms.

*Proceedings of the Fifteenth International Conference on Machine Learning* (pp. 445–453). San Francisco: Morgan Kaufmann.

Google ScholarProvost, F., & Kolluri, V. (1999). A survey of methods for scaling up inductive algorithms.

*Data Mining and Knowledge Discovery*,

*3:2*, 131–169.

Google ScholarQuinlan, J. R. (1993).

*C4.5: Programs for Machine Learning*. San Francisco: Morgan Kaufmann.

Google ScholarSimonoff, J. S. (1995). Smoothing categorical data.

*Journal of Statistical Planning and Inference*,

*47*, 41–69.

Google ScholarSmyth, P., Gray, A., & Fayyad, U. (1995). Retrofitting decision tree classifiers using kernel density estimation.

*Proceedings of the Twelfth International Conference on Machine Learning* (pp. 506–514). San Francisco: Morgan Kaufmann.

Google ScholarSobehart, J. R., Stein, R. M., Mikityanskaya, V., & Li, L. (2000). Moody's public firm risk model: A hybrid approach to modeling short term default risk. Tech rep., Moody's Investors Service, Global Credit Research. Available: http://www.moodysqra.com/research/crm/53853.asp.

Swets, J. (1988). Measuring the accuracy of diagnostic systems.

*Science*,

*240*, 1285–1293.

Google ScholarZadrozny, B., & Elkan, C. (2001). Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In C. Brodley, & A. Danyluk (Eds.),

*Proceedings of the Eighteenth International Conference on Machine Learning* (pp. 609–616). San Francisco: Morgan Kaufmann.

Google Scholar