Ali, K.M. (1996). *Learning probabilistic relational concept descriptions*. Ph.D. thesis, University of California, Irvine. http://www.ics.uci.edu/~ali.

Becker, B., Kohavi, R., & Sommerfield, D. (1997). Visualizing the simple bayesian classifier. *KDD Workshop on Issues in the Integration of Data Mining and Data Visualization*.

Bernardo, J.M., & Smith, A.F. (1993). *Bayesian theory*. John Wiley & Sons.

Breiman, L. (1994). *Heuristics of instability in model selection* (Technical Report). Berkeley: Statistics Department, University of California.

Breiman, L. (1996a). *Arcing classifiers* (Technical Report). Berkeley: Statistics Department, University of California. http://www.stat.Berkeley.EDU/users/breiman/.

Breiman, L. (1996b). Bagging predictors. *Machine Learning*, *24*, 123–140.

Breiman, L. (1997). *Arcing the edge* (Technical Report 486). Berkeley: Statistics Department, University of California. http://www.stat.Berkeley.EDU/users/breiman/.

Buntine, W. (1992a). Learning classification trees. *Statistics and Computing*, *2*(2), 63–73.

Buntine, W. (1992b). *A theory of learning classification rules*. Ph.D. thesis, University of Technology, Sydney, School of Computing Science.

Blake, C. Keogh, E., & Merz, C.J. (1998). UCI repository of machine learning databases. http://www.ics. uci.edu/~mlearn/MLRepository.html.

Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In L.C. Aiello (Ed.), *Proceedings of the Ninth European Conference on Artificial Intelligence* (pp. 147–149).

Chan, P., Stolfo, S., & Wolpert, D. (1996). *Integrating multiple learned models for improving and scaling machine learning algorithms*. AAAI Workshop.

Craven, M.W., & Shavlik, J.W. (1993). Learning symbolic rules using artificial neural networks. *Proceedings of the Tenth International Conference on Machine Learning* (pp. 73–80). Morgan Kaufmann.

Dietterich, T.G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. *Neural Computation*, *10*(7).

Dietterich, T.G., & Bakiri, G. (1991). Error-correcting output codes: A general method for improving multiclass inductive learning programs. *Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91)* (pp. 572–577).

Domingos, P. (1997). Why does bagging work? A Bayesian account and its implications. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), *Proceedings of the Third International Conference on Knowledge Discovery and Data Mining* (pp. 155–158). AAAI Press.

Domingos, P., & Pazzani, M. (1997). Beyond independence: Conditions for the optimality of the simple Bayesian classifier. *Machine Learning*, *29*(2/3), 103–130.

Drucker, H., & Cortes, C. (1996). Boosting decision trees. *Advances in neural information processing systems 8*' (pp. 479–485).

Duda, R., & Hart, P. (1973). *Pattern classification and scene analysis*. Wiley.

Efron, B., & Tibshirani, R. (1993). *An introduction to the bootstrap*. Chapman & Hall.

Elkan, C. (1997). *Boosting and naive bayesian learning* (Technical Report). San Diego: Department of Computer Science and Engineering, University of California.

Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. *Proceedings of the 13th International Joint Conference on Artificial Intelligence* (pp. 1022–1027). Morgan Kaufmann Publishers.

Freund, Y. (1990). Boosting a weak learning algorithm by majority. *Proceedings of the Third Annual Workshop on Computational Learning Theory* (pp. 202–216).

Freund, Y. (1996). Boosting a weak learning algorithm by majority. *Information and Computation*, *121*(2), 256–285.

Freund, Y., & Schapire, R.E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. *Proceedings of the Second European Conference on Computational Learning Theory* (pp. 23–37). Springer-Verlag, To appear in Journal of Computer and System Sciences.

Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. In L. Saitta (Ed.), *Machine Learning: Proceedings of the Thirteenth National Conference* (pp. 148–156). Morgan Kaufmann.

Friedman, J.H. (1997). On bias, variance, 0/1-loss, and the curse of dimensionality. *Data Mining and Knowledge Discovery*, *1*(1), 55–77. ftp://playfair.stanford.edu/pub/friedman/curse.ps.Z.

Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. *Neural Computation*, *4*, 1–48.

Good, I.J. (1965). *The estimation of probabilities: An essay on modern bayesian methods*. M.I.T. Press.

Holte, R.C. (1993). Very simple classification rules perform well on most commonly used datasets. *Machine Learning*, *11*, 63–90.

Iba, W., & Langley, P. (1992). Induction of one-level decision trees. *Proceedings of the Ninth International Conference on Machine Learning* (pp. 233–240). Morgan Kaufmann Publishers.

Kohavi, R. (1995a). A study of cross-validation and bootstrap for accuracy estimation and model selection. In C.S. Mellish (Ed.), *Proceedings of the 14th International Joint Conference on Artificial Intelligence* (pp. 1137–1143). Morgan Kaufmann. http://robotics.stanford.edu/~ronnyk.

Kohavi, R. (1995b). *Wrappers for performance enhancement and oblivious decision graphs*. Ph.D. thesis, Stanford University, Computer Science department. STAN-CS-TR–95–1560. http://robotics.Stanford.EDU/~ronnyk/teza.ps.Z.

Kohavi, R., Becker, B., & Sommerfield, D. (1997). Improving simple bayes. *The Nineth European Conference on Machine Learning*, Poster Papers' (pp. 78–87). Available at http://robotics.stanford.edu/users/ronnyk.

Kohavi, R., & Kunz, C. (1997). Option decision trees with majority votes. In D. Fisher (Ed.), *Machine Learning: Proceedings of theFourteenth International Conference* (pp. 161–169). Morgan Kaufmann Publishers.Available at http://robotics.stanford.edu/users/ronnyk.

Kohavi, R., & Sahami, M. (1996). Error-based and entropy-based discretization of continuous features. *Proceedings of the Second International Conference on Knowledge Discovery and Data Mining* (pp. 114–119).

Kohavi, R., & Sommerfield, D. (1995). Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. *The First International Conference on Knowledge Discovery and Data Mining* (pp. 192–197).

Kohavi, R., Sommerfield, D., & Dougherty, J. (1997). Data mining using \({\mathcal{M}}{\mathcal{L}}{\mathcal{C}}\)++: A machine learning library in C++. *International Journal on Artificial Intelligence Tools 6*(4), 537–566. http://www.sgi.com/Technology/mlc.

Kohavi, R., & Wolpert, D.H. (1996). Bias plus variance decomposition for zero-one loss functions. In L. Saitta (Ed.), *Machine Learning: Proceedings of the Thirteenth International Conference* (pp. 275–283). Morgan Kaufmann. Available at http://robotics.stanford.edu/users/ronnyk.

Kong, E.B., & Dietterich, T.G. (1995). Error-correcting output coding corrects bias and variance. In A. Prieditis & S. Russell (Eds.), *Machine Learning: Proceedings of the Twelfth International Conference* (pp. 313–321). Morgan Kaufmann.

Kwok, S.W., & Carter, C. (1990). Multiple decision trees. In R.D. Schachter, T.S. Levitt, L.N. Kanal, & J.F. Lemmer (Eds.), *Uncertainty in Artificial Intelligence* (pp. 327–335). Elsevier Science Publishers.

Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. *Proceedings of the Tenth National Conference on Artificial Intelligence* (pp. 223–228). AAAI Press and MIT Press.

Langley, P., & Sage, S. (1997). Scaling to domains withmany irrelevant features. In R. Greiner (Ed.), *Computational learning theory and natural learning systems* (Vol. 4). MIT Press.

Oates, T., & Jensen, D. (1997). The effects of training set size on decision tree complexity. In D. Fisher (Ed.), *Machine Learning: Proceedings of the Fourteenth International Conference* (pp. 254–262). Morgan Kaufmann.

Oliver, J., & Hand, D. (1995). On pruning and averaging decision trees. In A. Prieditis & S. Russell (Eds.), *Machine Learning: Proceedings of the Twelfth International Conference* (pp. 430–437). Morgan Kaufmann.

Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. *Machine Learning: Proceedings of the Eleventh International Conference*. Morgan Kaufmann.

Quinlan, J.R. (1993). *C4.5: programs for machine learning*. San Mateo, California: Morgan Kaufmann.

Quinlan, J.R. (1994). Comparing connectionist and symbolic learning methods. In S.J. Hanson, G.A. Drastal, & R.L. Rivest (Eds.), *Computational learning theory and natural learning systems* (Vol. I: Constraints and prospects, chap. 15, pp. 445–456). MIT Press.

Quinlan, J.R. (1996). Bagging, boosting, and c4.5. *Proceedings of the Thirteenth National Conference on Artificial Intelligence* (pp. 725–730). AAAI Press and the MIT Press.

Ridgeway, G., Madigan, D., & Richardson, T. (1998). Interpretable boosted naive bayes classification. *Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining*.

Schaffer, C. (1994). A conservation law for generalization performance. *Machine Learning: Proceedings of the Eleventh International Conference* (pp. 259–265). Morgan Kaufmann.

Schapire, R.E. (1990). The strength of weak learnability. *Machine Learning*, *5*(2), 197–227.

Schapire, R.E., Freund, Y., Bartlett, P., & Lee,W.S. (1997). Boosting the margin: A new explanation for the effectiveness of voting methods. In D. Fisher (Ed.), *Machine Learning: Proceedings of the Fourteenth International Conference* (pp. 322–330). Morgan Kaufmann.

Wolpert, D.H. (1992). Stacked generalization. *Neural Networks*, *5*, 241–259.

Wolpert, D.H. (1994). The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In D.H. Wolpert (Ed.), *The mathematics of generalization*. Addison Wesley.