Ali, K.M. (1996). Learning probabilistic relational concept descriptions. Ph.D. thesis, University of California, Irvine. http://www.ics.uci.edu/~ali.
Becker, B., Kohavi, R., & Sommerfield, D. (1997). Visualizing the simple bayesian classifier. KDD Workshop on Issues in the Integration of Data Mining and Data Visualization.
Bernardo, J.M., & Smith, A.F. (1993). Bayesian theory. John Wiley & Sons.
Breiman, L. (1994). Heuristics of instability in model selection (Technical Report). Berkeley: Statistics Department, University of California.
Breiman, L. (1996a). Arcing classifiers (Technical Report). Berkeley: Statistics Department, University of California. http://www.stat.Berkeley.EDU/users/breiman/.
Breiman, L. (1996b). Bagging predictors. Machine Learning, 24, 123–140.
Breiman, L. (1997). Arcing the edge (Technical Report 486). Berkeley: Statistics Department, University of California. http://www.stat.Berkeley.EDU/users/breiman/.
Buntine, W. (1992a). Learning classification trees. Statistics and Computing, 2(2), 63–73.
Buntine, W. (1992b). A theory of learning classification rules. Ph.D. thesis, University of Technology, Sydney, School of Computing Science.
Blake, C. Keogh, E., & Merz, C.J. (1998). UCI repository of machine learning databases. http://www.ics. uci.edu/~mlearn/MLRepository.html.
Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In L.C. Aiello (Ed.), Proceedings of the Ninth European Conference on Artificial Intelligence (pp. 147–149).
Chan, P., Stolfo, S., & Wolpert, D. (1996). Integrating multiple learned models for improving and scaling machine learning algorithms. AAAI Workshop.
Craven, M.W., & Shavlik, J.W. (1993). Learning symbolic rules using artificial neural networks. Proceedings of the Tenth International Conference on Machine Learning (pp. 73–80). Morgan Kaufmann.
Dietterich, T.G. (1998). Approximate statistical tests for comparing supervised classification learning algorithms. Neural Computation, 10(7).
Dietterich, T.G., & Bakiri, G. (1991). Error-correcting output codes: A general method for improving multiclass inductive learning programs. Proceedings of the Ninth National Conference on Artificial Intelligence (AAAI-91) (pp. 572–577).
Domingos, P. (1997). Why does bagging work? A Bayesian account and its implications. In D. Heckerman, H. Mannila, D. Pregibon, & R. Uthurusamy (Eds.), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (pp. 155–158). AAAI Press.
Domingos, P., & Pazzani, M. (1997). Beyond independence: Conditions for the optimality of the simple Bayesian classifier. Machine Learning, 29(2/3), 103–130.
Drucker, H., & Cortes, C. (1996). Boosting decision trees. Advances in neural information processing systems 8' (pp. 479–485).
Duda, R., & Hart, P. (1973). Pattern classification and scene analysis. Wiley.
Efron, B., & Tibshirani, R. (1993). An introduction to the bootstrap. Chapman & Hall.
Elkan, C. (1997). Boosting and naive bayesian learning (Technical Report). San Diego: Department of Computer Science and Engineering, University of California.
Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the 13th International Joint Conference on Artificial Intelligence (pp. 1022–1027). Morgan Kaufmann Publishers.
Freund, Y. (1990). Boosting a weak learning algorithm by majority. Proceedings of the Third Annual Workshop on Computational Learning Theory (pp. 202–216).
Freund, Y. (1996). Boosting a weak learning algorithm by majority. Information and Computation, 121(2), 256–285.
Freund, Y., & Schapire, R.E. (1995). A decision-theoretic generalization of on-line learning and an application to boosting. Proceedings of the Second European Conference on Computational Learning Theory (pp. 23–37). Springer-Verlag, To appear in Journal of Computer and System Sciences.
Freund, Y., & Schapire, R.E. (1996). Experiments with a new boosting algorithm. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth National Conference (pp. 148–156). Morgan Kaufmann.
Friedman, J.H. (1997). On bias, variance, 0/1-loss, and the curse of dimensionality. Data Mining and Knowledge Discovery, 1(1), 55–77. ftp://playfair.stanford.edu/pub/friedman/curse.ps.Z.
Geman, S., Bienenstock, E., & Doursat, R. (1992). Neural networks and the bias/variance dilemma. Neural Computation, 4, 1–48.
Good, I.J. (1965). The estimation of probabilities: An essay on modern bayesian methods. M.I.T. Press.
Holte, R.C. (1993). Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11, 63–90.
Iba, W., & Langley, P. (1992). Induction of one-level decision trees. Proceedings of the Ninth International Conference on Machine Learning (pp. 233–240). Morgan Kaufmann Publishers.
Kohavi, R. (1995a). A study of cross-validation and bootstrap for accuracy estimation and model selection. In C.S. Mellish (Ed.), Proceedings of the 14th International Joint Conference on Artificial Intelligence (pp. 1137–1143). Morgan Kaufmann. http://robotics.stanford.edu/~ronnyk.
Kohavi, R. (1995b). Wrappers for performance enhancement and oblivious decision graphs. Ph.D. thesis, Stanford University, Computer Science department. STAN-CS-TR–95–1560. http://robotics.Stanford.EDU/~ronnyk/teza.ps.Z.
Kohavi, R., Becker, B., & Sommerfield, D. (1997). Improving simple bayes. The Nineth European Conference on Machine Learning, Poster Papers' (pp. 78–87). Available at http://robotics.stanford.edu/users/ronnyk.
Kohavi, R., & Kunz, C. (1997). Option decision trees with majority votes. In D. Fisher (Ed.), Machine Learning: Proceedings of theFourteenth International Conference (pp. 161–169). Morgan Kaufmann Publishers.Available at http://robotics.stanford.edu/users/ronnyk.
Kohavi, R., & Sahami, M. (1996). Error-based and entropy-based discretization of continuous features. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (pp. 114–119).
Kohavi, R., & Sommerfield, D. (1995). Feature subset selection using the wrapper model: Overfitting and dynamic search space topology. The First International Conference on Knowledge Discovery and Data Mining (pp. 192–197).
Kohavi, R., Sommerfield, D., & Dougherty, J. (1997). Data mining using \({\mathcal{M}}{\mathcal{L}}{\mathcal{C}}\)++: A machine learning library in C++. International Journal on Artificial Intelligence Tools 6(4), 537–566. http://www.sgi.com/Technology/mlc.
Kohavi, R., & Wolpert, D.H. (1996). Bias plus variance decomposition for zero-one loss functions. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (pp. 275–283). Morgan Kaufmann. Available at http://robotics.stanford.edu/users/ronnyk.
Kong, E.B., & Dietterich, T.G. (1995). Error-correcting output coding corrects bias and variance. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 313–321). Morgan Kaufmann.
Kwok, S.W., & Carter, C. (1990). Multiple decision trees. In R.D. Schachter, T.S. Levitt, L.N. Kanal, & J.F. Lemmer (Eds.), Uncertainty in Artificial Intelligence (pp. 327–335). Elsevier Science Publishers.
Langley, P., Iba, W., & Thompson, K. (1992). An analysis of Bayesian classifiers. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 223–228). AAAI Press and MIT Press.
Langley, P., & Sage, S. (1997). Scaling to domains withmany irrelevant features. In R. Greiner (Ed.), Computational learning theory and natural learning systems (Vol. 4). MIT Press.
Oates, T., & Jensen, D. (1997). The effects of training set size on decision tree complexity. In D. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (pp. 254–262). Morgan Kaufmann.
Oliver, J., & Hand, D. (1995). On pruning and averaging decision trees. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 430–437). Morgan Kaufmann.
Pazzani, M., Merz, C., Murphy, P., Ali, K., Hume, T., & Brunk, C. (1994). Reducing misclassification costs. Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann.
Quinlan, J.R. (1993). C4.5: programs for machine learning. San Mateo, California: Morgan Kaufmann.
Quinlan, J.R. (1994). Comparing connectionist and symbolic learning methods. In S.J. Hanson, G.A. Drastal, & R.L. Rivest (Eds.), Computational learning theory and natural learning systems (Vol. I: Constraints and prospects, chap. 15, pp. 445–456). MIT Press.
Quinlan, J.R. (1996). Bagging, boosting, and c4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 725–730). AAAI Press and the MIT Press.
Ridgeway, G., Madigan, D., & Richardson, T. (1998). Interpretable boosted naive bayes classification. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining.
Schaffer, C. (1994). A conservation law for generalization performance. Machine Learning: Proceedings of the Eleventh International Conference (pp. 259–265). Morgan Kaufmann.
Schapire, R.E. (1990). The strength of weak learnability. Machine Learning, 5(2), 197–227.
Schapire, R.E., Freund, Y., Bartlett, P., & Lee,W.S. (1997). Boosting the margin: A new explanation for the effectiveness of voting methods. In D. Fisher (Ed.), Machine Learning: Proceedings of the Fourteenth International Conference (pp. 322–330). Morgan Kaufmann.
Wolpert, D.H. (1992). Stacked generalization. Neural Networks, 5, 241–259.
Wolpert, D.H. (1994). The relationship between PAC, the statistical physics framework, the Bayesian framework, and the VC framework. In D.H. Wolpert (Ed.), The mathematics of generalization. Addison Wesley.