Bailey, T.L. and Elkan, C. 1993. Estimating the accuracy of learned concepts. In Proceedings of International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers, pp. 95–112.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. 1984. Classification and Regression Trees. Wadsworth International Group.
Breiman, L. and Spector, P. 1992. Submodel selection and evaluation in regression the x-random case. International Statistical Review, 60(3):291–319.Google Scholar
Catlett, J. 1991. On changing continuous attributes into ordered discrete attributes. In Proc. Fifth European Working Session on Learning. Berlin: Springer-Verlag, pp. 164–177.Google Scholar
Chan, C.-C., Batur, C., and Srinivasan, A. 1991. Determination of quantization intervals in rule based model for dynamic. In Proceedings of the IEEE Conference on Systems, Man, and Cybernetics. Charlottesvile, Virginia, pp. 1719–1723.
Chiu, D.K.Y., Cheung, B., and Wong, A.K.C. 1990. Information synthesis based on hierarchical maximum entropy discretization. Journal of Experimental and Theoretical Artificial Intelligence, 2:117–129.Google Scholar
Chmielewski, M.R. and Grzymala-Busse, J.W. 1994. Global discretization of continuous attributes as preprocessing for machine learning. In Third International Workshop on Rough Sets and Soft Computing, pp. 294–301.
Chou, P. 1991. Optimal partitioning for classification and regression trees. IEEE Trans. Pattern Anal. Mach. Intell, 4:340–354.Google Scholar
Cerquides, J. and Mantaras, R.L. 1997. Proposal and empirical comparison of a parallelizable distance-based discretization method. In KDD97: Third International Conference on Knowledge Discovery and Data Mining, pp. 139–142.
Dougherty, J., Kohavi, R., and Sahami, M. 1995. Supervised and unsupervised discretization of continuous features. In Proc. Twelfth International Conference on Machine Learning. Los Altos, CA: Morgan Kaufmann, pp. 194–202.Google Scholar
Domingos, B. and Pazzani, M. 1996. Beyond independence: Conditions for the optimality of the simple Bayesian classifier. In Machine Learning: Proceedings of Thirteenth International Conference, L. Saitta (Ed.). Morgan Kaufmann Internationals, 105–112.
Efron, B. 1983. Estimating the error rate of a prediction rule: Improvement on cross-validation. Journal of the American Statistical Association, 78(382):316–330.Google Scholar
Fayyad, U. and Irani, K. 1992. On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8:87–102.Google Scholar
Fayyad, U. and Irani, K. 1993. Multi-interval discretization of continuous-valued attributes for classification learning. In Proc. Thirteenth International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann. 1022–1027.Google Scholar
Fayyad, U. and Irani, K. 1996. Discretizing continuous attributes while learning bayesian networks. In Proc. Thirteenth International Conference on Machine Learning. Morgan Kaufmann, pp. 157–165.
Fulton, T., Kasif, S., and Salzberg, S. 1995. Efficient algorithms for finding multi-way splits for decision trees. In Proc. Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, pp. 244–251.Google Scholar
Holte, R.C., Acker, L., and Porter, B.W. 1989. Concept learning and the problem of small disjuncts. In Proceedings of the Eleventh International Joint Conference on Artificial Intelligence. San Mateo, CA: Morgan Kaufmann, pp. 813–818.Google Scholar
Holte, R.C. 1993. Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63–90.Google Scholar
Ho, K.M. and Scott, P.D. 1997. Zeta: A global method for discretization of continuous variables. In KDD97: 3rd International Conference of Knowledge Discovery and Data Mining. Newport Beach, CA, pp. 191–194.
John, G., Kohavi, R., and Pfleger, K. 1994. Irrelevant features and the subset selection problem. In Proceedings of the Eleventh International Machine Learning Conference. New Brunswick, NJ: Morgan Kaufmann, pp. 121–129.Google Scholar
Kerber, R. 1992. Chimerge: Discretization of numeric attributes. In Proc. AAAI-92, Ninth National Confrerence Articial Intelligence. AAAI Press/The MIT Press, pp. 123–128.
Kontkaren, P., Myllymaki, P., Silander, T., and Tirri, H. 1998. Bayda: Software for bayesian classification and feature selection. In 4th International Conference on Knowledge Discovery and Data Mining, pp. 254–258.
Langley, P., Iba, W., and Thompson, K. 1992. An analysis of bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence. AAAI Press and MIT Press, pp. 223–228.
Langley, P. and Sage, S. 1994. Induction of selective bayesian classifiers. In Proceeding Conference on Uncertainty in AI. Morgan Kaufmann, pp. 255–261.
Liu, H. and Setiono, R. 1995. Chi2: Feature selection and discretization of numeric attributes. In Proceedings of the Seventh IEEE International Conference on Tools with Artificial Intelligence, November 5-8, 1995, J.F. Vassilopoulos (Ed.). Herndon, Virginia, IEEE Computer Society, pp. 388–391.Google Scholar
Liu, H. and Setiono, R. 1997. Feature selection and discretization. IEEE Transactios on Knowledge and Data Engineering, 9:1–4.Google Scholar
Maass, W. 1994. Efficient agnostic pac-learning with simple hypotheses. In Proc. Seventh Annual ACM Conference on Computational Learning Theory. New York, NY: ACM Press, pp. 67–75.Google Scholar
Mantaras, R.L. 1991. A distance based attribute selection measure for decision tree induction. Machine Learning, 103–115.
Merz, C.J. and Murphy, P.M. 1996. UCI repository of machine learning databases. http://www.ics.uci.edu/~mlearn/MLRepository.html. Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar
Oates, T. and Jensen, D. 1999. Large datsets lead to overly complex models: An explanation and a solution. In Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98). AAAI Press/The MIT Press, pp. 294–298.
Pfahringer, B. 1995a. Compression-based discretization of continuous attributes. In Proc. Twelfth International Conference on Machine Learning. San Francisco, CA: Morgan Kaufmann, pp. 456–463.Google Scholar
Pfahringer, B. 1995b. A new mdl measure for robust rule induction. In ECML95: European Conference on Machine Learning (Extended abstract), pp. 331–334.
Quinlan, J.R. 1986. Induction of decision trees. Machine Learning, 1:81–106.Google Scholar
Quinlan, J.R. 1988. Decision trees and multi-valued attributes. Machine Intelligence 11: Logic and the Acquisition of Knowledge, pp. 305–318.
Quinlan, J.R. 1993. C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.Google Scholar
Quinlan, J.R. 1996. Improved use of continuous attributes in c.45. Artificial Intelligence Research, 4:77–90.Google Scholar
Richeldi, M. and Rossotto, M. 1995. Class-driven statistical discretization of continuous attributes. In Proc. of European Conference on Machine Learning. Springer Verlag, pp. 335–338.
Schaffer, C. 1994. A conservation law for generalization performance. In Machine Learning: Proceedings of the Eleventh International Conference. Morgan Kaufmann, pp. 259–265.
Simon, H.A. 1981. The Sciences of the Artificial, 2nd edn. Cambridge, MA: MIT Press.Google Scholar
Shannon, C. and Weaver, W. 1949. The Mathmatical Theory of Information. Urbana: University of Illinois Press.Google Scholar
Thornton, C.J. 1992. Techniques of Computational Learning: An Introduction. Chapman and Hall.
Utogoff, P. 1989. Incremental induction of decision trees. Machine Learning, 4:161–186.Google Scholar
Van de Merckt, T. 1990. Decision trees in numerical attribute spaces. Machine Learning, 1016–1021.
Weiss, S.M. and Indurkhya, N. 1994. Decision tree pruning: Biased or optimal. In Proceedings of the Twelfth National Conference on Artificial Intelligence. AAAI Press and MIT Press, pp. 626–632.
Wang, K. and Liu, B. 1998. Concurrent discretization of multiple attributes. In Pacific-Rim International Conference on AI, pp. 250–259.