Abstract
Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the well-behavedness of an evaluation function, a property that guarantees the optimal multi-partition of an arbitrary numerical domain to be defined on boundary points. Well-behavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the non-cumulative functions Gain Ratio and Normalized Distance Measure are all well-behaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal with respect to a cumulative and well-behaved evaluation function. Our empirical experiments validate the utility of optimal multisplitting: it produces constantly better partitions than alternative approaches do and it only requires comparable time. In top-down induction of decision trees the choice of evaluation function has a more decisive effect on the result than the choice of partitioning strategy; optimizing the value of most common attribute evaluation functions does not raise the accuracy of the produced decision trees. In our tests the construction time using optimal multisplitting was, on the average, twice that required by greedy multisplitting, which in its part required on the average twice the time of binary splitting.
Article PDF
Similar content being viewed by others
References
Auer, P. (1997). Optimal splits of single attributes. Unpublished manuscript, Institute for Theoretical Computer Science, Graz University of Technology.
Auer, P., Holte, R. C., & Maass, W. (1995). Theory and application of agnostic PAC-learning with small decision trees. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 21–29). San Francisco, CA: Morgan Kaufmann.
Birkendorf, A. (1997). On fast and simple algorithms for finding maximal subarrays and applications in learning theory. In S. Ben-David (Ed.), Proceedings of the Third European Conference on Computational Learning Theory (pp. 198–209). Lecture Notes in Artificial Intelligence (Vol. 1208). Heidelberg: Springer-Verlag.
Breiman, L. (1996). Some properties of splitting criteria. Machine Learning, 24, 41–47.
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Pacific Grove, CA: Wadsworth.
Brodley, C. (1995). Automatic selection of split criterion during tree growing based on node location. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 73–80). San Francisco, CA: Morgan Kaufmann.
Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–85.
Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.), Proceedings of the Fifth EuropeanWorking Session on Learning (pp. 164–178). Lecture Notes in Computer Science (Vol. 482). Heidelberg: Springer-Verlag.
Cestnik, B., Kononenko, I., & Bratko, I. (1987). Assistant 86: A knowledge-elicitation tool for sophisticated users. In I. Bratko & N. Lavra & #x010D; (Eds.), Progress in Machine Learning, Proceedings of the Second European Working Session on Learning (pp. 31–45). Wilmslow: Sigma Press.
Codrington, C. W., & Brodley, C. E. (1999). On the qualitative behavior of impurity-based splitting rules I: The minima-free property. Machine Learning, to appear.
Dietterich, T., Kearns, M., & Mansour, Y. (1996). Applying the weak learning framework to understand and improve C4.5. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (pp. 96–104). San Francisco, CA: Morgan Kaufmann.
Dougherty, J., Kohavi, R., & Sahami, M.(1995). Supervised and unsupervised discretization of continuous features. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 194–202). SanFrancisco, CA: Morgan Kaufmann.
Elomaa, T. (1994). In defense of C4.5: Notes on learning one-level decision trees. In W. W. Cohen & H. Hirsh (Eds.), Machine Learning: Proceedings of the Eleventh International Conference (pp. 62–69). San Francisco, CA: Morgan Kaufmann.
Elomaa, T., & Rousu, J. (1997). On the well-behavedness of important attribute evaluation functions. In G. Grahne (Ed.), Proceedings of the Sixth Scandinavian Conference on Artificial Intelligence (pp. 95–106). Frontiers in Artificial Intelligence and Applications (Vol. 40). Amsterdam: IOS Press, Tokyo: Ohmsha Ltd.
Fayyad, U. M., & Irani, K. B. (1992a). The attribute selection problem in decision tree generation. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 104–110). Menlo Park, CA: AAAI Press.
Fayyad, U. M., & Irani, K. B. (1992b). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87–102.
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1027). San Francisco, CA: Morgan Kaufmann.
Fulton, T., Kasif, S., & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 244–251). San Francisco, CA: Morgan Kaufmann.
Holte, R. C. (1993). Very simple classification rules perform well on most commonly used data sets. Machine Learning, 11, 63–90.
Howard, P. G., & Vitter, J. S. (1992). Analysis of arithmetic coding for data compression. Information Processing and Management, 28, 749–763.
Iba, W. F., & Langley, P. (1992). Induction of one-level decision trees. In D. Sleeman & P. Edwards (Eds.), Machine Learning: Proceedings of the Ninth International Workshop (pp. 233–240). San Francisco, CA: Morgan Kaufmann.
Kononenko, I. (1995). On biases in estimating multi-valued attributes. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1034–1040). San Francisco, CA: Morgan Kaufmann.
Kononenko, I., Bratko, I., & Ro & #x0161;kar, E. (1984). Experiments in automatic learning of medical diagnostic rules (Technical Report). Ljubljana: Josef Stefan Institute, Faculty of Electrical Engineering and Computer Science.
Landeweerd, G. H., Timmers, T., Gelsema, E. S., Bins, M., & Halie, M. R. (1983). Binary tree versus single level tree classification of white blood cells. Pattern Recognition, 16, 571–577.
L & #x00F3;pez de M & #x00E0;ntaras, R. (1991). A distance-based attribute selection measure for decision tree induction. Machine Learning, 6, 81–92.
Lubinsky, D. J. (1995). Increasing the performance and consistency of classification trees by using the accuracy criterion at the leaves. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 371–377). San Francisco, CA: Morgan Kaufmann.
Maass, W. (1994). Efficient agnostic PAC-learning with simple hypotheses. Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory (pp. 67–75). New York: ACM Press.
Merz, C. J., & Murphy, P. M. (1996). UCI repository of machine learning databases (http://www.ics.uci.edu/~mlearn/MLRepository.html). Irvine, CA: University of California, Department of Information and Computer Science.
Mingers, J. (1989). An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3, 319–342.
Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end-games. In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (pp. 391–411). Palo Alto, CA: Tioga.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.
Quinlan, J. R. (1988). Decision trees and multi-valued attributes. In J. E. Hayes, D. Michie, & J. Richards (Eds.), Machine intelligence (Vol. 11): Logic and the acquisition of knowledge (pp. 305–318). Oxford: Oxford University Press.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA: Morgan Kaufmann.
Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.
Quinlan, J. R., & Rivest, R. L. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80, 227–248.
Rissanen, J. (1989). Stochastic complexity in statistical inquiry. River Edge, NJ: World Scientific.
Rissanen, J. (1995). Stochastic complexity in learning. In P. Vit & #x00E1;nyi (Ed.), Proceedings of the Second European Conference on Computational Learning Theory (pp. 196–210). Lecture Notes in Computer Science (Vol. 904). Heidelberg: Springer-Verlag.
Rousu, J. (1996). Constructing decision trees and lists using the MDL principle (in Finnish). Master & #x2019;s thesis, Department of Computer Science, University of Helsinki, Finland.
Van de Merckt, T. (1993). Decision trees in numerical attribute spaces. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1016–1021). San Francisco, CA: Morgan Kaufmann.
Wallace, C. S., & Freeman, P. R. (1987). Estimation and inference by compact coding. Journal of the Royal Statistical Society (B), 49, 240–265.
Wallace, C. S., & Patrick, J. D. (1993). Coding decision trees. Machine Learning, 11, 7–22.
White, A. P., & Liu, W. Z. (1994). Bias in information-based measures in decision tree induction. Machine Learning, 15, 321–329.
Witten, I. H., Neal, R. M., & Cleary, J. G. (1987). Arithmetic coding for data compression. Communications of the ACM, 30, 520–540.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Elomaa, T., Rousu, J. General and Efficient Multisplitting of Numerical Attributes. Machine Learning 36, 201–244 (1999). https://doi.org/10.1023/A:1007674919412
Issue Date:
DOI: https://doi.org/10.1023/A:1007674919412