# General and Efficient Multisplitting of Numerical Attributes

- 424 Downloads
- 92 Citations

## Abstract

Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the well-behavedness of an evaluation function, a property that guarantees the optimal multi-partition of an arbitrary numerical domain to be defined on boundary points. Well-behavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the non-cumulative functions Gain Ratio and Normalized Distance Measure are all well-behaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal with respect to a cumulative and well-behaved evaluation function. Our empirical experiments validate the utility of optimal multisplitting: it produces constantly better partitions than alternative approaches do and it only requires comparable time. In top-down induction of decision trees the choice of evaluation function has a more decisive effect on the result than the choice of partitioning strategy; optimizing the value of most common attribute evaluation functions does not raise the accuracy of the produced decision trees. In our tests the construction time using optimal multisplitting was, on the average, twice that required by greedy multisplitting, which in its part required on the average twice the time of binary splitting.

## References

- Auer, P. (1997).
*Optimal splits of single attributes*. Unpublished manuscript, Institute for Theoretical Computer Science, Graz University of Technology.Google Scholar - Auer, P., Holte, R. C., & Maass, W. (1995). Theory and application of agnostic PAC-learning with small decision trees. In A. Prieditis & S. Russell (Eds.),
*Machine Learning: Proceedings of the Twelfth International Conference*(pp. 21–29). San Francisco, CA: Morgan Kaufmann.Google Scholar - Birkendorf, A. (1997). On fast and simple algorithms for finding maximal subarrays and applications in learning theory. In S. Ben-David (Ed.),
*Proceedings of the Third European Conference on Computational Learning Theory*(pp. 198–209). Lecture Notes in Artificial Intelligence (Vol. 1208). Heidelberg: Springer-Verlag.Google Scholar - Breiman, L. (1996). Some properties of splitting criteria.
*Machine Learning*,*24*, 41–47.Google Scholar - Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984).
*Classification and regression trees*. Pacific Grove, CA: Wadsworth.Google Scholar - Brodley, C. (1995). Automatic selection of split criterion during tree growing based on node location. In A. Prieditis & S. Russell (Eds.),
*Machine Learning: Proceedings of the Twelfth International Conference*(pp. 73–80). San Francisco, CA: Morgan Kaufmann.Google Scholar - Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction.
*Machine Learning*,*8*, 75–85.CrossRefGoogle Scholar - Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.),
*Proceedings of the Fifth EuropeanWorking Session on Learning*(pp. 164–178). Lecture Notes in Computer Science (Vol. 482). Heidelberg: Springer-Verlag.Google Scholar - Cestnik, B., Kononenko, I., & Bratko, I. (1987). Assistant 86: A knowledge-elicitation tool for sophisticated users. In I. Bratko & N. Lavra & #x010D; (Eds.),
*Progress in Machine Learning, Proceedings of the Second European Working Session on Learning*(pp. 31–45). Wilmslow: Sigma Press.Google Scholar - Codrington, C. W., & Brodley, C. E. (1999). On the qualitative behavior of impurity-based splitting rules I: The minima-free property.
*Machine Learning*, to appear.Google Scholar - Dietterich, T., Kearns, M., & Mansour, Y. (1996). Applying the weak learning framework to understand and improve C4.5. In L. Saitta (Ed.),
*Machine Learning: Proceedings of the Thirteenth International Conference*(pp. 96–104). San Francisco, CA: Morgan Kaufmann.Google Scholar - Dougherty, J., Kohavi, R., & Sahami, M.(1995). Supervised and unsupervised discretization of continuous features. In A. Prieditis & S. Russell (Eds.),
*Machine Learning: Proceedings of the Twelfth International Conference*(pp. 194–202). SanFrancisco, CA: Morgan Kaufmann.Google Scholar - Elomaa, T. (1994). In defense of C4.5: Notes on learning one-level decision trees. In W. W. Cohen & H. Hirsh (Eds.),
*Machine Learning: Proceedings of the Eleventh International Conference*(pp. 62–69). San Francisco, CA: Morgan Kaufmann.Google Scholar - Elomaa, T., & Rousu, J. (1997). On the well-behavedness of important attribute evaluation functions. In G. Grahne (Ed.),
*Proceedings of the Sixth Scandinavian Conference on Artificial Intelligence*(pp. 95–106). Frontiers in Artificial Intelligence and Applications (Vol. 40). Amsterdam: IOS Press, Tokyo: Ohmsha Ltd.Google Scholar - Fayyad, U. M., & Irani, K. B. (1992a). The attribute selection problem in decision tree generation.
*Proceedings of the Tenth National Conference on Artificial Intelligence*(pp. 104–110). Menlo Park, CA: AAAI Press.Google Scholar - Fayyad, U. M., & Irani, K. B. (1992b). On the handling of continuous-valued attributes in decision tree generation.
*Machine Learning*,*8*, 87–102.CrossRefGoogle Scholar - Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning.
*Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence*(pp. 1022–1027). San Francisco, CA: Morgan Kaufmann.Google Scholar - Fulton, T., Kasif, S., & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis & S. Russell (Eds.),
*Machine Learning: Proceedings of the Twelfth International Conference*(pp. 244–251). San Francisco, CA: Morgan Kaufmann.Google Scholar - Holte, R. C. (1993). Very simple classification rules perform well on most commonly used data sets.
*Machine Learning*,*11*, 63–90.CrossRefGoogle Scholar - Howard, P. G., & Vitter, J. S. (1992). Analysis of arithmetic coding for data compression.
*Information Processing and Management*,*28*, 749–763.Google Scholar - Iba, W. F., & Langley, P. (1992). Induction of one-level decision trees. In D. Sleeman & P. Edwards (Eds.),
*Machine Learning: Proceedings of the Ninth International Workshop*(pp. 233–240). San Francisco, CA: Morgan Kaufmann.Google Scholar - Kononenko, I. (1995). On biases in estimating multi-valued attributes.
*Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence*(pp. 1034–1040). San Francisco, CA: Morgan Kaufmann.Google Scholar - Kononenko, I., Bratko, I., & Ro & #x0161;kar, E. (1984).
*Experiments in automatic learning of medical diagnostic rules*(Technical Report). Ljubljana: Josef Stefan Institute, Faculty of Electrical Engineering and Computer Science.Google Scholar - Landeweerd, G. H., Timmers, T., Gelsema, E. S., Bins, M., & Halie, M. R. (1983). Binary tree versus single level tree classification of white blood cells.
*Pattern Recognition*,*16*, 571–577.CrossRefGoogle Scholar - L & #x00F3;pez de M & #x00E0;ntaras, R. (1991). A distance-based attribute selection measure for decision tree induction.
*Machine Learning*,*6*, 81–92.CrossRefGoogle Scholar - Lubinsky, D. J. (1995). Increasing the performance and consistency of classification trees by using the accuracy criterion at the leaves. In A. Prieditis & S. Russell (Eds.),
*Machine Learning: Proceedings of the Twelfth International Conference*(pp. 371–377). San Francisco, CA: Morgan Kaufmann.Google Scholar - Maass, W. (1994). Efficient agnostic PAC-learning with simple hypotheses.
*Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory*(pp. 67–75). New York: ACM Press.Google Scholar - Merz, C. J., & Murphy, P. M. (1996).
*UCI repository of machine learning databases*(http://www.ics.uci.edu/~mlearn/MLRepository.html). Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar - Mingers, J. (1989). An empirical comparison of selection measures for decision-tree induction.
*Machine Learning*,*3*, 319–342.CrossRefGoogle Scholar - Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end-games. In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.),
*Machine learning: An artificial intelligence approach*(pp. 391–411). Palo Alto, CA: Tioga.Google Scholar - Quinlan, J. R. (1986). Induction of decision trees.
*Machine Learning*,*1*, 81–106.CrossRefGoogle Scholar - Quinlan, J. R. (1988). Decision trees and multi-valued attributes. In J. E. Hayes, D. Michie, & J. Richards (Eds.),
*Machine intelligence (Vol. 11): Logic and the acquisition of knowledge*(pp. 305–318). Oxford: Oxford University Press.Google Scholar - Quinlan, J. R. (1993).
*C4.5: Programs for machine learning*. San Francisco, CA: Morgan Kaufmann.Google Scholar - Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5.
*Journal of Artificial Intelligence Research*,*4*, 77–90.Google Scholar - Quinlan, J. R., & Rivest, R. L. (1989). Inferring decision trees using the minimum description length principle.
*Information and Computation*,*80*, 227–248.Google Scholar - Rissanen, J. (1989).
*Stochastic complexity in statistical inquiry*. River Edge, NJ: World Scientific.Google Scholar - Rissanen, J. (1995). Stochastic complexity in learning. In P. Vit & #x00E1;nyi (Ed.),
*Proceedings of the Second European Conference on Computational Learning Theory*(pp. 196–210). Lecture Notes in Computer Science (Vol. 904). Heidelberg: Springer-Verlag.Google Scholar - Rousu, J. (1996).
*Constructing decision trees and lists using the MDL principle*(in Finnish). Master & #x2019;s thesis, Department of Computer Science, University of Helsinki, Finland.Google Scholar - Van de Merckt, T. (1993). Decision trees in numerical attribute spaces.
*Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence*(pp. 1016–1021). San Francisco, CA: Morgan Kaufmann.Google Scholar - Wallace, C. S., & Freeman, P. R. (1987). Estimation and inference by compact coding.
*Journal of the Royal Statistical Society (B)*,*49*, 240–265.Google Scholar - Wallace, C. S., & Patrick, J. D. (1993). Coding decision trees.
*Machine Learning*,*11*, 7–22.CrossRefGoogle Scholar - White, A. P., & Liu, W. Z. (1994). Bias in information-based measures in decision tree induction.
*Machine Learning*,*15*, 321–329.CrossRefGoogle Scholar - Witten, I. H., Neal, R. M., & Cleary, J. G. (1987). Arithmetic coding for data compression.
*Communications of the ACM*,*30*, 520–540.Google Scholar