Advertisement

Machine Learning

, Volume 36, Issue 3, pp 201–244 | Cite as

General and Efficient Multisplitting of Numerical Attributes

  • Tapio Elomaa
  • Juho Rousu
Article

Abstract

Often in supervised learning numerical attributes require special treatment and do not fit the learning scheme as well as one could hope. Nevertheless, they are common in practical tasks and, therefore, need to be taken into account. We characterize the well-behavedness of an evaluation function, a property that guarantees the optimal multi-partition of an arbitrary numerical domain to be defined on boundary points. Well-behavedness reduces the number of candidate cut points that need to be examined in multisplitting numerical attributes. Many commonly used attribute evaluation functions possess this property; we demonstrate that the cumulative functions Information Gain and Training Set Error as well as the non-cumulative functions Gain Ratio and Normalized Distance Measure are all well-behaved. We also devise a method of finding optimal multisplits efficiently by examining the minimum number of boundary point combinations that is required to produce partitions which are optimal with respect to a cumulative and well-behaved evaluation function. Our empirical experiments validate the utility of optimal multisplitting: it produces constantly better partitions than alternative approaches do and it only requires comparable time. In top-down induction of decision trees the choice of evaluation function has a more decisive effect on the result than the choice of partitioning strategy; optimizing the value of most common attribute evaluation functions does not raise the accuracy of the produced decision trees. In our tests the construction time using optimal multisplitting was, on the average, twice that required by greedy multisplitting, which in its part required on the average twice the time of binary splitting.

supervised learning numerical attributes optimal partitions evaluation functions 

References

  1. Auer, P. (1997). Optimal splits of single attributes. Unpublished manuscript, Institute for Theoretical Computer Science, Graz University of Technology.Google Scholar
  2. Auer, P., Holte, R. C., & Maass, W. (1995). Theory and application of agnostic PAC-learning with small decision trees. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 21–29). San Francisco, CA: Morgan Kaufmann.Google Scholar
  3. Birkendorf, A. (1997). On fast and simple algorithms for finding maximal subarrays and applications in learning theory. In S. Ben-David (Ed.), Proceedings of the Third European Conference on Computational Learning Theory (pp. 198–209). Lecture Notes in Artificial Intelligence (Vol. 1208). Heidelberg: Springer-Verlag.Google Scholar
  4. Breiman, L. (1996). Some properties of splitting criteria. Machine Learning, 24, 41–47.Google Scholar
  5. Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Pacific Grove, CA: Wadsworth.Google Scholar
  6. Brodley, C. (1995). Automatic selection of split criterion during tree growing based on node location. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 73–80). San Francisco, CA: Morgan Kaufmann.Google Scholar
  7. Buntine, W., & Niblett, T. (1992). A further comparison of splitting rules for decision-tree induction. Machine Learning, 8, 75–85.CrossRefGoogle Scholar
  8. Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.), Proceedings of the Fifth EuropeanWorking Session on Learning (pp. 164–178). Lecture Notes in Computer Science (Vol. 482). Heidelberg: Springer-Verlag.Google Scholar
  9. Cestnik, B., Kononenko, I., & Bratko, I. (1987). Assistant 86: A knowledge-elicitation tool for sophisticated users. In I. Bratko & N. Lavra & #x010D; (Eds.), Progress in Machine Learning, Proceedings of the Second European Working Session on Learning (pp. 31–45). Wilmslow: Sigma Press.Google Scholar
  10. Codrington, C. W., & Brodley, C. E. (1999). On the qualitative behavior of impurity-based splitting rules I: The minima-free property. Machine Learning, to appear.Google Scholar
  11. Dietterich, T., Kearns, M., & Mansour, Y. (1996). Applying the weak learning framework to understand and improve C4.5. In L. Saitta (Ed.), Machine Learning: Proceedings of the Thirteenth International Conference (pp. 96–104). San Francisco, CA: Morgan Kaufmann.Google Scholar
  12. Dougherty, J., Kohavi, R., & Sahami, M.(1995). Supervised and unsupervised discretization of continuous features. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 194–202). SanFrancisco, CA: Morgan Kaufmann.Google Scholar
  13. Elomaa, T. (1994). In defense of C4.5: Notes on learning one-level decision trees. In W. W. Cohen & H. Hirsh (Eds.), Machine Learning: Proceedings of the Eleventh International Conference (pp. 62–69). San Francisco, CA: Morgan Kaufmann.Google Scholar
  14. Elomaa, T., & Rousu, J. (1997). On the well-behavedness of important attribute evaluation functions. In G. Grahne (Ed.), Proceedings of the Sixth Scandinavian Conference on Artificial Intelligence (pp. 95–106). Frontiers in Artificial Intelligence and Applications (Vol. 40). Amsterdam: IOS Press, Tokyo: Ohmsha Ltd.Google Scholar
  15. Fayyad, U. M., & Irani, K. B. (1992a). The attribute selection problem in decision tree generation. Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 104–110). Menlo Park, CA: AAAI Press.Google Scholar
  16. Fayyad, U. M., & Irani, K. B. (1992b). On the handling of continuous-valued attributes in decision tree generation. Machine Learning, 8, 87–102.CrossRefGoogle Scholar
  17. Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1022–1027). San Francisco, CA: Morgan Kaufmann.Google Scholar
  18. Fulton, T., Kasif, S., & Salzberg, S. (1995). Efficient algorithms for finding multi-way splits for decision trees. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 244–251). San Francisco, CA: Morgan Kaufmann.Google Scholar
  19. Holte, R. C. (1993). Very simple classification rules perform well on most commonly used data sets. Machine Learning, 11, 63–90.CrossRefGoogle Scholar
  20. Howard, P. G., & Vitter, J. S. (1992). Analysis of arithmetic coding for data compression. Information Processing and Management, 28, 749–763.Google Scholar
  21. Iba, W. F., & Langley, P. (1992). Induction of one-level decision trees. In D. Sleeman & P. Edwards (Eds.), Machine Learning: Proceedings of the Ninth International Workshop (pp. 233–240). San Francisco, CA: Morgan Kaufmann.Google Scholar
  22. Kononenko, I. (1995). On biases in estimating multi-valued attributes. Proceedings of the Fourteenth International Joint Conference on Artificial Intelligence (pp. 1034–1040). San Francisco, CA: Morgan Kaufmann.Google Scholar
  23. Kononenko, I., Bratko, I., & Ro & #x0161;kar, E. (1984). Experiments in automatic learning of medical diagnostic rules (Technical Report). Ljubljana: Josef Stefan Institute, Faculty of Electrical Engineering and Computer Science.Google Scholar
  24. Landeweerd, G. H., Timmers, T., Gelsema, E. S., Bins, M., & Halie, M. R. (1983). Binary tree versus single level tree classification of white blood cells. Pattern Recognition, 16, 571–577.CrossRefGoogle Scholar
  25. L & #x00F3;pez de M & #x00E0;ntaras, R. (1991). A distance-based attribute selection measure for decision tree induction. Machine Learning, 6, 81–92.CrossRefGoogle Scholar
  26. Lubinsky, D. J. (1995). Increasing the performance and consistency of classification trees by using the accuracy criterion at the leaves. In A. Prieditis & S. Russell (Eds.), Machine Learning: Proceedings of the Twelfth International Conference (pp. 371–377). San Francisco, CA: Morgan Kaufmann.Google Scholar
  27. Maass, W. (1994). Efficient agnostic PAC-learning with simple hypotheses. Proceedings of the Seventh Annual ACM Conference on Computational Learning Theory (pp. 67–75). New York: ACM Press.Google Scholar
  28. Merz, C. J., & Murphy, P. M. (1996). UCI repository of machine learning databases (http://www.ics.uci.edu/~mlearn/MLRepository.html). Irvine, CA: University of California, Department of Information and Computer Science.Google Scholar
  29. Mingers, J. (1989). An empirical comparison of selection measures for decision-tree induction. Machine Learning, 3, 319–342.CrossRefGoogle Scholar
  30. Quinlan, J. R. (1983). Learning efficient classification procedures and their application to chess end-games. In R.S. Michalski, J.G. Carbonell, & T.M. Mitchell (Eds.), Machine learning: An artificial intelligence approach (pp. 391–411). Palo Alto, CA: Tioga.Google Scholar
  31. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1, 81–106.CrossRefGoogle Scholar
  32. Quinlan, J. R. (1988). Decision trees and multi-valued attributes. In J. E. Hayes, D. Michie, & J. Richards (Eds.), Machine intelligence (Vol. 11): Logic and the acquisition of knowledge (pp. 305–318). Oxford: Oxford University Press.Google Scholar
  33. Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Francisco, CA: Morgan Kaufmann.Google Scholar
  34. Quinlan, J. R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77–90.Google Scholar
  35. Quinlan, J. R., & Rivest, R. L. (1989). Inferring decision trees using the minimum description length principle. Information and Computation, 80, 227–248.Google Scholar
  36. Rissanen, J. (1989). Stochastic complexity in statistical inquiry. River Edge, NJ: World Scientific.Google Scholar
  37. Rissanen, J. (1995). Stochastic complexity in learning. In P. Vit & #x00E1;nyi (Ed.), Proceedings of the Second European Conference on Computational Learning Theory (pp. 196–210). Lecture Notes in Computer Science (Vol. 904). Heidelberg: Springer-Verlag.Google Scholar
  38. Rousu, J. (1996). Constructing decision trees and lists using the MDL principle (in Finnish). Master & #x2019;s thesis, Department of Computer Science, University of Helsinki, Finland.Google Scholar
  39. Van de Merckt, T. (1993). Decision trees in numerical attribute spaces. Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence (pp. 1016–1021). San Francisco, CA: Morgan Kaufmann.Google Scholar
  40. Wallace, C. S., & Freeman, P. R. (1987). Estimation and inference by compact coding. Journal of the Royal Statistical Society (B), 49, 240–265.Google Scholar
  41. Wallace, C. S., & Patrick, J. D. (1993). Coding decision trees. Machine Learning, 11, 7–22.CrossRefGoogle Scholar
  42. White, A. P., & Liu, W. Z. (1994). Bias in information-based measures in decision tree induction. Machine Learning, 15, 321–329.CrossRefGoogle Scholar
  43. Witten, I. H., Neal, R. M., & Cleary, J. G. (1987). Arithmetic coding for data compression. Communications of the ACM, 30, 520–540.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Tapio Elomaa
    • 1
  • Juho Rousu
    • 2
  1. 1.Department of Computer ScienceUniversity of HelsinkiFinland
  2. 2.VTT Biotechnology and Food ResearchVTTFinland

Personalised recommendations