Machine Learning

, Volume 61, Issue 1–3, pp 5–48 | Cite as

Incremental Learning of Linear Model Trees

Article

Abstract

A linear model tree is a decision tree with a linear functional model in each leaf. Previous model tree induction algorithms have been batch techniques that operate on the entire training set. However there are many situations when an incremental learner is advantageous. In this article a new batch model tree learner is described with two alternative splitting rules and a stopping rule. An incremental algorithm is then developed that has many similarities with the batch version but is able to process examples one at a time. An online pruning rule is also developed. The incremental training time for an example is shown to only depend on the height of the tree induced so far, and not on the number of previous examples. The algorithms are evaluated empirically on a number of standard datasets, a simple test function and three dynamic domains ranging from a simple pendulum to a complex 13 dimensional flight simulator. The new batch algorithm is compared with the most recent batch model tree algorithms and is seen to perform favourably overall. The new incremental model tree learner compares well with an alternative online function approximator. In addition it can sometimes perform almost as well as the batch model tree algorithms, highlighting the effectiveness of the incremental implementation.

Keywords

model trees linear regression trees online learning incremental learning 

References

  1. Alexander, W., & Grimshaw, S. (1996). Treed regression. Journal of Computational and Graphical Statistics, 5, 156–175.Google Scholar
  2. Atkeson, C., Moore, A., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73.Google Scholar
  3. Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth.Google Scholar
  4. Cestnik, B., & Bratko, I. (1991). On estimating probabilities in tree pruning. In Y. Kodratoff (Ed.), Proceedings of the European Working Session on Learning, vol. 482 of Lecture Notes in Artificial Intelligence (pp. 138–150). Springer.Google Scholar
  5. Chaudhuri, P., Huang, M., Loh, W., & Yao, R. (1994). Piecewise-polynomial regression trees. Statistica Sinica, 4, 143–167.Google Scholar
  6. Chow, G. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28:3, 591–605.MathSciNetMATHGoogle Scholar
  7. Dobra, A., & Gehrke, J. (2002). SECRET: A scalable linear regression tree algorithm. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 481–487). ACM.Google Scholar
  8. Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. (1998). Using model trees for classification. Machine Learning, 32:1, 63–76.CrossRefGoogle Scholar
  9. Gama, J. (2004). Functional trees. Machine Learning, 55:3, 219–250.CrossRefMATHGoogle Scholar
  10. Gama, J., Rocha, R., & Medas, P. (2003). Accurate decision trees for mining high-speed data streams. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 523–528). ACM.Google Scholar
  11. Hastie, T., & Loader, C. (1993). Local regression: Automatic kernel carpentry. Statistical Science, 8:2, 120–143.Google Scholar
  12. Haykin, S. (2002). Adaptive Filter Theory. Prentice-Hall.Google Scholar
  13. Isaac, A., & Sammut, C. (2003). Goal-directed learning to fly. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th International Conference of Machine Learning (pp. 258–265). AAAI Press.Google Scholar
  14. Karalič, A. (1992). Employing linear regression in regression tree leaves. In B. Neumann (Ed.), Proceedings of the 10th European Conference on Artificial Intelligence (pp. 440–441). Wiley.Google Scholar
  15. Kullback, S., & Rosenblatt, H. (1957). On the analysis of multiple regression in k categories. Biometrika, 44, 67–83.MathSciNetGoogle Scholar
  16. Last, M. (2002). Online classification of non-stationary data streams. Intelligent Data Analysis, 6, 129–147.MATHGoogle Scholar
  17. Li, K., Lue, H., & Chen, C. (2000). Interactive tree-structured regression via principal Hessian directions. Journal of the American Statistical Association, 95, 547–560.MathSciNetGoogle Scholar
  18. Ljung, L. (1987). System Identification: Theory for the User. Prentice-Hall.Google Scholar
  19. Loh, W. (2002). Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12, 361–386.MathSciNetMATHGoogle Scholar
  20. Malerba, D., Esposito, F., Ceci, M., & Appice, A. (2004). Top-down induction of model trees with regression and splitting nodes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 612–625.CrossRefGoogle Scholar
  21. Mehta, M., Agrawal, R., & Rissanen, J. (1996). SLIQ: A fast scalable classifier for data mining. In P. Apers, M. Bouzeghoub, & G. Gardarin (Eds.), Proceedings of the 5th International Conference on Extending Database Technology, vol. 1057 of Lecture Notes in Computer Science (pp. 18–32). Springer.Google Scholar
  22. Moore, A., & Atkeson, C. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning, 21, 199–233.Google Scholar
  23. Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291–323.CrossRefGoogle Scholar
  24. Murray-Smith, R. (1994). A local model network approach to nonlinear modelling. Ph.D. thesis, University of Strathclyde, Strathclyde, UK.Google Scholar
  25. Nakanishi, J., Farrell, J., & Schaal, S. (2004). Learning composite adaptive control for a class of nonlinear systems. In IEEE International Conference on Robotics and Automation (pp. 2647–2652).Google Scholar
  26. Nelles, O. (2001) Nonlinear System Identification. Springer.Google Scholar
  27. Potts, D. (2004a). Fast incremental learning of linear model trees. In J. Carbonell & J. Siekmann (Eds.), Proceedings of the 8th Pacific Rim International Conference on Artificial Intelligence, vol. 3157 of Lecture Notes in Artificial Intelligence (pp. 221–230). Springer.Google Scholar
  28. Potts, D. (2004b). Incremental learning of linear model trees. In R. Greiner & D. Schuurmans (Eds.), Proceedings of the 21st International Conference on Machine Learning (pp. 663–670). ACM.Google Scholar
  29. Quinlan, J. (1993a). C4.5: Programs for Machine Learning. Morgan Kaufmann.Google Scholar
  30. Quinlan, J. (1993b). Combining instance-based and model-based learning. In Proceedings of the 10th International Conference on Machine Learning (pp. 236–243). Morgan Kaufmann.Google Scholar
  31. Robnik-Šikonja, M., & Kononenko, I. (1998). Pruning regression trees with MDL. In H. Prade (Ed.), Proceedings of the 13th European Conference on Artificial Intelligence (pp. 455–459). Wiley.Google Scholar
  32. Schaal, S., & Atkeson, C. (1998). Constructive incremental learning from only local information. Neural Computation, 10, 2047–2084.CrossRefGoogle Scholar
  33. Schlimmer, J., & Fisher, D. (1986). A case study of incremental concept induction. In Proceedings of the 5th National Conference on Artificial Intelligence (pp. 496–501). AAAI Press.Google Scholar
  34. Sicilano, R., & Mola, F. (1994). Modelling for recursive partitioning and variable selection. In R. Dutter & W. Grossmann (Eds.), Proceedings in Computational Statistics: COMPSTAT '94 (pp. 172–177). Physica Verlag.Google Scholar
  35. Slotine, J., & Li, W. (1991). Applied nonlinear control. Prentice-Hall.Google Scholar
  36. Šuc, D., Vladušič, D., & Bratko, I. (2004). Qualitatively faithful quantitative prediction. Artificial Intelligence, 158:2, 189–214.MathSciNetGoogle Scholar
  37. Torgo, L. (1997). Functional models for regression tree leaves. In D. Fisher (Ed.), Proceedings of the 14th International Conference on Machine Learning (pp. 385–393). Morgan Kaufmann.Google Scholar
  38. Torgo, L. (2002). Computationally efficient linear regression trees. In K. Jajuga, A. Sokolowski, & H.-H. Bock (Eds.), Classification, Clustering and Data Analysis: Recent Advances and Applications. Springer.Google Scholar
  39. Utgoff, P., Berkman, N., & Clouse, J. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29, 5–44.CrossRefGoogle Scholar
  40. Vijayakumar, S., & Schaal, S. (2000). Locally weighted projection regression: Incremental real time learning in high dimensional space. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (pp. 1079–1086). Morgan Kaufmann.Google Scholar
  41. Wang, Y., & Witten, I. (1997). Inducing model trees for continuous classes. In Proceedings of Poster Papers, 9th European Conference on Machine Learning.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.School of Computer Science and Engineering and ARC Centre of Excellence for Autonomous SystemsUniversity of New South WalesSydneyAustralia

Personalised recommendations