Abstract
A linear model tree is a decision tree with a linear functional model in each leaf. Previous model tree induction algorithms have been batch techniques that operate on the entire training set. However there are many situations when an incremental learner is advantageous. In this article a new batch model tree learner is described with two alternative splitting rules and a stopping rule. An incremental algorithm is then developed that has many similarities with the batch version but is able to process examples one at a time. An online pruning rule is also developed. The incremental training time for an example is shown to only depend on the height of the tree induced so far, and not on the number of previous examples. The algorithms are evaluated empirically on a number of standard datasets, a simple test function and three dynamic domains ranging from a simple pendulum to a complex 13 dimensional flight simulator. The new batch algorithm is compared with the most recent batch model tree algorithms and is seen to perform favourably overall. The new incremental model tree learner compares well with an alternative online function approximator. In addition it can sometimes perform almost as well as the batch model tree algorithms, highlighting the effectiveness of the incremental implementation.
Article PDF
Similar content being viewed by others
References
Alexander, W., & Grimshaw, S. (1996). Treed regression. Journal of Computational and Graphical Statistics, 5, 156–175.
Atkeson, C., Moore, A., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Wadsworth.
Cestnik, B., & Bratko, I. (1991). On estimating probabilities in tree pruning. In Y. Kodratoff (Ed.), Proceedings of the European Working Session on Learning, vol. 482 of Lecture Notes in Artificial Intelligence (pp. 138–150). Springer.
Chaudhuri, P., Huang, M., Loh, W., & Yao, R. (1994). Piecewise-polynomial regression trees. Statistica Sinica, 4, 143–167.
Chow, G. (1960). Tests of equality between sets of coefficients in two linear regressions. Econometrica, 28:3, 591–605.
Dobra, A., & Gehrke, J. (2002). SECRET: A scalable linear regression tree algorithm. In Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 481–487). ACM.
Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. (1998). Using model trees for classification. Machine Learning, 32:1, 63–76.
Gama, J. (2004). Functional trees. Machine Learning, 55:3, 219–250.
Gama, J., Rocha, R., & Medas, P. (2003). Accurate decision trees for mining high-speed data streams. In Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 523–528). ACM.
Hastie, T., & Loader, C. (1993). Local regression: Automatic kernel carpentry. Statistical Science, 8:2, 120–143.
Haykin, S. (2002). Adaptive Filter Theory. Prentice-Hall.
Isaac, A., & Sammut, C. (2003). Goal-directed learning to fly. In T. Fawcett & N. Mishra (Eds.), Proceedings of the 20th International Conference of Machine Learning (pp. 258–265). AAAI Press.
Karalič, A. (1992). Employing linear regression in regression tree leaves. In B. Neumann (Ed.), Proceedings of the 10th European Conference on Artificial Intelligence (pp. 440–441). Wiley.
Kullback, S., & Rosenblatt, H. (1957). On the analysis of multiple regression in k categories. Biometrika, 44, 67–83.
Last, M. (2002). Online classification of non-stationary data streams. Intelligent Data Analysis, 6, 129–147.
Li, K., Lue, H., & Chen, C. (2000). Interactive tree-structured regression via principal Hessian directions. Journal of the American Statistical Association, 95, 547–560.
Ljung, L. (1987). System Identification: Theory for the User. Prentice-Hall.
Loh, W. (2002). Regression trees with unbiased variable selection and interaction detection. Statistica Sinica, 12, 361–386.
Malerba, D., Esposito, F., Ceci, M., & Appice, A. (2004). Top-down induction of model trees with regression and splitting nodes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26, 612–625.
Mehta, M., Agrawal, R., & Rissanen, J. (1996). SLIQ: A fast scalable classifier for data mining. In P. Apers, M. Bouzeghoub, & G. Gardarin (Eds.), Proceedings of the 5th International Conference on Extending Database Technology, vol. 1057 of Lecture Notes in Computer Science (pp. 18–32). Springer.
Moore, A., & Atkeson, C. (1995). The parti-game algorithm for variable resolution reinforcement learning in multidimensional state-spaces. Machine Learning, 21, 199–233.
Munos, R., & Moore, A. (2002). Variable resolution discretization in optimal control. Machine Learning, 49, 291–323.
Murray-Smith, R. (1994). A local model network approach to nonlinear modelling. Ph.D. thesis, University of Strathclyde, Strathclyde, UK.
Nakanishi, J., Farrell, J., & Schaal, S. (2004). Learning composite adaptive control for a class of nonlinear systems. In IEEE International Conference on Robotics and Automation (pp. 2647–2652).
Nelles, O. (2001) Nonlinear System Identification. Springer.
Potts, D. (2004a). Fast incremental learning of linear model trees. In J. Carbonell & J. Siekmann (Eds.), Proceedings of the 8th Pacific Rim International Conference on Artificial Intelligence, vol. 3157 of Lecture Notes in Artificial Intelligence (pp. 221–230). Springer.
Potts, D. (2004b). Incremental learning of linear model trees. In R. Greiner & D. Schuurmans (Eds.), Proceedings of the 21st International Conference on Machine Learning (pp. 663–670). ACM.
Quinlan, J. (1993a). C4.5: Programs for Machine Learning. Morgan Kaufmann.
Quinlan, J. (1993b). Combining instance-based and model-based learning. In Proceedings of the 10th International Conference on Machine Learning (pp. 236–243). Morgan Kaufmann.
Robnik-Šikonja, M., & Kononenko, I. (1998). Pruning regression trees with MDL. In H. Prade (Ed.), Proceedings of the 13th European Conference on Artificial Intelligence (pp. 455–459). Wiley.
Schaal, S., & Atkeson, C. (1998). Constructive incremental learning from only local information. Neural Computation, 10, 2047–2084.
Schlimmer, J., & Fisher, D. (1986). A case study of incremental concept induction. In Proceedings of the 5th National Conference on Artificial Intelligence (pp. 496–501). AAAI Press.
Sicilano, R., & Mola, F. (1994). Modelling for recursive partitioning and variable selection. In R. Dutter & W. Grossmann (Eds.), Proceedings in Computational Statistics: COMPSTAT '94 (pp. 172–177). Physica Verlag.
Slotine, J., & Li, W. (1991). Applied nonlinear control. Prentice-Hall.
Šuc, D., Vladušič, D., & Bratko, I. (2004). Qualitatively faithful quantitative prediction. Artificial Intelligence, 158:2, 189–214.
Torgo, L. (1997). Functional models for regression tree leaves. In D. Fisher (Ed.), Proceedings of the 14th International Conference on Machine Learning (pp. 385–393). Morgan Kaufmann.
Torgo, L. (2002). Computationally efficient linear regression trees. In K. Jajuga, A. Sokolowski, & H.-H. Bock (Eds.), Classification, Clustering and Data Analysis: Recent Advances and Applications. Springer.
Utgoff, P., Berkman, N., & Clouse, J. (1997). Decision tree induction based on efficient tree restructuring. Machine Learning, 29, 5–44.
Vijayakumar, S., & Schaal, S. (2000). Locally weighted projection regression: Incremental real time learning in high dimensional space. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (pp. 1079–1086). Morgan Kaufmann.
Wang, Y., & Witten, I. (1997). Inducing model trees for continuous classes. In Proceedings of Poster Papers, 9th European Conference on Machine Learning.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Johannes Fürnkranz
Rights and permissions
About this article
Cite this article
Potts, D., Sammut, C. Incremental Learning of Linear Model Trees. Mach Learn 61, 5–48 (2005). https://doi.org/10.1007/s10994-005-1121-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-005-1121-8