Abstract
This paper presents a new method that deals with a supervised learning task usually known as multiple regression. The main distinguishing feature of our technique is the use of a multistrategy approach to this learning task. We use a clustering method to form sub-sets of the training data before the actual regression modeling takes place. This pre-clustering stage creates several training sub-samples containing cases that are “nearby” to each other from the perspective of the multidimensional input space. Supervised learning within each of these sub-samples is easier and more accurate as our experiments show. We call the resulting method clustered partial linear regression. Predictions using these models are preceded by a cluster membership query for each test case. The cluster membership probability of a test case is used as a weight in an averaging process that calculates the final prediction. This averaging process involves the predictions of the regression models associated to the clusters for which the test case may belong. We have tested this general multistrategy approach using several regression techniques and we have observed significant accuracy gains in several data sets. We have also compared our method to bagging that also uses an averaging process to obtain predictions. This experiment showed that the two methods are significantly different. Finally, we present a comparison of our method with several state-of-the-art regression methods showing its competitiveness.
Article PDF
Similar content being viewed by others
References
Aha, D. (1997). Lazy learning. Artificial Intelligence Review, 11.
Atkeson, C., Moore, A., & Schaal, S. (1997). Locally weighted learning. Artificial Intelligence Review, 11, 11–73.
Bontempi, G. (2000). Local learning techniques for modeling, prediction and control. Ph.D. Thesis, Universit Libre de Bruxelles, Belgium.
Bradley, P., Fayyad, U., & Reina, C. (1999). Scaling clustering algorithms to large databases. In Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining (pp. 9–15). AAAI Press.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140. Aggregating predictions, prediction, bagging, combination methods.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and Regression Trees, Statistics/ Probability Series. Wadsworth & Brooks/Cole Advanced Books & Software.
Cheesman, P., Kelly, J., Self, M., & Stutz, J. (1988). Autoclass: A Bayesian classification system. In Proceedings of the 5th International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
Cheesman, P., & Stutz, J. (1995). Bayesian classification (Autoclass): Theory and Results. In Advances in Knowledge Discovery. AAAI Press.
Cleveland, W., & Loader, C. (1995). Smoothing by local regression: Principles and methods (with discussion). Computational Statistics.
Devogelaere, D., Bael, P. V., & Rijckaert, M. (1999). Regression through genetic algorithm driven clustering. In Proceedings of the 7th European Congress on Intelligent Techniques and Soft Computing (EUFIT'99).
Drapper, N., & Smith, H. (1981). Applied regression analysis, 2nd ed. New York: John Wiley & Sons.
Falkenhainer, B., & Michalski, R. (1990). Integrating quantitative and qualitative discovery in the ABACUS system. In Machine learning and artificial intelligence approach (Vol. III). San Mateo, CA: Morgan Kaufmann.
Fan, J. (1995). Local modelling. In Encyclopedia of statistical science.
Farnstrom, F., Lewis, J., & Elkan, C. (2000). Scalability for clustering algorithms revisited. SIGKDD Explorations, 2:1, 51–57.
Freund, Y., & Shapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the 13th International Conference on Machine Learning. San Mateo, CA: Morgan Kaufmann.
Friedman, J. (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19:1, 1–141.
Hardle, W. (1990). Applied nonparametric regression. Cambridge, UK: Cambridge University Press.
Hastie, T., & Loader, C. (1993). Local regression: Automatic kernel carpentry. Statistical Science, 8, 120–143.
Michalski, R., & Tecuci, G. (Eds.). (1994). Machine Learning, a multistrategy approach (Vol. IV). San Mateo, CA: Morgan Kaufmann.
Nadaraya, E. (1964). On estimating regression. Theory of probability and its applications, 9, 141–142.
Press, W., Teukolsky, S., Vetterling, W., & Flannery, B. (1992). Numerical Receipes in C. Cambridge, UK: Cambridge Univeristy Press.
Quinlan, J. (1993). C4.5: Programs for machine learning. San Mateo, CA: Morgan Kaufmann Publishers.
Schulmeister, B., & Wysotzki, F. (1997). DIPOL—A hybrid piecewise linear classifier. In Machine Learning and Statistics, the Interface. New York: John Wiley & Sons, Inc.
Shapire, R. (1990). The strength of weak learnability. Machine Learning, 5, 197–227.
Spiegelman, C. (1976). Two techniques for estimating treatment effects in the presence of hidden variables: Adaptive regression and a solution to Reiersol problem. Ph.D. Thesis, Dept. of Mathematics, Nortwestern University.
Torgo, L. (1999). Inductive learning of tree-based regression models. Ph.D. Thesis, Faculty of Sciences, University of Porto.
Torgo, L. (2000). Partial linear trees. In P. Langley (Ed.), Proceedings of the 17th International Conference on Machine Learning (pp. 1007–1014). San Mateo, CA: Morgan Kaufmann.
Torgo, L., & Costa, J. P. (2000). Clustered partial linear regression. In R. L. de Mantaras, & E. Plaza (Eds.), Proceedings of the 11th European Conference on Machine Learning (ECML 2000), number 1810 in LNAI, (pp. 426–436). Berlin: Springer.
Watson, G. (1964). Smooth regression analysis. Sankhya: The Indian Journal of Statistics, Series A 26, 359–372.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Torgo, L., da Costa, J.P. Clustered Partial Linear Regression. Machine Learning 50, 303–319 (2003). https://doi.org/10.1023/A:1021770020534
Issue Date:
DOI: https://doi.org/10.1023/A:1021770020534