Abstract
This paper studies model diagnostics for linear regression models. We propose two tree-based procedures to check the adequacy of linear functional form and the appropriateness of homoscedasticity, respectively. The proposed tree methods not only facilitate a natural assessment of the linear model, but also automatically provide clues for amending deficiencies. We explore and illustrate their uses via both Monte Carlo studies and real data examples.
Article PDF
Similar content being viewed by others
References
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Czaki (Eds.), 2nd int. symp. inf. theory (pp. 267–281). Budapest: Akad Kiado.
Bickel, P. J. (1978). Using residuals robustly i: Tests for heteroscedasticity, nonlinearity. Annals of Statistics, 6, 266–291.
Box, G. (1988). Signal-to-noise ratios, performance criteria, and transformation. Technometrics, 29, 1–17.
Breiman, L. (1996). Bagging predictors. Machine Learning, 24, 123–140.
Breiman, L., Friedman, J., Olshen, R., & Stone, C. (1984). Classification and regression trees. Belmont, CA: Wadsworth International Group.
Breusch, T. S., & Pagan, A. R. (1979). A simple test for heteroscedasticity and random coefficient variation. Econometrica, 47, 1287–1294.
Carroll, R. J., & Ruppert, D. (1988). Transformation and weighting in regression. New York, NY: Chapman and Hall.
Cox, D. R., & Hinkley, D. V. (1974). Theoretical statistics. New York: Chapman and Hall.
Hand, D. J. (1999). Statistics and data mining: intersecting disciplines. ACM SIGKDD, 1, 16–19.
Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.
Harvey, A. C. (1976). Estimating regression models with multiplicative heteroscedasticity. Econometrica, 44, 461–465.
Hoaglin, D. C., & Velleman, P. F. (1995). A critical look at some analyses of major league baseball salaries. The American Statistician, 49, 277–285.
Householder, A. S. (1958). Unitary triangularization of a nonsymmetric matrix. Journal of the Association for Computing Machinery, 5, 339–342.
Kennedy, W. J., & Gentle, J. E. (1980). Statistical computing. New York: Marcel Dekker, Inc.
Koenker, R. (1981). A note on studentizing a test for heteroscedasticity. Journal of Econometrics, 17, 107–112.
Mansfield, E. R., & Conerly, M. D. (1987). Diagnostic value of residual and partial residual plots. American Statistician, 41, 107–116.
Miller, T. W. (1996). Putting the cart after the horse: tree-structured regression diagnostics. In 1996 proceedings of the statistical computing section, American statistical association (pp. 150–155).
Morgan, J., & Sonquist, J. (1963). Problems in the analysis of survey data and a proposal. Journal of the American Statistical Association, 58, 415–434.
Neter, J., Kutner, M., Wasserman, W., & Nachtsheim, C. J. (1996). Applied linear statistical models (4th ed.). Boston, MA: McGraw-Hill.
Rao, C. R. (1947). Large sample tests of statistical hypotheses concerning several parameters with application to problems of testing. Proceeding of the Cambridge Philosophical Society, 44, 50–57.
Rencher, A. C. (2000). Linear models in statistics. New York: Wiley
Rutemiller, H. C., & Bowers, D. A. (1968). Estimation in a heteroscedastic regression model. Journal of American Statistical Association, 63, 552–557.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461–464.
Simonoff, J. S., & Tsai, C.-L. (1994). Improved tests for nonconstant variance in regression based on the modified profile likelihood. Journal of the Royal Statistical Society, Series C, 43, 357–370.
Su, X. G., & Tsai, C.-L. (2005). Tree-augmented cox proportional hazards models. Biostatistics, 6, 486–499.
Su, X. G., Wang, M., & Fan, J. (2004). Maximum likelihood regression trees. Journal of Computational and Graphical Statistics, 13, 586–598.
Su, X. G., Tsai, C.-L., & Yan, X. (2006). Treed variance. Journal of Computational and Graphical Statistics, 15, 356–371.
Venables, W. N., & Ripley, B. D. (1999). Modern applied statistics with S-plus (3rd ed.). New York: Springer.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: David Page.
Rights and permissions
About this article
Cite this article
Su, X., Tsai, CL. & Wang, M.C. Tree-structured model diagnostics for linear regression. Mach Learn 74, 111–131 (2009). https://doi.org/10.1007/s10994-008-5080-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5080-8