Abstract
Let (X,Y) denote a random vector with decomposition Y = f(X) + ε where f(x) = E[Y ¦ X = x] is the regression of Y on X. In this paper we propose a test for the hypothesis that f is a linear combination of given linearly independent regression functions g1,..,gd. The test is based on an estimator of the minimal L2-distance between f and the subspace spanned by the regression functions. More precisely, the method is based on the estimation of certain integrals of the regression function and therefore does not require an explicit estimation of the regression. For this reason the test proposed in this paper does not depend on the subjective choice of a smoothing parameter. Differences between the problem of regression diagnostics in the nonrandom and random design case are also discussed.
Similar content being viewed by others
References
Achieser, N. I. (1956). Theory of Approximation. Dover, New York.
Azzalini, A. and Bowman, A. (1993). On the use of nonparametric regression for checking linear relationships, J. Roy. Statist. Soc. Ser. B, 55, 549–557.
Berger, J. O. and Delampady, M. (1987). Testing precise hypotheses, Statist. Sci., 2(3), 317–352.
Berger, J. O. and Sollke, T. (1987). Testing a point null hypothesis: the irreconcilability of P-values and evidence (with comments), J. Amer. Statist. Assoc., 82, 112–122.
Berkson, J. (1942). Tests of significance considered as evidence, J. Amer. Statist. Assoc., 37, 325–335.
Breiman, L. and Meisel, W. S. (1976). General estimates of the intrinsic variability of data in nonlinear regression models, J. Amer. Statist. Assoc., 71, 301–307.
Brodeau, F. (1993). Tests for the choice of approximative models in nonlinear regression when the variance is unknown, Statistics, 24, 95–106.
David, H. A. (1970). Order Statistics, Wiley, New York.
Delgado, M. A. (1993). Testing the equality of nonparametric regression curves. Statist. Probab. Lett., 17, 199–204.
Dette, H. and Munk, A. (1998). Validation of linear regression models, Ann. Statist. (to appear).
Eubank, R. L. and Hart, J. D. (1992). Testing goodness-of-fit in regression via order selection criteria, Ann. Statist., 20, 1412–1425.
Eubank, R. L. and Spiegelman, C. H. (1990). Testing the goodness of fit of a linear model via nonparametric regression techniques, J. Amer. Statist. Assoc., 85, 387–392.
Gantmacher, F. R. (1959). The Theory of Matrices, Vol. I, II, Chelsea Publishing, New York.
Gasser, T., Skroka, L. and Jennen-Steinmetz, C. (1986). Residual variance and residual pattern in nonlinear regression, Biometrika, 73, 625–633.
Hall, P. and Hart, J. D. (1990). Bootstrap test for difference between means in nonparametric regression, J. Amer. Statist. Assoc., 85, 1039–1049.
Hall, P., Kay, J. W. and Titterington, D. M. (1990). Asymptotically optimal difference-based estimation of variance in nonparametric regression, Biometrika, 77, 521–528.
Härdle, W. and Marron, J. S. (1990). Semiparametric comparison of regression curves, Ann. Statist., 18, 83–89.
Härdle, W. and Mammen, E. (1993). Comparing nonparametric versus parametric regression fits, Ann. Statist., 21, 1926–1947.
Hauck, W. W. and Anderson, S. (1996). Comment on 'Bioequivalence trials, intersection-union tests and equivalence confidence sets by R. L. Berger and J. C. Hsu', Statist. Sci., 11, 283–319.
Hermann, E., Wand, M. P., Engel, J. and Gasser, T. (1995). A bandwidth selector for bivariate regression, J. Roy. Statist. Soc. Ser. B, 57, 171–180.
Jayasuriya (1996). Testing for polynomial regression using nonparametric regression techniques, J. Amer. Statist. Assoc., 91, 1626–1631.
King, E. C., Hart, J. D. and Wehrly, T. E. (1991). Testing the equality of regression curves using linear smoothers, Statist. Probab. Lett., 12, 239–247.
Kozek, A. S. (1991). A nonparametric fit of a parametric model, J. Multivariate Anal., 37, 66–75.
MacKmnon, J. G. (1992). Model specification tests and artificial regressions, Journal of Economic Literature, 30, 102–146.
Orey, S. (1958). A central limit theorem for m-dependent random variables, Duke Math. J., 52, 543–546.
Rice, J. (1984). Bandwidth choice for nonparametric regression, Ann. Statist., 12, 1215–1230.
Sacks, J. and Ylvisacker, D. (1970). Designs for regression problems for correlated errors III, Ann. Math. Statist., 41, 2057–2074.
Schervish, M. J. (1996). P-values: what they are and what they are not, Amer. Statist., 50, 203–206.
Staniswalis, J. and Severini, T. (1991). Diagnostics for assessing regression models, J. Amer. Statist. Assoc., 86, 684–692.
Stute, W. (1997). Nonparametric model checks for regression, Ann. Statist., 25, 613–641.
Stute, W. and Manteiga, W. G. (1996). NN goodness-of-fit tests for linear models, J. Statist. Plann. Inference, 53, 75–92.
Zwanzig, S. (1980). The choice of approximative models in nonlinear regression, Math. Operationsforschung Statist., Ser. Statistics, 11, 23–47.
Author information
Authors and Affiliations
About this article
Cite this article
Dette, H., Munk, A. A Simple Goodness-of-fit Test for Linear Models Under a Random Design Assumption. Annals of the Institute of Statistical Mathematics 50, 253–275 (1998). https://doi.org/10.1023/A:1003439114929
Issue Date:
DOI: https://doi.org/10.1023/A:1003439114929