Abstract
Building a multiple linear regression (MLR) model from data is one of the most challenging regression problems. The “final full model” will have response variable Y = t(Z), a constant x1, and predictor variables x2 = t2(w2, …, w r ), …, x p = t p (w2, …, w r ) where the initial data consists of Z, w2, …, w r . Choosing t, t2, …, t p so that the final full model is a useful MLR approximation to the data can be difficult.
References
- Ashworth, H. (1842). Statistical illustrations of the past and present state of Lancashire. Journal of the Royal Statistical Society, A, 5, 245–256.Google Scholar
- Atkinson, A. C. (1985). Plots, transformations, and regression. Oxford: Clarendon Press.MATHGoogle Scholar
- Belsley, D. A. (1984). Demeaning conditioning diagnostics through centering. The American Statistician, 38, 73–77.Google Scholar
- Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: Wiley.CrossRefMATHGoogle Scholar
- Bertsimas, D., King, A., & Mazmunder, R. (2016). Best subset selection via a modern optimization lens. The Annals of Statistics, 44, 813–852.MathSciNetCrossRefMATHGoogle Scholar
- Bickel, P. J., & Doksum, K. A. (1981). An analysis of transformations revisited. Journal of the American Statistical Association, 76, 296–311.MathSciNetCrossRefMATHGoogle Scholar
- Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, B, 26, 211–246.MathSciNetMATHGoogle Scholar
- Box, G. E. P., & Cox, D. R. (1982). An analysis of transformations revisited, rebutted. Journal of the American Statistical Association, 77, 209–210.MathSciNetCrossRefMATHGoogle Scholar
- Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York, NY: Springer.MATHGoogle Scholar
- Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference understanding AIC and BIC in model selection. Sociological Methods & Research, 33, 261–304.MathSciNetCrossRefGoogle Scholar
- Buxton, L. H. D. (1920). The anthropology of Cyprus. The Journal of the Royal Anthropological Institute of Great Britain and Ireland, 50, 183–235.CrossRefGoogle Scholar
- Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. (1983). Graphical methods for data analysis. Boston, MA: Duxbury Press.MATHGoogle Scholar
- Chang, J., & Olive, D. J. (2010). OLS for 1D regression models. Communications in statistics: Theory and methods, 39, 1869–1882.MathSciNetCrossRefMATHGoogle Scholar
- Chatterjee, S., & Hadi, A. S. (1988). Sensitivity analysis in linear regression. New York, NY: Wiley.CrossRefMATHGoogle Scholar
- Chen, C. H., & Li, K. C. (1998). Can SIR be as popular as multiple linear regression? Statistica Sinica, 8, 289–316.MathSciNetMATHGoogle Scholar
- Claeskins, G., & Hjort, N. L. (2003). The focused information criterion (with discussion). Journal of the American Statistical Association, 98, 900–916.MathSciNetCrossRefGoogle Scholar
- Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. New York, NY: Cambridge University Press.CrossRefMATHGoogle Scholar
- Cook, R. D. (1977). Deletion of influential observations in linear regression. Technometrics,19, 15–18.Google Scholar
- Cook, R. D. (1993). Exploring partial residual plots. Technometrics, 35, 351–362.MathSciNetCrossRefMATHGoogle Scholar
- Cook, R. D., & Nachtsheim, C. J. (1994). Reweighting to achieve elliptically contoured covariates in regression. Journal of the American Statistical Association, 89, 592–599.CrossRefMATHGoogle Scholar
- Cook, R. D., & Olive, D. J. (2001). A note on visualizing response transformations in regression. Technometrics, 43, 443–449.MathSciNetCrossRefGoogle Scholar
- Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman & Hall.MATHGoogle Scholar
- Cook, R. D., & Weisberg, S. (1994). Transforming a response variable for linearity. Biometrika, 81, 731–737.CrossRefMATHGoogle Scholar
- Cook, R. D., & Weisberg, S. (1997). Graphics for assessing the adequacy of regression models. Journal of the American Statistical Association, 92, 490–499.MathSciNetCrossRefMATHGoogle Scholar
- Cook, R. D., & Weisberg, S. (1999a). Applied regression including computing and graphics. New York, NY: Wiley.Google Scholar
- Cook, R. D., & Weisberg, S. (1999b). Graphs in statistical analysis: Is the medium the message? The American Statistician, 53, 29–37.Google Scholar
- Daniel, C., & Wood, F. S. (1980). Fitting equations to data (2nd ed.). New York, NY: Wiley.MATHGoogle Scholar
- Draper, N. R., & Smith, H. (1981). Applied regression analysis (2nd ed.). New York, NY: Wiley.MATHGoogle Scholar
- Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Philadelphia, PA: SIAM.CrossRefMATHGoogle Scholar
- Efron, B. (2014), Estimation and accuracy after model selection (with discussion). Journal of the American Statistical Association, 109, 991–1007.MathSciNetCrossRefGoogle Scholar
- Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression (with discussion). The Annals of Statistics, 32, 407–451.MathSciNetCrossRefMATHGoogle Scholar
- Ferrari, D., & Yang, Y. (2015). Confidence sets for model selection by F–testing. Statistica Sinica, 25, 1637–1658.MathSciNetMATHGoogle Scholar
- Fox, J. (1991). Regression diagnostics. Newbury Park, CA: Sage Publications.CrossRefGoogle Scholar
- Freedman, D. A. (1983). A note on screening regression equations. The American Statistician, 37, 152–155.MathSciNetGoogle Scholar
- Freedman, D. A. (2005). Statistical models theory and practice. New York, NY: Cambridge University Press.CrossRefMATHGoogle Scholar
- Frey, J. (2013). Data-driven nonparametric prediction intervals. Journal of Statistical Planning and Inference, 143, 1039–1048.MathSciNetCrossRefMATHGoogle Scholar
- Furnival, G., & Wilson, R. (1974). Regression by leaps and bounds. Technometrics, 16, 499–511.CrossRefMATHGoogle Scholar
- Gilmour, S. G. (1996). The interpretation of Mallows’s C p-statistic. The Statistician, 45, 49–56.CrossRefGoogle Scholar
- Gladstone, R. J. (1905). A study of the relations of the brain to the size of the head. Biometrika, 4, 105–123.CrossRefGoogle Scholar
- Gunst, R. F., & Mason, R. L. (1980). Regression analysis and its application: A data oriented approach. New York, NY: Marcel Dekker.MATHGoogle Scholar
- Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.CrossRefMATHGoogle Scholar
- Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York, NY: Springer.CrossRefMATHGoogle Scholar
- Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. Boca Raton, FL: CRC Press Taylor & Francis.MATHGoogle Scholar
- Hebbler, B. (1847). Statistics of Prussia. Journal of the Royal Statistical Society, A, 10, 154–186.Google Scholar
- Hinkley, D. V., & Runger, G. (1984). The analysis of transformed data (with discussion). Journal of the American Statistical Association, 79, 302–320.MathSciNetCrossRefMATHGoogle Scholar
- Hjort, N. L., & Claeskins, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.MathSciNetCrossRefMATHGoogle Scholar
- Hoaglin, D. C., & Welsh, R. (1978). The hat matrix in regression and ANOVA. The American Statistician, 32, 17–22.MATHGoogle Scholar
- Hurvich, C. M., & Tsai, C. L. (1990). The impact of model selection on inference in linear regression. The American Statistician, 44, 214–217.Google Scholar
- James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York, NY: Springer.CrossRefMATHGoogle Scholar
- Jones, H. L. (1946). Linear regression functions with neglected variables. Journal of the American Statistical Association, 41, 356–369.MathSciNetCrossRefMATHGoogle Scholar
- Kenard, R. W. (1971). A note on the C p statistics. Technometrics, 13, 899–900.Google Scholar
- Leeb, H., & Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34, 2554–2591.MathSciNetCrossRefMATHGoogle Scholar
- Léger, C., & Altman, N. (1993). Assessing influence in variable selection problems. Journal of the American Statistical Association, 88, 547–556.MathSciNetGoogle Scholar
- Li, K. C., & Duan, N. (1989). Regression analysis under link violation. The Annals of Statistics, 17, 1009–1052.MathSciNetCrossRefMATHGoogle Scholar
- Linhart, H., & Zucchini, W. (1986). Model selection. New York, NY: Wiley.MATHGoogle Scholar
- Mallows, C. (1973). Some comments on C p. Technometrics, 15, 661–676.MATHGoogle Scholar
- McDonald, G. C., & Schwing, R. C. (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.CrossRefGoogle Scholar
- McKenzie, J. D., & Goldman, R. (1999). The student edition of MINITAB. Reading, MA: Addison Wesley Longman.Google Scholar
- Nishi, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. The Annals of Statistics, 12, 758–765.MathSciNetCrossRefGoogle Scholar
- Olive, D. J. (2002). Applications of robust distances for regression. Technometrics, 44, 64–71.MathSciNetCrossRefGoogle Scholar
- Olive, D. J. (2004b). Visualizing 1D regression. In M. Hubert, G. Pison, A. Struyf, & S. Van Aelst (Eds.), Theory and applications of recent robust methods (pp. 221–233). Basel, Switzerland: Birkhäuser.Google Scholar
- Olive, D. J. (2005). Two simple resistant regression estimators. Computational Statistics & Data Analysis, 49, 809–819.MathSciNetCrossRefMATHGoogle Scholar
- Olive, D. J. (2008), Applied robust statistics, online course notes, see http://lagrange.math.siu.edu/Olive/ol-bookp.htm Google Scholar
- Olive, D. J. (2013a), Asymptotically optimal regression prediction intervals and prediction regions for multivariate data. International Journal of Statistics and Probability, 2, 90–100.Google Scholar
- Olive, D. J. (2013b). Plots for generalized additive models. Communications in Statistics: Theory and Methods, 42, 2610–2628.Google Scholar
- Olive, D. J. (2016a). Bootstrapping hypothesis tests and confidence regions, preprint, see http://lagrange.math.siu.edu/Olive/ppvselboot.pdf
- Olive, D. J. (2016b). Applications of hyperellipsoidal prediction regions. Statistical Papers, to appear.Google Scholar
- Olive, D. J. (2016c). Robust multivariate analysis. New York, NY: Springer, to appear.Google Scholar
- Olive, D. J., & Hawkins, D. M. (2005). Variable selection for 1D regression models. Technometrics, 47, 43–50.MathSciNetCrossRefGoogle Scholar
- Olive, D. J., & Hawkins, D.M. (2010). Robust multivariate location and dispersion, preprint at http://lagrange.math.siu.edu/Olive/pphbmld.pdf Google Scholar
- Olive, D. J., & Hawkins, D. M. (2011). Practical high breakdown regression, preprint at http://lagrange.math.siu.edu/Olive/pphbreg.pdf Google Scholar
- Pelawa Watagoda, L. C. R. (2017). Inference After Variable Selection. Ph.D. Thesis, Southern Illinois University, online at http://lagrange.math.siu.edu/Olive/slasanthiphd.pdf
- Pelawa Watagoda, L. C. R., & Olive, D. J. (2017). Inference after variable selection, preprint at http://lagrange.math.siu.edu/Olive/ppvsinf.pdf Google Scholar
- Rouncefield, M. (1995). The statistics of poverty and inequality. Journal of Statistics and Education, 3, online www.amstat.org/publications/jse/
- Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York, NY: Wiley.CrossRefMATHGoogle Scholar
- Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.CrossRefGoogle Scholar
- SAS Institute (1985). SAS user’s guide: Statistics. Version 5. Cary, NC: SAS Institute.Google Scholar
- Schaaffhausen, H. (1878). Die Anthropologische Sammlung Des Anatom–ischen Der Universitat Bonn. Archiv fur Anthropologie, 10, 1–65. Appendix.Google Scholar
- Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd ed.). New York, NY: Wiley.CrossRefMATHGoogle Scholar
- Selvin, H. C., & Stuart, A. (1966). Data-dredging procedures in survey analysis. The American Statistician, 20(3), 20–23.Google Scholar
- Tremearne, A. J. N. (1911). Notes on some Nigerian tribal marks. Journal of the Royal Anthropological Institute of Great Britain and Ireland, 41, 162–178.CrossRefGoogle Scholar
- Tukey, J. W. (1957). Comparative anatomy of transformations. Annals of Mathematical Statistics, 28, 602–632.MathSciNetCrossRefMATHGoogle Scholar
- Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley Publishing Company.MATHGoogle Scholar
- Velilla, S. (1993). A note on the multivariate Box-Cox transformation to normality. Statistics & Probability Letters, 17, 259–263.MathSciNetCrossRefMATHGoogle Scholar
- Velleman, P. F., & Welsch, R. E. (1981). Efficient computing of regression diagnostics. The American Statistician, 35, 234–242.MATHGoogle Scholar
- Walls, R. C., & Weeks, D. L. (1969). A note on the variance of a predicted response in regression. The American Statistician, 23, 24–26.Google Scholar
- Yeo, I. K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954–959.MathSciNetCrossRefMATHGoogle Scholar
- Zhang, J., Olive, D. J., & Ye, P. (2012). Robust covariance matrix estimation with canonical correlation analysis. International Journal of Statistics and Probability, 1, 119–136.CrossRefGoogle Scholar
Copyright information
© Springer International Publishing AG 2017