Building an MLR Model

  • David J. Olive
Chapter

Abstract

Building a multiple linear regression (MLR) model from data is one of the most challenging regression problems. The “final full model” will have response variable Y = t(Z), a constant x1, and predictor variables x2 = t2(w2, , w r ), , x p  = t p (w2, , w r ) where the initial data consists of Z, w2, , w r . Choosing t, t2, , t p so that the final full model is a useful MLR approximation to the data can be difficult.

References

  1. Ashworth, H. (1842). Statistical illustrations of the past and present state of Lancashire. Journal of the Royal Statistical Society, A, 5, 245–256.Google Scholar
  2. Atkinson, A. C. (1985). Plots, transformations, and regression. Oxford: Clarendon Press.MATHGoogle Scholar
  3. Belsley, D. A. (1984). Demeaning conditioning diagnostics through centering. The American Statistician, 38, 73–77.Google Scholar
  4. Belsley, D. A., Kuh, E., & Welsch, R. E. (1980). Regression diagnostics: Identifying influential data and sources of collinearity. New York, NY: Wiley.CrossRefMATHGoogle Scholar
  5. Bertsimas, D., King, A., & Mazmunder, R. (2016). Best subset selection via a modern optimization lens. The Annals of Statistics, 44, 813–852.MathSciNetCrossRefMATHGoogle Scholar
  6. Bickel, P. J., & Doksum, K. A. (1981). An analysis of transformations revisited. Journal of the American Statistical Association, 76, 296–311.MathSciNetCrossRefMATHGoogle Scholar
  7. Box, G. E. P., & Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society, B, 26, 211–246.MathSciNetMATHGoogle Scholar
  8. Box, G. E. P., & Cox, D. R. (1982). An analysis of transformations revisited, rebutted. Journal of the American Statistical Association, 77, 209–210.MathSciNetCrossRefMATHGoogle Scholar
  9. Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach (2nd ed.). New York, NY: Springer.MATHGoogle Scholar
  10. Burnham, K. P., & Anderson, D. R. (2004). Multimodel inference understanding AIC and BIC in model selection. Sociological Methods & Research, 33, 261–304.MathSciNetCrossRefGoogle Scholar
  11. Buxton, L. H. D. (1920). The anthropology of Cyprus. The Journal of the Royal Anthropological Institute of Great Britain and Ireland, 50, 183–235.CrossRefGoogle Scholar
  12. Chambers, J. M., Cleveland, W. S., Kleiner, B., & Tukey, P. (1983). Graphical methods for data analysis. Boston, MA: Duxbury Press.MATHGoogle Scholar
  13. Chang, J., & Olive, D. J. (2010). OLS for 1D regression models. Communications in statistics: Theory and methods, 39, 1869–1882.MathSciNetCrossRefMATHGoogle Scholar
  14. Chatterjee, S., & Hadi, A. S. (1988). Sensitivity analysis in linear regression. New York, NY: Wiley.CrossRefMATHGoogle Scholar
  15. Chen, C. H., & Li, K. C. (1998). Can SIR be as popular as multiple linear regression? Statistica Sinica, 8, 289–316.MathSciNetMATHGoogle Scholar
  16. Claeskins, G., & Hjort, N. L. (2003). The focused information criterion (with discussion). Journal of the American Statistical Association, 98, 900–916.MathSciNetCrossRefGoogle Scholar
  17. Claeskens, G., & Hjort, N. L. (2008). Model selection and model averaging. New York, NY: Cambridge University Press.CrossRefMATHGoogle Scholar
  18. Cook, R. D. (1977). Deletion of influential observations in linear regression. Technometrics,19, 15–18.Google Scholar
  19. Cook, R. D. (1993). Exploring partial residual plots. Technometrics, 35, 351–362.MathSciNetCrossRefMATHGoogle Scholar
  20. Cook, R. D., & Nachtsheim, C. J. (1994). Reweighting to achieve elliptically contoured covariates in regression. Journal of the American Statistical Association, 89, 592–599.CrossRefMATHGoogle Scholar
  21. Cook, R. D., & Olive, D. J. (2001). A note on visualizing response transformations in regression. Technometrics, 43, 443–449.MathSciNetCrossRefGoogle Scholar
  22. Cook, R. D., & Weisberg, S. (1982). Residuals and influence in regression. London: Chapman & Hall.MATHGoogle Scholar
  23. Cook, R. D., & Weisberg, S. (1994). Transforming a response variable for linearity. Biometrika, 81, 731–737.CrossRefMATHGoogle Scholar
  24. Cook, R. D., & Weisberg, S. (1997). Graphics for assessing the adequacy of regression models. Journal of the American Statistical Association, 92, 490–499.MathSciNetCrossRefMATHGoogle Scholar
  25. Cook, R. D., & Weisberg, S. (1999a). Applied regression including computing and graphics. New York, NY: Wiley.Google Scholar
  26. Cook, R. D., & Weisberg, S. (1999b). Graphs in statistical analysis: Is the medium the message? The American Statistician, 53, 29–37.Google Scholar
  27. Daniel, C., & Wood, F. S. (1980). Fitting equations to data (2nd ed.). New York, NY: Wiley.MATHGoogle Scholar
  28. Draper, N. R., & Smith, H. (1981). Applied regression analysis (2nd ed.). New York, NY: Wiley.MATHGoogle Scholar
  29. Efron, B. (1982). The jackknife, the bootstrap and other resampling plans. Philadelphia, PA: SIAM.CrossRefMATHGoogle Scholar
  30. Efron, B. (2014), Estimation and accuracy after model selection (with discussion). Journal of the American Statistical Association, 109, 991–1007.MathSciNetCrossRefGoogle Scholar
  31. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression (with discussion). The Annals of Statistics, 32, 407–451.MathSciNetCrossRefMATHGoogle Scholar
  32. Ferrari, D., & Yang, Y. (2015). Confidence sets for model selection by F–testing. Statistica Sinica, 25, 1637–1658.MathSciNetMATHGoogle Scholar
  33. Fox, J. (1991). Regression diagnostics. Newbury Park, CA: Sage Publications.CrossRefGoogle Scholar
  34. Freedman, D. A. (1983). A note on screening regression equations. The American Statistician, 37, 152–155.MathSciNetGoogle Scholar
  35. Freedman, D. A. (2005). Statistical models theory and practice. New York, NY: Cambridge University Press.CrossRefMATHGoogle Scholar
  36. Frey, J. (2013). Data-driven nonparametric prediction intervals. Journal of Statistical Planning and Inference, 143, 1039–1048.MathSciNetCrossRefMATHGoogle Scholar
  37. Furnival, G., & Wilson, R. (1974). Regression by leaps and bounds. Technometrics, 16, 499–511.CrossRefMATHGoogle Scholar
  38. Gilmour, S. G. (1996). The interpretation of Mallows’s C p-statistic. The Statistician, 45, 49–56.CrossRefGoogle Scholar
  39. Gladstone, R. J. (1905). A study of the relations of the brain to the size of the head. Biometrika, 4, 105–123.CrossRefGoogle Scholar
  40. Gunst, R. F., & Mason, R. L. (1980). Regression analysis and its application: A data oriented approach. New York, NY: Marcel Dekker.MATHGoogle Scholar
  41. Harrison, D., & Rubinfeld, D. L. (1978). Hedonic prices and the demand for clean air. Journal of Environmental Economics and Management, 5, 81–102.CrossRefMATHGoogle Scholar
  42. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference and prediction (2nd ed.). New York, NY: Springer.CrossRefMATHGoogle Scholar
  43. Hastie, T., Tibshirani, R., & Wainwright, M. (2015). Statistical learning with sparsity: The lasso and generalizations. Boca Raton, FL: CRC Press Taylor & Francis.MATHGoogle Scholar
  44. Hebbler, B. (1847). Statistics of Prussia. Journal of the Royal Statistical Society, A, 10, 154–186.Google Scholar
  45. Hinkley, D. V., & Runger, G. (1984). The analysis of transformed data (with discussion). Journal of the American Statistical Association, 79, 302–320.MathSciNetCrossRefMATHGoogle Scholar
  46. Hjort, N. L., & Claeskins, G. (2003). Frequentist model average estimators. Journal of the American Statistical Association, 98, 879–899.MathSciNetCrossRefMATHGoogle Scholar
  47. Hoaglin, D. C., & Welsh, R. (1978). The hat matrix in regression and ANOVA. The American Statistician, 32, 17–22.MATHGoogle Scholar
  48. Hurvich, C. M., & Tsai, C. L. (1990). The impact of model selection on inference in linear regression. The American Statistician, 44, 214–217.Google Scholar
  49. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning with applications in R. New York, NY: Springer.CrossRefMATHGoogle Scholar
  50. Jones, H. L. (1946). Linear regression functions with neglected variables. Journal of the American Statistical Association, 41, 356–369.MathSciNetCrossRefMATHGoogle Scholar
  51. Kenard, R. W. (1971). A note on the C p statistics. Technometrics, 13, 899–900.Google Scholar
  52. Leeb, H., & Pötscher, B. M. (2006). Can one estimate the conditional distribution of post-model-selection estimators? The Annals of Statistics, 34, 2554–2591.MathSciNetCrossRefMATHGoogle Scholar
  53. Léger, C., & Altman, N. (1993). Assessing influence in variable selection problems. Journal of the American Statistical Association, 88, 547–556.MathSciNetGoogle Scholar
  54. Li, K. C., & Duan, N. (1989). Regression analysis under link violation. The Annals of Statistics, 17, 1009–1052.MathSciNetCrossRefMATHGoogle Scholar
  55. Linhart, H., & Zucchini, W. (1986). Model selection. New York, NY: Wiley.MATHGoogle Scholar
  56. Mallows, C. (1973). Some comments on C p. Technometrics, 15, 661–676.MATHGoogle Scholar
  57. McDonald, G. C., & Schwing, R. C. (1973). Instabilities of regression estimates relating air pollution to mortality. Technometrics, 15, 463–482.CrossRefGoogle Scholar
  58. McKenzie, J. D., & Goldman, R. (1999). The student edition of MINITAB. Reading, MA: Addison Wesley Longman.Google Scholar
  59. Nishi, R. (1984). Asymptotic properties of criteria for selection of variables in multiple regression. The Annals of Statistics, 12, 758–765.MathSciNetCrossRefGoogle Scholar
  60. Olive, D. J. (2002). Applications of robust distances for regression. Technometrics, 44, 64–71.MathSciNetCrossRefGoogle Scholar
  61. Olive, D. J. (2004b). Visualizing 1D regression. In M. Hubert, G. Pison, A. Struyf, & S. Van Aelst (Eds.), Theory and applications of recent robust methods (pp. 221–233). Basel, Switzerland: Birkhäuser.Google Scholar
  62. Olive, D. J. (2005). Two simple resistant regression estimators. Computational Statistics & Data Analysis, 49, 809–819.MathSciNetCrossRefMATHGoogle Scholar
  63. Olive, D. J. (2008), Applied robust statistics, online course notes, see http://lagrange.math.siu.edu/Olive/ol-bookp.htm Google Scholar
  64. Olive, D. J. (2013a), Asymptotically optimal regression prediction intervals and prediction regions for multivariate data. International Journal of Statistics and Probability, 2, 90–100.Google Scholar
  65. Olive, D. J. (2013b). Plots for generalized additive models. Communications in Statistics: Theory and Methods, 42, 2610–2628.Google Scholar
  66. Olive, D. J. (2016a). Bootstrapping hypothesis tests and confidence regions, preprint, see http://lagrange.math.siu.edu/Olive/ppvselboot.pdf
  67. Olive, D. J. (2016b). Applications of hyperellipsoidal prediction regions. Statistical Papers, to appear.Google Scholar
  68. Olive, D. J. (2016c). Robust multivariate analysis. New York, NY: Springer, to appear.Google Scholar
  69. Olive, D. J., & Hawkins, D. M. (2005). Variable selection for 1D regression models. Technometrics, 47, 43–50.MathSciNetCrossRefGoogle Scholar
  70. Olive, D. J., & Hawkins, D.M. (2010). Robust multivariate location and dispersion, preprint at http://lagrange.math.siu.edu/Olive/pphbmld.pdf Google Scholar
  71. Olive, D. J., & Hawkins, D. M. (2011). Practical high breakdown regression, preprint at http://lagrange.math.siu.edu/Olive/pphbreg.pdf Google Scholar
  72. Pelawa Watagoda, L. C. R. (2017). Inference After Variable Selection. Ph.D. Thesis, Southern Illinois University, online at http://lagrange.math.siu.edu/Olive/slasanthiphd.pdf
  73. Pelawa Watagoda, L. C. R., & Olive, D. J. (2017). Inference after variable selection, preprint at http://lagrange.math.siu.edu/Olive/ppvsinf.pdf Google Scholar
  74. Rouncefield, M. (1995). The statistics of poverty and inequality. Journal of Statistics and Education, 3, online www.amstat.org/publications/jse/
  75. Rousseeuw, P. J., & Leroy, A. M. (1987). Robust regression and outlier detection. New York, NY: Wiley.CrossRefMATHGoogle Scholar
  76. Rousseeuw, P. J., & Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212–223.CrossRefGoogle Scholar
  77. SAS Institute (1985). SAS user’s guide: Statistics. Version 5. Cary, NC: SAS Institute.Google Scholar
  78. Schaaffhausen, H. (1878). Die Anthropologische Sammlung Des Anatom–ischen Der Universitat Bonn. Archiv fur Anthropologie, 10, 1–65. Appendix.Google Scholar
  79. Seber, G. A. F., & Lee, A. J. (2003). Linear regression analysis (2nd ed.). New York, NY: Wiley.CrossRefMATHGoogle Scholar
  80. Selvin, H. C., & Stuart, A. (1966). Data-dredging procedures in survey analysis. The American Statistician, 20(3), 20–23.Google Scholar
  81. Tremearne, A. J. N. (1911). Notes on some Nigerian tribal marks. Journal of the Royal Anthropological Institute of Great Britain and Ireland, 41, 162–178.CrossRefGoogle Scholar
  82. Tukey, J. W. (1957). Comparative anatomy of transformations. Annals of Mathematical Statistics, 28, 602–632.MathSciNetCrossRefMATHGoogle Scholar
  83. Tukey, J. W. (1977). Exploratory data analysis. Reading, MA: Addison-Wesley Publishing Company.MATHGoogle Scholar
  84. Velilla, S. (1993). A note on the multivariate Box-Cox transformation to normality. Statistics & Probability Letters, 17, 259–263.MathSciNetCrossRefMATHGoogle Scholar
  85. Velleman, P. F., & Welsch, R. E. (1981). Efficient computing of regression diagnostics. The American Statistician, 35, 234–242.MATHGoogle Scholar
  86. Walls, R. C., & Weeks, D. L. (1969). A note on the variance of a predicted response in regression. The American Statistician, 23, 24–26.Google Scholar
  87. Yeo, I. K., & Johnson, R. (2000). A new family of power transformations to improve normality or symmetry. Biometrika, 87, 954–959.MathSciNetCrossRefMATHGoogle Scholar
  88. Zhang, J., Olive, D. J., & Ye, P. (2012). Robust covariance matrix estimation with canonical correlation analysis. International Journal of Statistics and Probability, 1, 119–136.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • David J. Olive
    • 1
  1. 1.Department of MathematicsSouthern Illinois UniversityCarbondaleUSA

Personalised recommendations