# Chapter 1: Statistical Models

• Peter K. Dunn
• Gordon K. Smyth
Chapter
Part of the Springer Texts in Statistics book series (STS)

## Abstract

This chapter introduces the concept of a statistical model. One particular type of statistical model—the generalized linear model—is the focus of this book, and so we begin with an introduction to statistical models in general. This allows us to introduce the necessary language, notation, and other important issues. We first discuss conventions for describing data mathematically (Sect. 1.2). We then highlight the importance of plotting data (Sect. 1.3), and explain how to numerically code non-numerical variables (Sect. 1.4) so that they can be used in mathematical models. We then introduce the two components of a statistical model used for understanding data (Sect. 1.5): the systematic and random components. The class of regression models is then introduced (Sect. 1.6), which includes all models in this book. Model interpretation is then considered (Sect. 1.7), followed by comparing physical models and statistical models (Sect. 1.8) to highlight the similarities and differences. The purpose of a statistical model is then given (Sect. 1.9), followed by a description of the two criteria for evaluating statistical models: accuracy and parsimony (Sect. 1.10). The importance of understanding the limitations of statistical models is then addressed (Sect. 1.11), including the differences between observational and experimental data. The generalizability of models is then discussed (Sect. 1.12). Finally, we make some introductory comments about using r for statistical modelling (Sect. 1.13).

## References

1. [1]
Agresti, A.: An Introduction to Categorical Data Analysis, second edn. Wiley-Interscience (2007)Google Scholar
2. [2]
Box, G.E.P., Draper, N.R.: Empirical Model-Building and Response Surfaces. Wiley, New York (1987)
3. [3]
Brockmann, H.J.: Satellite male groups in horseshoe crabs, limulus polyphemus. Ethology 102, 1–21 (1996)
4. [4]
Dunn, P.K., Smyth, G.K.: GLMsData: Generalized linear model data sets (2017). URL https://CRAN.R-project.org/package=GLMsData. R package version 1.0.0
5. [5]
Efron, B.: Double exponential families and their use in generalized linear regression. Journal of the American Statistical Association 81(395), 709–721 (1986)
6. [6]
Giauque, W.F., Wiebe, R.: The heat capacity of hydrogen bromide from $$15^{\circ }$$ K. to its boiling point and its heat of vaporization. The entropy from spectroscopic data. Journal of the American Chemical Society 51(5), 1441–1449 (1929)
7. [7]
Hand, D.J., Daly, F., Lunn, A.D., McConway, K.Y., Ostrowski, E.: A Handbook of Small Data Sets. Chapman and Hall, London (1996)
8. [8]
Joglekar, G., Scheunemyer, J.H., LaRiccia, V.: Lack-of-fit testing when replicates are not available. The American Statistician 43, 135–143 (1989)Google Scholar
9. [9]
Johnson, B., Courtney, D.M.: Tower building. Child Development 2(2), 161–162 (1931)
10. [10]
Kahn, M.: An exhalent problem for teaching statistics. Journal of Statistical Education 13(2) (2005)Google Scholar
11. [11]
Maron, M.: Threshold effect of eucalypt density on an aggressive avian competitor. Biological Conservation 136, 100–107 (2007)
12. [12]
Mazess, R.B., Peppler, W.W., Gibbons, M.: Total body composition by dualphoton (153Gd) absorptiometry. American Journal of Clinical Nutrition 40, 834–839 (1984)
13. [13]
Myers, R.H., Montgomery, D.C., Vining, G.G.: Generalized Linear Models with Applications in Engineering and the Sciences. Wiley, Chichester (2002)
14. [14]
Nelson, W.: Applied Life Data Analysis. Wiley Series in Probability and Statistics. John Wiley Sons, New York (1982)
15. [15]
Royston, P., Altman, D.G.: Regression using fractional polynomials of continuous covariates: Parsimonious parametric modelling. Journal of the Royal Statistical Society, Series C 43(3), 429–467 (1994)Google Scholar
16. [16]
Shacham, M., Brauner, N.: Minimizing the effects of collinearity in polynomial regression. Industrial and Engineering Chemical Research 36, 4405–4412 (1997)
17. [17]
Singer, J.D., Willett, J.B.: Improving the teaching of applied statistics: Putting the data back into data analysis. The American Statistician 44(3), 223–230 (1990)Google Scholar
18. [18]
Smyth, G.K.: Australasian data and story library (Ozdasl) (2011). URL http://www.statsci.org/data
19. [19]
Tager, I.B., Weiss, S.T., Muñoz, A., Rosner, B., Speizer, F.E.: Longitudinal study of the effects of maternal smoking on pulmonary function in children. New England Journal of Medicine 309(12), 699–703 (1983)
20. [20]
Tager, I.B., Weiss, S.T., Rosner, B., Speizer, F.E.: Effect of parental cigarette smoking on the pulmonary function of children. American Journal of Epidemiology 110(1), 15–26 (1979)