Abstract
This chapter introduces regression analysis, the cornerstone of hypothesis-driven inquiry about health care outcomes. Regression analysis is the quantitative framework that is most commonly used to establish whether outcomes are associated with individual, community, or environmental characteristics. It quantifies the strength of relationships in conceptual models of health care utilization and costs. It provides a framework for explaining why some people incur extremely high health care expenses and why others barely cost anything. It estimates effects of health interventions. And it enables prediction of future costs and outcomes. This chapter presumes a basic knowledge of the concepts of linear regression (also known as ordinary least squares regression). We do not focus on mathematical details; rather, we present the critical ideas that form a practical foundation for regression analysis using observational health care databases.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Many textbooks present this test in terms of the residual sum of squares (RSS) instead of the variance as used here. These are equivalent approaches since RSS = (n − 1) ×Varres, where n is the number of observations in the data.
- 2.
Here and everywhere else in this text, the “\(\log \)” function refers to the natural logarithm, which is sometimes denoted “\(\ln \).”
References
Gaskin, D.J., Richard, P.: The economic costs of pain in the United States. J. Pain 13(8), 715–724 (2012)
Centers for Disease Control and Prevention: National health and nutrition examination survey (2020). https://www.cdc.gov/nchs/nhanes/index.htm. Accessed Feb. 12 2020
Centers for Disease Control and Prevention: Prevalence of obesity and severe obesity among adults: United States, 2017–2018 (2020). https://www.cdc.gov/nchs/products/databriefs/db360.htm. Accessed July 19 2020
Lumley, T., Diehr, P., Emerson, S., Chen, L.: The importance of the normality assumption in large public health data sets. Annu. Rev. Public Health 23(1), 151–169 (2002)
Buse, A.: The likelihood ratio, Wald, and Lagrange multiplier tests: an expository note. Am. Statist. 36(3), 153–157 (1982)
Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)
Wasserman, L., Roeder, K.: High-dimensional variable selection. Ann. Statist. 37, 2178–2201 (2009)
Taylor, J., Tibshirani, R.J.: Statistical learning and selective inference. Proc. Natl. Acad. Sci. 112, 7629–7634 (2015)
Hong, L., Kuffner, T.A., Martin, R.: On overfitting and post-selection uncertainty assessments. Biometrika 105, 221–224 (2018)
Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Monographs on Statistics and Applied Probability, vol. 43. Chapman & Hall/CRC, Boca Raton (1990)
Koenker, R.: quantreg: quantile regression (2019). https://CRAN.R-project.org/package=quantreg. R package version 5.52
Endres, C.J.: nhanesA: NHANES data retrieval (2018). https://cran.r-project.org/web/packages/nhanesA/index.html. R package version 0.6.5
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Etzioni, R., Mandel, M., Gulati, R. (2020). Regression Analysis. In: Statistics for Health Data Science. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-59889-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-59889-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59888-4
Online ISBN: 978-3-030-59889-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)