Skip to main content

Regression Analysis

  • Chapter
  • First Online:
Statistics for Health Data Science

Part of the book series: Springer Texts in Statistics ((STS))

Abstract

This chapter introduces regression analysis, the cornerstone of hypothesis-driven inquiry about health care outcomes. Regression analysis is the quantitative framework that is most commonly used to establish whether outcomes are associated with individual, community, or environmental characteristics. It quantifies the strength of relationships in conceptual models of health care utilization and costs. It provides a framework for explaining why some people incur extremely high health care expenses and why others barely cost anything. It estimates effects of health interventions. And it enables prediction of future costs and outcomes. This chapter presumes a basic knowledge of the concepts of linear regression (also known as ordinary least squares regression). We do not focus on mathematical details; rather, we present the critical ideas that form a practical foundation for regression analysis using observational health care databases.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 16.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Many textbooks present this test in terms of the residual sum of squares (RSS) instead of the variance as used here. These are equivalent approaches since RSS = (n − 1) ×Varres, where n is the number of observations in the data.

  2. 2.

    Here and everywhere else in this text, the “\(\log \)” function refers to the natural logarithm, which is sometimes denoted “\(\ln \).”

References

  1. Gaskin, D.J., Richard, P.: The economic costs of pain in the United States. J. Pain 13(8), 715–724 (2012)

    Article  Google Scholar 

  2. Centers for Disease Control and Prevention: National health and nutrition examination survey (2020). https://www.cdc.gov/nchs/nhanes/index.htm. Accessed Feb. 12 2020

  3. Centers for Disease Control and Prevention: Prevalence of obesity and severe obesity among adults: United States, 2017–2018 (2020). https://www.cdc.gov/nchs/products/databriefs/db360.htm. Accessed July 19 2020

  4. Lumley, T., Diehr, P., Emerson, S., Chen, L.: The importance of the normality assumption in large public health data sets. Annu. Rev. Public Health 23(1), 151–169 (2002)

    Article  Google Scholar 

  5. Buse, A.: The likelihood ratio, Wald, and Lagrange multiplier tests: an expository note. Am. Statist. 36(3), 153–157 (1982)

    Article  Google Scholar 

  6. Gelman, A., Hill, J.: Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, Cambridge (2006)

    Book  Google Scholar 

  7. Wasserman, L., Roeder, K.: High-dimensional variable selection. Ann. Statist. 37, 2178–2201 (2009)

    Article  MathSciNet  Google Scholar 

  8. Taylor, J., Tibshirani, R.J.: Statistical learning and selective inference. Proc. Natl. Acad. Sci. 112, 7629–7634 (2015)

    Article  MathSciNet  Google Scholar 

  9. Hong, L., Kuffner, T.A., Martin, R.: On overfitting and post-selection uncertainty assessments. Biometrika 105, 221–224 (2018)

    Article  MathSciNet  Google Scholar 

  10. Hastie, T.J., Tibshirani, R.J.: Generalized Additive Models. Monographs on Statistics and Applied Probability, vol. 43. Chapman & Hall/CRC, Boca Raton (1990)

    Google Scholar 

  11. Koenker, R.: quantreg: quantile regression (2019). https://CRAN.R-project.org/package=quantreg. R package version 5.52

  12. Endres, C.J.: nhanesA: NHANES data retrieval (2018). https://cran.r-project.org/web/packages/nhanesA/index.html. R package version 0.6.5

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2020 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Etzioni, R., Mandel, M., Gulati, R. (2020). Regression Analysis. In: Statistics for Health Data Science. Springer Texts in Statistics. Springer, Cham. https://doi.org/10.1007/978-3-030-59889-1_3

Download citation

Publish with us

Policies and ethics