Introduction to the Use of Regression Models in Epidemiology

Part of the Methods in Molecular Biology book series (MIMB, volume 471)


Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

Key words

Regression linear regression logistic regression Poisson regression Cox regression 


  1. 1.
    Matthews DE. (2005). Linear regression, simple. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 4. Chichester, UK: Wiley, pp. 2812–2816.Google Scholar
  2. 2.
    McCullagh P, Nelder JA. (1989). Generalized Linear Models, 2nd ed. New York: Chapman & Hall.Google Scholar
  3. 3.
    Srivastava MS. (2002). Methods of Multi-variate Statistics. New York: Wiley.Google Scholar
  4. 4.
    Anderson TW. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. New York: Wiley.Google Scholar
  5. 5.
    Krzanowski WJ. (2005). Multivariate multiple regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 5. Chichester, UK: Wiley, pp. 3552–3553.Google Scholar
  6. 6.
    Matthews DE. (2005). Multiple linear regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 5. Chichester, UK: Wiley, pp. 3428–3441.Google Scholar
  7. 7.
    Draper NR, Smith H. (1998). Applied Regression Analysis, 3rd ed. New York: Wiley.Google Scholar
  8. 8.
    Harrell FE Jr. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer.Google Scholar
  9. 9.
    Cook DR, Weisberg S. (1997). Graphics for assessing the adequacy of regression models. J Am Stat Assoc 92, 490–499.CrossRefGoogle Scholar
  10. 10.
    Chan SC, Liu CL, Lo CM, et al. (2006). Estimating liver weight of adults by Body weight and gender. World J Gastroenterol 12, 2217–2222.PubMedGoogle Scholar
  11. 11.
    Anderson JA. (1972). Separate sample logistic discrimination. Biometrika 59, 19–35.CrossRefGoogle Scholar
  12. 12.
    Mantel N. (1973). Synthetic retrospective studies and related topics. Biometrics 29, 479–486.CrossRefPubMedGoogle Scholar
  13. 13.
    Levy PS, Stolte K. (2000). Statistical methods in public health and epidemiology: a look at the recent past and projections for the next decade. Stat Methods Med Res 9, 41–55.CrossRefPubMedGoogle Scholar
  14. 14.
    Hosmer DW Jr, Lemeshow S. (2000). Applied Logistic Regression, 2nd ed. New York: Wiley.CrossRefGoogle Scholar
  15. 15.
    Hosmer DW, Lemeshow S. (1980). Goodness-of-fit tests for the multiple logistic regression model. Commun Stat Theory Methods 9, 1043–1069.CrossRefGoogle Scholar
  16. 16.
    Davies HTO, Crombie IK, Tavakoli M. (1998). When can odds ratios mislead? BMJ 316, 989–991.PubMedGoogle Scholar
  17. 17.
    Gorini G, Stagnaro E, Fontana V, et al. (2007). Alcohol consumption and risk of Hodgkin's lymphoma and multiple myeloma: a multicentre case-control study. Ann Oncol 18, 143–148.CrossRefPubMedGoogle Scholar
  18. 18.
    Kaplan EL, Meier P. (1958). Nonparamet-ric estimator from incomplete observations. J Am Stat Assoc 53, 457–481.CrossRefGoogle Scholar
  19. 19.
    Sasieni P. (2005). Cox regression model. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 2. Chichester, UK: Wiley, pp. 1280–1294.Google Scholar
  20. 20.
    Cox DR. (1972). Regression models and life tables (with discussion). J R Stat Soc B 34, 187–220.Google Scholar
  21. 21.
    Cox DR. (1975). Partial likelihood. Biometrika 62, 269–276.CrossRefGoogle Scholar
  22. 22.
    Jac/obs DR Jr, Adachi H, Mulder I, et al. (1999). Cigarette smoking and mortality risk: twenty-five-year follow-up of the Seven Countries Study. Arch Intern Med 159, 733–740.CrossRefGoogle Scholar
  23. 23.
    Frome EL, Kutner MH, Beauchamp JJ. (1973). Regression analysis of Poisson-distrib-uted data. J Am Stat Assoc 68, 935–940.CrossRefGoogle Scholar
  24. 24.
    Preston DL. (2005). Poisson regression in epidemiology. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 6. Chichester, UK: Wiley, pp. 4124–4127.Google Scholar
  25. 25.
    Spiegelman D, Hertzmark E. (2005). Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 162, 199–200.CrossRefPubMedGoogle Scholar
  26. 26.
    Seeber GUH. (2005). Poisson regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 6. Chichester, UK: Wiley, pp. 4115–4124.Google Scholar
  27. 27.
    Romundstad P, Andersen A, Haldorsen T. (2001). Cancer incidence among workers in the Norwegian silicon carbide industry. Am J Epidemiol 153, 978–986.CrossRefPubMedGoogle Scholar
  28. 28.
    Royston P. (2000). A strategy for modelling the effect of a continuous covariate in medicine and epidemiology. Stat Med 19, 1831–1847.CrossRefPubMedGoogle Scholar
  29. 29.
    Harrell FE Jr, Lee KL, Mark DB. (1996). Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15, 361–387.CrossRefPubMedGoogle Scholar
  30. 30.
    Hosmer DW Jr, Lemeshow S. (1999). Applied Survival Analysis: Regression Modelling of Time to Event Data. New York: Wiley.Google Scholar
  31. 31.
    Bagley SC, White H, Golomb BA. (2001). Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J Clin Epidemiol 54, 979–985.CrossRefPubMedGoogle Scholar
  32. 32.
    Katz MH. (2003). Multivariable analysis: A primer for readers of medical research. N Engl J Med 138, 644–650.Google Scholar
  33. 33.
    Breslow NE, Day NE. (1980). Statistical Methods in Cancer Research Vol. I: The Analysis of Case-Control Studies. Lyon, France: International Agency for Research on Cancer.Google Scholar
  34. 34.
    Engel J. (1988). Polytomous logistic regression. Stat Neerl 42: 233–252.CrossRefGoogle Scholar
  35. 35.
    McCullagh P. (1980). Regression models for ordinal data (with discussion). J R Stat Soc B 42, 109–142.Google Scholar
  36. 36.
    Bender R, Grouven U. (1997). Ordinal logistic regression in medical research. J R Coll Physicians Lond 31, 546–551.PubMedGoogle Scholar
  37. 37.
    Bender R, Benner A. (2000). Calculating ordinal regression models in SAS and S-Plus. Biom J 42, 677–699.CrossRefGoogle Scholar
  38. 38.
    Andersen PK. (1992). Repeated assessment of risk factors in survival analysis. Stat Methods Med Res 1, 297–315.CrossRefPubMedGoogle Scholar
  39. 39.
    Altman DG, DeStavola BL. (1994). Practical problems in fitting a proportional hazards model to data with updated measurements of the covariates. Stat Med 13, 301–341.CrossRefPubMedGoogle Scholar
  40. 40.
    Breslow NE, Day NE. (1987). Statistical Methods in Cancer Research Vol. II: The Design and Analysis of Cohort Studies. Lyon, France: International Agency for Research on Cancer.Google Scholar
  41. 41.
    Dickman PW, Sloggett A, Hills M, Hakulinen T. (2004). Regression models for relative survival. Stat Med 23, 51–64.CrossRefPubMedGoogle Scholar
  42. 42.
    Royston P, Altman DG. (1994). Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat 43, 429–467.CrossRefGoogle Scholar
  43. 43.
    Sauerbrei W, Royston P. (1999). Building multi-variable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J R Stat Society 162, 71–94.CrossRefGoogle Scholar
  44. 44.
    Royston P, Ambler G, Sauerbrei W. (1999). The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol 28, 964–974.CrossRefPubMedGoogle Scholar
  45. 45.
    Royston P, Sauerbrei W. (2005). Building multivariable regression models with continuous covariates in clinical epidemiology—with an emphasis on fractional polynomials. Methods Inf Med 44, 561–571.PubMedGoogle Scholar
  46. 46.
    Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006). Multivariable regression building by using fractional polynomials: description of SAS, STATA and R programs. Comput Stat Data Anal 50, 3646–3485.CrossRefGoogle Scholar
  47. 47.
    Bates DM, Watts DG. (1988). Nonlinear Regression Analysis and its Applications. New York: Wiley.CrossRefGoogle Scholar
  48. 48.
    Seber GAF, Wild CJ. (1989). Nonlinear Regression. New York: Wiley.CrossRefGoogle Scholar
  49. 49.
    Ratkowsky DA. (1990). Handbook of Nonlinear Regression Models. New York: Marcel Dekker.Google Scholar
  50. 50.
    Liang K-Y, Zeger SL. (1986) Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.CrossRefGoogle Scholar
  51. 51.
    Burton P, Gurrin L, Sly P. (1998). Tutorial in biostatistics: extending the simple linear regression model to account for correlated responses: an introduction to generalized estimating equations and multi-level mixed modelling. Stat Med 17, 1261–1291.CrossRefPubMedGoogle Scholar
  52. 52.
    Hanley JA, Negassa A, Edwardes MD, Forrester JE. (2003). Statistical analysis of correlated data using generalized estimating equations: an orientation. Am J Epidemiol 157, 364–375.CrossRefPubMedGoogle Scholar
  53. 53.
    Brown H. (2006). Applied Mixed Models in Medicine, 2nd ed. Chichester, UK: Wiley.CrossRefGoogle Scholar
  54. 54.
    McGilchrist CA. (1993). REML estimation for survival models with frailty. Biometrics 49, 221–225.CrossRefPubMedGoogle Scholar
  55. 55.
    Diez-Roux AV. (2000). Multilevel analysis in public health research. Annu Rev Public Health 21, 171–192.CrossRefPubMedGoogle Scholar
  56. 56.
    Little RJA, Rubin DB. (2002). Statistical Analysis with Missing Data, 2nd ed. Hobo-ken, NJ: Wiley.Google Scholar
  57. 57.
    Carroll RJ, Ruppert D, Stefanski LA, Crain-iceanu CM. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. London, UK: Chapman & Hall.CrossRefGoogle Scholar

Copyright information

© Humana Press, a part of Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Institute for Quality and Efficiency in Health CareCologneGermany

Personalised recommendations