Introduction to the Use of Regression Models in Epidemiology

Bender, Ralf

doi:10.1007/978-1-59745-416-2_9

Ralf Bender³

Part of the book series: Methods in Molecular Biology ((MIMB,volume 471))

6799 Accesses
35 Citations
7 Altmetric

Summary

Regression modeling is one of the most important statistical techniques used in analytical epidemiology. By means of regression models the effect of one or several explanatory variables (e.g., exposures, subject characteristics, risk factors) on a response variable such as mortality or cancer can be investigated. From multiple regression models, adjusted effect estimates can be obtained that take the effect of potential confounders into account. Regression methods can be applied in all epidemiologic study designs so that they represent a universal tool for data analysis in epidemiology. Different kinds of regression models have been developed in dependence on the measurement scale of the response variable and the study design. The most important methods are linear regression for continuous outcomes, logistic regression for binary outcomes, Cox regression for time-to-event data, and Poisson regression for frequencies and rates. This chapter provides a nontechnical introduction to these regression models with illustrating examples from cancer research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Protocol: USD 49.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Matthews DE. (2005). Linear regression, simple. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 4. Chichester, UK: Wiley, pp. 2812–2816.
Google Scholar
McCullagh P, Nelder JA. (1989). Generalized Linear Models, 2nd ed. New York: Chapman & Hall.
Google Scholar
Srivastava MS. (2002). Methods of Multi-variate Statistics. New York: Wiley.
Google Scholar
Anderson TW. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. New York: Wiley.
Google Scholar
Krzanowski WJ. (2005). Multivariate multiple regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 5. Chichester, UK: Wiley, pp. 3552–3553.
Google Scholar
Matthews DE. (2005). Multiple linear regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 5. Chichester, UK: Wiley, pp. 3428–3441.
Google Scholar
Draper NR, Smith H. (1998). Applied Regression Analysis, 3rd ed. New York: Wiley.
Google Scholar
Harrell FE Jr. (2001). Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis. New York: Springer.
Google Scholar
Cook DR, Weisberg S. (1997). Graphics for assessing the adequacy of regression models. J Am Stat Assoc 92, 490–499.
Article Google Scholar
Chan SC, Liu CL, Lo CM, et al. (2006). Estimating liver weight of adults by Body weight and gender. World J Gastroenterol 12, 2217–2222.
PubMed Google Scholar
Anderson JA. (1972). Separate sample logistic discrimination. Biometrika 59, 19–35.
Article Google Scholar
Mantel N. (1973). Synthetic retrospective studies and related topics. Biometrics 29, 479–486.
Article CAS PubMed Google Scholar
Levy PS, Stolte K. (2000). Statistical methods in public health and epidemiology: a look at the recent past and projections for the next decade. Stat Methods Med Res 9, 41–55.
Article CAS PubMed Google Scholar
Hosmer DW Jr, Lemeshow S. (2000). Applied Logistic Regression, 2nd ed. New York: Wiley.
Book Google Scholar
Hosmer DW, Lemeshow S. (1980). Goodness-of-fit tests for the multiple logistic regression model. Commun Stat Theory Methods 9, 1043–1069.
Article Google Scholar
Davies HTO, Crombie IK, Tavakoli M. (1998). When can odds ratios mislead? BMJ 316, 989–991.
CAS PubMed Google Scholar
Gorini G, Stagnaro E, Fontana V, et al. (2007). Alcohol consumption and risk of Hodgkin's lymphoma and multiple myeloma: a multicentre case-control study. Ann Oncol 18, 143–148.
Article CAS PubMed Google Scholar
Kaplan EL, Meier P. (1958). Nonparamet-ric estimator from incomplete observations. J Am Stat Assoc 53, 457–481.
Article Google Scholar
Sasieni P. (2005). Cox regression model. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 2. Chichester, UK: Wiley, pp. 1280–1294.
Google Scholar
Cox DR. (1972). Regression models and life tables (with discussion). J R Stat Soc B 34, 187–220.
Google Scholar
Cox DR. (1975). Partial likelihood. Biometrika 62, 269–276.
Article Google Scholar
Jac/obs DR Jr, Adachi H, Mulder I, et al. (1999). Cigarette smoking and mortality risk: twenty-five-year follow-up of the Seven Countries Study. Arch Intern Med 159, 733–740.
Article Google Scholar
Frome EL, Kutner MH, Beauchamp JJ. (1973). Regression analysis of Poisson-distrib-uted data. J Am Stat Assoc 68, 935–940.
Article Google Scholar
Preston DL. (2005). Poisson regression in epidemiology. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 6. Chichester, UK: Wiley, pp. 4124–4127.
Google Scholar
Spiegelman D, Hertzmark E. (2005). Easy SAS calculations for risk or prevalence ratios and differences. Am J Epidemiol 162, 199–200.
Article PubMed Google Scholar
Seeber GUH. (2005). Poisson regression. In: Armitage P, Colton T, eds. Encyclopedia of Biostatistics, 2nd ed., vol. 6. Chichester, UK: Wiley, pp. 4115–4124.
Google Scholar
Romundstad P, Andersen A, Haldorsen T. (2001). Cancer incidence among workers in the Norwegian silicon carbide industry. Am J Epidemiol 153, 978–986.
Article CAS PubMed Google Scholar
Royston P. (2000). A strategy for modelling the effect of a continuous covariate in medicine and epidemiology. Stat Med 19, 1831–1847.
Article CAS PubMed Google Scholar
Harrell FE Jr, Lee KL, Mark DB. (1996). Multivariable prognostic models: Issues in developing models, evaluating assumptions and adequacy, and measuring and reducing errors. Stat Med 15, 361–387.
Article PubMed Google Scholar
Hosmer DW Jr, Lemeshow S. (1999). Applied Survival Analysis: Regression Modelling of Time to Event Data. New York: Wiley.
Google Scholar
Bagley SC, White H, Golomb BA. (2001). Logistic regression in the medical literature: standards for use and reporting, with particular attention to one medical domain. J Clin Epidemiol 54, 979–985.
Article CAS PubMed Google Scholar
Katz MH. (2003). Multivariable analysis: A primer for readers of medical research. N Engl J Med 138, 644–650.
Google Scholar
Breslow NE, Day NE. (1980). Statistical Methods in Cancer Research Vol. I: The Analysis of Case-Control Studies. Lyon, France: International Agency for Research on Cancer.
Google Scholar
Engel J. (1988). Polytomous logistic regression. Stat Neerl 42: 233–252.
Article Google Scholar
McCullagh P. (1980). Regression models for ordinal data (with discussion). J R Stat Soc B 42, 109–142.
Google Scholar
Bender R, Grouven U. (1997). Ordinal logistic regression in medical research. J R Coll Physicians Lond 31, 546–551.
CAS PubMed Google Scholar
Bender R, Benner A. (2000). Calculating ordinal regression models in SAS and S-Plus. Biom J 42, 677–699.
Article Google Scholar
Andersen PK. (1992). Repeated assessment of risk factors in survival analysis. Stat Methods Med Res 1, 297–315.
Article CAS PubMed Google Scholar
Altman DG, DeStavola BL. (1994). Practical problems in fitting a proportional hazards model to data with updated measurements of the covariates. Stat Med 13, 301–341.
Article CAS PubMed Google Scholar
Breslow NE, Day NE. (1987). Statistical Methods in Cancer Research Vol. II: The Design and Analysis of Cohort Studies. Lyon, France: International Agency for Research on Cancer.
Google Scholar
Dickman PW, Sloggett A, Hills M, Hakulinen T. (2004). Regression models for relative survival. Stat Med 23, 51–64.
Article PubMed Google Scholar
Royston P, Altman DG. (1994). Regression using fractional polynomials of continuous covariates: parsimonious parametric modelling. Appl Stat 43, 429–467.
Article Google Scholar
Sauerbrei W, Royston P. (1999). Building multi-variable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. J R Stat Society 162, 71–94.
Article Google Scholar
Royston P, Ambler G, Sauerbrei W. (1999). The use of fractional polynomials to model continuous risk variables in epidemiology. Int J Epidemiol 28, 964–974.
Article CAS PubMed Google Scholar
Royston P, Sauerbrei W. (2005). Building multivariable regression models with continuous covariates in clinical epidemiology—with an emphasis on fractional polynomials. Methods Inf Med 44, 561–571.
CAS PubMed Google Scholar
Sauerbrei W, Meier-Hirmer C, Benner A, Royston P. (2006). Multivariable regression building by using fractional polynomials: description of SAS, STATA and R programs. Comput Stat Data Anal 50, 3646–3485.
Article Google Scholar
Bates DM, Watts DG. (1988). Nonlinear Regression Analysis and its Applications. New York: Wiley.
Book Google Scholar
Seber GAF, Wild CJ. (1989). Nonlinear Regression. New York: Wiley.
Book Google Scholar
Ratkowsky DA. (1990). Handbook of Nonlinear Regression Models. New York: Marcel Dekker.
Google Scholar
Liang K-Y, Zeger SL. (1986) Longitudinal data analysis using generalized linear models. Biometrika 73, 13–22.
Article Google Scholar
Burton P, Gurrin L, Sly P. (1998). Tutorial in biostatistics: extending the simple linear regression model to account for correlated responses: an introduction to generalized estimating equations and multi-level mixed modelling. Stat Med 17, 1261–1291.
Article CAS PubMed Google Scholar
Hanley JA, Negassa A, Edwardes MD, Forrester JE. (2003). Statistical analysis of correlated data using generalized estimating equations: an orientation. Am J Epidemiol 157, 364–375.
Article PubMed Google Scholar
Brown H. (2006). Applied Mixed Models in Medicine, 2nd ed. Chichester, UK: Wiley.
Book Google Scholar
McGilchrist CA. (1993). REML estimation for survival models with frailty. Biometrics 49, 221–225.
Article CAS PubMed Google Scholar
Diez-Roux AV. (2000). Multilevel analysis in public health research. Annu Rev Public Health 21, 171–192.
Article CAS PubMed Google Scholar
Little RJA, Rubin DB. (2002). Statistical Analysis with Missing Data, 2nd ed. Hobo-ken, NJ: Wiley.
Google Scholar
Carroll RJ, Ruppert D, Stefanski LA, Crain-iceanu CM. (2006). Measurement Error in Nonlinear Models: A Modern Perspective, 2nd ed. London, UK: Chapman & Hall.
Book Google Scholar

Download references

Author information

Authors and Affiliations

Institute for Quality and Efficiency in Health Care, Cologne, Germany
Ralf Bender

Authors

Ralf Bender
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Division of Cancer Control and Population Sciences Bethesda, Maryland, 20892, USA
Mukesh Verma PhD

Rights and permissions

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bender, R. (2009). Introduction to the Use of Regression Models in Epidemiology. In: Verma, M. (eds) Cancer Epidemiology. Methods in Molecular Biology, vol 471. Humana Press. https://doi.org/10.1007/978-1-59745-416-2_9

Download citation

DOI: https://doi.org/10.1007/978-1-59745-416-2_9
Publisher Name: Humana Press
Print ISBN: 978-1-58829-987-1
Online ISBN: 978-1-59745-416-2
eBook Packages: Springer Protocols

Publish with us

Policies and ethics