Skip to main content

Advertisement

Log in

Point and interval estimates of partial population attributable risks in cohort studies: examples and software

  • Original Paper
  • Published:
Cancer Causes & Control Aims and scope Submit manuscript

Abstract

The concept of the population attributable risk (PAR) percent has found widespread application in public health research. This quantity describes the proportion of a disease which could be prevented if a specific exposure were to be eliminated from a target population. We present methods for obtaining point and interval estimates of partial PARs, where the impact on disease burden for some presumably modifiable determinants is estimated in, and applied to, a cohort study. When the disease is multifactorial, the partial PAR must, in general, be used to quantify the proportion of disease which can be prevented if a specific exposure or group of exposures is eliminated from a target population, while the distribution of other modifiable and non-modifiable risk factors is unchanged. The methods are illustrated in a study of risk factors for bladder cancer incidence (Michaud DS et al., New England J Med 340 (1999) 1390). A user-friendly SAS macro implementing the methods described in this paper is available via the worldwide web.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Similar content being viewed by others

References

  1. Levin M (1953) The occurrence of lung cancer in man. Aca Unio Inter Contra Cancrum 9:531–541

    CAS  Google Scholar 

  2. Walter SD (1975) The distribution of Levin’s measure of attributable risk. Biometrika 62:371–374

    Article  Google Scholar 

  3. Uter W, Pfahlberg A (1999) The concept of attributable risk in epidemiological practice. Biom J 41(8):985–993

    Article  Google Scholar 

  4. Miettinen O (1974) Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol 99(5):325–332

    PubMed  CAS  Google Scholar 

  5. Cole P, Macmahon B (1971) Attributable risk percent in case-control studies. Br J Prevent Social Med 25(4):242

    CAS  Google Scholar 

  6. Last JM (1983) A dictionary of epidemiology. Oxford University Press, New York

    Google Scholar 

  7. Bruzzi P, Green SB, Byar DP, Brinton LA, Schairer C (1985) Estimating the population attributable risk for multiple risk-factors using case-control data. Am J Epidemiol 122(5):904–913

    PubMed  CAS  Google Scholar 

  8. Benichou J (2001) A review of adjusted estimators of attributable risks. Stat Methods Med Res 10:195–216

    Article  PubMed  CAS  Google Scholar 

  9. D’Agostino RB, Lee M-L, Belanger AJ, Cupples LA, Anderson K, Kannel WB (1990) Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham heart study. Stat Med 9:1501–1515

    Article  PubMed  CAS  Google Scholar 

  10. Walter SD (1980) Prevention for multifactorial diseases. Am J Epidemiol 112(3):409–416

    PubMed  CAS  Google Scholar 

  11. Gefeller O (1992) Comparison of adjusted attributable risk estimators. Stat Med 11(16):2083–2091

    Article  PubMed  CAS  Google Scholar 

  12. Morgenstern H (1983) Morgenstern corrects a conceptual error (Letter). Am J Publ Health 73(6):703–703

    Google Scholar 

  13. Korn EL, Graubard BI, Midthune D (1997) Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale. Am J Epidemiol 145(1):72–80

    PubMed  CAS  Google Scholar 

  14. Basu S, Landis JR (1995) Model-based estimation of population attributable risk under cross-sectional sampling. Am J Epidemiol 142(12):1338–1343

    PubMed  CAS  Google Scholar 

  15. Greenland S, Drescher K (1993) Maximum-likelihood-estimation of the attributable fraction from logistic-models. Biometrics 49(3):865–872

    Article  PubMed  CAS  Google Scholar 

  16. Benichou J, Chow WH, McLaughlin JK, Mandel JS, Fraumeni JF (1998) Population attributable risk of renal cell cancer in Minnesota. Am J Epidemiol 148(5):424–430

    PubMed  CAS  Google Scholar 

  17. Wilson PD, Loffredo CA, Correa-Villasenor A, Ferencz C (1998) Attributable fraction for cardiac malformations. Am J Epidemiol 148(5):414–423

    PubMed  CAS  Google Scholar 

  18. Wacholder S, Benichou J, Heineman E (1994) Attributable risk—advantages of a broad definition of exposure (vol 140, pg 303,1994). Am J Epidemiol 140(7):668–668

    Google Scholar 

  19. Benichou J (1991) Methods of adjustment for estimating the attributable risk in case-control studies; a review. Stat Med 10:1753–1773

    Article  PubMed  CAS  Google Scholar 

  20. Michaud DS, Spiegelman D, Clinton SK, Willett WC, Giovannucci EL (1999) Total fluid intake, specific beverages and bladder cancer risk in the health professional follow-up study. New England J Med 340:1390–1397

    Article  CAS  Google Scholar 

  21. Therneau TM, Grambsch PM (2000) Modeling survival data: extending the cox model. Springer-Verlag, New York, New York

    Google Scholar 

  22. Mezzetti M, Ferraroni M, Decarli A, LaVecchia C, Benichou J (1996) Software for attributable risk and confidence interval estimation in case-control studies. Comput Biomed Res 29(1):63–75

    Article  PubMed  CAS  Google Scholar 

  23. Benichou J, Gail MH (1990) Variance calculations and confidence-intervals for estimates of the attributable risk based on logistic-models. Biometrics 46(4):991–1003

    Article  PubMed  CAS  Google Scholar 

  24. Miettinen OS (1974) Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol 99(5):325–332

    PubMed  CAS  Google Scholar 

  25. Third National Health and Nutrition Examination Survey 1989–1994 (1996) NHANES II laboratory data file (CD-ROM). US Department of Health and Human Services National Center for Health Statistics, Hyattsville, MD

    Google Scholar 

  26. National Center for Health Statistics (1993) 1987 National health interview survey. Government Printing Office

  27. Graubard BI, Fears TR (2005) Standard errors for attributable risk for simple and complex sample designs. Biometrics 61(3):847–855

    Article  PubMed  Google Scholar 

  28. Rich-Edwards JW, Spiegelman D, Garland M, Hertzmark E, Hunter DJ, Colditz GA et al (2002) Physical activity, body mass index and ovulatory disorder infertility. Epidemiology 13:184–190

    Article  PubMed  Google Scholar 

  29. Morgenstern H, Bursic E (1982) A method for using epidemiologic data to estimate the potential impact of an intervention on the health status of a target population. J Community Health 7:292–309

    Article  PubMed  CAS  Google Scholar 

  30. Drescher K, Becher H (1997) Estimating the generalized impact fraction from case-control data. Biometrics 53(3):1170–1176

    Article  PubMed  CAS  Google Scholar 

  31. Leung HM, Kupper LL (1981) Comparisons Of Confidence-Intervals For Attributable Risk. Biometrics 37(2):293–302

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgments

Supported by a grant from the National Institutes of Health (CA55075)

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to D. Spiegelman.

Appendices

Appendix 1: Derivation of the \({Var\left( {\widehat{PAR}_p } \right)}\)

$$ Var\left({\widehat{PAR}_p }\right) = Var\left(\frac{\sum\nolimits_{t=1}^{T} \hat{p}_{.t} \widehat{RR}_{2t}}{\sum\nolimits_{s=1}^{S}\sum\nolimits_{t=1}^{T} \hat{p}_{st} \widehat{RR}_{1s} \widehat{RR}_{2t}}\right)=Var(f(\varvec{\hat{p}},\varvec{\widehat{RR}_1}, \varvec{\widehat{RR}_2})) \approx \left. \left[\frac{\partial f\left(\varvec{p,RR}_1, \varvec{RR}_2\right)}{\partial \varvec{p}}\right]^{\prime} \right|_{\varvec{\hat{p}},\varvec{\widehat{RR}}} Var\left(\varvec{\hat{p}}\right)\left. \left[\frac{\partial f\left( \varvec{p,RR}_1,\varvec{RR}_2\right)}{\partial \varvec{p}}\right] \right|_{\varvec{\hat{p}},\varvec{\widehat{RR}}} + \left . \left[ \frac{\partial f\left(\varvec{{p}}\varvec{, R}\varvec{{R_1}}\varvec{,R}\varvec{{R_2}}\right)}{\partial \left(\varvec{RR}_{1}^{\prime},\varvec{RR}_{2}^{\prime}\right)^{\prime}} \right] \right|_{\varvec{\hat{p}},\varvec{\widehat{RR}}} Var\left[\left( \varvec{\widehat{RR}}^{\prime}_1\varvec{\widehat{RR}}^{\prime}_2\right)^{\prime} \right] \left. \left[\frac{\partial f\left(\varvec{{p}},\varvec{R}\varvec{{R}}_{1},\varvec{R}\varvec{{R}}_{2}\right)} {\partial \left(\varvec{RR}_{1}^{\prime} \varvec{,RR}_{2}^{\prime}\right)^{\prime}} \right]\right|_{\varvec{\hat{p}},\varvec{\widehat{RR}}} $$
(6)

where

$$ \frac{\partial f\left(\varvec{p,RR}_1, \varvec{RR}_2\right)}{\partial p_{st} }=\frac{b\,RR_{2t} -a\,RR_{2t} RR_{1s} }{b^2}, \quad \frac{\partial f\left(\varvec{p,RR}_1, \varvec{RR}_2\right)}{\partial RR_{1s} }=-\frac{a\sum\nolimits_{t=1}^T {p_{st} RR_{2t} } } {b^2}, $$
$$ \frac{\partial f\left(\varvec{p,RR}_1, \varvec{RR}_2\right)}{\partial RR_{2t} }=\frac{bp_{.t} -a\sum\nolimits_{s=1}^S {p_{st} RR_{1s} } }{b^2}, \quad a=\sum\limits_{t=1}^T {p_{.t} RR_{2t} } , \quad b=\sum\limits_{s=1}^S {\sum\limits_{t=1}^T {p_{st} RR_{1s} RR_{2t} } } , $$

where Σ s p st = p .t , and \({{\varvec{RR}_1} =\left( {RR_{1,1} ,RR_{1,2} ,\ldots,RR_{1S}}\right)^\prime }\) and \({{\varvec{RR}_2}=\left( {RR_{2,1} ,RR_{2,2} ,\ldots,RR_{2T} } \right)^\prime }\) are the vectors of the relative risks corresponding to the modifiable and unmodifiable risk factors respectively.

Under the proportional hazards model, \({RR_{1s} =e^{\varvec {\beta}_1^\prime \varvec {e_s}}}\), where e s is the vector of values of the binary indicators corresponding to the sth combination of modifiable exposure variables, of which there are S combinations, and \({RR_{2t} =e^{\varvec {\beta}_2^\prime \varvec {c_t}}}\) where c t is the vector of values of the tth combination of unmodifiable background risk, of which there are T combinations. Then, \({Var\left[ {\left( {\widehat{RR}}^{\prime}_1{\widehat{RR}}^{\prime}_2\right)^\prime } \right]=D\Sigma {D}^\prime}\), where \({\Sigma =Var\left[ {\left( {\varvec{\hat {{\beta_1 ^\prime}}, \hat {{\beta_2^\prime}}}} \right)^\prime } \right]}\), and D = [(D uv ), \(u = 1,\ldots, S + T, v = 1,\ldots, {p_1} + {p_2}\)] where

$$ D_{uv} =\left\{ {\begin{array}{l} \frac{\partial RR_{1,u} }{\partial \beta _{1,v} }\;\quad if\;u\leqslant S\;and\;v\leqslant p_1 \\ \frac{\partial RR_{2,u-S} }{\partial \beta _{2,v-p_1 } }\;\quad if\;u > S\;and\;v> p_1 \\ 0\;\quad \quad if\;u\leqslant S\;and\;v> p_1 \\ 0\;\quad \quad if\;u > S\;and\;v\leqslant p_1 \\ \end{array}} \right. $$

Under the proportional hazards model, \({\frac{\partial RR_{1,u} }{\partial \beta_{1,v} }=e_{uv} e^{\varvec {\beta}_1^\prime \varvec {e_u}}}\), where e uv is the vth element of the vector e u , and \({\frac{\partial RR_{2,u-S} }{\partial \beta_{2,v-p}}=c_{u-S,v-p_1 } e^{\varvec {\beta}_2^\prime \varvec {c}_{u-S}}}\), where \(c_{{u-S},{v-p_{1}}}\) is the vp 1 th element of the vector \({{\varvec c}_{u-S} }\).

The variance of the \({\widehat{PAR}_p }\) is estimated by replacing, in Eq. 6, \({\left( {p,RR} \right)}\) with \({\left( {\hat {p},\widehat {RR}} \right)}\), Σ with the estimated variance-covariance matrix of \({\left( {\varvec{\hat {{\beta_1 ^\prime}} \hat {{\beta_2^\prime}}}} \right)}\) obtained from the pooled logistic regression model or Poisson regression model used to fit (Eq. 5). In a cohort study, the multinomial distribution is used to estimate the variance-covariance matrix of \({\hat {p}}\) , where p = (\(p_{1,1}, p_{1,2}, \ldots, p_{ST}\)), and \({Cov(\hat {p}_{st} ,\hat {p}_{uv} )=\left\{ {\begin{array}{l} \hat {p}_{st} (1-\hat {p}_{st} )/n\quad \quad if\;s=u\;\& \;t=v\quad \\ -\hat {p}_{st} \hat {p}_{uv} /n\quad \quad \quad if\;s\ne u\;or\;u\ne v \\ \end{array}} \right.}\), and n is the total number of units of person-time of follow-up observed.

In the spirit of transformation suggested by Leung and Kupper [31], to improve the asymptotic behavior of the 95% confidence intervals of \({\widehat{PAR}_p }\) and to ensure that the confidence intervals remain within the range of –100% to 100%, it is useful to calculate the confidence intervals using the Fisher’s Z transformation, that is

$$ \widehat{Var}\left[ {Fisherz \left( {\widehat{PAR}_p } \right)} \right]\approx \frac{1}{\left[ \left( 1 + \widehat{PAR}_p \right) \left( {1-\widehat{PAR}_p } \right)\right]^2}\widehat{Var}\left( {\widehat{PAR}_p } \right) $$

Then the 95% confidence interval for the \({\widehat{PAR}_p }\) is estimated as

$$ \frac{e^{2\left[\widehat{PAR}_p \pm 1.96\sqrt {\widehat {Var}[Fisherz(\widehat{PAR}_p )]}\right]}-1}{e^{2\left[\widehat{PAR}_p \pm 1.96\sqrt {\widehat {Var}[Fisherz(\widehat{PAR}_p)]}\right] }+1} $$

, where Fisherz \(({\widehat{PAR}_p })= \log \left[\sqrt{\frac{1+{\widehat{PAR}_p }}{1-{\widehat{PAR}_p }}}\right].\)

In a cohort study, it can be shown \({Cov \left( {\varvec{\hat {{p}}, \hat {{\beta}}}} \right)\approx 0}\) by a double expectation argument: The estimators \({\varvec {\hat { \beta}}}\) and \({\varvec {\hat {p}}}\) are the solutions of the following estimating equations,

$$ {\varvec U}_\beta \left( {\varvec{ \beta, p}} \right)=\sum\limits_{i=1}^n {\frac{\partial g({\varvec e}_i , {\varvec c}_i ;{\varvec \beta})}{\partial({\varvec\beta^{\prime}},{\varvec p^{\prime}})^{\prime}}} \left[ {Y_i -\hbox{E} ( Y_i \vert g({{{\varvec e}_i , {\varvec c}_i ; {\varvec \beta}}}))} \right ]={\varvec 0} $$
$$ {\varvec U}_p \left( {\varvec{ \beta, p}} \right) = \left( \begin{array}{c} {\varvec 0}_{(S+T-p_1-p_2)\times 1}\\ \sum \limits_{i=1}^n \left[ {\varvec I}({\varvec e}_i ={\varvec E}_s \;\& \; {\varvec c}_i ={\varvec C}_t )-\hbox{E}({\varvec I}({\varvec e}_i ={\varvec E}_s \;\& \; {\varvec c}_i ={\varvec C}_t ))\right] \end{array}\right)={\varvec 0}, $$

where Y i is 1 if the unit of person-time is a case and 0 otherwise, \({g({\varvec e}_i , {\varvec c}_i ;{\varvec \beta})}\) will typically be the expit or exponential function, depending on whether pooled logistic regression or Poisson regression is used to estimate \({{\varvec \beta }}\), E(·) is the expectation operator, and \({{\varvec I}(\cdot )}\) is an S + T vector of indicator functions which take values 1 when the condition inside the parentheses is true and 0 otherwise. Because they are unbiased score functions, \({\hbox{E}\left[ {{\varvec U}_{\beta i} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right)} \right]={\varvec 0}}\) and \({\hbox{E}\left[ {{\varvec U}_{pi} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right)} \right]={\varvec 0}}\), i = \( 1,\ldots,n\). This implies that

$$ Cov\left[ {{\varvec U}_{\beta i} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right),{\varvec U}_{pi} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right)} \right]=\hbox{E}_{Y_i ,{\varvec {c}}_i ,{\varvec {e}}_i } \left[ {{\varvec U}_{\beta i} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right){\varvec U}_{pi}^{\prime}\left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right)} \right]=\hbox{E}_{c,e} \hbox{E}_{Y\vert {\varvec {c}},{\varvec {e}}} \left[ {{\varvec U}_{\beta i} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right){\varvec U}_{pi}^{\prime} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right)} \right] $$
$$ =\hbox{E}_{{\varvec {c}}, {\varvec {e}}} \hbox{E}_{Y\vert {\varvec {c}}, {\varvec {e}}} \left[ {\frac{\partial g({ {\varvec e}_i , {\varvec c}_i ;{\varvec \beta} })}{\partial( {\varvec \beta^{\prime} },{\varvec p^{\prime}})^{\prime}}[Y_i -\hbox{E}(Y_i \vert g({ {\varvec e}_i , {\varvec c}_i ;{\varvec \beta} }))] \left( \begin{array}{c} {\varvec 0}_{(S+T-p_1-p_2)\times 1}\\ \sum \limits_{i=1}^n \left[ {\varvec I}({\varvec e}_i ={\varvec E}_s \;\& \; {\varvec c}_i ={\varvec C}_t )-\hbox{E}({\varvec I}({\varvec e}_i ={\varvec E}_s \;\& \; {\varvec c}_i ={\varvec C}_t ))\right] \end{array}\right)^{\prime}} \right] $$
$$ =\hbox{E}_{\varvec {c},\varvec {e}} \left[ {\frac{\partial g({ {\varvec e}_i , {\varvec c}_i ; {\varvec \beta} })}{\partial( {\varvec \beta^{\prime} },{\varvec p^{\prime}})^{\prime}}\left( \begin{array}{c} {\varvec 0}_{(S+T-p_1-p_2)\times 1}\\ \sum \limits_{i=1}^n \left[ {\varvec I}({\varvec e}_i ={\varvec E}_s \;\& \; {\varvec c}_i ={\varvec C}_t )-\hbox{E}({\varvec I}({\varvec e}_i ={\varvec E}_s \;\& \; {\varvec c}_i ={\varvec C}_t ))\right] \end{array}\right)^{\prime}} \right]\hbox{E}_{Y\vert {\varvec {c}},{\varvec {e}}} \left[ {Y_i -\hbox{E}(Y_i \vert g({{\varvec e}_i , {\varvec c}_i ; {\varvec \beta}}))} \right]={\varvec 0}. $$

Appendix 2. Sample SAS code for calculating the \({\widehat{PAR_P}}\)

Program:

title ‘make variance-covariance matrix of beta coefficients’;

proc logistic descending data=all covout outest=betas;

model bladder=

   volrnk0 volrnk1 volrnk2 volrnk3

/* lowest 4 quintiles of fluid intake */

   region1 region2 region3 region4

/* geographic regions */

   agegrp2 - agegrp8

/* 5-year age groups */

   smkc

/* current smoking */

   packyr2-packyr6

/* categories of pack-years */

   period1 period2 period3 period4

/* calendar time periods */

   calor2-calor5

/* highest 4 quintiles of caloric intake */

   fruv1-fruv3;

/* lowest 3 categories of fruit-and-

 

vegetable intake */

title ‘make dataset of joint prevalences of modifiable and unmodifiable risk

     factors’;

proc sort data=all; by

   volrnk0 volrnk1 volrnk2 volrnk3

   region1 region2 region3 region4

   agegrp2 - agegrp8

   smkc

   packyr2-packyr6

   period1 period2 period3 period4

   calor2-calor5

   fruv1-fruv3;

run;

proc means noprint data=all; var bladder;

output out=phats n=fq;

by

   volrnk0 volrnk1 volrnk2 volrnk3

   region1 region2 region3 region4

   agegrp2 - agegrp8

   smkc

   packyr2-packyr6

   period1 period2 period3 period4

   calor2-calor5

   fruv1-fruv3;

run;

%par(bdata=betas, pdata=phats, n_or_p=n, n_or_pname=fq,

fixedvar=agegrp2 agegrp3 agegrp4 agegrp5 agegrp6 agegrp7 agegrp8 period1

period2 period3 period4

region2 region3 region4 region5 calor2 calor3 calor4 calor5

fruv862 fruv863 fruv861

modvar=smkc packyr2 packyr3 packyr4 packyr5 packyr6

volrnk0 volrnk1 volrnk2 volrnk3);

Output:

option for the variance-covariance matrix of the prevalences is FIXED .

Partial PAR (95% CI) for

modifiable vbls : VOLRNK0 VOLRNK1 VOLRNK2 VOLRNK3 SMKC PACKYR2

PACKYR3 PACKYR4 PACKYR5 PACKYR6

fixed vbls : AGEGRP2 AGEGRP3 AGEGRP4 AGEGRP5 AGEGRP6 AGEGRP7 AGEGRP8

PERIOD1 PERIOD2 PERIOD3 PERIOD4 REGION2 REGION3 REGION4 REGION5 CALOR2

CALOR3 CALOR4 CALOR5 FRUV862 FRUV863 FRUV861

      0.692 (0.366, 0.869)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Spiegelman, D., Hertzmark, E. & Wand, H.C. Point and interval estimates of partial population attributable risks in cohort studies: examples and software. Cancer Causes Control 18, 571–579 (2007). https://doi.org/10.1007/s10552-006-0090-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10552-006-0090-y

Keywords

Navigation