Abstract
The concept of the population attributable risk (PAR) percent has found widespread application in public health research. This quantity describes the proportion of a disease which could be prevented if a specific exposure were to be eliminated from a target population. We present methods for obtaining point and interval estimates of partial PARs, where the impact on disease burden for some presumably modifiable determinants is estimated in, and applied to, a cohort study. When the disease is multifactorial, the partial PAR must, in general, be used to quantify the proportion of disease which can be prevented if a specific exposure or group of exposures is eliminated from a target population, while the distribution of other modifiable and non-modifiable risk factors is unchanged. The methods are illustrated in a study of risk factors for bladder cancer incidence (Michaud DS et al., New England J Med 340 (1999) 1390). A user-friendly SAS macro implementing the methods described in this paper is available via the worldwide web.
Similar content being viewed by others
References
Levin M (1953) The occurrence of lung cancer in man. Aca Unio Inter Contra Cancrum 9:531–541
Walter SD (1975) The distribution of Levin’s measure of attributable risk. Biometrika 62:371–374
Uter W, Pfahlberg A (1999) The concept of attributable risk in epidemiological practice. Biom J 41(8):985–993
Miettinen O (1974) Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol 99(5):325–332
Cole P, Macmahon B (1971) Attributable risk percent in case-control studies. Br J Prevent Social Med 25(4):242
Last JM (1983) A dictionary of epidemiology. Oxford University Press, New York
Bruzzi P, Green SB, Byar DP, Brinton LA, Schairer C (1985) Estimating the population attributable risk for multiple risk-factors using case-control data. Am J Epidemiol 122(5):904–913
Benichou J (2001) A review of adjusted estimators of attributable risks. Stat Methods Med Res 10:195–216
D’Agostino RB, Lee M-L, Belanger AJ, Cupples LA, Anderson K, Kannel WB (1990) Relation of pooled logistic regression to time dependent Cox regression analysis: the Framingham heart study. Stat Med 9:1501–1515
Walter SD (1980) Prevention for multifactorial diseases. Am J Epidemiol 112(3):409–416
Gefeller O (1992) Comparison of adjusted attributable risk estimators. Stat Med 11(16):2083–2091
Morgenstern H (1983) Morgenstern corrects a conceptual error (Letter). Am J Publ Health 73(6):703–703
Korn EL, Graubard BI, Midthune D (1997) Time-to-event analysis of longitudinal follow-up of a survey: choice of the time-scale. Am J Epidemiol 145(1):72–80
Basu S, Landis JR (1995) Model-based estimation of population attributable risk under cross-sectional sampling. Am J Epidemiol 142(12):1338–1343
Greenland S, Drescher K (1993) Maximum-likelihood-estimation of the attributable fraction from logistic-models. Biometrics 49(3):865–872
Benichou J, Chow WH, McLaughlin JK, Mandel JS, Fraumeni JF (1998) Population attributable risk of renal cell cancer in Minnesota. Am J Epidemiol 148(5):424–430
Wilson PD, Loffredo CA, Correa-Villasenor A, Ferencz C (1998) Attributable fraction for cardiac malformations. Am J Epidemiol 148(5):414–423
Wacholder S, Benichou J, Heineman E (1994) Attributable risk—advantages of a broad definition of exposure (vol 140, pg 303,1994). Am J Epidemiol 140(7):668–668
Benichou J (1991) Methods of adjustment for estimating the attributable risk in case-control studies; a review. Stat Med 10:1753–1773
Michaud DS, Spiegelman D, Clinton SK, Willett WC, Giovannucci EL (1999) Total fluid intake, specific beverages and bladder cancer risk in the health professional follow-up study. New England J Med 340:1390–1397
Therneau TM, Grambsch PM (2000) Modeling survival data: extending the cox model. Springer-Verlag, New York, New York
Mezzetti M, Ferraroni M, Decarli A, LaVecchia C, Benichou J (1996) Software for attributable risk and confidence interval estimation in case-control studies. Comput Biomed Res 29(1):63–75
Benichou J, Gail MH (1990) Variance calculations and confidence-intervals for estimates of the attributable risk based on logistic-models. Biometrics 46(4):991–1003
Miettinen OS (1974) Proportion of disease caused or prevented by a given exposure, trait or intervention. Am J Epidemiol 99(5):325–332
Third National Health and Nutrition Examination Survey 1989–1994 (1996) NHANES II laboratory data file (CD-ROM). US Department of Health and Human Services National Center for Health Statistics, Hyattsville, MD
National Center for Health Statistics (1993) 1987 National health interview survey. Government Printing Office
Graubard BI, Fears TR (2005) Standard errors for attributable risk for simple and complex sample designs. Biometrics 61(3):847–855
Rich-Edwards JW, Spiegelman D, Garland M, Hertzmark E, Hunter DJ, Colditz GA et al (2002) Physical activity, body mass index and ovulatory disorder infertility. Epidemiology 13:184–190
Morgenstern H, Bursic E (1982) A method for using epidemiologic data to estimate the potential impact of an intervention on the health status of a target population. J Community Health 7:292–309
Drescher K, Becher H (1997) Estimating the generalized impact fraction from case-control data. Biometrics 53(3):1170–1176
Leung HM, Kupper LL (1981) Comparisons Of Confidence-Intervals For Attributable Risk. Biometrics 37(2):293–302
Acknowledgments
Supported by a grant from the National Institutes of Health (CA55075)
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Derivation of the \({Var\left( {\widehat{PAR}_p } \right)}\)
where
where Σ s p st = p .t , and \({{\varvec{RR}_1} =\left( {RR_{1,1} ,RR_{1,2} ,\ldots,RR_{1S}}\right)^\prime }\) and \({{\varvec{RR}_2}=\left( {RR_{2,1} ,RR_{2,2} ,\ldots,RR_{2T} } \right)^\prime }\) are the vectors of the relative risks corresponding to the modifiable and unmodifiable risk factors respectively.
Under the proportional hazards model, \({RR_{1s} =e^{\varvec {\beta}_1^\prime \varvec {e_s}}}\), where e s is the vector of values of the binary indicators corresponding to the sth combination of modifiable exposure variables, of which there are S combinations, and \({RR_{2t} =e^{\varvec {\beta}_2^\prime \varvec {c_t}}}\) where c t is the vector of values of the tth combination of unmodifiable background risk, of which there are T combinations. Then, \({Var\left[ {\left( {\widehat{RR}}^{\prime}_1{\widehat{RR}}^{\prime}_2\right)^\prime } \right]=D\Sigma {D}^\prime}\), where \({\Sigma =Var\left[ {\left( {\varvec{\hat {{\beta_1 ^\prime}}, \hat {{\beta_2^\prime}}}} \right)^\prime } \right]}\), and D = [(D uv ), \(u = 1,\ldots, S + T, v = 1,\ldots, {p_1} + {p_2}\)] where
Under the proportional hazards model, \({\frac{\partial RR_{1,u} }{\partial \beta_{1,v} }=e_{uv} e^{\varvec {\beta}_1^\prime \varvec {e_u}}}\), where e uv is the vth element of the vector e u , and \({\frac{\partial RR_{2,u-S} }{\partial \beta_{2,v-p}}=c_{u-S,v-p_1 } e^{\varvec {\beta}_2^\prime \varvec {c}_{u-S}}}\), where \(c_{{u-S},{v-p_{1}}}\) is the v − p 1 th element of the vector \({{\varvec c}_{u-S} }\).
The variance of the \({\widehat{PAR}_p }\) is estimated by replacing, in Eq. 6, \({\left( {p,RR} \right)}\) with \({\left( {\hat {p},\widehat {RR}} \right)}\), Σ with the estimated variance-covariance matrix of \({\left( {\varvec{\hat {{\beta_1 ^\prime}} \hat {{\beta_2^\prime}}}} \right)}\) obtained from the pooled logistic regression model or Poisson regression model used to fit (Eq. 5). In a cohort study, the multinomial distribution is used to estimate the variance-covariance matrix of \({\hat {p}}\) , where p = (\(p_{1,1}, p_{1,2}, \ldots, p_{ST}\)), and \({Cov(\hat {p}_{st} ,\hat {p}_{uv} )=\left\{ {\begin{array}{l} \hat {p}_{st} (1-\hat {p}_{st} )/n\quad \quad if\;s=u\;\& \;t=v\quad \\ -\hat {p}_{st} \hat {p}_{uv} /n\quad \quad \quad if\;s\ne u\;or\;u\ne v \\ \end{array}} \right.}\), and n is the total number of units of person-time of follow-up observed.
In the spirit of transformation suggested by Leung and Kupper [31], to improve the asymptotic behavior of the 95% confidence intervals of \({\widehat{PAR}_p }\) and to ensure that the confidence intervals remain within the range of –100% to 100%, it is useful to calculate the confidence intervals using the Fisher’s Z transformation, that is
Then the 95% confidence interval for the \({\widehat{PAR}_p }\) is estimated as
, where Fisherz \(({\widehat{PAR}_p })= \log \left[\sqrt{\frac{1+{\widehat{PAR}_p }}{1-{\widehat{PAR}_p }}}\right].\)
In a cohort study, it can be shown \({Cov \left( {\varvec{\hat {{p}}, \hat {{\beta}}}} \right)\approx 0}\) by a double expectation argument: The estimators \({\varvec {\hat { \beta}}}\) and \({\varvec {\hat {p}}}\) are the solutions of the following estimating equations,
where Y i is 1 if the unit of person-time is a case and 0 otherwise, \({g({\varvec e}_i , {\varvec c}_i ;{\varvec \beta})}\) will typically be the expit or exponential function, depending on whether pooled logistic regression or Poisson regression is used to estimate \({{\varvec \beta }}\), E(·) is the expectation operator, and \({{\varvec I}(\cdot )}\) is an S + T vector of indicator functions which take values 1 when the condition inside the parentheses is true and 0 otherwise. Because they are unbiased score functions, \({\hbox{E}\left[ {{\varvec U}_{\beta i} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right)} \right]={\varvec 0}}\) and \({\hbox{E}\left[ {{\varvec U}_{pi} \left( {\varvec{{\hat {\beta}},{\hat{p}}} }\right)} \right]={\varvec 0}}\), i = \( 1,\ldots,n\). This implies that
Appendix 2. Sample SAS code for calculating the \({\widehat{PAR_P}}\)
Program: | |
---|---|
title ‘make variance-covariance matrix of beta coefficients’; | |
proc logistic descending data=all covout outest=betas; | |
model bladder= | |
volrnk0 volrnk1 volrnk2 volrnk3 | /* lowest 4 quintiles of fluid intake */ |
region1 region2 region3 region4 | /* geographic regions */ |
agegrp2 - agegrp8 | /* 5-year age groups */ |
smkc | /* current smoking */ |
packyr2-packyr6 | /* categories of pack-years */ |
period1 period2 period3 period4 | /* calendar time periods */ |
calor2-calor5 | /* highest 4 quintiles of caloric intake */ |
fruv1-fruv3; | /* lowest 3 categories of fruit-and- |
vegetable intake */ | |
title ‘make dataset of joint prevalences of modifiable and unmodifiable risk | |
factors’; | |
proc sort data=all; by | |
volrnk0 volrnk1 volrnk2 volrnk3 | |
region1 region2 region3 region4 | |
agegrp2 - agegrp8 | |
smkc | |
packyr2-packyr6 | |
period1 period2 period3 period4 | |
calor2-calor5 | |
fruv1-fruv3; | |
run; | |
proc means noprint data=all; var bladder; | |
output out=phats n=fq; | |
by | |
volrnk0 volrnk1 volrnk2 volrnk3 | |
region1 region2 region3 region4 | |
agegrp2 - agegrp8 | |
smkc | |
packyr2-packyr6 | |
period1 period2 period3 period4 | |
calor2-calor5 | |
fruv1-fruv3; | |
run; | |
%par(bdata=betas, pdata=phats, n_or_p=n, n_or_pname=fq, | |
fixedvar=agegrp2 agegrp3 agegrp4 agegrp5 agegrp6 agegrp7 agegrp8 period1 | |
period2 period3 period4 | |
region2 region3 region4 region5 calor2 calor3 calor4 calor5 | |
fruv862 fruv863 fruv861 | |
modvar=smkc packyr2 packyr3 packyr4 packyr5 packyr6 | |
volrnk0 volrnk1 volrnk2 volrnk3); | |
Output: | |
option for the variance-covariance matrix of the prevalences is FIXED . | |
Partial PAR (95% CI) for | |
modifiable vbls : VOLRNK0 VOLRNK1 VOLRNK2 VOLRNK3 SMKC PACKYR2 | |
PACKYR3 PACKYR4 PACKYR5 PACKYR6 | |
fixed vbls : AGEGRP2 AGEGRP3 AGEGRP4 AGEGRP5 AGEGRP6 AGEGRP7 AGEGRP8 | |
PERIOD1 PERIOD2 PERIOD3 PERIOD4 REGION2 REGION3 REGION4 REGION5 CALOR2 | |
CALOR3 CALOR4 CALOR5 FRUV862 FRUV863 FRUV861 | |
0.692 (0.366, 0.869) |
Rights and permissions
About this article
Cite this article
Spiegelman, D., Hertzmark, E. & Wand, H.C. Point and interval estimates of partial population attributable risks in cohort studies: examples and software. Cancer Causes Control 18, 571–579 (2007). https://doi.org/10.1007/s10552-006-0090-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10552-006-0090-y