Abstract
Absolute risk is the probability that a cause-specific event occurs in a given time interval in the presence of competing events. We present methods to estimate population-based absolute risk from a complex survey cohort that can accommodate multiple exposure-specific competing risks. The hazard function for each event type consists of an individualized relative risk multiplied by a baseline hazard function, which is modeled nonparametrically or parametrically with a piecewise exponential model. An influence method is used to derive a Taylor-linearized variance estimate for the absolute risk estimates. We introduce novel measures of the cause-specific influences that can guide modeling choices for the competing event components of the model. To illustrate our methodology, we build and validate cause-specific absolute risk models for cardiovascular and cancer deaths using data from the National Health and Nutrition Examination Survey. Our applications demonstrate the usefulness of survey-based risk prediction models for predicting health outcomes and quantifying the potential impact of disease prevention programs at the population level.
Similar content being viewed by others
References
Aalen O (1978) Nonparametric inference for a family of counting processes. Ann Stat 6(4):701–726
Benichou J, Gail MH (1990) Estimates of absolute cause-specific risk in cohort studies. Biometrics 46(3):813–26
Benichou J, Gail MH (1995) Methods of inference for estimates of absolute risk derived from population-based case–control studies. Biometrics 51(1):182–194
Binder D (1992) Fitting Cox’s proportional hazards models from survey data. Biometrika 79(1):139–147
Breslow N (1974) Covariance analysis of censored survival data. Biometrics 30(1):89–99
Cai T, Hyndman RJ, Wand MP (2002) Mixed model-based hazard estimation. J Comput Graph Stat 11(4):784–798
Cox C, Rothwell S, Madans J, Finucane F, Freid V, Kleinman J, Barbano H, Feldman J (1992) Plan and operation of the NHANES I Epidemiologic Followup Study, 1987. Vital Health Stat Ser 1(27):1–190
Demnati A, Rao JNK (2010) Linearization variance estimators for model parameters from complex survey data. Surv Methodol 36(2):193–201
Deville J (1999) Variance estimation for complex statistics and estimators: linearization and residual techniques. Surv Methodol 25(2):193–204
Engel A, Murphy R, Maurer K, Collins E (1978) Plan and operation of the HANES I augmentation survey of adults 25–74 years United States, 1974–1975. Vital Health Stat Ser 1(14):1–110
Ezzati T, Massey J, Waksberg J, Chu A, Maurer K (1992) Sample design: third National Health and Nutrition Examination Survey. Vital Health Stat Ser 2(113):1–35
Fine J, Gray R (1999) A proportional hazards model for the subdistribution of a competing risk. J Am Stat Assoc 94:496–509
Graubard B, Korn E (2002) Inference for superpopulation parameters using sample surveys. Stat Sci 17(1):73–96
Graubard BI, Fears TR (2005) Standard errors for attributable risk for simple and complex sample designs. Biometrics 61(3):847–855
Gray RJ (2009) Weighted analyses for cohort sampling designs. Lifetime Data Anal 15(1):24–40
Hampel FR (1974) Influence curve and its role in robust estimation. J Am Stat Assoc 69(346):383–393
Kalbfleisch JD, Lawless JF (1988) Likelihood analysis of multi-state models for disease incidence and mortality. Stat Med 7(1–2):149–160
Kish L, Frankel MR (1974) Inference from complex samples. J R Stat Soc Ser B Stat Methodol 36(1):1–22
Korn EL, Graubard BI (1995) Examples of differing weighted and unweighted estimates from a sample survey. Am Stat 49(3):291–295
Korn EL, Graubard BI (1999) Analysis of health surveys. Wiley series in probability and statistics. Wiley, New York
Langholz B, Borgan O (1997) Estimation of absolute risk from nested case–control data. Biometrics 53(2):767–774
Langholz B, Jiao J (2007) Computational methods for case–cohort studies. Comput Stat Data Anal 51(8):3737–3748
Lin D (2000) On fitting Cox’s proportional hazards models to survey data. Biometrika 87(1):37–47
Lin D, Wei L (1989) The robust inference for the Cox proportional hazards model. J Am Stat Assoc 84:1074–1078
Lumley T (2011) Survey: analysis of complex survey samples. R package version 3.26
Lumley TS (2004) Analysis of complex survey samples. J Stat Softw 9(1):1–19
McDowell A, Engel A, Massey J, Maurer K (1981) Plan and operation of the Second National Health and Nutrition Examination Survey, 1976–1980. Vital Health Stat Ser 1(15):1–114
Patterson B, Dayton C, Graubard B (2002) Latent class analysis of complex sample survey data. J Am Stat Assoc 97(459):721–741
Preston D, Lubin JH, Pierce D, McConney ME (1993) Epicure user’s guide. Hirosoft International Corporation, Seattle
Rao JNK, Scott AJ (1987) On simple adjustments to chi-square tests with sample survey data. Ann Stat 15(1):385–397
Reid N, Crepeau H (1985) Influence functions for proportional hazards regression. Biometrika 72(1):1–9
Särndal CE, Swensson B, Wretman J (1992) Model assisted survey sampling. Springer series in statistics. Springer-Verlag, New York
Shah B (2002) Calculus of Taylor deviations. Joint Statistical Meetings, ASA, Minneapolis
Shen Y, Cheng SC (1999) Confidence bands for cumulative incidence curves under the additive risk model. Biometrics 55(4):1093–1100
Williams R (1995) Product-limit survival functions with correlated survival times. Lifetime Data Anal 1(2):171–186
Woodruff RS (1971) Simple method for approximating variance of a complicated estimate. J Am Stat Assoc 66(334):411–414
Acknowledgments
We thank the reviewers for their helpful comments. We are grateful to Dr. Barry Graubard for suggestions he provided to us during the writing of this paper. This research was supported by the intramural research program of the National Cancer Institute.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
Derivatives and Taylor deviates for the piecewise exponential hazard function
To simplify the notation, we express the absolute risk estimate of (4) in a more compact form,
where
and \(\hat{S}(\tau _q)\) is as defined in Eq. (5). Then the deviates for \(\hat{\pi }(\tau _{0{n_0}},\tau _{1n_1}; \varvec{x})\) are
Taking each component in turn, the deviates for \(A_q\) are
where \(T^{(m)}_{q} = \hat{\lambda }_0^{(m)} (\tau _q) \exp ( \hat{\varvec{\beta }}^{(m)^{\prime }}\varvec{x}^{(m)})\) and \(T_{q} = \sum _{m=1}^M T^{(m)}_{q}\), with deviates
The deviates for \(B_q\) are
For \(q>n_0\), we note that \(\hat{S}(\tau _q) = \prod _{l=n_0}^{q-1} B_l\) so that
and \(\Delta _{ijk} \lbrace \hat{S}(\tau _q) \rbrace \) is zero when \(q=n_0\).
The Taylor deviates for \(A_q\), \(B_q\) and \(\hat{S}(\tau _q)\) are each functions of \(\hat{\lambda }_0^{(m)}\) and \(\hat{\varvec{\beta }}^{(m)}\). For \(\hat{\lambda }_0^{(m)}\), we have
where
with \(\mathcal A _{ijk}(\tau _q)\) defined in Eq. (6). The Taylor deviates for each \(\hat{\varvec{\beta }}^{(m)}\) are
where \(\mathcal H (\hat{\varvec{\beta }}^{(m)})\) is the second partial derivative of the pseudo-likelihood,
and \(\bar{\varvec{H}}(\hat{\varvec{\beta }}^{(m)},t_{ijk})\) is defined in Eq. (3). Thus, the deviates for \(\hat{\varvec{\beta }}^{(m)}\) are equivalent to the per-observation update in a Newton–Raphson optimization algorithm where the objective function is the weighted pseudo-likelihood of the Cox regression model.
Derivatives and Taylor deviates for the semiparametric hazard function
Denote the \(N^{(m)}\) ordered observed event times occurring within \([t_0,t_1)\) for the \(m\)th cause as \(u^{(m)}_1 < u^{(m)}_2 < \ldots < u^{(m)}_{N^{(m)}}\). In terms of these event times, Eq. (1) becomes
As with the piecewise model, we determine the derivative and deviates for each component of (14). For the \(\hat{\varvec{\beta }}^{(m)}\), the derivate is
when \(m=1\) and
for competing causes. The Taylor deviates for each \(\hat{\varvec{\beta }}^{(m)}\) are the same as given by Eq. (13) of the piecewise model.
The derivatives for the baseline hazard components are
The Taylor deviates for the baseline hazard of cause \(m\) at observed event time \(t\) are
where
and
In terms of these quantities, the Taylor deviates are
with
We note that the hazard deviates for the piecewise and semiparametric model in Eq. (12) are equivalent when each interval of the piecewise model contains exactly one observed event time.
The final components are the survival functions. The derivatives for each \(\hat{S}_0^{(m)}(u^{(1)}_j)\) are
From the semiparametric estimate of Eq. (7), the Taylor deviates for the baseline survival up to time \(u^{(1)}_j\) for the \(m\)th risk type are
Combining these results, the expression for the Taylor deviates of \(\hat{\pi }(t_0,t_1;\varvec{x})\) are
Rights and permissions
About this article
Cite this article
Kovalchik, S.A., Pfeiffer, R.M. Population-based absolute risk estimation with survey data. Lifetime Data Anal 20, 252–275 (2014). https://doi.org/10.1007/s10985-013-9258-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-013-9258-4