Abstract
With the explosive growth of data, it is a challenge to infer the quantity of interest by combining the existing different research data about the same topic. In the case-cohort setting, our aim is to improve the efficiency of parameter estimation for Cox model by using subgroup information in the aggregate data. So we put forward the generalized moment method (GMM) to use the auxiliary survival information at some critical time points. However, the auxiliary information is likely obtained from other studies or populations, two extended GMM estimators are proposed to account for multiplicative and additive inconsistencies. We establish the consistency and asymptotic normality of the proposed estimators. In addition, the uniform consistency and asymptotic normality of Breslow estimator are also presented. From the asymptotic normality, we show that the proposed approaches are more efficient than the traditional weighted estimating equation method. In particular, if the number of subgroups is equal to one, the asymptotic variance-covariances of the GMM estimators are identical with the weighted score estimate. Some simulation studies and a real data study demonstrate the proposed methods and theories. In the numerical studies, our approaches are even better than the full cohort estimator and the extended GMM methods are robust.
Similar content being viewed by others
References
Breslow, N. E. (1972). Discussion of the paper by D. R. Cox. Journal of the Royal Statistical Society, Series B, 34, 216–217.
Breslow, N. E., & Chatterjee, N. (1999). Design and analysis of two-phase studies with binary outcome 460 applied to Wilms tumour prognosis. Journal of the Royal Statistical Society, 48, 457–468.
Chen, K., & Lo, S. H. (1999). Case-cohort and case-control analysis with Cox’s model. Biometrika, 86, 755–764.
Cox, D. R. (1972). Regression models and life tables. Journal of the Royal Statistical Society, Series B, 34, 187–220.
Cox, D. R. (1975). Partial likelihood. Biometrika, 96, 1348–1360.
D’Angio, G. J., Breslow, N., Beckwith, J. B., Evans, A., Baum, E., Delorimier, A., Fernbach, D., Hrabovsky, E., Jones, B., Kelalis, P., Othersen, H. B., Tefft, M., & Thomas, P. R. M. (1989). Treatment of Wilmstumor. Results of the third national Wilmstumor study. Cancer, 64, 349–360.
Fleming, T. R., & Harrington, D. P. (1991). Counting processes and survival analysis. New York: Wiley-Interscience.
Green, D. M., Breslow, N. E., Bechwith, J. B., Finklestein, J. Z., Grundy, P. E., Thomas, P. R., Kim, T., Shochat, S. J., Haase, G. M., Ritchey, M. L., Kelalis, P. P., & G. J. DAngio. (1998). Comparison between single-dose and divided-dose administration of dactinomycin and doxorubicin for patients with Wilmstumor: a report from the National Wilms Tumor Study Group. Journal of Clinical Oncology, 16, 237–245.
Huang, C. Y., Qin, J., & Tsai, H. T. (2015). Efficient estimation of the Cox model with auxiliary subgroup survival information. Journal of the American Statistical Association, 111, 787–799.
Kulich, M., & Lin, D. Y. (2004). Improving the efficiency of relative-risk estimation in case-cohort studies. Journal of the American Statistical Association, 99, 832–844.
Noma, H., & Tanaka, S. (2017). Analysis of case-cohort designs with binary outcomes: improving efficiency using whole-cohort auxiliary information. Statistical Methods in Medical Research, 26, 691–706.
Pan, Q., & Schaubel, D. E. (2008). Proportional hazards models based on biased samples and estimated selection probabilities. The Canadian Journal of Statistics, 36, 111–127.
Prentice, R. L. (1986). A case-cohort design for epidemiologic cohort studies and disease prevention trials. Biometrika, 73, 1–11.
Qi, L., Wang, C. Y., & Prentice, R. L. (2005). Weighted estimators for proportional hazards regression with missing covariates. Journal of the American Statistical Association, 100, 1250–1263.
Scheike, T., & Martinussen, T. (2004). Maximum likelihood estimation in Cox’s regression model under case-cohort sampling. Scandinavian Journal of Statistics, 31, 283–293.
Schoenfeld, D. (1982). Partial residuals for the proportional hazards model. Biometrika, 69, 51–55.
Shang, W. P., & Wang, X. (2017). The generalized moment estimation of the additive-multiplicative hazard model with auxiliary survival information. Computational Statistics and Data Analysis, 112, 154–159.
Zeng, D., & Lin, D. Y. (2014). Efficient estimation of semiparametric transformation models for two-phase cohort studies. Journal of the American Statistical Association, 109, 371–383.
Acknowledgements
We thank the referees for their many constructive and insightful comments that have led to significant improvements in this article. This work is supported by the Natural Science Foundation of Henan Province of China (Grant no. 222300420126).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
For convenience, we use \(\mathbf {0}\) to represent a matrix of 0 with proper dimension.
The proofs of the three lemmas are similar to Shang and Wang (2017).
Proof of Theorem 1
Define \(U_w(\varvec{\beta })=(U_w^{(1)}(\varvec{\beta }),\ldots ,U_w^{(p)}(\varvec{\beta }))^\mathrm{T}\) and \({\varvec{Z}}_i=(Z_i^{(1)},\ldots ,Z_i^{(p)})^\mathrm{T},\ i=1,\ldots ,n\), where \(U_w^{(l)}(\varvec{\beta })\) and \(Z_i^{(l)}\) are the lth element of \(U_w(\varvec{\beta })\) and \({\varvec{Z}}_i\), respectively. Let \({\varvec{S}}_w^{(m,l)}(t,\varvec{\beta })=\frac{1}{n}\sum _{i=1}^n w_i Y_i(t) Z_i^{(l)}{\varvec{Z}}_i^{\otimes m}\exp (\varvec{\beta }^\mathrm{T}{\varvec{Z}}_i)\) with \(m=0,1,2\), \(l=1,\dots ,p\), \({\varvec{a}}^{\otimes 0}=1\) and \({\varvec{a}}^{\otimes 1}={\varvec{a}}\). The second order partial derivatives of \(\Psi _n(\varvec{\theta })\) and \(\phi ({\varvec{Z}},\varvec{\theta })\) are established as follows. For \(k=1,\ldots ,K\) and \(l=1,\ldots ,p\),
(I) Applying the strong law of large numbers, the law of large numbers and the double expectation, we obtain that
From the conditions of C1 and C2, we know \(E\left\{ \frac{\partial ^2 \phi ^{(k)}({\varvec{Z}},\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}|_{\varvec{\theta }=\varvec{\theta }_0}\right\} <\infty\) and \(\frac{\partial ^2 \Phi _{ w}^{(k)}(\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}|_{\varvec{\theta }=\varvec{\theta }_0}<\infty , k=1,\ldots ,K\), as n is greater than a large enough positive number. Since Y(t) is less than one,
Following condition C3, we get \(E\{S_w^{(m,l)}(t,\varvec{\beta }_0)\}<\infty\). Thus \(E\left\{ \frac{\partial ^2 U_w^{(l)}(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^\mathrm{T}}|_{\varvec{\beta }=\varvec{\beta }_0}\right\} <\infty\) and \(\frac{\partial ^2 U_w^{(l)}(\varvec{\beta })}{\partial \varvec{\beta }\partial \varvec{\beta }^\mathrm{T}}|_{\varvec{\beta }=\varvec{\beta }_0}<\infty ,l=1,\ldots ,p\) in probability, as n is large enough. From the continuity of \(\frac{\partial ^2 U_w^{(l)}(\varvec{\beta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}\) and \(\frac{\partial ^2 \Phi _{ w}^{(k)}(\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}\), we conclude that \(E\left\{ \frac{\partial ^2 U_w^{(l)}(\varvec{\beta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}\right\}\) and \(E\left\{ \frac{\partial ^2 \phi ^{(k)}({\varvec{Z}},\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}\right\}\) are finite, \(\frac{\partial ^2 \Phi _{ w}^{(k)}(\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}\) and \(\frac{\partial ^2 U_w^{(l)}(\varvec{\beta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}\) are finite in probability, as n is greater than a large enough positive number and \(\varvec{\theta }\) lies in the neighborhood of \(\varvec{\theta }_0\).
Define the overall weighted loss function \(l(\varvec{\theta })= E\{\Psi _n(\varvec{\theta })^\mathrm{T}\} \varvec{\Sigma }^{-1} E\{\Psi _n(\varvec{\theta })\}/2\). Thus
where \(\{\widehat{\varvec{\Sigma }}(\widehat{\varvec{\theta }}_n^1)^{-1} \Psi _n(\varvec{\theta })\}^{(j)}\) and \([\varvec{\Sigma }^{-1} E\{\Psi _n(\varvec{\theta })\}]^{(j)}\) are the jth elements of \(\widehat{\varvec{\Sigma }}(\widehat{\varvec{\theta }}_n^1)^{-1} \Psi _n(\varvec{\theta })\) and \(\varvec{\Sigma }^{-1} E\{\Psi _n(\varvec{\theta })\}\), respectively.
Let \(\mathscr {F}_t=\sigma \{N_i(s),Y_i(s+),{\varvec{Z}}_i;0\le s\le t,i=1,2,\ldots ,n\}\) denote a filtration. Define \(M_i(t,\varvec{\beta })=N_i(t)-\int _0^t Y_i(u)\lambda _0(u)e^{\varvec{\beta }^\mathrm T{\varvec{Z}}_i}\mathrm{d} u,\ i=1,\ldots ,n\). Then \(M_i(t,\varvec{\beta }_0),\ i=1,\ldots ,n\) are martingales on the time interval \([0,\tau ]\). Therefore,
Thus \(E\{\Psi _n(\varvec{\theta }_0)\}=\mathbf {0}\). Since \(\Psi _n(\varvec{\theta })\) is continuous, we obtain that \(|\Psi _n(\varvec{\theta })|\) is less than an arbitrarily small value in probability if \(\varvec{\theta }\) lies in the neighborhood of \(\varvec{\theta }_0\). From Lemma 2, we get that \(\frac{\partial ^2 l_n(\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}\) is a positive definite matrix in probability. Then \(l_n(\varvec{\theta })\) is convex function and \(\widehat{\varvec{\theta }}_n\) is the unique minimum point of \(l_n(\varvec{\theta })\) in the neighborhood of \(\varvec{\theta }_0\). Since \(E\left\{ \frac{\partial ^2 \Psi _n^{(j)}(\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}\right\}\) is bounded and \(E\{\Psi _n(\varvec{\theta }_0)\}=0\), we show that
By Lemma 2 and the positive definite property of \(\varvec{\Sigma }\), we know that \(\frac{\partial ^2 l(\varvec{\theta })}{\partial \varvec{\theta }\partial \varvec{\theta }^\mathrm{T}}|_{\varvec{\theta }=\varvec{\theta }_0}\) is a positive definite matrix. Therefore, \(l(\varvec{\theta })\) is a convex function in the neighborhood of \(\varvec{\theta }_0\). Due to \(l(\varvec{\theta }_0)=0\) and \(l(\varvec{\theta })\ge 0\), we obtain that \(\varvec{\theta }_0\) is the unique minimum point of \(l(\varvec{\theta })\) in the neighborhood of \(\varvec{\theta }_0\). From Lemma 8.3.2 of Fleming and Harrington (1991), we conclude that \(\widehat{\varvec{\theta }}_n\) coverage to \(\varvec{\theta }_0\) in probability.
(II) By the Newton-Raphson method, we obtain that
where \(\tilde{\varvec{\theta }}_n\) is a point on the line of \(\widehat{\varvec{\theta }}_n\) and \(\varvec{\theta }_0\). From the law of large numbers and the continuous mapping theorem, we show that
By Lemma 1, the law of large numbers and the Slutsky Lemma,
The weighted auxiliary estimating function can be rewrote as follow,
where \(\Phi _{F}(\varvec{\theta })=(\Phi _{F}^{(1)}(\varvec{\theta }),\ldots ,\Phi _{F}^{(K)}(\varvec{\theta }))^\mathrm{T}.\)
For \(i=1,\ldots ,n,\)
Then \(\text{ cov }\{\sqrt{n}\Phi _{ w}(\varvec{\theta }_0)\}=E\{\phi ({\varvec{Z}},\varvec{\theta }_0)\phi ({\varvec{Z}},\varvec{\theta }_0)^\mathrm{T}\}+(1-\alpha _0)\rho _0/\alpha _0E\{\phi ({\varvec{Z}},\varvec{\theta }_0)\phi ({\varvec{Z}},\varvec{\theta }_0)^\mathrm{T}\} =\phi _\phi .\)
Because \(\text{ cov }\{ U_w({\varvec{\beta }}_0),\Phi _{ w}({\varvec{\theta }}_0)\} =E[E\{ U_w({\varvec{\beta }}_0)|\mathscr {F}_0\}\Phi _{ w}({\varvec{\theta }}_0)^\mathrm T]=\mathbf {0},\) following the central limit theorem and the martingale central limit theorem, we get that
By the Slutsky Lemma, we show that
where
From the proof of Lemma 2, \(\phi _\gamma ^\mathrm{T}\phi _\phi ^{-1}\phi _\gamma >0\), then
By Lemma 3, we obtain that \(\varvec{\Sigma }_3\) is a positive semi-definite matrix and equal to zero in case that \(K=1\).
The proofs of the asymptotic properties for \({\widehat{\Lambda }}_w(t,{\widehat{{\varvec{\beta }}}}_n)\), \({\widehat{\Lambda }}_w(t,{\widehat{{\varvec{\beta }}}}_n^w)\), \({\widehat{\Lambda }}_w(t,{\widehat{{\varvec{\beta }}}}_M)\) and \({\widehat{\Lambda }}_w(t,{\widehat{{\varvec{\beta }}}}_A)\) can be found in Kulich and Lin (2004). The asymptotic properties of \({\widehat{{\varvec{\beta }}}}_n^w\), \({\widehat{{\varvec{\beta }}}}_M\) and \({\widehat{{\varvec{\beta }}}}_A\) are similar with that of \({\widehat{{\varvec{\beta }}}}_n\). The consistency of \({\widehat{\eta }}_M\) and \({\widehat{\eta }}_A\) are similar with that of \({\widehat{{\varvec{\theta }}}}_n\). Hence we only illustrate the the asymptotic normality for \({\widehat{\eta }}_M\) and \({\widehat{\eta }}_A\)
Proof of asymptotic normality for \({\widehat{\eta }}_M\): Following from the same discussion of the asymptotic normality for \({\widehat{{\varvec{\theta }}}}_M\),
Because
then under the regularity conditions and C1-C4,
Since \(\text{ cov }\left\{ \sqrt{n}\Psi _n(\varvec{\theta }_0),\frac{1}{\sqrt{n}}\sum _{i=1}^n w_i\int _0^{t_*} \frac{1}{S_w^{(0)}(t,\varvec{\beta }_0)}\mathrm{d}M_i(t,{\varvec{\beta }}_0)\right\} =\mathbf {0}\), then
From \(\sqrt{n}({\widehat{\eta }}_M-\eta _0)=\frac{{\widehat{\varphi }}_M\sqrt{n} \{{\widehat{\Lambda }}_w(t_*,{\widehat{{\varvec{\beta }}}}_M)-\Lambda _0(t_*)\}}{{\widehat{\Lambda }}_w(t_*,{\widehat{{\varvec{\beta }}}}_M)\Lambda _0(t_*)} +\frac{\sqrt{n}({\widehat{\varphi }}_M-\varphi _0)}{\Lambda _0(t_*)}\) and the Slutsky Lemma,
Proof of asymptotic normality for \({\widehat{\eta }}_A\): Following from the same discussion of the proof for the asymptotic normality of \({\widehat{\eta }}_M\),
From \(\sqrt{n}({\widehat{\eta }}_A-\eta _0)=\sqrt{n} \{{\widehat{\Lambda }}_w(t_*,{\widehat{{\varvec{\beta }}}}_A)-\Lambda _0(t_*)\}/t_* +\sqrt{n}({\widehat{\varphi }}_A-\varphi _0)/t_*\) and the Slutsky Lemma,
Rights and permissions
About this article
Cite this article
Shang, W. Statistical inference for Cox model under case-cohort design with subgroup survival information. J. Korean Stat. Soc. 51, 884–926 (2022). https://doi.org/10.1007/s42952-022-00166-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-022-00166-4