Abstract
This paper studies a novel model averaging estimation issue for linear regression models when the responses are right censored and the covariates are measured with error. A novel weighted Mallows-type criterion is proposed for the considered issue by introducing multiple candidate models. The weight vector for model averaging is selected by minimizing the proposed criterion. Under some regularity conditions, the asymptotic optimality of the selected weight vector is established in terms of its ability to achieve the lowest squared loss asymptotically. Simulation results show that the proposed method is superior to the other existing related methods. A real data example is provided to supplement the actual performance.
Similar content being viewed by others
References
Ando T, Li KC (2014) A model-averaging approach for high-dimensional regression. J Am Stat Assoc 109(505):254–265
Ando T, Li KC (2017) A weight-relaxed model averaging approach for high-dimensional generalized linear models. Ann Stat 45(6):2654–2679
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective. Chapman and Hall-CRC, Boca Raton
Chen L, Yi GY (2020) Model selection and model averaging for analysis of truncated and censored data with measurement error. Electron J Stat 14(2):4054–4109
Claeskens G, Hjort NL (2003) The focused information criterion. J Am Stat Assoc 98(464):900–916
Claeskens G, Hjort NL (2008) Model selection and model averaging. Cambridge University Press, Cambridge
Dong Q, Liu B, Zhao H (2023) Weighted least squares model averaging for accelerated failure time models. Comput Stat Data Anal 184:107743
Du J, Zhang Z, Xie T (2017) Focused information criterion and model averaging in censored quantile regression. Metrika 80(5):547–570
Han P, Kong L, Zhao J, Zhou X (2019) A general framework for quantile estimation with incomplete data. J R Stat Soc Ser B Stat Methodol 81(2):305–333
Hansen BE (2007) Least squares model averaging. Econometrica 75(4):1175–1189
Hansen BE, Racine JS (2012) Jackknife model averaging. J Econom 167(1):38–46
Hoeting J, Madigan D, Raftery A, Volinsky C (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–417
Kaplan EL, Meyer P (1957) Nonparametric estimation from incomplete observations. J Am Stat Assoc 53(282):457–481
Li KC (1987) Asymptotic optimality for \(C_p, C_L\), cross-validation and generalized cross-validation: discrete index set. Ann Stat 15(3):958–975
Li G, Wang Q (2003) Empirical likelihood regression analysis for right censored data. Stat Sin 13(1):51–68
Li M, Wang X (2023) Semiparametric model averaging method for survival probability predictions of patients. Comput Stat Data Anal 185:107759
Li J, Yu T, Lv J, Lee M-LT (2021) Semiparametric model averaging prediction for lifetime data via hazards regression. J R Stat Soc Ser C Appl Stat 70(5):1187–1209
Liang H, Li R (2009) Variable selection for partially linear models with measurement errors. J Am Stat Assoc 104(485):234–248
Liang H, Wang S, Carroll RJ (2007) Partially linear models with missing response variables and error-prone covariates. Biometrika 94(1):185–198
Liang Z, Chen X, Zhou Y (2022) Mallows model averaging estimation for linear regression model with right censored data. Acta Math Appl Sin Engl Ser 38(1):5–23
Liao J, Zou G (2020) Corrected mallows criterion for model averaging. Comput Stat Data Anal 144:106902
Liao J, Zong X, Zhang X, Zou G (2019) Model averaging based on leave-subject-out cross-validation for vector autoregressions. J Econom 209(1):35–60
Liu Q, Okui R (2013) Heteroscedasticity-robust \(C_p\) model averaging. Econom J 16(3):463–472
Longford NT (2005) Model selection and efficiency-is ‘Which model...?’ the right question? J R Stat Soc Ser A Stat Soc 168(3):469–472
Raftery AE, Zheng Y (2003) Discussion: performance of Bayesian model averaging. J Am Stat Assoc 98(464):931–938
Su M, Wang R, Wang Q (2022) A two-stage optimal subsampling estimation for missing data problems with large-scale data. Comput Stat Data Anal 173:107505
Sun Z, Sun L, Lu X, Zhu J, Li Y (2017) Frequentist model averaging estimation for the censored partial linear quantile regression model. J Stat Plan Inference 189:1–15
Tang ML, Tang NS, Zhao PY, Zhu H (2018) Efficient robust estimation for linear models with missing response at random. Scand J Stat 45(2):366–381
Wan ATK, Zhang X, Zou G (2010) Least squares model averaging by mallows criterion. J Econom 156(2):277–283
Wang H, Zou G, Wan ATK (2012) Model averaging for varying-coefficient partially linear measurement error models. Electron J Stat 6:1017–1039
Wen C (2012) Cox regression for mixed case interval-censored data with covariate errors. Lifetime Data Anal 18(3):321–338
Yan X, Wang H, Wang W, Xie J, Ren Y, Wang X (2021) Optimal model averaging forecasting in high-dimensional survival analysis. Int J Forecast 37(3):1147–1155
Zhang X, Liu C-A (2023) Model averaging prediction by K-fold cross-validation. J Econom 235(1):280–301
Zhang T, Wang L (2020) Smoothed empirical likelihood inference and variable selection for quantile regression with nonignorable missing response. Comput Stat Data Anal 144:106888
Zhang X, Zou G, Carroll RJ (2015) Model averaging based on Kullback–Leibler distance. Stat Sin 25:1583–1598
Zhang X, Yu D, Zou G, Liang H (2016) Optimal model averaging estimation for generalized linear models and generalized linear mixed-effects models. J Am Stat Assoc 111(516):1775–1790
Zhang X, Wang H, Ma Y, Carroll RJ (2017) Linear model selection when covariates contain errors. J Am Stat Assoc 112(520):1553–1561
Zhang X, Ma Y, Carroll RJ (2019) MALMEM: model averaging in linear measurement error models. J R Stat Soc Ser B Stat Methodol 81(4):763–779
Zhou M (1992) Asymptotic normality of the ‘synthetic data’ regression estimator for censored survival data. Ann Stat 20:1002–1021
Zhu R, Wan ATK, Zhang X, Zou G (2019) A mallows-type model averaging estimator for the varying-coefficient partially linear model. J Am Stat Assoc 114(526):882–892
Acknowledgements
The authors are grateful to the editor, associate editor and three referees for their comments and suggestions that helped improve the manuscript greatly. This paper is supported by Institute of Digital Finance, Hangzhou City University.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors report there are no competing interests to declare.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1. Regularity conditions
In Appendix 1, some addition notations and regularity conditions would be listed. Let \(J(t)=1-\{1-F(t)\}\{1-G(t)\}\) with \(F(\cdot )\) is the cumulative distribution function of \(\mathbf{{Y}}\). Denote by \(\mathbf{{Q}}(\mathbf{{w}})\) and \(\hat{\mathbf{{Q}}}(\mathbf{{w}})\) are two n order diagonal matrices, and their diagonal elements are \(p_{ii}(\mathbf{{w}})\) and \({\hat{p}}_{ii}(\mathbf{{w}})\), respectively, for \(i=1,\ldots ,n\). Let \(\lambda _{\max }(\cdot )\) and \(\lambda _{\min }(\cdot )\) be the largest and the smallest singular values of the given matrix, respectively. Denote by \(\xi _G=\inf _{\mathbf{{w}}\in W_n}R_G(\mathbf{{w}})\).
-
(C.1)
\(1-G(\tau _J-)>0\) with \(\tau _J=\inf \{t:J(t)=1\}\).
-
(C.2)
\(\max _{1\le i\le n}E(\epsilon ^{4}_i|\mathbf{{X}}_i)<\infty\), a.s. and \(\Vert \varvec{ {\upmu }}\Vert ^2=O(n)\).
-
(C.3)
\(c_1\le n^{-1}\lambda _{\min }(\mathbf{{X}}^\top _{(m)}\mathbf{{X}}_{(m)})\le n^{-1}\lambda _{\max }(\mathbf{{X}}^\top _{(m)}\mathbf{{X}}_{(m)})<c_2\) uniformly in m, a.s., and \(\lambda _{\max }(\varvec{ {\Sigma }}_T)<c_2\), where \(c_1\) and \(c_2\) are two positive constants.
-
(C.4)
\(n^{1/2}k_M/\xi _G=o(1)\) and \(k^2_M/n=o(1)\), a.s., where \(k_M<p_n\) is the number of regressors in the largest candidate model.
-
(C.5)
\(\max _{1\le m\le M}\max _{1\le i\le n}p_{(m),ii}=O(n^{-1/2})\), a.s., where \(p_{(m),ii}\) is the ith diagonal element of \(\mathbf{{P}}_{(m)}\).
-
(C.6)
\(\Vert \hat{\varvec{ {\Sigma }}}_{T,(m)}-{\varvec{ {\Sigma }}}_{T,(m)}\Vert =O_p(n^{-1/2}k_M)\) uniformly in m.
Remark 1
Condition (C.1) is a commonly used condition in right censored data analysis. Similar conditions can be seen Sun et al. (2017), Li and Wang (2003) and Liang et al. (2022). Condition (C.2) is just a standard condition for the linear regression models with measurement error, which is similar to Condition 1 of Zhang et al. (2019). Condition (C.3) is a standard condition for the covariates of the model and the covariance of the measurement error, which is the same as Condition (C.3) in Zhang et al. (2017). Similar condition can also be seen in Condition (C.5) of Liao and Zou (2020). Condition (C.4) places a constraint on the dimension of the regressors in the largest candidate model. It allows \(k_M\) to increase with n, but it is not arbitrary and is limited by Condition (C.4). Condition (C.4) also requires that \(\xi _G\) increases faster than \(n^{1/2}k_M\), which implies that there is no finite candidate model whose bias is 0. This condition is weaker than Condition (C.1) of Zhang et al. (2017), because Zhang et al. (2017) requires that the minimum of the risk tends to infinity faster than \(n^{1/2}p^2_n\). Other similar conditions can be seen Condition (C.6) of Zhang et al. (2016) and Condition (C.9) in Liao and Zou (2020). Condition (C.6) requires that the estimator \(\hat{\varvec{ {\Sigma }}}_{T,(m)}\) is consistent, similar condition can be seen Condition (C.4) in Zhang et al. (2017).
Appendix 2. Proofs of main results
Appendix 2 gives the detailed proofs of the theorems appearing in this paper. We first give a short proof of Lemma 1. Next, we give some additional lemmas to assist the proofs of the theorems. In the following, C represents a general positive constant that can be freely varied under different contexts. The norm of a matrix is the Euclidean norm, which is the largest singular value of the matrix.
Proof of Lemma 1
By the definition of \(C_{G}(\mathbf{{w}})\) in (6), we have
where \(\mathbf{{I}}_n\) is an n order identity matrix. \(\square\)
Lemma 2
Provided that Conditions (C.1)–(C.2) in Appendix 1 hold, as \(n\rightarrow \infty\), we have
Proof of Lemma 2
According to the definitions of \(\mathbf{{Z}}_{{\hat{G}}_n}\) and \(\mathbf{{Z}}_G\), by Condition (C.2), we have
Through straightforward but trivial calculations, we can obtain that
Then, combine with Condition (C.1) and the results in Zhou (1992)
we have
By (14) and (15), it can be obtained that \(\Vert \mathbf{{Z}}_{{\hat{G}}_n}-\mathbf{{Z}}_G\Vert ^2=O_p(1)\). Thus, we complete the proof of Lemma 2. \(\square\)
Lemma 3
Provided that Conditions (C.1)–(C.5) in Appendix 1 hold, as \(n\rightarrow \infty\), we have
Proof of Lemma 3
By the definition of \(R_G(\mathbf{{w}})\) in (7), it follows that
where \(\varvec{ {\Omega }}_G=diag(\sigma ^2_{G,1},\ldots ,\sigma ^2_{G,n})\). Then, we have
Thus, to prove (16), it is equivalent to show
Similar to Zhang et al. (2017), under Conditions (C.2) and (C.3), by Markov inequality, we have
uniformly in m, as \(C_n\rightarrow \infty\), where \(\mathbf{{X}}_{(m),m}\) is the mth column of \(\mathbf{{X}}_{(m)}\). This implies \(\Vert \mathbf{{T}}^\top _{(m)}\mathbf{{X}}_{(m),m}\Vert =O_p(n^{1/2}k^{1/2}_M)\) uniformly in m. Similarly, we can prove that \(\Vert \mathbf{{T}}^\top _{(m)}\mathbf{{X}}_{(m)}\Vert =O_p(n^{1/2}k_M)\) and \(\Vert \mathbf{{T}}^\top _{(m)}\mathbf{{T}}_{(m)}-n{\varvec{ {\Sigma }}}_{T,(m)}\Vert =O_p(n^{1/2}k_M)\) uniformly in m. Then, under Conditions (C.2) and (C.3), with the similar proving steps of (21) in Zhang et al. (2017), we can obtain that uniformly in m,
and
By (19) and (20), under Conditions (C.3) and (C.4), we can derive that
which indirectly indicates that
Similar to (19), we can obtain that \(\lambda _{\max }(\tilde{\mathbf{{X}}}^\top _{(m)}\tilde{\mathbf{{X}}}_{(m)})\le 2nc_2+O_p(n^{1/2}k_M)\). This together with (21), under Condition (C.4), we have
By (22), it can be obtained that \(\lambda _{\max }\{\mathbf{{P}}(\mathbf{{w}})\}=O_p(1)\). Then, we have
and
In addition, by Conditions (C.1) and (C.2), using \(C_r\) inequality, we can prove
Together with (23)–(25), under Condition (C.4), using the same techniques as those for the proof of Theorem 1 in Wan et al. (2010), we can prove (17) and (18). Therefore, Lemma 3 is valid. \(\square\)
Lemma 4
Provided that Conditions (C.2), (C.3), (C.4) and (C.6) in Appendix 1 hold, as \(n\rightarrow \infty\), we have
Proof of Lemma 4
Let \(\hat{\mathbf{{A}}}_{(m)}=n^{-1}(\tilde{\mathbf{{X}}}^\top _{(m)}\tilde{\mathbf{{X}}}_{(m)}-n\hat{\varvec{ {\Sigma }}}_{T,(m)})\) and \({\mathbf{{A}}}_{(m)}=n^{-1}(\tilde{\mathbf{{X}}}^\top _{(m)}\tilde{\mathbf{{X}}}_{(m)}-n{\varvec{ {\Sigma }}_{T,(m)}})\). Note that
Then, we have
By (21), we can easily verify that
that is, \(\sup _m\Vert \mathbf{{A}}^{-1}_{(m)}\Vert \le C<\infty\) uniformly in m. By Condition (C.6), we have \(\Vert {\mathbf{{A}}}_{(m)}-\hat{\mathbf{{A}}}_{(m)}\Vert =n^{-1}\Vert {n(\varvec{ {\Sigma }}_{T,(m)}}-\hat{\varvec{ {\Sigma }}}_{T,(m)})\Vert =O_p(n^{-1/2}k_M)\) uniformly in m. Further, by Condition (C.4), we know that \(\sup _m\Vert \mathbf{{A}}_{(m)}-\hat{\mathbf{{A}}}_{(m)}\Vert \rightarrow 0\) in probability, which implies that \(C\Vert \mathbf{{A}}_{(m)}-\hat{\mathbf{{A}}}_{(m)}\Vert < 1\) in probability. Then, by (27), we have
By (19) and (28), under Condition (C.4), we then have
Thus, we complete the proof of Lemma 4. \(\square\)
Proof of Theorem 1
Under Conditions (C.1)–(C.5) listed in Appendix 1, the idea of proving Theorem 1 is similar to that of the proof of Theorem 1. And the steps in the proof of Theorem 1 are relatively simpler than the proof of Theorem 2. Thus we omit it here. \(\square\)
Proof of Theorem 2
The proposed criterion of (9) can be recognized as
where \(\hat{\mathbf{{Q}}}(\mathbf{{w}})\) is defined in Appendix 1. Obviously, the last term in (29) is unrelated to \(\mathbf{{w}}\). Accordingly, following Li (1987), Theorem 2 is valid if the following three formulas hold,
where \(L_{{\hat{G}}_n}(\mathbf{{w}})=\Vert \varvec{ {\upmu }}-\hat{\varvec{ {\upmu }}}_{{\hat{G}}_n}(\mathbf{{w}})\Vert ^2\). We first consider (30). By Cauchy–Schwartz inequality, we have
By Lemma 3, we know that \(J_1\) is \(o_p(1)\). Note that
By Conditions (C.1) and (C.2), it is clear that \(\Vert \mathbf{{Z}}_G\Vert ^2=O_p(n)\). Then, by Lemmas 2 and 4, under Condition (C.4), we have
As for \(J_3\), according to the results of \(J_1\) and \(J_2\), by Cauchy–Schwartz inequality, we have
The above indicates that (30) is true.
Next, we turn to consider (31). By Lemma 2 and (30), utilizing Cauchy–Schwartz inequality once again, we have
Finally, we consider (32). According to the definition of \(\hat{\varvec{ {\upepsilon }}}_{{\hat{G}}_n}\) in (10), by Lemmas 2 and 4, it can be obtained that
Through some trivial calculations, we can obtain that
where \(\varvec{ {\Omega }}_G=diag(\sigma ^2_{G,1},\ldots ,\sigma ^2_{G,n})\), \(\mathbf{{Q}}(\mathbf{{w}})\) and \(\hat{\mathbf{{Q}}}(\mathbf{{w}})\) are given in Appendix 1. Thus, to prove (32), it is equivalent to prove
By (25) and Markov inequality, it is easy to show that
By (35), under Conditions (C.2) and (C.5), with the similar proving skills of Theorem 2.2 in Liu and Okui (2013), we know that \(\sup _{\mathbf{{w}} \in W_n}\Lambda _i/R_G(\mathbf{{w}})=o_p(1)\) are true for \(i=1,3,9\). By (22), (35), Lemmas 2, 4 and \(J_2\), under Condition (C.4), we have
Denote by \(\tilde{p}=\sup _{\mathbf{{w}} \in W_n}\max _{1\le i\le n}p_{ii}(\mathbf{{w}})\). By Condition (C.5), it is easy to verify that \(\tilde{p}=O_p(n^{-1/2})\). Similar to Lemma 4, we can verify that
Then, for \(\Lambda _4\), combine with (33) and (36), under Condition (C.4), we have
For \(\Lambda _5\), by (33) and (36), under Conditions (C.4) and (C.5), we have
As for \(\Lambda _6\), similarly, we have
Similar to the proving of \(\Lambda _4\), \(\Lambda _5\) and \(\Lambda _6\), we can prove that \(\sup _{\mathbf{{w}} \in W_n}\Lambda _i/R_G(\mathbf{{w}})=o_p(1)\) for \(i=7,8\). Until now, we have proved that (32) is true.
Thus, we complete the proof of Theorem 2. \(\square\)
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, Z., Zhang, C. & Xu, L. Model averaging for right censored data with measurement error. Lifetime Data Anal 30, 501–527 (2024). https://doi.org/10.1007/s10985-024-09620-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10985-024-09620-3