Abstract
Variable selection in the Cox proportional hazards model (the Cox model) has manifested its importance in many microarray genetic studies. However, theoretical results on the procedures of variable selection in the Cox model with a high-dimensional feature space are rare because of its complicated data structure. In this paper, we consider the extended Bayesian information criterion (EBIC) for variable selection in the Cox model and establish its selection consistency in the situation of high-dimensional feature space. The EBIC is adopted to select the best model from a model sequence generated from the SIS-ALasso procedure. Simulation studies and real data analysis are carried out to demonstrate the merits of the EBIC.
Similar content being viewed by others
References
Andersen, P., Gill, R. (1982). Cox’s regression model for counting processes: a large sample study. The Annals of Statistics, 10(4), 1100–1120.
Barabási, A., Gulbahce, N., Loscalzo, J. (2011). Network medicine: a network-based approach to human disease. Nature Reviews Genetics, 12(1), 56–68.
Bogdan, M., Ghosh, J. K., Doerge, R. (2004). Modifying the schwarz bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics, 167(2), 989–999.
Broman, K. W., Speed, T. P. (2002). A model selection approach for the identification of quantitative trait loci in experimental crosses. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 641–656.
Chen, J., Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.
Chen, J., Chen, Z. (2012). Extended bic for small-n-large-p sparse glm. Statistica Sinica, 22(2), 555.
Cookson, W., Liang, L., Abecasis, G., Moffatt, M., Lathrop, M. (2009). Mapping complex disease traits with global gene expression. Nature Reviews Genetics, 10(3), 184–194.
Du, P., Ma, S., Liang, H. (2010). Penalized variable selection procedure for cox models with semiparametric relative risk. Annals of statistics, 38(4), 2092.
Fan, J., Li, R. (2002). Variable selection for cox’s proportional hazards model and frailty model. The Annals of Statistics, 30(1), 74–99.
Fan, J., Li, G., Li, R. (2005). An overview on variable selection for survival analysis. Contemporary multivariate analysis and design of experiments (p. 315). New Jersey: World Scientific.
Fan, J., Feng, Y., Wu, Y. (2010). High-dimensional variable selection for cox’s proportional hazards model. Borrowing strength: theory powering applications—a Festschrift for Lawrence D Brown, vol. 6 (pp. 70–86). Beachwood: IMS Collections.
Fill, J. (1983). Convergence rates related to the strong law of large numbers. The Annals of Probability, 11(1), 123–142.
Fleming, T., Harrington, D. (1991). Counting processes and survival analysis, vol 8. Wiley Online Library.
Gui, J., Li, H. (2005). Penalized cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 21(13), 3001–3008.
Luo, S., Chen, Z. (2013a). Extended bic for linear regression models with diverging number of relevant features and high or ultra-high feature spaces. Journal of Statistical Planning and Inference, 143, 494–504.
Luo, S., Chen, Z. (2013b). Selection consistency of ebic for glim with non-canonical links and diverging number of parameters. Statistics and Its Interface, 6, 275–284.
Rosenwald, A., Wright, G., Chan, W., Connors, J., Campo, E., Fisher, R., et al. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. New England Journal of Medicine, 346(25), 1937–1947.
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Sha, N., Tadesse, M., Vannucci, M. (2006). Bayesian variable selection for the analysis of microarray data with censored outcomes. Bioinformatics, 22(18), 2262–2268.
Siegmund, D. (2004). Model selection in irregular problems: Application to mapping quantitative trait loci. Biometrika, 91, 785–800.
Tibshirani, R., et al. (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16(4), 385–395.
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., et al. (2001). Missing value estimation methods for dna microarrays. Bioinformatics, 17(6), 520–525.
Van de Geer, S. (1995). Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. The Annals of Statistics, 23(5), 1779–1801.
Zhang, H., Lu, W. (2007). Adaptive lasso for cox’s proportional hazards model. Biometrika, 94(3), 691–703.
Zou, H. (2008). A note on path-based variable selection in the penalized proportional hazards model. Biometrika, 95(1), 241–247.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appenidx A: remarks on the assumptions
1.1 Remark on assumption A1
Note that \(S_n, S_n^{(1)}\) and \(S_n^{(2)}\) are summations of i.i.d random variables; it is verified in Fill (1983) that, when the associated random variable satisfies A1.1, for instance, when the components in \({\varvec{Z}}\) are bounded or Gaussian random variables, there exists positive constants \(C_0, C_1\) such that
hold for any positive \(u_n\) such that \(u_n\rightarrow +\infty , n^{-1/6}u_n\rightarrow 0\) as \(n\rightarrow +\infty \). These inequalities and A1.2 are similar to Condition (2.2) and (2.5) in Section 8.2 of Fleming and Harrington (1991). However, it is worth noting that they assume the convergence of \(S_n, S_n^{(l)}\) to \(s, s^{(l)}\) holds for a neighborhood \({\fancyscript{B}}\) of \({\varvec{\beta }}_0\). That is, \(\sup _{t\in [0,1],{\varvec{\beta }}\in {\fancyscript{B}}}\Vert S_n({\varvec{\beta }}, t)-s({\varvec{\beta }}, t)\Vert \rightarrow 0,\;\sup _{t\in [0,1] {\varvec{\beta }}\in {\fancyscript{B}}}\Vert S_n^{(l)}({\varvec{\beta }}, t)-s^{(l)}({\varvec{\beta }}, t)\Vert \rightarrow 0\) for \(l= 1, 2\) in probability. Similarly for the boundedness of \(s, s^{(l)}\). But our assumptions are made at the true value \({\varvec{\beta }}_0\). Moreover, with condition A1.2, it can be deduced that
and
The detailed proofs of inequalities (5) and (6) are provided in the Appendix 4. A1.3 is Condition (2.6) in Section 8.2 of Fleming and Harrington (1991). A1.4 is assumed in Theorem 4.1 in Andersen and Gill (1982); they are regular conditions in counting process theory.
1.2 Remark on assumption A2
Under Assumption A2, we have, for any positive \(u_n\) such that \(u_n\rightarrow +\infty , n^{-1/6}u_n\rightarrow 0\) as \(n\rightarrow +\infty \), there exists positive constant \(C_0\) such that
Without loss of generality, we assume all the diagonal elements of \(\varSigma \left( {\varvec{\beta }}_0,1\right) \) are 1. Then when \({\varvec{a}}_j=1\) for any fixed \(j\) and 0 otherwise, (7) reduces to
Now let us see how A2 is related to A1.1 in the following: Denote \(\xi _{ij}(t)=\int _0^t \left( Z_{ij}(u)-e_j({\varvec{\beta }}_0,u)\right) \mathrm{d}M_i(u)\); it can be shown that \(Cov(\xi _{ij}(t), \xi _{ik}(t) )= \small \left[ \varSigma \left( {\varvec{\beta }}_0,t\right) \right] _{jk} \) in the following:
For any fixed set \(s\), denote \(\xi _i(s)=(\xi _{ij})_{j\in s}\), note that \({\varvec{var}}(\sum \nolimits _{i=1}^n\sum _{j\in s}{\varvec{a}}_j\xi _{ij}/\sqrt{n})=1\) implies \({\varvec{a}}^{\tau }\varSigma ({\varvec{\beta }}_0(s),1){\varvec{a}}=1\). Let \(\lambda _{\min }\) denote the smallest eigenvalue. Since for \(u>0\), we have
Therefore, when \(\lambda _{\min }(\varSigma ({\varvec{\beta }}_0(s),1))\) is bounded from below and \(|s|\) is bounded from above, \(E \exp ( |u\xi _{ij}|)<+\infty \) for all \(j\), inequality (1) holds.
1.3 Remark on assumption A3
The more strict counterpart of A3.1 in linear regression models is the Sparse Riesz Condition. Similar conditions were also assumed in Chen and Chen (2012) for generalized linear regression models. As was relaxed technically in linear regression models, a weaker version of A3.1 can be expected in the Cox models.
Appenidx B: proofs of the main results
Proof of inequality (5)
By definition, for a fixed \(j\),
Assumption A1.2 implies \(\sup _{t\in [0,1]}\small \left| \dfrac{s^{(1)}_{j}({\varvec{\beta }}_0,t)}{s({\varvec{\beta }}_0,t)}\right| \) and \(\sup _{t\in [0,1]}\small \left| \dfrac{1}{s({\varvec{\beta }}_0,t)}\right| \) are bounded from above.
Note that \(\sup _{t\in [0,1]}\small \left| \dfrac{1}{S_{n}({\varvec{\beta }}_0,t)}\right| \) is bounded from above when
and \(n\) is sufficiently large. That is, under this condition, there exists constants \(c_1> 0,c_2>0\) such that
Hence,
\(\square \)
Proof of inequality (6)
By definition, for fixed \(i,j\),
By following the steps in the proof of inequality (5), we can obtain inequality (6). \(\square \)
Proof of Theorem 1
Here we decompose the \(j\)th component of the score function \(U({\varvec{\beta }}_0,t)\) defined in Sect. 2 as
To avoid confusion, let \(\xi _j=\xi _j(1),\;\xi _{1j}=\xi _{1j}(1),\;\xi _{2j}=\xi _{2j}(1)\). For any fixed \(s\in {\fancyscript{A}}_0\), note that for any \(j\in s,\,E_{nj}\left( {\varvec{\beta }}_0,u\right) =E_{nj}\left( {\varvec{\beta }}_0(s),u\right) ,\; e_j\left( {\varvec{\beta }}_0,u\right) =e_j\left( {\varvec{\beta }}_0(s),u\right) \), for any unit vector \({\varvec{u}}\), let \({\varvec{a}}={\varvec{u}}^{\tau }\varSigma ^{-1/2}({\varvec{\beta }}_0(s),1)\). Then
Also, from the remark on Assumption A2, we have \({\varvec{var}}(\sum \limits _{j\in s}{\varvec{a}}_j\xi _{1j}/\sqrt{n})=1\) and \(\Vert {\varvec{a}}\Vert _2^2\le \lambda ^{-1}_{\min }\left( \varSigma ({\varvec{\beta }}_0(s),1)\right) \). Let \(u_n\) satisfy \(n^{-1/6}u_n\rightarrow 0, u_n(\ln n)^{-1/2}\rightarrow +\infty \) as \(n\rightarrow +\infty \); note that for any positive constant \(c\in (0,1)\) independent of \(n\),
the large deviation result of \(\sum _{j\in s}{\varvec{a}}_j\xi _{1j}\) is already given in the remark on Assumption A2, that is, there exists a constant \(C_0\) such that
Now it suffices to show the large deviation of \(\sum _{j\in s}{\varvec{a}}_j\xi _{2j}\). Let \(C_1\) be a positive constant, denote
then
Inequality (5) and the remark on Assumption A1 demonstrate that there exists positive constant \(C_0\) such that
In the following, we verify that condition on \({\fancyscript{C}}\), the new martingale \(\sum _{j\in s}{\varvec{a}}_j\xi _{2j}(t)\) has bounded jumps by following the steps in the proof of Theorem 3.1 in ?. Let \(\bar{M}(t)=\sum \limits _{i=1}^nM_i(t),\bar{N}(t)=\sum \limits _{i=1}^nN_i(t)\), then \(|\triangle (\bar{M}(t))|=|\triangle (\bar{N}(t))|\le 1\).
First,
therefore,
Second, the predictable quadratic variation of \(n^{-1/2}\xi _{2j}(t)\), denoted by \(\left<n^{-1/2}\xi _{2j}(t)\right>\) is bilinear and for all \(j\in \{1,2,\ldots ,p_n\}\),
Obviously, \(b_n^2(t)\le b_n^2(1)\le c_n^2\int _0^1S_n({\varvec{\beta }}_0,u)h_0(u)\mathrm{d}u\). Note that
Assumption A1.2 and Eq. (10) imply that
That is, when \(|s|=O(1)\), condition on \({\fancyscript{C}}\), there exists constants \(b^2=O(\frac{u_n^2}{n}),\;K=O(\frac{u_n}{n})\) such that
According to Lemma 2.1 in Van de Geer (1995), we have
since \(u_n^2/n\rightarrow 0\), when \(n\) is sufficiently large, there exists an arbitrarily large positive constant \(M\) such that
Hence, together with (8) and (9), because of the arbitrariness of \(c\), we know that there exists positive constants \(c_0\) independent of \(j\) and an arbitrarily small positive \(\varepsilon \) such that
When \({\varvec{a}}_j=1\) and 0 otherwise, we have
over \(j\in \{1,2,\ldots ,p_n\}\). \(\square \)
Proof of Theorem 2
For any unit vector \({\varvec{w}}(s)\), let \({\varvec{\beta }}(s)={\varvec{\beta }}_0(s)+\psi _n{\varvec{w}}(s)\) where \(\psi _n\) satisfies (4). Under Assumption A3, for all \(s\in {\fancyscript{A}}_0\), the mean value theorem implies that there exists \(\tilde{{\varvec{\beta }}}(s)\) satisfying \(\Vert \tilde{{\varvec{\beta }}}(s)-{\varvec{\beta }}_0(s)\Vert _2\le \Vert \psi _n{\varvec{w}}(s)\Vert _2\) such that
Hence, we have
By noting that \(k_n=O(1),p_n=O(n^{\kappa })\) and letting \(u_n=\frac{1-\varepsilon }{2\sqrt{nk_n}}\lambda _{1,n}\psi _n\), \(n^{-1/6}u_n\rightarrow 0, u_n(\ln n)^{-1/2}\rightarrow +\infty \). According to (2), it follows that
for some positive constants \(C_0,C_1,C_2,\tilde{C}_0\). It converges to 0 as \(n\) goes to infinity. Because \(l_n\left( {\varvec{\beta }}(s)\right) \) is a concave function for any \({\varvec{\beta }}(s)\), we get the desired result. \(\square \)
Proof of Theorem 3
Note that \(\{s:s\ne s_0, |s|\le Cp_0\}={\fancyscript{A}}_1 \cup {\fancyscript{A}}_0\), if we can prove that when \(\gamma > 1- \frac{1}{2\kappa }\), as \(n\rightarrow +\infty \),
and
then we will have completed the proof. Since asymptotically, \(\ln \tau ({\fancyscript{S}}_j)=j\kappa \ln n(1+o(1))\),
\(\text{ EBIC }_{\gamma }(s)\le \text{ EBIC }_{\gamma }(s_{0n})\) implies
-
(1)
When \(s\in {\fancyscript{A}}_1\), note that
$$\begin{aligned} -\dfrac{1+2\gamma \kappa }{2}\left( |s_{0n}|-|s|\right) \ln n\ge -\dfrac{1+2\gamma \kappa }{2}|s_{0n}|\ln n\ge -C\ln n \end{aligned}$$for some positive constant \(C\) when \(-\dfrac{1}{2\kappa }<\gamma \le 1 \) and \(\kappa \) is a positive constant. Therefore, if we can show that
$$\begin{aligned} P(\sup \{l_n (\hat{{\varvec{\beta }}}(s))-l_n (\hat{{\varvec{\beta }}}(s_{0n})): s\in {\fancyscript{A}}_1\}\ge -C\ln n)\rightarrow 0, \end{aligned}$$(13)then we will have (11). Now, consider \(\tilde{s}=s\cup s_{0n}\) and \({\varvec{\beta }}(\tilde{s})\) near \({\varvec{\beta }}_0(\tilde{s})\). Taylor expansion shows that
$$\begin{aligned} l_n\left( {\varvec{\beta }}(\tilde{s})\right) -l_n\left( {\varvec{\beta }}_0(\tilde{s})\right)&\le \left( {\varvec{\beta }}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s}) \right) ^{\tau }U({\varvec{\beta }}_0(s))\nonumber \\&-\dfrac{(1-\varepsilon )\lambda _{1,n}}{2} \left\| {\varvec{\beta }}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s})\right\| _2^2. \end{aligned}$$Let \(\breve{{\varvec{\beta }}}(\tilde{s})\) be augmented \(\hat{{\varvec{\beta }}}(s)\) with components in \(\tilde{s}\cap s^c\) being 0, then \(l_n\left( \hat{{\varvec{\beta }}}(s)\right) =l_n\left( \breve{{\varvec{\beta }}}(\tilde{s})\right) \) and \(\Vert \breve{{\varvec{\beta }}}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s})\Vert _2\ge |{\varvec{\beta }}_{0,\min }|\), where \(|{\varvec{\beta }}_{0,\min }|=\min \left\{ |{\varvec{\beta }}_{0,j}|:j\in s_{0n}\right\} \). The concavity of \(l_n\left( {\varvec{\beta }}(s)\right) \) implies
$$\begin{aligned} {\fancyscript{M}}_n&= \sup \left\{ l_n\left( {\varvec{\beta }}(\tilde{s})\right) -l_n \left( {\varvec{\beta }}_0(\tilde{s})\right) :s\in {\fancyscript{A}}_1, \Vert {\varvec{\beta }}(\tilde{s})- {\varvec{\beta }}_0(\tilde{s})\Vert _2\ge |{\varvec{\beta }}_{0,\min }|\right\} \\&\le \sup \left\{ l_n\left( {\varvec{\beta }}(\tilde{s})\right) -l_n \left( {\varvec{\beta }}_0(\tilde{s})\right) :s\in {\fancyscript{A}}_1, \Vert {\varvec{\beta }}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s})\Vert _2=|{\varvec{\beta }}_{0,\min }|\right\} . \end{aligned}$$Since for any fixed \(\tilde{s}\), when \(\Vert {\varvec{\beta }}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s})\Vert _2=|{\varvec{\beta }}_{0,\min }|\),
$$\begin{aligned} l_n\left( {\varvec{\beta }}(\tilde{s})\right) -l_n\left( {\varvec{\beta }}_0(\tilde{s})\right) \le |{\varvec{\beta }}_{0,\min }|\Vert U_j({\varvec{\beta }}_0(\tilde{s}))\Vert _{+\infty }-{\varvec{\beta }}_{0,\min }^2\dfrac{(1-\varepsilon )\lambda _{1,n}}{2}. \end{aligned}$$Therefore,
$$\begin{aligned} P\left( {\fancyscript{M}}_n\ge -{\varvec{\beta }}_{0,\min }^2\dfrac{(1-\varepsilon )\lambda _{1,n}}{4}\right)&\le k_np_n^{k_n}P(\Vert U_j({\varvec{\beta }}_0(\tilde{s}))\Vert _{+\infty }\nonumber \\&\ge \dfrac{|{\varvec{\beta }}_{0,\min }|(1-\varepsilon )\lambda _{1,n}}{4}). \end{aligned}$$When \(n^{1/6-\delta }=O(\lambda _{1,n}/\sqrt{n})\) for some \(0<\delta <1/6\).
$$\begin{aligned}&P(\sup \{l_n (\hat{{\varvec{\beta }}}(s))-l_n (\hat{{\varvec{\beta }}}(s_{0n})): s\in {\fancyscript{A}}_1\}\ge -C\ln n)\\&\quad \le P\left( {\fancyscript{M}}_n\ge -C\ln n\right) \le P\left( {\fancyscript{M}}_n\ge -{\varvec{\beta }}_{0,\min }^2\dfrac{(1-\varepsilon )\lambda _{1,n}}{4}\right) \\&\quad \le k_np_n^{k_n}P(\Vert U_j({\varvec{\beta }}_0(\tilde{s}))\Vert _{+\infty }\ge \sqrt{n}n^{1/6-\delta })\le c_0\exp \left( -c_1n^{1/3-2\delta }+\kappa \ln n\right) . \end{aligned}$$It converges to 0 when \(n\) goes to \(\infty \); inequality (13) is thus obtained.
-
(2)
When \(s\in {\fancyscript{A}}_0\) and \(s\ne s_{0n}\), let \(m=|s|-|s_{0n}|,\text{ EBIC }_{\gamma }(s)\le \text{ EBIC }_{\gamma }(s_{0n})\) if and only if
$$\begin{aligned} l_n(\hat{{\varvec{\beta }}}(s))-l_n(\hat{{\varvec{\beta }}}(s_{0n}))\ge m[0.5\ln n+\gamma \ln p_n] \approx \dfrac{m(1+2\gamma \kappa )\ln n}{2}. \end{aligned}$$From the assumptions, we can see that
$$\begin{aligned} l_n(\hat{{\varvec{\beta }}}(s))-l_n(\hat{{\varvec{\beta }}}(s_{0n}))\!&\le \! l_n(\hat{{\varvec{\beta }}}(s))-l_n({\varvec{\beta }}(s_{0n}))=l_n\left( \hat{{\varvec{\beta }}}(s)\right) -l_n\left( {\varvec{\beta }}_0(s)\right) \\ \!&\le \!\left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) ^{\tau }U\left( {\varvec{\beta }}_0(s),1\right) \\&-\dfrac{1}{2}\left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) ^{\tau }I\left( \widetilde{{\varvec{\beta }}}(s),1\right) \left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) \\ \!&\le \!\left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) ^{\tau }U\left( {\varvec{\beta }}_0(s),1\right) \\&-\dfrac{1-\varepsilon }{2}\left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) ^{\tau }I\left( {\varvec{\beta }}_0(s),1\right) \left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) \\ \!&\le \! \max _{{\varvec{\beta }}}\left[ {\varvec{\beta }}^{\tau }U\left( {\varvec{\beta }}_0(s),1\right) -\dfrac{1-\varepsilon }{2}{\varvec{\beta }}^{\tau }I\left( {\varvec{\beta }}_0(s),1\right) {\varvec{\beta }}\right] \\ \!&\le \! \left[ {\varvec{\beta }}^{\tau }U\left( {\varvec{\beta }}_0(s),1\right) \right] |_{{\varvec{\beta }}=[(1-\varepsilon )I\left( {\varvec{\beta }}_0(s),1\right) ]^{-1}U\left( {\varvec{\beta }}_0(s),1\right) }\\ \!&= \! \frac{1}{2n(1-\varepsilon )}U^{\tau }\left( {\varvec{\beta }}_0(s),1\right) \left[ \frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}\right] ^{-1}U\left( {\varvec{\beta }}_0(s),1\right) , \end{aligned}$$where \(\varepsilon \) is an arbitrary positive value. Note that \(m\) is finite; therefore, if we can show that for any fixed positive integer \(m\), when \(\gamma >1-\frac{1}{2\kappa }\),
$$\begin{aligned}&P\left( \max \limits _{s\in {\fancyscript{A}}_0, |s|=m+|s_{0n}|}\frac{1}{2n(1-\varepsilon )}U^{\tau }\left( {\varvec{\beta }}_0(s),1 \right) \left[ \frac{I\left( {\varvec{\beta }}_0(s),1 \right) }{n}\right] ^{-1}U\left( {\varvec{\beta }}_0(s),1\right) \right. \nonumber \\&\left. \quad \ge \dfrac{m(1+2\gamma \kappa )\ln n}{2}\right) \rightarrow 0, \end{aligned}$$(14)then we will have (12). Denote
$$\begin{aligned} {\fancyscript{T}}_1&= \left\{ \max _{s\in {\fancyscript{A}}_0}\Vert [\frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}]^{-1}- \varSigma ^{-1}\left( {\varvec{\beta }}_0(s),1\right) \Vert _{+\infty }\le \dfrac{C_1u_n}{\sqrt{n}}\right\} \\ {\fancyscript{T}}_2&= \left\{ \max _{s\in {\fancyscript{A}}_0}\frac{U^{\tau }\left( {\varvec{\beta }}_0(s),1\right) U \left( {\varvec{\beta }}_0(s),1\right) }{|s|}\le nu_n^2\right\} . \end{aligned}$$Inequalities (6) and (2) show that
$$\begin{aligned} P\left( {\fancyscript{T}}_1^c\right)&\le \frac{C_0}{u_n}\exp \left( -\frac{u_n^2}{2}+2\kappa \ln n\right) ;\;P\left( {\fancyscript{T}}_2^c\right) \nonumber \\&\le c_0\exp \left( -\frac{(1-\varepsilon )u_n^2}{2}+\kappa \ln n\right) . \end{aligned}$$(15)Therefore, we have
$$\begin{aligned}&P\left( \max \limits _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}\right. \left. \frac{1}{2n(1-\varepsilon )}U^{\tau }\left( {\varvec{\beta }}_0(s),1\right) \left[ \frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}\right] ^{-1}U \left( {\varvec{\beta }}_0(s),1\right) \right. \\&\quad \left. \ge \dfrac{m(1+2\gamma \kappa )\ln n}{2}\right) \\&\quad \le P\left( \max _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}U^{\tau }\left( {\varvec{\beta }}_0(s),1 \right) \left[ \frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}\right] ^{-1}U \left( {\varvec{\beta }}_0(s),1\right) \right. \\&\quad \left. \ge mn(1-\varepsilon )(1+2\gamma \kappa )\ln n \mid {\fancyscript{T}}_1,{\fancyscript{T}}_2\right) \\&\quad \quad + P\left( {\fancyscript{T}}_1^c\right) + P\left( {\fancyscript{T}}_2^c\right) . \end{aligned}$$Since under \({\fancyscript{T}}_1,{\fancyscript{T}}_2\),
$$\begin{aligned}&\max _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}\left[ U^{\tau }\left( {\varvec{\beta }}_0(s),1\right) \left| [\frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}]^{-1}-\varSigma ^{-1}\left( {\varvec{\beta }}_0(s),1\right) \right| U\left( {\varvec{\beta }}_0(s),1\right) \right] \\&\quad \le C\sqrt{n}u_n^3=C\frac{(n^{-1/6}u_n)^3}{\ln n}(n\ln n)=o(n\ln n), \end{aligned}$$the two terms in (15) both converge to 0 as \(n\) goes to \(+\infty \) and
$$\begin{aligned}&P\left( \max _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}U^{\tau }\left( {\varvec{\beta }}_0(s),1 \right) \varSigma ^{-1}\left( {\varvec{\beta }}_0(s),1\right) U\left( {\varvec{\beta }}_0(s),1 \right) \right. \\&\quad \left. \ge mn(1-\varepsilon )(1+2\gamma \kappa )\ln n \mid {\fancyscript{T}}_1,{\fancyscript{T}}_2\right) \\&\quad \le CP\left( \max _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}{\varvec{u}}^{\tau }\varSigma ^{-1/2}\left( {\varvec{\beta }}_0(s),1 \right) U\left( {\varvec{\beta }}_0(s),1\right) \right. \\&\quad \left. \ge (1-\delta )\sqrt{mn(1-\varepsilon )(1+2\gamma \kappa )\ln n} \mid {\fancyscript{T}}_1,{\fancyscript{T}}_2\right) , \end{aligned}$$where \(\Vert {{\varvec{u}}}\Vert _2=1,\;\delta \) is an arbitrary positive value. According to (3), it can be further bounded by \(c_0^{\star }\exp \left[ -\dfrac{1-\varepsilon ^{\star }}{2}(1+2\gamma \kappa )m\ln n+m\kappa \ln n\right] \) where \(c_0^{\star }\) is a positive constant. It converges to 0 when \(\gamma >\frac{1}{1-\varepsilon ^{\star }}-\frac{1}{2\kappa }\), where \(\varepsilon ^{\star }\) is an arbitrary positive value; inequality (14) is thus obtained.
\(\square \)
About this article
Cite this article
Luo, S., Xu, J. & Chen, Z. Extended Bayesian information criterion in the Cox model with a high-dimensional feature space. Ann Inst Stat Math 67, 287–311 (2015). https://doi.org/10.1007/s10463-014-0448-y
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-014-0448-y