Extended Bayesian information criterion in the Cox model with a high-dimensional feature space

Luo, Shan; Xu, Jinfeng; Chen, Zehua

doi:10.1007/s10463-014-0448-y

Extended Bayesian information criterion in the Cox model with a high-dimensional feature space

Published: 06 March 2014

Volume 67, pages 287–311, (2015)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Shan Luo¹,
Jinfeng Xu² &
Zehua Chen³

660 Accesses
16 Citations
Explore all metrics

Abstract

Variable selection in the Cox proportional hazards model (the Cox model) has manifested its importance in many microarray genetic studies. However, theoretical results on the procedures of variable selection in the Cox model with a high-dimensional feature space are rare because of its complicated data structure. In this paper, we consider the extended Bayesian information criterion (EBIC) for variable selection in the Cox model and establish its selection consistency in the situation of high-dimensional feature space. The EBIC is adopted to select the best model from a model sequence generated from the SIS-ALasso procedure. Simulation studies and real data analysis are carried out to demonstrate the merits of the EBIC.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A sequential feature selection procedure for high-dimensional Cox proportional hazards model

Article 07 May 2022

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

Article 06 June 2016

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

Article 27 May 2023

References

Andersen, P., Gill, R. (1982). Cox’s regression model for counting processes: a large sample study. The Annals of Statistics, 10(4), 1100–1120.
Google Scholar
Barabási, A., Gulbahce, N., Loscalzo, J. (2011). Network medicine: a network-based approach to human disease. Nature Reviews Genetics, 12(1), 56–68.
Google Scholar
Bogdan, M., Ghosh, J. K., Doerge, R. (2004). Modifying the schwarz bayesian information criterion to locate multiple interacting quantitative trait loci. Genetics, 167(2), 989–999.
Google Scholar
Broman, K. W., Speed, T. P. (2002). A model selection approach for the identification of quantitative trait loci in experimental crosses. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4), 641–656.
Google Scholar
Chen, J., Chen, Z. (2008). Extended bayesian information criteria for model selection with large model spaces. Biometrika, 95(3), 759–771.
Google Scholar
Chen, J., Chen, Z. (2012). Extended bic for small-n-large-p sparse glm. Statistica Sinica, 22(2), 555.
Google Scholar
Cookson, W., Liang, L., Abecasis, G., Moffatt, M., Lathrop, M. (2009). Mapping complex disease traits with global gene expression. Nature Reviews Genetics, 10(3), 184–194.
Google Scholar
Du, P., Ma, S., Liang, H. (2010). Penalized variable selection procedure for cox models with semiparametric relative risk. Annals of statistics, 38(4), 2092.
Fan, J., Li, R. (2002). Variable selection for cox’s proportional hazards model and frailty model. The Annals of Statistics, 30(1), 74–99.
Google Scholar
Fan, J., Li, G., Li, R. (2005). An overview on variable selection for survival analysis. Contemporary multivariate analysis and design of experiments (p. 315). New Jersey: World Scientific.
Fan, J., Feng, Y., Wu, Y. (2010). High-dimensional variable selection for cox’s proportional hazards model. Borrowing strength: theory powering applications—a Festschrift for Lawrence D Brown, vol. 6 (pp. 70–86). Beachwood: IMS Collections.
Fill, J. (1983). Convergence rates related to the strong law of large numbers. The Annals of Probability, 11(1), 123–142.
Google Scholar
Fleming, T., Harrington, D. (1991). Counting processes and survival analysis, vol 8. Wiley Online Library.
Gui, J., Li, H. (2005). Penalized cox regression analysis in the high-dimensional and low-sample size settings, with applications to microarray gene expression data. Bioinformatics, 21(13), 3001–3008.
Google Scholar
Luo, S., Chen, Z. (2013a). Extended bic for linear regression models with diverging number of relevant features and high or ultra-high feature spaces. Journal of Statistical Planning and Inference, 143, 494–504.
Google Scholar
Luo, S., Chen, Z. (2013b). Selection consistency of ebic for glim with non-canonical links and diverging number of parameters. Statistics and Its Interface, 6, 275–284.
Google Scholar
Rosenwald, A., Wright, G., Chan, W., Connors, J., Campo, E., Fisher, R., et al. (2002). The use of molecular profiling to predict survival after chemotherapy for diffuse large-b-cell lymphoma. New England Journal of Medicine, 346(25), 1937–1947.
Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2), 461–464.
Google Scholar
Sha, N., Tadesse, M., Vannucci, M. (2006). Bayesian variable selection for the analysis of microarray data with censored outcomes. Bioinformatics, 22(18), 2262–2268.
Google Scholar
Siegmund, D. (2004). Model selection in irregular problems: Application to mapping quantitative trait loci. Biometrika, 91, 785–800.
Google Scholar
Tibshirani, R., et al. (1997). The lasso method for variable selection in the cox model. Statistics in Medicine, 16(4), 385–395.
Google Scholar
Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., et al. (2001). Missing value estimation methods for dna microarrays. Bioinformatics, 17(6), 520–525.
Google Scholar
Van de Geer, S. (1995). Exponential inequalities for martingales, with application to maximum likelihood estimation for counting processes. The Annals of Statistics, 23(5), 1779–1801.
Google Scholar
Zhang, H., Lu, W. (2007). Adaptive lasso for cox’s proportional hazards model. Biometrika, 94(3), 691–703.
Google Scholar
Zou, H. (2008). A note on path-based variable selection in the penalized proportional hazards model. Biometrika, 95(1), 241–247.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Mathematics, Shanghai Jiao Tong University, Shanghai , 200240, China
Shan Luo
Division of Biostatistics, Department of Population Health, School of Medicine, New York University, New York , 10016, USA
Jinfeng Xu
Department of Statistics and Applied Probability, National University of Singapore, Singapore , 117546, Singapore
Zehua Chen

Authors

Shan Luo
View author publications
You can also search for this author in PubMed Google Scholar
Jinfeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Zehua Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shan Luo.

Appendices

Appenidx A: remarks on the assumptions

1.1 Remark on assumption A1

Note that $S_n, S_n^{(1)}$ and $S_n^{(2)}$ are summations of i.i.d random variables; it is verified in Fill (1983) that, when the associated random variable satisfies A1.1, for instance, when the components in ${\varvec{Z}}$ are bounded or Gaussian random variables, there exists positive constants $C_0, C_1$ such that

$$\begin{aligned} P\left( \sup _{t\in [0,1]}\left| S_n({\varvec{\beta }}_0,t)-s({\varvec{\beta }}_0,t)\right| \ge \frac{C_1u_n}{\sqrt{n}}\right)&\le \frac{C_0}{u_n} \exp \left( -\frac{u_n^2}{2}\right) ,\\ P\left( \sup _{t\in [0,1]}\left| S^{(1)}_{nj}({\varvec{\beta }}_0,t)-s^{(1)}_j({\varvec{\beta }}_0,t) \right| \ge \dfrac{C_1u_n}{\sqrt{n}}\right)&\le \dfrac{C_0}{u_n}\exp \left( -\dfrac{u_n^2}{2}\right) ,\\ P\left( \sup _{t\in [0,1]}\left| S^{(2)}_{nij}({\varvec{\beta }}_0,t)-s^{(2)}_{ij} ({\varvec{\beta }}_0,t)\right| \ge \dfrac{C_1u_n}{\sqrt{n}}\right)&\le \dfrac{C_0}{u_n}\exp \left( -\dfrac{u_n^2}{2}\right) \end{aligned}$$

hold for any positive $u_n$ such that $u_n\rightarrow +\infty , n^{-1/6}u_n\rightarrow 0$ as $n\rightarrow +\infty $. These inequalities and A1.2 are similar to Condition (2.2) and (2.5) in Section 8.2 of Fleming and Harrington (1991). However, it is worth noting that they assume the convergence of $S_n, S_n^{(l)}$ to $s, s^{(l)}$ holds for a neighborhood ${\fancyscript{B}}$ of ${\varvec{\beta }}_0$. That is, $\sup _{t\in [0,1],{\varvec{\beta }}\in {\fancyscript{B}}}\Vert S_n({\varvec{\beta }}, t)-s({\varvec{\beta }}, t)\Vert \rightarrow 0,\;\sup _{t\in [0,1] {\varvec{\beta }}\in {\fancyscript{B}}}\Vert S_n^{(l)}({\varvec{\beta }}, t)-s^{(l)}({\varvec{\beta }}, t)\Vert \rightarrow 0$ for $l= 1, 2$ in probability. Similarly for the boundedness of $s, s^{(l)}$. But our assumptions are made at the true value ${\varvec{\beta }}_0$. Moreover, with condition A1.2, it can be deduced that

$$\begin{aligned} P\left( \sup _{t\in [0,1]}\left| E_{nj}({\varvec{\beta }}_0,t)-e_j({\varvec{\beta }}_0,t)\right| \ge \frac{C_1u_n}{\sqrt{n}}\right) \le \frac{C_0}{u_n}\exp \left( -\frac{u_n^2}{2}\right) \end{aligned}$$

(5)

and

$$\begin{aligned} P\left( \sup _{t\in [0,1]}\left| \dfrac{I_{ij}\left( {\varvec{\beta }}_0,t\right) }{n}-\varSigma _{ij}\left( {\varvec{\beta }}_0,t \right) \right| \ge \frac{C_1u_n}{\sqrt{n}}\right) \le \frac{C_0}{u_n}\exp \left( -\frac{u_n^2}{2}\right) . \end{aligned}$$

(6)

The detailed proofs of inequalities (5) and (6) are provided in the Appendix 4. A1.3 is Condition (2.6) in Section 8.2 of Fleming and Harrington (1991). A1.4 is assumed in Theorem 4.1 in Andersen and Gill (1982); they are regular conditions in counting process theory.

1.2 Remark on assumption A2

Under Assumption A2, we have, for any positive $u_n$ such that $u_n\rightarrow +\infty , n^{-1/6}u_n\rightarrow 0$ as $n\rightarrow +\infty $, there exists positive constant $C_0$ such that

$$\begin{aligned} P\left( \left| \sum \limits _{i=1}^n\sum _{j\in s}{\varvec{a}}_j\xi _{ij}\right| \ge \sqrt{n}u_n\right) \le \frac{C_0}{u_n}\exp \left( -\frac{u_n^2}{2}\right) . \end{aligned}$$

(7)

Without loss of generality, we assume all the diagonal elements of $\varSigma \left( {\varvec{\beta }}_0,1\right) $ are 1. Then when ${\varvec{a}}_j=1$ for any fixed $j$ and 0 otherwise, (7) reduces to

$$\begin{aligned} P\left( \left| \sum \limits _{i=1}^n\xi _{ij}\right| \ge \sqrt{n}u_n\right) \le \frac{C_0}{u_n}\exp \left( -\frac{u_n^2}{2}\right) , \;\forall j\in \{1,2,\ldots ,p_n\}. \end{aligned}$$

Now let us see how A2 is related to A1.1 in the following: Denote $\xi _{ij}(t)=\int _0^t \left( Z_{ij}(u)-e_j({\varvec{\beta }}_0,u)\right) \mathrm{d}M_i(u)$; it can be shown that $Cov(\xi _{ij}(t), \xi _{ik}(t) )= \small \left[ \varSigma \left( {\varvec{\beta }}_0,t\right) \right] _{jk} $ in the following:

$$\begin{aligned} <\xi _{ij}, \xi _{ik}> (t) \!&= \! \int _0^t (Z_{ij}-e_j({\varvec{\beta }}_0, u))(Z_{ik}-e_k({\varvec{\beta }}_0, u)) d <M_i, M_i> (u)\\ \!&= \! \int _0^t (Z_{ij}-e_j({\varvec{\beta }}_0, u))(Z_{ik}-e_k({\varvec{\beta }}_0, u)) Y_i(u)\exp \left( {\varvec{z}}_i^{\tau }{\varvec{\beta }}_0\right) h_0(u)\mathrm{d}u;\\ E<\xi _{1j}, \xi _{1k}> (t) \!&= \! \int _0^t EZ_{ij}Z_{ik}Y_i(u)\exp \left( {\varvec{z}}_i^{\tau }{\varvec{\beta }}_0\right) h_0(u)\mathrm{d}u\\&-\int _0^t e_j({\varvec{\beta }}_0, u) EZ_{ik}Y_i(u)\exp \left( {\varvec{z}}_i^{\tau }{\varvec{\beta }}_0\right) h_0(u)\mathrm{d}u\\&-\int _0^t e_k({\varvec{\beta }}_0, u) EZ_{ij}Y_i(u)\exp \left( {\varvec{z}}_i^{\tau }{\varvec{\beta }}_0\right) h_0(u)\mathrm{d}u\\&+ \int _0^t e_j({\varvec{\beta }}_0, u)e_k({\varvec{\beta }}_0, u)EY_i(u)\exp \left( {\varvec{z}}_i^{\tau }{\varvec{\beta }}_0\right) h_0(u)\mathrm{d}u\\ \!&= \! \int _0^t \left[ \dfrac{E \left[ Z_{ij}Z_{ik}Y_i(u)\exp \left( {\varvec{z}}_i^{\tau }{\varvec{\beta }}_0\right) \right] }{s({\varvec{\beta }}_0, u)}-e_j({\varvec{\beta }}_0, u)e_k({\varvec{\beta }}_0, u)\right] \nonumber \\&\times s({\varvec{\beta }}_0, u)h_0(u)\mathrm{d}u\\ \!&= \! \left[ \varSigma \left( {\varvec{\beta }}_0,t\right) \right] _{jk}. \end{aligned}$$

For any fixed set $s$, denote $\xi _i(s)=(\xi _{ij})_{j\in s}$, note that ${\varvec{var}}(\sum \nolimits _{i=1}^n\sum _{j\in s}{\varvec{a}}_j\xi _{ij}/\sqrt{n})=1$ implies ${\varvec{a}}^{\tau }\varSigma ({\varvec{\beta }}_0(s),1){\varvec{a}}=1$. Let $\lambda _{\min }$ denote the smallest eigenvalue. Since for $u>0$, we have

$$\begin{aligned} E \exp \left( u\sum _{j\in s}{\varvec{a}}_j\xi _{ij}\right)&\le E \exp (u \Vert {\varvec{a}}\Vert _2\Vert \xi _i(s)\Vert _2) \nonumber \\&\le \lambda ^{-1/2}_{\min }(\varSigma ({\varvec{\beta }}_0(s),1)) |s|\max _{j} E \exp (u |\xi _{ij}|). \end{aligned}$$

Therefore, when $\lambda _{\min }(\varSigma ({\varvec{\beta }}_0(s),1))$ is bounded from below and $|s|$ is bounded from above, $E \exp ( |u\xi _{ij}|)<+\infty $ for all $j$, inequality (1) holds.

1.3 Remark on assumption A3

The more strict counterpart of A3.1 in linear regression models is the Sparse Riesz Condition. Similar conditions were also assumed in Chen and Chen (2012) for generalized linear regression models. As was relaxed technically in linear regression models, a weaker version of A3.1 can be expected in the Cox models.

Appenidx B: proofs of the main results

Proof of inequality (5)

By definition, for a fixed $j$,

$$\begin{aligned} E_{nj}({\varvec{\beta }}_0,t)-e_j({\varvec{\beta }}_0,t)&= \dfrac{S^{(1)}_{nj}( {\varvec{\beta }}_0,t)}{S_{n}({\varvec{\beta }}_0,t)}-\dfrac{s^{(1)}_{j}( {\varvec{\beta }}_0,t)}{s({\varvec{\beta }}_0,t)}\\&= \dfrac{1}{S_{n}({\varvec{\beta }}_0,t)}\left( S^{(1)}_{nj}( {\varvec{\beta }}_0,t)-s^{(1)}_{j}({\varvec{\beta }}_0,t)\right) \\&-\dfrac{s^{(1)}_{j}({\varvec{\beta }}_0,t)}{S_{n}({\varvec{\beta }}_0,t)s({\varvec{\beta }}_0,t)} \left( S_{n}({\varvec{\beta }}_0,t)-s({\varvec{\beta }}_0,t)\right) \\&= {\fancyscript{I}}_1(t)-{\fancyscript{I}}_2(t). \end{aligned}$$

Assumption A1.2 implies $\sup _{t\in [0,1]}\small \left| \dfrac{s^{(1)}_{j}({\varvec{\beta }}_0,t)}{s({\varvec{\beta }}_0,t)}\right| $ and $\sup _{t\in [0,1]}\small \left| \dfrac{1}{s({\varvec{\beta }}_0,t)}\right| $ are bounded from above.

Note that $\sup _{t\in [0,1]}\small \left| \dfrac{1}{S_{n}({\varvec{\beta }}_0,t)}\right| $ is bounded from above when

$$\begin{aligned} \sup _{t\in [0,1]}\left| S_{n}({\varvec{\beta }}_0,t)-s({\varvec{\beta }}_0,t)\right| \le \dfrac{C_1u_n}{\sqrt{n}} \end{aligned}$$

and $n$ is sufficiently large. That is, under this condition, there exists constants $c_1> 0,c_2>0$ such that

$$\begin{aligned} |{\fancyscript{I}}_1(t)|\le c_1\left| S^{(1)}_{nj}({\varvec{\beta }}_0,t)-s^{(1)}_{j}({\varvec{\beta }}_0,t) \right| ;\;|{\fancyscript{I}}_2(t)|\le c_2\left| S_{n}({\varvec{\beta }}_0,t)-s({\varvec{\beta }}_0,t)\right| . \end{aligned}$$

Hence,

$$\begin{aligned}&P\left( \sup _{t\in [0,1]}\left| E_{nj}({\varvec{\beta }}_0,t)-e_j({\varvec{\beta }}_0,t)\right| \ge \frac{C_1u_n}{\sqrt{n}}\right) \\&\quad \le P\left( \sup _{t\in [0,1]}\left| E_{nj}({\varvec{\beta }}_0,t)-e_j({\varvec{\beta }}_0,t)\right| \!\ge \! \frac{C_1u_n}{\sqrt{n}},\sup _{t\in [0,1]}\left| S_{n}({\varvec{\beta }},t)-s({\varvec{\beta }},t)\right| \!\le \! \dfrac{C_1u_n}{\sqrt{n}}\right) \\&\quad \quad + P\left( \sup _{t\in [0,1]}\left| S_{n}({\varvec{\beta }},t)-s({\varvec{\beta }},t)\right| \ge \dfrac{C_1u_n}{\sqrt{n}}\right) \\&\quad \le P\left( \sup _{t\in [0,1]}\left| S^{(1)}_{nj}({\varvec{\beta }}_0,t)-s^{(1)}_{j}({\varvec{\beta }}_0,t)\right| \ge \dfrac{C_1u_n}{2c_1\sqrt{n}} \right) \\&\quad \quad +P\left( \sup _{t\in [0,1]}\left| S_{n}({\varvec{\beta }}_0,t)-s({\varvec{\beta }}_0,t)\right| \ge \dfrac{C_1u_n}{2c_2\sqrt{n}} \right) \\&\quad \quad +P\left( \sup _{t\in [0,1]}\left| S_{n}({\varvec{\beta }},t)-s({\varvec{\beta }},t)\right| \ge \dfrac{C_1u_n}{\sqrt{n}}\right) \le \frac{C_0}{u_n}\exp \left( -\frac{u_n^2}{2}\right) . \end{aligned}$$

$\square $

Proof of inequality (6)

By definition, for fixed $i,j$,

$$\begin{aligned}&V_{ij}\left( {\varvec{\beta }}_0,t\right) S_n\left( {\varvec{\beta }}_0,t \right) -v_{ij}\left( {\varvec{\beta }}_0,t\right) s\left( {\varvec{\beta }}_0,t\right) \\&\quad =[S_{nij}^{(2)}\left( {\varvec{\beta }}_0,t\right) -s_{ij}^{(2)} \left( {\varvec{\beta }}_0,t\right) ]-[E_{ni}\left( {\varvec{\beta }}_0,t\right) S_{nj}^{(1)} \left( {\varvec{\beta }}_0,t\right) -e_{i}\left( {\varvec{\beta }}_0,t\right) s_{j}^{(1)} \left( {\varvec{\beta }}_0,t\right) ]\\&\quad =[S_{nij}^{(2)}\left( {\varvec{\beta }}_0,t\right) -s_{ij}^{(2)} \left( {\varvec{\beta }}_0,t\right) ]-[E_{ni}\left( {\varvec{\beta }}_0,t\right) -e_{i} \left( {\varvec{\beta }}_0,t\right) ]S_{nj}^{(1)}\left( {\varvec{\beta }}_0,t\right) \\&\quad -e_{i}\left( {\varvec{\beta }}_0,t\right) [S_{nj}^{(1)}\left( {\varvec{\beta }}_0,t \right) -s_{j}^{(1)}\left( {\varvec{\beta }}_0,t\right) ]. \end{aligned}$$

By following the steps in the proof of inequality (5), we can obtain inequality (6). $\square $

Proof of Theorem 1

Here we decompose the $j$th component of the score function $U({\varvec{\beta }}_0,t)$ defined in Sect. 2 as

$$\begin{aligned} U_j({\varvec{\beta }}_0,t)&= \sum \limits _{i=1}^n\int _0^t\left( {\varvec{z}}_{ij}-e_j \left( {\varvec{\beta }}_0,u\right) \right) \mathrm{d}M_i(u)-\sum \limits _{i=1}^n\int _0^t \left( E_{nj}\left( {\varvec{\beta }}_0,u\right) \right. \nonumber \\&\left. -e_j\left( {\varvec{\beta }}_0,u\right) \right) \mathrm{d}M_i(u)\\&= \xi _{1j}(t)-\xi _{2j}(t). \end{aligned}$$

To avoid confusion, let $\xi _j=\xi _j(1),\;\xi _{1j}=\xi _{1j}(1),\;\xi _{2j}=\xi _{2j}(1)$. For any fixed $s\in {\fancyscript{A}}_0$, note that for any $j\in s,\,E_{nj}\left( {\varvec{\beta }}_0,u\right) =E_{nj}\left( {\varvec{\beta }}_0(s),u\right) ,\; e_j\left( {\varvec{\beta }}_0,u\right) =e_j\left( {\varvec{\beta }}_0(s),u\right) $, for any unit vector ${\varvec{u}}$, let ${\varvec{a}}={\varvec{u}}^{\tau }\varSigma ^{-1/2}({\varvec{\beta }}_0(s),1)$. Then

$$\begin{aligned} {\varvec{u}}^{\tau }\varSigma ^{-1/2}\left( {\varvec{\beta }}_0(s),1\right) U\left( {\varvec{\beta }}_0(s),1 \right) =\sum _{j\in s}{\varvec{a}}_j\xi _{1j}-\sum _{j\in s}{\varvec{a}}_j\xi _{2j}. \end{aligned}$$

Also, from the remark on Assumption A2, we have ${\varvec{var}}(\sum \limits _{j\in s}{\varvec{a}}_j\xi _{1j}/\sqrt{n})=1$ and $\Vert {\varvec{a}}\Vert _2^2\le \lambda ^{-1}_{\min }\left( \varSigma ({\varvec{\beta }}_0(s),1)\right) $. Let $u_n$ satisfy $n^{-1/6}u_n\rightarrow 0, u_n(\ln n)^{-1/2}\rightarrow +\infty $ as $n\rightarrow +\infty $; note that for any positive constant $c\in (0,1)$ independent of $n$,

$$\begin{aligned} P(|\sum _{j\in s}{\varvec{a}}_j\xi _{1j}-\sum _{j\in s}{\varvec{a}}_j\xi _{2j}|&> \sqrt{n}u_n) \le P\left( |\sum _{j\in s}{\varvec{a}}_j\xi _{1j}|>c\sqrt{n}u_n\right) \\&+P\left( |\sum _{j\in s}{\varvec{a}}_j\xi _{2j}|>(1-c)\sqrt{n}u_n\right) , \end{aligned}$$

the large deviation result of $\sum _{j\in s}{\varvec{a}}_j\xi _{1j}$ is already given in the remark on Assumption A2, that is, there exists a constant $C_0$ such that

$$\begin{aligned} P\left( |\sum _{j\in s}{\varvec{a}}_j\xi _{1j}|>c\sqrt{n}u_n\right) \le C_0\exp \left( -\frac{c^2u_n^2}{2}-\ln u_n\right) . \end{aligned}$$

(8)

Now it suffices to show the large deviation of $\sum _{j\in s}{\varvec{a}}_j\xi _{2j}$. Let $C_1$ be a positive constant, denote

$$\begin{aligned} {\fancyscript{C}}&= \left\{ \Vert \sup _{u\in [0,1]}[E_n\left( {\varvec{\beta }}_0,u\right) -e\left( {\varvec{\beta }}_0,u\right) ]\Vert _{+\infty }\le \dfrac{C_1u_n}{\sqrt{n}},\;\sup _{u\in [0,1]}|S_n({\varvec{\beta }}_0,u)\right. \nonumber \\&\left. -s({\varvec{\beta }}_0,u)| \le \dfrac{C_1u_n}{\sqrt{n}}\right\} , \end{aligned}$$

then

$$\begin{aligned}&P\left( \left| \sum _{j\in s}{\varvec{a}}_j\xi _{2j}\right| >(1-c)\sqrt{n}u_n\right) \\&\quad \le P\left( \Vert \sup _{u\in [0,1]}[E_n\left( {\varvec{\beta }}_0,u\right) -e\left( {\varvec{\beta }}_0,u\right) ]\Vert _{+\infty }\ge \dfrac{C_1u_n}{\sqrt{n}}\right) \\&\quad \quad +P\left( \sup _{u\in [0,1]}|S_n({\varvec{\beta }}_0,u)-s({\varvec{\beta }}_0,u)|\ge \dfrac{C_1u_n}{\sqrt{n}}\right) \\&\quad \quad +P\left( \left| \sum _{j\in s}{\varvec{a}}_j\xi _{2j}\right| >(1-c)\sqrt{n}u_n\mid {\fancyscript{C}}\right) \\&\quad \equiv P_{2,1}+P_{2,2,1}+P_{2,2,2}. \end{aligned}$$

Inequality (5) and the remark on Assumption A1 demonstrate that there exists positive constant $C_0$ such that

$$\begin{aligned} P_{2,1}\le C_0\exp \left( -\dfrac{u_n^2}{2}+\kappa \ln n-\ln u_n\right) \;;\;P_{2,2,1}\le C_0\exp \left( -\dfrac{u_n^2}{2}-\ln u_n\right) . \end{aligned}$$

(9)

In the following, we verify that condition on ${\fancyscript{C}}$, the new martingale $\sum _{j\in s}{\varvec{a}}_j\xi _{2j}(t)$ has bounded jumps by following the steps in the proof of Theorem 3.1 in ?. Let $\bar{M}(t)=\sum \limits _{i=1}^nM_i(t),\bar{N}(t)=\sum \limits _{i=1}^nN_i(t)$, then $|\triangle (\bar{M}(t))|=|\triangle (\bar{N}(t))|\le 1$.

First,

$$\begin{aligned} \left| \triangle \left( n^{-1/2}\xi _{2j}(t)\right) \right| \!\le \! n^{-1/2}\Vert \sup _{u\in [0,1]}[E_n\left( {\varvec{\beta }}_0,u\right) -e\left( {\varvec{\beta }}_0,u\right) ]\Vert _{+\infty } \!\equiv \! n^{-1/2}c_n \!\le \! \dfrac{C_1u_n}{n}; \end{aligned}$$

therefore,

$$\begin{aligned} \left| \triangle \left( n^{-1/2}\sum _{j\in s}{\varvec{a}}_j\xi _{2j}(t)\right) \right| \le \sum _{j\in s}|{\varvec{a}}_j|\left| \triangle \left( n^{-1/2}\xi _{2j}(t)\right) \right| \le \dfrac{|s|C_1u_n}{n}. \end{aligned}$$

(10)

Second, the predictable quadratic variation of $n^{-1/2}\xi _{2j}(t)$, denoted by $\left<n^{-1/2}\xi _{2j}(t)\right>$ is bilinear and for all $j\in \{1,2,\ldots ,p_n\}$,

$$\begin{aligned} \left<n^{-1/2}\xi _{2j}(t)\right>&= n^{-1}\int _0^t\left( E_{nj} \left( {\varvec{\beta }}_0,u\right) -e_j\left( {\varvec{\beta }}_0),u\right) \right) ^2\mathrm{d} \left<\bar{M}(u)\right>\\&= \int _0^t\left( E_{nj}\left( {\varvec{\beta }}_0,u\right) -e_j\left( {\varvec{\beta }}_0,u\right) \right) ^2S_n({\varvec{\beta }}_0,u)h_0(u)\mathrm{d}u\\&\le \Vert \sup _{u\in [0,1]}[E_n\left( {\varvec{\beta }}_0,u\right) -e\left( {\varvec{\beta }}_0,u \right) ]\Vert _{+\infty }^2\int _0^tS_n({\varvec{\beta }}_0,u)h_0(u)\mathrm{d}u\\&\equiv b_n^2(t).\\ \left<n^{-1/2}\sum _{j\in s}{\varvec{a}}_j\xi _{2j}(t)\right>&\le |s|\sum _{j\in s}{\varvec{a}}_j^2\left<n^{-1/2}\xi _{2j}(t)\right>\le |s|^2b_n^2(t). \end{aligned}$$

Obviously, $b_n^2(t)\le b_n^2(1)\le c_n^2\int _0^1S_n({\varvec{\beta }}_0,u)h_0(u)\mathrm{d}u$. Note that

$$\begin{aligned} \int _0^1S_n({\varvec{\beta }}_0,u)h_0(u)\mathrm{d}u&\le \int _0^1s({\varvec{\beta }}_0,u)h_0(u)\mathrm{d}u\nonumber \\&+ \sup _{u\in [0,1]}|S_n({\varvec{\beta }}_0,u)-s({\varvec{\beta }}_0,u)|\int _0^1h_0(u)\mathrm{d}u. \end{aligned}$$

Assumption A1.2 and Eq. (10) imply that

$$\begin{aligned} \sup _{t\in [0,1]}b_n^2(t)\le c_n^2\left( C_1+C_2\dfrac{C_1u_n}{\sqrt{n}}\right) \le C\frac{u_n^2}{n}. \end{aligned}$$

That is, when $|s|=O(1)$, condition on ${\fancyscript{C}}$, there exists constants $b^2=O(\frac{u_n^2}{n}),\;K=O(\frac{u_n}{n})$ such that

$$\begin{aligned} \left| \triangle \left( n^{-1/2}\sum _{j\in s}{\varvec{a}}_j\xi _{2j}(t)\right) \right| \le K;\;\left<n^{-1/2}\sum _{j\in s}{\varvec{a}}_j\xi _{2j}(t)\right>\le b^2. \end{aligned}$$

According to Lemma 2.1 in Van de Geer (1995), we have

$$\begin{aligned} P_{2,2,2}&\le 2\exp \left( -\dfrac{(1-c)^2u_n^2}{2(K(1-c)u_n+b^2)}\right) \nonumber \\&= 2\exp \left( -\dfrac{u_n^2}{2(K(1-c)^{-1}u_n+(1-c)^{-2}b^2)}\right) , \end{aligned}$$

since $u_n^2/n\rightarrow 0$, when $n$ is sufficiently large, there exists an arbitrarily large positive constant $M$ such that

$$\begin{aligned} P_{2,2,2}\le 2\exp (-Mu_n^2). \end{aligned}$$

Hence, together with (8) and (9), because of the arbitrariness of $c$, we know that there exists positive constants $c_0$ independent of $j$ and an arbitrarily small positive $\varepsilon $ such that

$$\begin{aligned} P\left( \left| {\varvec{u}}^{\tau }\varSigma ^{-1/2}\left( {\varvec{\beta }}_0(s),1\right) U \left( {\varvec{\beta }}_0(s),1\right) \right| >\sqrt{n}u_n\right) \le c_0\exp \left( -\dfrac{(1-\varepsilon )u_n^2}{2}\right) . \end{aligned}$$

When ${\varvec{a}}_j=1$ and 0 otherwise, we have

$$\begin{aligned} P\left( |U_j\left( {\varvec{\beta }}_0,1\right) |>\sqrt{n}u_n\right) \le c_0\exp \left( -\dfrac{(1-\varepsilon )u_n^2}{2}\right) \end{aligned}$$

over $j\in \{1,2,\ldots ,p_n\}$. $\square $

Proof of Theorem 2

For any unit vector ${\varvec{w}}(s)$, let ${\varvec{\beta }}(s)={\varvec{\beta }}_0(s)+\psi _n{\varvec{w}}(s)$ where $\psi _n$ satisfies (4). Under Assumption A3, for all $s\in {\fancyscript{A}}_0$, the mean value theorem implies that there exists $\tilde{{\varvec{\beta }}}(s)$ satisfying $\Vert \tilde{{\varvec{\beta }}}(s)-{\varvec{\beta }}_0(s)\Vert _2\le \Vert \psi _n{\varvec{w}}(s)\Vert _2$ such that

$$\begin{aligned} l_n({\varvec{\beta }}(s))-l_n({\varvec{\beta }}_0(s))&= \psi _n{\varvec{w}}^{\tau }(s)U({\varvec{\beta }}_0(s),1)- \dfrac{1}{2}\psi _n^2{\varvec{w}}(s)^{\tau }\{I(\tilde{{\varvec{\beta }}}(s),1)\}{\varvec{w}}(s)\\&\le \psi _n{\varvec{w}}^{\tau }(s)U({\varvec{\beta }}_0(s),1)-\frac{1-\varepsilon }{2}\lambda _{1,n} \psi _n^2\\&\le \psi _n\sqrt{{\varvec{w}}^{\tau }(s){\varvec{w}}(s)}\sqrt{U^{\tau }({\varvec{\beta }}_0(s),1)U({\varvec{\beta }}_0(s),1)}-\frac{1-\varepsilon }{2}\lambda _{1,n}\psi _n^2\\&\le \psi _n\sqrt{k_n}\max _{j\in s,s\in {\fancyscript{A}}_0}\left| U_j({\varvec{\beta }}_0(s),1)\right| -\frac{1-\varepsilon }{2}\lambda _{1,n}\psi _n^2. \end{aligned}$$

Hence, we have

$$\begin{aligned}&P(l_n({\varvec{\beta }}(s))-l_n({\varvec{\beta }}_0(s)>0:\;\; \text {for some}\;\;{\varvec{w}}(s))\nonumber \\&\quad \le P\left( \max _{j\in s,s\in {\fancyscript{A}}_0}\left| U_j({\varvec{\beta }}_0(s),1)\right| \ge \frac{1-\varepsilon }{2\sqrt{k_n}}\lambda _{1,n}\psi _n\right) . \end{aligned}$$

By noting that $k_n=O(1),p_n=O(n^{\kappa })$ and letting $u_n=\frac{1-\varepsilon }{2\sqrt{nk_n}}\lambda _{1,n}\psi _n$, $n^{-1/6}u_n\rightarrow 0, u_n(\ln n)^{-1/2}\rightarrow +\infty $. According to (2), it follows that

$$\begin{aligned}&P\left( \max _{j\in s,s\in {\fancyscript{A}}_0}\left| U_j({\varvec{\beta }}_0(s),1)\right| \ge \frac{1-\varepsilon }{2\sqrt{k_n}}\lambda _{1,n}\psi _n\right) \nonumber \\&\quad \le \sum \limits _{j\in s,s\in {\fancyscript{A}}_0}P\left( \left| U_j({\varvec{\beta }}_0(s),1)\right| \ge \frac{1-\varepsilon }{2\sqrt{k_n}}\lambda _{1,n}\psi _n\right) \\&\quad \le k_np_n^{k_n} C_0\exp \left( -C_1\frac{\lambda _{1,n}^2\psi _n^2}{n}\right) \\&\quad \le \tilde{C}_0\exp \left( -C_1\frac{\lambda _{1,n}^2\psi _n^2}{n}+C_2\kappa \ln n\right) \end{aligned}$$

for some positive constants $C_0,C_1,C_2,\tilde{C}_0$. It converges to 0 as $n$ goes to infinity. Because $l_n\left( {\varvec{\beta }}(s)\right) $ is a concave function for any ${\varvec{\beta }}(s)$, we get the desired result. $\square $

Proof of Theorem 3

Note that $\{s:s\ne s_0, |s|\le Cp_0\}={\fancyscript{A}}_1 \cup {\fancyscript{A}}_0$, if we can prove that when $\gamma > 1- \frac{1}{2\kappa }$, as $n\rightarrow +\infty $,

$$\begin{aligned} P\left( \min _{s: s \in {\fancyscript{A}}_1}\mathrm{EBIC}_{\gamma }(s)\le \mathrm{EBIC}_{\gamma }(s_{0})\right) \rightarrow 0, \end{aligned}$$

(11)

and

$$\begin{aligned} P\left( \min _{s: s \in {\fancyscript{A}}_0}\mathrm{EBIC}_{\gamma }(s)\le \mathrm{EBIC}_{\gamma }(s_{0})\right) \rightarrow 0, \end{aligned}$$

(12)

then we will have completed the proof. Since asymptotically, $\ln \tau ({\fancyscript{S}}_j)=j\kappa \ln n(1+o(1))$,

$$\begin{aligned} \text{ EBIC }_{\gamma }(s_{0n})-\text{ EBIC }_{\gamma }(s) \!=\! 2 \left( l_n(\hat{{\varvec{\beta }}}(s))-l_n(\hat{{\varvec{\beta }}}(s_{0n}))\right) \!+\! (1+2\gamma \kappa )\left( |s_{0n}|-|s|\right) \ln n, \end{aligned}$$

$\text{ EBIC }_{\gamma }(s)\le \text{ EBIC }_{\gamma }(s_{0n})$ implies

$$\begin{aligned} l_n (\hat{{\varvec{\beta }}}(s))-l_n (\hat{{\varvec{\beta }}}(s_{0n}))\ge -\dfrac{1+2\gamma \kappa }{2}\left( |s_{0n}|-|s|\right) \ln n. \end{aligned}$$

(1)
When $s\in {\fancyscript{A}}_1$, note that
$$\begin{aligned} -\dfrac{1+2\gamma \kappa }{2}\left( |s_{0n}|-|s|\right) \ln n\ge -\dfrac{1+2\gamma \kappa }{2}|s_{0n}|\ln n\ge -C\ln n \end{aligned}$$
for some positive constant $C$ when $-\dfrac{1}{2\kappa }<\gamma \le 1 $ and $\kappa $ is a positive constant. Therefore, if we can show that
$$\begin{aligned} P(\sup \{l_n (\hat{{\varvec{\beta }}}(s))-l_n (\hat{{\varvec{\beta }}}(s_{0n})): s\in {\fancyscript{A}}_1\}\ge -C\ln n)\rightarrow 0, \end{aligned}$$
(13)
then we will have (11). Now, consider $\tilde{s}=s\cup s_{0n}$ and ${\varvec{\beta }}(\tilde{s})$ near ${\varvec{\beta }}_0(\tilde{s})$. Taylor expansion shows that
$$\begin{aligned} l_n\left( {\varvec{\beta }}(\tilde{s})\right) -l_n\left( {\varvec{\beta }}_0(\tilde{s})\right)&\le \left( {\varvec{\beta }}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s}) \right) ^{\tau }U({\varvec{\beta }}_0(s))\nonumber \\&-\dfrac{(1-\varepsilon )\lambda _{1,n}}{2} \left\| {\varvec{\beta }}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s})\right\| _2^2. \end{aligned}$$
Let $\breve{{\varvec{\beta }}}(\tilde{s})$ be augmented $\hat{{\varvec{\beta }}}(s)$ with components in $\tilde{s}\cap s^c$ being 0, then $l_n\left( \hat{{\varvec{\beta }}}(s)\right) =l_n\left( \breve{{\varvec{\beta }}}(\tilde{s})\right) $ and $\Vert \breve{{\varvec{\beta }}}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s})\Vert _2\ge |{\varvec{\beta }}_{0,\min }|$, where $|{\varvec{\beta }}_{0,\min }|=\min \left\{ |{\varvec{\beta }}_{0,j}|:j\in s_{0n}\right\} $. The concavity of $l_n\left( {\varvec{\beta }}(s)\right) $ implies
$$\begin{aligned} {\fancyscript{M}}_n&= \sup \left\{ l_n\left( {\varvec{\beta }}(\tilde{s})\right) -l_n \left( {\varvec{\beta }}_0(\tilde{s})\right) :s\in {\fancyscript{A}}_1, \Vert {\varvec{\beta }}(\tilde{s})- {\varvec{\beta }}_0(\tilde{s})\Vert _2\ge |{\varvec{\beta }}_{0,\min }|\right\} \\&\le \sup \left\{ l_n\left( {\varvec{\beta }}(\tilde{s})\right) -l_n \left( {\varvec{\beta }}_0(\tilde{s})\right) :s\in {\fancyscript{A}}_1, \Vert {\varvec{\beta }}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s})\Vert _2=|{\varvec{\beta }}_{0,\min }|\right\} . \end{aligned}$$
Since for any fixed $\tilde{s}$, when $\Vert {\varvec{\beta }}(\tilde{s})-{\varvec{\beta }}_0(\tilde{s})\Vert _2=|{\varvec{\beta }}_{0,\min }|$,
$$\begin{aligned} l_n\left( {\varvec{\beta }}(\tilde{s})\right) -l_n\left( {\varvec{\beta }}_0(\tilde{s})\right) \le |{\varvec{\beta }}_{0,\min }|\Vert U_j({\varvec{\beta }}_0(\tilde{s}))\Vert _{+\infty }-{\varvec{\beta }}_{0,\min }^2\dfrac{(1-\varepsilon )\lambda _{1,n}}{2}. \end{aligned}$$
Therefore,
$$\begin{aligned} P\left( {\fancyscript{M}}_n\ge -{\varvec{\beta }}_{0,\min }^2\dfrac{(1-\varepsilon )\lambda _{1,n}}{4}\right)&\le k_np_n^{k_n}P(\Vert U_j({\varvec{\beta }}_0(\tilde{s}))\Vert _{+\infty }\nonumber \\&\ge \dfrac{|{\varvec{\beta }}_{0,\min }|(1-\varepsilon )\lambda _{1,n}}{4}). \end{aligned}$$
When $n^{1/6-\delta }=O(\lambda _{1,n}/\sqrt{n})$ for some $0<\delta <1/6$.
$$\begin{aligned}&P(\sup \{l_n (\hat{{\varvec{\beta }}}(s))-l_n (\hat{{\varvec{\beta }}}(s_{0n})): s\in {\fancyscript{A}}_1\}\ge -C\ln n)\\&\quad \le P\left( {\fancyscript{M}}_n\ge -C\ln n\right) \le P\left( {\fancyscript{M}}_n\ge -{\varvec{\beta }}_{0,\min }^2\dfrac{(1-\varepsilon )\lambda _{1,n}}{4}\right) \\&\quad \le k_np_n^{k_n}P(\Vert U_j({\varvec{\beta }}_0(\tilde{s}))\Vert _{+\infty }\ge \sqrt{n}n^{1/6-\delta })\le c_0\exp \left( -c_1n^{1/3-2\delta }+\kappa \ln n\right) . \end{aligned}$$
It converges to 0 when $n$ goes to $\infty $; inequality (13) is thus obtained.
(2)
When $s\in {\fancyscript{A}}_0$ and $s\ne s_{0n}$, let $m=|s|-|s_{0n}|,\text{ EBIC }_{\gamma }(s)\le \text{ EBIC }_{\gamma }(s_{0n})$ if and only if
$$\begin{aligned} l_n(\hat{{\varvec{\beta }}}(s))-l_n(\hat{{\varvec{\beta }}}(s_{0n}))\ge m[0.5\ln n+\gamma \ln p_n] \approx \dfrac{m(1+2\gamma \kappa )\ln n}{2}. \end{aligned}$$
From the assumptions, we can see that
$$\begin{aligned} l_n(\hat{{\varvec{\beta }}}(s))-l_n(\hat{{\varvec{\beta }}}(s_{0n}))\!&\le \! l_n(\hat{{\varvec{\beta }}}(s))-l_n({\varvec{\beta }}(s_{0n}))=l_n\left( \hat{{\varvec{\beta }}}(s)\right) -l_n\left( {\varvec{\beta }}_0(s)\right) \\ \!&\le \!\left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) ^{\tau }U\left( {\varvec{\beta }}_0(s),1\right) \\&-\dfrac{1}{2}\left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) ^{\tau }I\left( \widetilde{{\varvec{\beta }}}(s),1\right) \left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) \\ \!&\le \!\left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) ^{\tau }U\left( {\varvec{\beta }}_0(s),1\right) \\&-\dfrac{1-\varepsilon }{2}\left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) ^{\tau }I\left( {\varvec{\beta }}_0(s),1\right) \left( \hat{{\varvec{\beta }}}(s)-{\varvec{\beta }}(s_{0n})\right) \\ \!&\le \! \max _{{\varvec{\beta }}}\left[ {\varvec{\beta }}^{\tau }U\left( {\varvec{\beta }}_0(s),1\right) -\dfrac{1-\varepsilon }{2}{\varvec{\beta }}^{\tau }I\left( {\varvec{\beta }}_0(s),1\right) {\varvec{\beta }}\right] \\ \!&\le \! \left[ {\varvec{\beta }}^{\tau }U\left( {\varvec{\beta }}_0(s),1\right) \right] |_{{\varvec{\beta }}=[(1-\varepsilon )I\left( {\varvec{\beta }}_0(s),1\right) ]^{-1}U\left( {\varvec{\beta }}_0(s),1\right) }\\ \!&= \! \frac{1}{2n(1-\varepsilon )}U^{\tau }\left( {\varvec{\beta }}_0(s),1\right) \left[ \frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}\right] ^{-1}U\left( {\varvec{\beta }}_0(s),1\right) , \end{aligned}$$
where $\varepsilon $ is an arbitrary positive value. Note that $m$ is finite; therefore, if we can show that for any fixed positive integer $m$, when $\gamma >1-\frac{1}{2\kappa }$,
$$\begin{aligned}&P\left( \max \limits _{s\in {\fancyscript{A}}_0, |s|=m+|s_{0n}|}\frac{1}{2n(1-\varepsilon )}U^{\tau }\left( {\varvec{\beta }}_0(s),1 \right) \left[ \frac{I\left( {\varvec{\beta }}_0(s),1 \right) }{n}\right] ^{-1}U\left( {\varvec{\beta }}_0(s),1\right) \right. \nonumber \\&\left. \quad \ge \dfrac{m(1+2\gamma \kappa )\ln n}{2}\right) \rightarrow 0, \end{aligned}$$
(14)
then we will have (12). Denote
$$\begin{aligned} {\fancyscript{T}}_1&= \left\{ \max _{s\in {\fancyscript{A}}_0}\Vert [\frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}]^{-1}- \varSigma ^{-1}\left( {\varvec{\beta }}_0(s),1\right) \Vert _{+\infty }\le \dfrac{C_1u_n}{\sqrt{n}}\right\} \\ {\fancyscript{T}}_2&= \left\{ \max _{s\in {\fancyscript{A}}_0}\frac{U^{\tau }\left( {\varvec{\beta }}_0(s),1\right) U \left( {\varvec{\beta }}_0(s),1\right) }{|s|}\le nu_n^2\right\} . \end{aligned}$$
Inequalities (6) and (2) show that
$$\begin{aligned} P\left( {\fancyscript{T}}_1^c\right)&\le \frac{C_0}{u_n}\exp \left( -\frac{u_n^2}{2}+2\kappa \ln n\right) ;\;P\left( {\fancyscript{T}}_2^c\right) \nonumber \\&\le c_0\exp \left( -\frac{(1-\varepsilon )u_n^2}{2}+\kappa \ln n\right) . \end{aligned}$$
(15)
Therefore, we have
$$\begin{aligned}&P\left( \max \limits _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}\right. \left. \frac{1}{2n(1-\varepsilon )}U^{\tau }\left( {\varvec{\beta }}_0(s),1\right) \left[ \frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}\right] ^{-1}U \left( {\varvec{\beta }}_0(s),1\right) \right. \\&\quad \left. \ge \dfrac{m(1+2\gamma \kappa )\ln n}{2}\right) \\&\quad \le P\left( \max _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}U^{\tau }\left( {\varvec{\beta }}_0(s),1 \right) \left[ \frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}\right] ^{-1}U \left( {\varvec{\beta }}_0(s),1\right) \right. \\&\quad \left. \ge mn(1-\varepsilon )(1+2\gamma \kappa )\ln n \mid {\fancyscript{T}}_1,{\fancyscript{T}}_2\right) \\&\quad \quad + P\left( {\fancyscript{T}}_1^c\right) + P\left( {\fancyscript{T}}_2^c\right) . \end{aligned}$$
Since under ${\fancyscript{T}}_1,{\fancyscript{T}}_2$,
$$\begin{aligned}&\max _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}\left[ U^{\tau }\left( {\varvec{\beta }}_0(s),1\right) \left| [\frac{I\left( {\varvec{\beta }}_0(s),1\right) }{n}]^{-1}-\varSigma ^{-1}\left( {\varvec{\beta }}_0(s),1\right) \right| U\left( {\varvec{\beta }}_0(s),1\right) \right] \\&\quad \le C\sqrt{n}u_n^3=C\frac{(n^{-1/6}u_n)^3}{\ln n}(n\ln n)=o(n\ln n), \end{aligned}$$
the two terms in (15) both converge to 0 as $n$ goes to $+\infty $ and
$$\begin{aligned}&P\left( \max _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}U^{\tau }\left( {\varvec{\beta }}_0(s),1 \right) \varSigma ^{-1}\left( {\varvec{\beta }}_0(s),1\right) U\left( {\varvec{\beta }}_0(s),1 \right) \right. \\&\quad \left. \ge mn(1-\varepsilon )(1+2\gamma \kappa )\ln n \mid {\fancyscript{T}}_1,{\fancyscript{T}}_2\right) \\&\quad \le CP\left( \max _{s\in {\fancyscript{A}}_0,|s|=m+|s_{0n}|}{\varvec{u}}^{\tau }\varSigma ^{-1/2}\left( {\varvec{\beta }}_0(s),1 \right) U\left( {\varvec{\beta }}_0(s),1\right) \right. \\&\quad \left. \ge (1-\delta )\sqrt{mn(1-\varepsilon )(1+2\gamma \kappa )\ln n} \mid {\fancyscript{T}}_1,{\fancyscript{T}}_2\right) , \end{aligned}$$
where $\Vert {{\varvec{u}}}\Vert _2=1,\;\delta $ is an arbitrary positive value. According to (3), it can be further bounded by $c_0^{\star }\exp \left[ -\dfrac{1-\varepsilon ^{\star }}{2}(1+2\gamma \kappa )m\ln n+m\kappa \ln n\right] $ where $c_0^{\star }$ is a positive constant. It converges to 0 when $\gamma >\frac{1}{1-\varepsilon ^{\star }}-\frac{1}{2\kappa }$, where $\varepsilon ^{\star }$ is an arbitrary positive value; inequality (14) is thus obtained.

$\square $

About this article

Cite this article

Luo, S., Xu, J. & Chen, Z. Extended Bayesian information criterion in the Cox model with a high-dimensional feature space. Ann Inst Stat Math 67, 287–311 (2015). https://doi.org/10.1007/s10463-014-0448-y

Download citation

Received: 25 June 2013
Revised: 02 November 2013
Published: 06 March 2014
Issue Date: April 2015
DOI: https://doi.org/10.1007/s10463-014-0448-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Extended Bayesian information criterion in the Cox model with a high-dimensional feature space

Abstract

Access this article

Similar content being viewed by others

A sequential feature selection procedure for high-dimensional Cox proportional hazards model

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appenidx A: remarks on the assumptions

1.1 Remark on assumption A1

1.2 Remark on assumption A2

1.3 Remark on assumption A3

Appenidx B: proofs of the main results

Proof of inequality (5)

Proof of inequality (6)

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

About this article

Cite this article

Keywords

Navigation

Extended Bayesian information criterion in the Cox model with a high-dimensional feature space

Abstract

Access this article

Similar content being viewed by others

A sequential feature selection procedure for high-dimensional Cox proportional hazards model

Bayesian variable selection with sparse and correlation priors for high-dimensional data analysis

A hybrid deterministic–deterministic approach for high-dimensional Bayesian variable selection with a default prior

References

Author information

Authors and Affiliations

Corresponding author

Appendices

Appenidx A: remarks on the assumptions

1.1 Remark on assumption A1

1.2 Remark on assumption A2

1.3 Remark on assumption A3

Appenidx B: proofs of the main results

Proof of inequality (5)

Proof of inequality (6)

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

About this article

Cite this article

Share this article

Keywords

Search

Navigation