Model pursuit and variable selection in the additive accelerated failure time model

Liu, Li; Wang, Hao; Liu, Yanyan; Huang, Jian

doi:10.1007/s00362-020-01205-0

Model pursuit and variable selection in the additive accelerated failure time model

Regular Article
Published: 12 October 2020

Volume 62, pages 2627–2659, (2021)
Cite this article

Statistical Papers Aims and scope Submit manuscript

470 Accesses
2 Citations
Explore all metrics

Abstract

In this paper, we propose a new semiparametric method to simultaneously select important variables, identify the model structure and estimate covariate effects in the additive AFT model, for which the dimension of covariates is allowed to increase with sample size. Instead of directly approximating the non-parametric effects as in most existing studies, we take a linear effect out to weak the condition required for model identifiability. To compute the proposed estimates numerically, we use an alternating direction method of multipliers algorithm so that it can be implemented easily and achieve fast convergence rate. Our method is proved to be selection consistent and possess an asymptotic oracle property. The performance of the proposed methods is illustrated through simulations and the real data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Guide for Sparse PCA: Model Comparison and Applications

Article Open access 29 June 2021

Maximum likelihood estimation of the Weibull distribution with reduced bias

Article Open access 17 April 2023

Joint models with multiple longitudinal outcomes and a time-to-event outcome: a corrected two-stage approach

Article Open access 04 March 2020

References

Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, pp 267–281
Antoniadis A, Gijbels I, Lambert-Lacroix S (2014) Penalized estimation in additive varying coefficient models using grouped regularization. Stat Pap 55:727–750
MathSciNet MATH Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3:1–122
MATH Google Scholar
Buckley J, James I (1979) Linear regression with censored data. Biometrika 66:429–436
MATH Google Scholar
Candes E, Tao T (2007) The Dantzig selector: statsitical estimation when $p$ is much larger than $n$. Ann Stat 35:2313–2351
MATH Google Scholar
Cao Y, Huang J, Liu Y, Zhao X (2016) Sieve estimation of Cox models with latent structures. Biometrics 72:1086–1097
MathSciNet MATH Google Scholar
Chen K, Shen J, Ying Z (2005) Rank estimation in partial linear model with censored data. Stat Sin 15(3):767–779
MathSciNet MATH Google Scholar
Chen S, Zhou Y, Ji Y (2018) Nonparametric identification and estimation of sample selection models under symmetry. J Econom 202(2):148–160
MathSciNet MATH Google Scholar
Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403
MathSciNet MATH Google Scholar
de Boor C (1978) A practical guide to splines. Applied Mathematical Sciences, vol 27, no 149. Springer, New York, pp 157
Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York
MATH Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
MathSciNet MATH Google Scholar
Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30:74–99
MathSciNet MATH Google Scholar
Huang J (1999) Efficient estimation of the partly linear additive Cox model. Ann Stat 27:1536–1563
MathSciNet MATH Google Scholar
Huang J, Ma S (2010) Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Anal 16:176–195
MathSciNet MATH Google Scholar
Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313
MathSciNet MATH Google Scholar
Huang J, Wei F, Ma S (2012) Semiparametic regression pursuit. Stat Sin 22:1403–1426
MATH Google Scholar
Joseph A (2013) Variable selection in high-dimension with random designs and orthogonal matching pursuit. J Mach Learn Res 14:1771–1800
MathSciNet MATH Google Scholar
Kim J, Pollard DB (1990) Cube root asymptotics. Ann Stat 18:191–219
MathSciNet MATH Google Scholar
Lam C, Fan J (2009) Sparsitency and rates of convergence on large covariance matrix estimation. Ann Stat 37:4254–4278
MATH Google Scholar
Leng C, Ma S (2007) Accelerated failure time models with nonlinear covariates effects. Aust N Z J Stat 49:155–172
MathSciNet MATH Google Scholar
Lian H, Lai P, Liang H (2013) Partially linear structure selection in Cox models with varying coefficients. Biometrics 69:348–357
MathSciNet MATH Google Scholar
Liu Y, Zhang J, Zhao X (2018) A new nonparametric screening method for ultrahigh-dimensional survival data. Comput Stat Data Anal 119:74–85
MathSciNet MATH Google Scholar
Ma S, Du P (2012) Variable selection in partly linear regression model with diverging dimensions for right censored data. Stat Sin 22:1003–1020
MathSciNet MATH Google Scholar
Ma S, Kosorok MR, Fine JP (2006) Additive risk models for survival data with high-dimensional covariates. Biometrics 62:202–210
MathSciNet MATH Google Scholar
Newey WK (1994) The asymptotic variance of semiparametric estimators. Econometrica 62:1349–1382
MathSciNet MATH Google Scholar
Neykov NM, Filzmoser P, Neytchev PN (2014) Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator. Stat Pap 55:187–207
MathSciNet MATH Google Scholar
Robert J, Gray (1992) Flexible methods for analyzing survival data using splines with applications to breast cancer prognosis. J Am Stat Assoc 8:942–951
Google Scholar
Rosenwald A et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346:1937–1947
Google Scholar
Schumaker L (1981) Spline functions: basic theory. Wiley, New York
MATH Google Scholar
Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
MathSciNet Google Scholar
Stone C (1986) The dimensionality reduction principle for generalized additive models. Ann Stat 14:590–606
MathSciNet MATH Google Scholar
Stute W (1993) Consistent estimation under random censorship when covariables are available. J Multivar Anal 45:89–103
MATH Google Scholar
Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23:461–471
MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288
MathSciNet MATH Google Scholar
Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16:385–395
Google Scholar
van der Vaart A, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York
MATH Google Scholar
Wang K, Lin L (2019) Robust and efficient estimator for simultaneous model structure indentification and variable selection in generalized partial linear varying coefficient models with longitudinal data. Stat Pap 60:1649–1676
MATH Google Scholar
Wang S, Nan B, Zhu J, David GB (2008) Doubly penalized Buckley-James method for survival data with high-dimensional covariates. Biometrics 64:132–140
MathSciNet MATH Google Scholar
Wei LJ (1992) The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med 11:1871–1879
Google Scholar
Wu Y, Stefanski LA (2015) Automatic structure recovery for additive models. Biometrika 102:381–395
MathSciNet MATH Google Scholar
Zeng D, Lin D (2007) Efficient estimation for the accelerated failure time model. J Am Stat Assoc 102:1387–1396
MathSciNet MATH Google Scholar
Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
MathSciNet MATH Google Scholar
Zhang HH, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94:691–703
MathSciNet MATH Google Scholar
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
MathSciNet MATH Google Scholar
Zhang J, Yin G, Liu Y, Wu Y (2018) Censored cumulative residual independent screening for ultrahigh-dimensional survival data. Lifetime Data Anal 24:273–292
MathSciNet MATH Google Scholar
Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the referees, the associate editor and the editor for their constructive and insightful comments and suggestions that greatly improved the paper. This research was partially supported by the National Nature Science Foundation of China (Nos. 11971362, 11571263 and 11771366). The work of J. Huang is supported in part by the NSF grant DMS-1916199.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Wuhan University, Wuhan, China
Li Liu, Hao Wang & Yanyan Liu
Department of Statistics and Actuarial Science, University of Iowa, Iowa City, USA
Jian Huang

Authors

Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Hao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yanyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jian Huang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yanyan Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Proof of Proposition 1

First, the fact that $\phi _j(x)=\beta _j+\tilde{\phi }_j(x)$ with $\beta _j=\int _{\alpha _1}^{\alpha _2} \phi _j(x)dx$ and $\tilde{\phi }_j(x)=\phi _j(x)-\beta _j$ for $j=1,\ldots , p$ implies that the decomposition (4) holds. To show the uniqueness of the decomposition, we assume that there exist $(\beta _1^{(l)},\ldots ,\beta _{d_n}^{(l)})'\in {\mathbb {R}}^{d_n}$ and $(\tilde{\phi }_1^{(l)},\ldots ,\tilde{\phi }_{d_n}^{(l)})'\in \tilde{{\mathcal {H}}}^{d_n}$, $l=1,2$ such that

$$\begin{aligned} \sum \limits _{j=1}^{d_n} x_j[\beta _j^{(1)}+\tilde{\phi }_j^{(1)}(x_j)]\equiv \sum \limits _{j=1}^{d_n} x_j[\beta _j^{(2)}+\tilde{\phi }_j^{(2)}(x_j)]. \end{aligned}$$

(10)

It suffices to prove that $\beta _j^{(1)}=\beta _j^{(2)}$ and $\tilde{\phi }_j^{(1)}(x)\equiv \tilde{\phi }_j^{(2)}(x)$ for each $j=1,\ldots , d_n$. To the end, we note that (10) implies that

$$\begin{aligned} \sum \limits _{j=1}^{d_n} x_j\Big (\left[ \beta _j^{(1)}-\beta _j^{(2)}\right] +\left[ \tilde{\phi }_j^{(1)}(x_j)-\tilde{\phi }_j^{(2)}(x_j)\right] \Big )\equiv 0. \end{aligned}$$

When the covariates are not linearly dependent, by the Fubini’s theorem, there exists $(x_1^0,\ldots ,x_{j-1}^0,x_{j+1}^0,\ldots ,x_{d_n}^0)\in [\alpha _1,\alpha _2]^{d_n-1}$ such that

$$\begin{aligned} x_j\Big ([\beta _j^{(1)}-\beta _j^{(2)}]+[\tilde{\phi }_j^{(1)}(x_j)-\tilde{\phi }_j^{(2)}(x_j)]\Big )\equiv -\sum \limits _{i\ne j}x_i^0\Big ([\beta _i^{(1)}-\beta _i^{(2)}]+[\tilde{\phi }_i^{(1)}(x_i^0)-\tilde{\phi }_i^{(2)}(x_i^0)]\Big ). \end{aligned}$$

Writing $-\sum \nolimits _{i\ne j}x_i^0\Big ([\beta _i^{(1)}-\beta _i^{(2)}]+[\tilde{\phi }_i^{(1)}(x_i^0)-\tilde{\phi }_i^{(2)}(x_i^0)]\Big )$ as $C_j$ and using the condition that $E(\beta _j^{(l)} X_j+X_j\tilde{\phi }_j^{(l)}(X_j))=E(X_j\phi (X_j))$ for $l=1,2$, we have

$$\begin{aligned} (\beta _j^{(1)}-\beta _j^{(2)})+[\tilde{\phi }_j^{(1)}(x)-\tilde{\phi }_j^{(2)}(x)]\equiv 0 \end{aligned}$$

(11)

for each $j=1,\ldots , d_n$. Noting that $\tilde{\phi }_j^{(l)}(x)\in \tilde{{\mathcal {H}}}$, integrating two sides of (11) on variable x from $\alpha _1$ to $\alpha _2$ gives that

$$\begin{aligned} \beta _j^{(1)}=\beta _j^{(2)}. \end{aligned}$$

Combining with (11), we get that

$$\begin{aligned} \tilde{\phi }_j^{(1)}(x)\equiv \tilde{\phi }_j^{(2)}(x). \end{aligned}$$

$\square $

Let ${\mathbb {P}}_n$ be the empirical measure of $\{(Y_i,\delta _i, \varvec{X}_i):i=1,2,\ldots ,n\}$, and ${\mathbb {P}}$ be the probability measure of $(Y,\delta ,\varvec{X})$. Define $g_{nj}^*(X_j)=g_{j}^*(\phi _{nj},X_j)$ and $g_{0j}^*(X_j)=g_{j}^*(\phi _{0j},X_j)$ for $\phi _{nj}\in {\Omega _n}$. Then denote $g_n(\varvec{X})=\sum _{j=1}^{d_n} X_j\phi _{nj}(X_j)$, $g_n^*(\varvec{X})=\sum _{j=1}^{d_n} g_{nj}^*(X_j)$ and $g_0^*(\varvec{X})=\sum _{j=1}^{d_n} g_{0j}^*(X_j)$. Define

$$\begin{aligned} \varvec{{C}_{\xi }}=\left( \begin{array}{lllll} -\frac{m}{u_{m+1}-u_1} &{}\quad \frac{m}{u_{m+1}-u_1} &{}\quad 0 &{}\quad \cdots \ &{}\quad 0 \\ 0 &{}\quad -\frac{m}{u_{m+2}-u_2} &{} \frac{m}{u_{m+2}-u_2} &{}\quad \cdots &{}\quad 0 \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad -\frac{m}{u_{q_n+m-1}-u_{q_n-1}} &{}\quad \frac{m}{u_{q_n+m-1}-u_{q_n-1}} \end{array} \right) _{(q_n-1)\times q_n,} \end{aligned}$$

for $u_0=\cdots =u_m=\xi _0,\ u_{m+1}=\xi _1,\ldots , u_{q_n-1}=\xi _{K_n-1},\ u_{q_n}=\cdots =u_{q_n+m}=\xi _{K_n}$. Let $\xrightarrow {P}$ and $\xrightarrow {d}$ represent convergence in probability and in distribution, respectively, as $n\rightarrow \infty $ unless otherwise stated. Similar to Lemma A5 in Huang (1999), the following lemma can be established first.

Lemma 1

Assume that Conditions (C1)–(C4) hold for any $1\le j\le d_n$. Then there exists a function $\phi _{nj} \in {\Omega _n}$ such that

$$\begin{aligned} \Vert g_n^*-g_0^*\Vert _\infty =\Vert g_n-g_0\Vert _\infty =O_p(d_n(n^{-\nu p}+n^{-(1-\nu )/2})) \end{aligned}$$

with ${\mathbb {P}}_n\delta g_{nj}=0$.

Proof

According to Corollary 6.21 of Schumaker (1981), for any $1\le j\le d_n$, there exists $\phi _{nj}\in \Omega _n$ such that $\Vert \phi _{nj}-\phi _{0j}\Vert _\infty =O(n^{-\nu p})$. We define $\widetilde{g}_{nj}(X_j)=X_j\phi _{nj}(X_j)$ and

$$\begin{aligned} g_{nj}=\widetilde{g}_{nj}-n_{\delta }^{-1}{\mathbb {P}}_n\delta \widetilde{g}_{nj}, \end{aligned}$$

where $n_{\delta }=\sum _{i=1}^n\delta _i/n$. Then it is easy to see that ${\mathbb {P}}_n\delta g_{nj}=0$ for any $1\le j\le d_n$. Furthermore, we note that

$$\begin{aligned} \Vert g_{nj}-g_{0j}\Vert _{\infty }\le \Vert g_{nj}-\widetilde{g}_{nj}\Vert _\infty +\Vert \widetilde{g}_{nj}-g_{0j}\Vert _\infty \triangleq I_{1n}+I_{2n}, \end{aligned}$$

(12)

where

$$\begin{aligned} I_{1n}=\Vert g_{nj}-\widetilde{g}_{nj}\Vert _\infty \le c\Vert {\mathbb {P}}_n\delta \widetilde{g}_{nj}\Vert _{\infty }\le c(\Vert ({\mathbb {P}}_n-{\mathbb {P}})\delta \widetilde{g}_{nj}\Vert _\infty +\Vert {\mathbb {P}}(\delta \widetilde{g}_{nj}-\delta g_{0j})\Vert _\infty ), \end{aligned}$$

with c being a constant independent of n. By Lemma 3.4.2 in van der Vaart and Wellner (1996), we have $({\mathbb {P}}_n-{\mathbb {P}})\delta \widetilde{g}_{nj}=O_p(n^{-1/2}n^{\nu /2})$. And the definition of $\phi _{nj}$ shows that $\Vert {\mathbb {P}}(\delta \widetilde{g}_{nj}-\delta g_{0j})\Vert _\infty \le E(\delta )\Vert \widetilde{g}_{nj}-g_{0j}\Vert _\infty =O(n^{-\nu p})$. Hence we have

$$\begin{aligned} I_{1n}=O_p(n^{-\nu p}+n^{-(1-\nu )/2}). \end{aligned}$$

(13)

In addition,

$$\begin{aligned} I_{2n}=\Vert X_j\phi _{nj}-X_j\phi _{0j}\Vert _{\infty }=O_p(n^{-\nu p}). \end{aligned}$$

(14)

Plugging (13) and (14) into (12), we can get $\Vert g_{nj}-g_{0j}\Vert _\infty =O_p(n^{-\nu p}+n^{-(1-\nu )/2})$. By using the property of Kaplan–Meier weights (Stute 1993) and Lemma 3.4.2 in van der Vaart and Wellner (1996), we have

$$\begin{aligned} \Vert g_{nj}^*-g_{0j}^*\Vert _\infty\le & {} c_1\Vert (X_j \phi _{nj}-X_j \phi _{0j})-(\overline{g}_{jw}(\phi _{nj},X_j)-\overline{g}_{jw}(\phi _{0j},X_j))\Vert _\infty \\\le & {} c_1\Vert \widetilde{g}_{nj}-g_{0j}\Vert _\infty +c_2\Big \Vert \sum _{i=1}^n \omega _i X_{(i)j}{\phi }_{nj}(X_{(i)j})-\delta \widetilde{g}_{nj}\Big \Vert _\infty \\&\quad +c_3\Big \Vert \sum _{i=1}^n \omega _i X_{(i)j}{\phi }_{0j}(X_{(i)j})-\delta g_{0j}\Vert _\infty +c_4\Vert \delta \widetilde{g}_{nj}-\delta g_{0j}\Big \Vert _\infty \\= & {} O_p(n^{-\nu p})+O_p(n^{-(1-\nu )/2})+O_p(n^{-1/2})+O_p(n^{-\nu p})\\= & {} O_p(n^{-\nu p}+n^{-(1-\nu )/2}), \end{aligned}$$

where $c_i$’s $i=1,\ldots , 4$ are finite constants. Thus, we have

$$\begin{aligned} \Vert g_n^*-g_0^*\Vert _\infty =\Vert g_n-g_0\Vert _\infty =O_p(d_n(n^{-\nu p}+n^{-(1-\nu )/2})). \end{aligned}$$

$\square $

Define $\widehat{g}_{nj}^*(X_j)=g_j^*(\widehat{\phi }_{nj},X_j)$ and $\widehat{g}_n^*(\varvec{X})=\sum _{j=1}^{d_n}\widehat{g}_{nj}^*(X_j)$, then we have the following lemma.

Lemma 2

Assume that Conditions (C1)–(C7) hold. If $0.25/p<\nu <0.5$, then $\Vert \widehat{g}_n^*-g_n^*\Vert ^2=o_p(d_n^2 q_n^{-1})$ and $\displaystyle \left\| \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\right\| _{\infty }=o_p(1)$.

Proof

Let $\eta _{nj}\in \Omega _n$ such that $\eta _{nj}(x)=\varvec{\theta }_{nj}^{*T}\varvec{\psi }_{q_n,m}(x)$ and $\Vert \eta _{nj}(x)\Vert ^2=O(q_n^{-1})$. Denote $h_n^*(\varvec{X})=\sum _{j=1}^{d_n} g_j^*(\eta _{nj},X_j)$, then we have $\displaystyle \Vert \frac{1}{d_n}h_n^*(\varvec{X})\Vert ^2=O_p(q_n^{-1})$. Define $H_n(\alpha )=Q_n(\varvec{\theta }_n+\alpha \varvec{\theta }_n^{*})$. To prove this lemma, it is sufficient to show that for any $\alpha _0>0$, $H'_n(\alpha _0)>0$ and $H'_n(-\alpha _0)<0$ with probability tending to one.

Note that

$$\begin{aligned} H_n(\alpha _0)= & {} \frac{1}{2n}\Vert Y^*-(g_n^*+\alpha _0 h_n^*)(\varvec{X})\Vert ^2 +\sum _{j=1}^{d_n}P_1(\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert ;\lambda _{1})\\&+\sum _{j=1}^{d_n}P_2(\Vert \varvec{\theta }_{nj} +\alpha _0 \varvec{\theta }_{nj}^{*}\Vert ;\lambda _{2}). \end{aligned}$$

Then

$$\begin{aligned} H'_n(\alpha _0)= & {} -{\mathbb {P}}_n\Big [h_n^*\big (Y^{*}-g_n^*-\alpha _0 h_n^*\big )\Big ] \\&+\sum _{j=1}^{d_n}P'_1(\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert ;\lambda _{1}) \frac{(\varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*})^T \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})}{\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert } \\&+\sum _{j=1}^{d_n}P'_2(\Vert \varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*}\Vert ;\lambda _{2})\frac{\varvec{\theta }_{nj}^{*T}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})}{\Vert \varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*}\Vert } \\\triangleq & {} H_1+H_2+H_3. \end{aligned}$$

We consider the first part

$$\begin{aligned} H_1= & {} -{\mathbb {P}}_n\big [h_n^*(Y^*-g_n^*)\big ]+\alpha _0{\mathbb {P}}_n(h_n^*\cdot h_n^*) \\= & {} -{\mathbb {P}}_n\big [h_n^*(Y^*-g_n^*)\big ]+\alpha _0{\mathbb {P}}\Vert h_n^*(\varvec{X})\Vert ^2+\alpha _0({\mathbb {P}}_n-{\mathbb {P}})(h_n^*\cdot h_n^*) \\= & {} -{\mathbb {P}}_n\big [h_n^*(Y^*-g_n^*)\big ]+c_0 \alpha _0 d_n^2 n^{-\nu }+O_p(n^{-1/2}d_n^2), \end{aligned}$$

where $c_0>0$ is a constant and the first term

$$\begin{aligned} {\mathbb {P}}_n\big [h_n^*(Y^*-g_n^*)\big ]= & {} ({\mathbb {P}}_n-{\mathbb {P}})\big [h_n^*(Y^*-g_n^*)\big ]+{\mathbb {P}}\big [h_n^*(Y^*-g_n^*)\big ] \\\triangleq & {} J_{1n}+J_{2n}. \end{aligned}$$

In $J_{1n}$, $\Vert Y^*-g_n^*\Vert _\infty =\Vert Y^*-g_0^*+g_0^*-g_n^*\Vert _\infty \le O_p(1)+O_p(d_n (n^{-\nu p}+n^{-(1-\nu )/2}))$. Since $d_n^4/n\rightarrow 0,\ 0.25/p<\nu <0.5$, we have $\Vert \displaystyle \frac{1}{d_n^2}h_n^*(Y^*-g_n^*)\Vert _\infty \le M_0$ with a constant $M_0$. Let

$$\begin{aligned} \mu _0(\eta )=\left\{ \frac{1}{d_n^2}h_n^*\cdot (Y^*-g_n^*):\ \left\| \frac{1}{d_n}h_n^*\right\| \le \eta ,\left\| \frac{1}{d_n}(g_n^*-g_0^*)\right\| \le \eta \right\} . \end{aligned}$$

Then similar to Lemma A2 and Corollary A1 in Huang (1999), we have

$$\begin{aligned} \log N_\square (\varepsilon ,\mu _0(\eta ),L_2({\mathbb {P}}))\le c_0q_n\log (\eta /\varepsilon ), \end{aligned}$$

for any $\varepsilon <\eta $ with a constant $c_0$ and

$$\begin{aligned} J_\square (\eta ,\mu _0,L_2({\mathbb {P}}))\le c_0q_n^{1/2}\eta . \end{aligned}$$

Here we can take $\eta =q_n^{-1/2}$. Combining the results of Lemma 3.4.2 in van der Vaart and Wellner (1996) and Lemma A1 in Huang (1999), we get

$$\begin{aligned} J_{1n}=O_p(1)\cdot d_n^2\cdot n^{-1/2}\Big (q_n^{1/2}\eta +\frac{q_n}{\sqrt{n}}M_0\Big )=O_p\big (n^{-1/2}d_n^2\big ). \end{aligned}$$

We then consider $J_{2n}$ as

$$\begin{aligned} J_{2n}={\mathbb {P}}\big [h_n^*(g_n^*-g_0^*)\big ]=d_n^2\cdot {\mathbb {P}}\Big [\frac{h_n^*}{d_n}\cdot \frac{g_0^*-g_n^*}{d_n}\Big ], \end{aligned}$$

which gives that

$$\begin{aligned} |J_{2n}|\le O_p(1)\cdot d_n^2\cdot \big \Vert \frac{h_n^*}{d_n}\big \Vert \cdot \big \Vert \frac{g_0^*-g_n^*}{d_n}\big \Vert =O_p\big (d_n^2(n^{-(1/2+p)\nu }+n^{-1/2})\big ). \end{aligned}$$

Therefore,

$$\begin{aligned} H_1\ge c_0\alpha _0 d_n^2 n^{-\nu }+O_p(d_n^2 n^{-1/2})+O_p(d_n^2(n^{-(1/2+p)\nu }+n^{-1/2})). \end{aligned}$$

Next we focus on $H_2$ and $H_3$. Let $\varvec{B}_j(X_j)=(\varvec{\psi }_{q_n,m}(X_{1j}),\ldots ,\varvec{\psi }_{q_n,m}(X_{nj}))^T$. By Lemma 3 of Huang and Ma (2010), it follows that there are constants $0<c_3<c_4<\infty $ such that

$$\begin{aligned} c_3 q_n^{-1}\le \Lambda _{\min }\Big (\frac{\varvec{B}_j(X_j)^T \varvec{B}_j(X_j)}{n}\Big )\le \Lambda _{\max }\Big (\frac{\varvec{B}_j(X_j)^T \varvec{B}_j(X_j)}{n}\Big )\le c_4 q_n^{-1} \end{aligned}$$

with probability tending to one. Then we have $\Vert \varvec{\theta }_{nj}^{*}\Vert =O_p(1)$ and $\Vert \varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*}\Vert =O_p(1)$ by using of the fact that $\Vert \varvec{\theta }_{nj}^{*T} \varvec{\psi }_{q_n,m}(X_j)\Vert =O(q_n^{-1/2})$. Observing that

$$\begin{aligned} |H_2|= & {} \left| \sum _{j=1}^{d_n}P'_1(\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert ;\lambda _{1}) \frac{(\varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*})^T \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})}{\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert }\right| \\\le & {} \sum _{j=1}^{d_n}P'_1(\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert ;\lambda _{1})\frac{\big |(\varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*})^T \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\big |}{\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert }, \end{aligned}$$

by using of Condition (C9) and $\lambda _{1}=o(d_n n^{-\nu })$, we have

$$\begin{aligned} |H_2| \le P'_1(0+;\lambda _{1})\sum _{j=1}^{d_n}\Vert \varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*}\Vert \le O(\lambda _{1})O_p(d_n)=o_p(d_n^2 n^{-\nu }). \end{aligned}$$

The same arguments as above give that $|H_3|\le o_p(d_n^2 n^{-\nu })$ if $\lambda _{2}=o(d_n n^{-\nu })$.

Consequently, $H'_n(\alpha _0)\ge c_0\alpha _0 d_n^2 n^{-\nu }+o_p(d_n^2 n^{-\nu })>0$ with probability tending to one. Similarly, we can prove that $H'_n(-\alpha _0)<0$ with probability tending to one. Therefore, the boundness of covariate $\varvec{X}$ in Condition (C2) ensures that

$$\begin{aligned} \Vert \widehat{g}_n^*-g_n^*\Vert ^2=o_p(d_n^2 q_n^{-1})=o_p(d_n^2 n^{-\nu }). \end{aligned}$$

Subsequently, Lemma 7 of Stone (1986) yields that $\displaystyle \Vert \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\Vert _{\infty }=o_p(1)$. $\square $

To verify the consistency of parameter estimation, we need the following lemma.

Lemma 3

Define $m_0(x,y^*;g^*)=(y^*-g^*(x))^2/d_n^2$. Denote $M_0={\mathbb {P}} m_0$ and $\displaystyle M_n={\mathbb {P}}_n m_0=\frac{1}{n}\Vert Y^*-g^*(\varvec{X})\Vert ^2/d_n^2$. Under the conditions of Lemma 1, for any function $g(\cdot )$ satisfying $E[\delta g(\varvec{X})]=0$, there exists a constant $c>0$ such that

$$\begin{aligned} {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=c\left\| \frac{1}{d_n}(g^*-g_n^*)\right\| ^2+O_p(n^{-2\nu p}+n^{-(1-\nu )}). \end{aligned}$$

Proof

Let $h^*=g^*-g_0^*$ and

$$\begin{aligned} L(s)= & {} {\mathbb {P}} m_0(\cdot ;g_0^*+sh^*)-{\mathbb {P}} m_0(\cdot ;g_0^*) \\= & {} \frac{1}{d_n^2}\big [{\mathbb {P}}(Y^*-(g_0^*+sh^*))^2-{\mathbb {P}}(Y^*-g_0^*)^2\big ] \\= & {} \frac{1}{d_n^2}{\mathbb {P}}(-2sY^*h^*+2sg_0^* h^*+s^2 h^{*2}). \end{aligned}$$

Since $L'(0)=0$ and $L''(0)=2{\mathbb {P}}(h^{*2})/d_n^2$, there exists a constant $c>0$, such that $\displaystyle {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)=c\left\| \frac{1}{d_n}(g^*-g_0^*)\right\| ^2$. Similarly, we have

$$\begin{aligned} {\mathbb {P}} m_0(\cdot ;g_n^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)=O_p(1)\left\| \frac{1}{d_n}(g_n^*-g_0^*)\right\| ^2. \end{aligned}$$

By Lemma 1, ${\mathbb {P}} m_0(\cdot ;g_n^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)=O_p(n^{-2\nu p}+n^{-(1-\nu )})$. Combining the following equality

$$\begin{aligned} {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=\Big ({\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)\Big )+\Big ({\mathbb {P}} m_0(\cdot ;g_0^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)\Big ) \end{aligned}$$

with the triangle inequality

$$\begin{aligned} \Vert g^*-g_n^*\Vert ^2-\Vert g_n^*-g_0^*\Vert ^2\le \Vert g^*-g_0^*\Vert ^2\le \Vert g^*-g_n^*\Vert ^2+\Vert g_n^*-g_0^*\Vert ^2, \end{aligned}$$

we have

$$\begin{aligned} {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=c\left\| \frac{1}{d_n}(g^*-g_n^*)\right\| ^2+O_p(n^{-2\nu p}+n^{-(1-\nu )}), \end{aligned}$$

where $c>0$ is a finite constant. $\square $

Proof of Theorem 1

Let

$$\begin{aligned} V= & {} M_n(g^*)-M_n(g_n^*)-(M_0(g^*)-M_0(g_n^*)) \\= & {} {\mathbb {P}}_n m_0(\cdot ;g^*)-{\mathbb {P}}_n m_0(\cdot ;g_n^*)-({\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)) \\= & {} ({\mathbb {P}}_n-{\mathbb {P}})(m_0(\cdot ;g^*)-m_0(\cdot ;g_n^*)), \end{aligned}$$

By Lemma 3.4.2 of van der Vaart and Wellner (1996),

$$\begin{aligned} E\sup \limits _{\left\| \frac{1}{d_n}(g^*-g_n^*)\right\| \le \eta }|V|=n^{-1/2}\eta q_n^{1/2}. \end{aligned}$$

Then by Theorem 3.4.1 of van der Vaart and Wellner (1996), choosing the distance $d(\widehat{g}_n^*,g_n^*)=-[{\mathbb {P}} m_0(\cdot ;\widehat{g}_n^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)]$ there, we have

$$\begin{aligned} -r_{1n}^2[{\mathbb {P}} m_0(\cdot ;\widehat{g}_n^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)]=O_p(1), \end{aligned}$$

where $r_{1n}=O(n^{1/2}q_n^{-1/2})=O(n^{(1-\nu )/2})$. Therefore, ${\mathbb {P}} m_0(\cdot ;\widehat{g}_n^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=O_p(n^{-(1-\nu )})$. Thus Lemma 3 gives that $\Vert \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\Vert ^2=O_p(n^{-2\nu p}+n^{-(1-\nu )})$. Combining the result in Lemma 1 that $\Vert g_n^*-g_0^*\Vert _\infty ^2=O_p(d_n^2(n^{-2\nu p}+n^{-(1-\nu )}))$, we have

$$\begin{aligned} \Vert \widehat{g}_n^*-g_0^*\Vert ^2=O_p(d_n^2(n^{-2\nu p}+n^{-(1-\nu )})). \end{aligned}$$

By Conditions (C2)–(C4), it follows that

$$\begin{aligned} E\delta \big \Vert \varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n(\varvec{\varvec{X}}_{M_1})-\tilde{\varvec{\phi }}_0(\varvec{\varvec{X}}_{M_1}))+\varvec{\varvec{X}}_{M_2} (\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)\big \Vert ^2=O(d_n^2(n^{-2\nu p}+n^{-(1-\nu )})). \end{aligned}$$

Denoting the projection of $\varvec{\varvec{X}}_{M_2}$ on $\varvec{\varvec{X}}_{M_1}$ as W, we have

$$\begin{aligned}&E\delta \big \Vert (\varvec{\varvec{X}}_{M_2}-W)(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)+W(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)+\varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\big \Vert ^2 \\&\quad =E\delta \big \Vert (\varvec{\varvec{X}}_{M_2}-W)(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)\big \Vert ^2+E\delta \big \Vert W(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)+\varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\big \Vert ^2 \\&\quad =O(d_n^2(n^{-2\nu p}+n^{-(1-\nu )})). \end{aligned}$$

By Condition (C6), we obtain

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}_n-\varvec{\beta }_0\Vert ^2=O_p(d_n^2 (n^{-(1-\nu )}+n^{-2\nu p})). \end{aligned}$$

This in turn implies $E\delta \big \Vert \varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\big \Vert ^2=O_p(d_n^2 (n^{-(1-\nu )}+n^{-2\nu p}))$. Therefore,

$$\begin{aligned} \Vert \widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0\Vert ^2=O_p(d_n^2 (n^{-(1-\nu )}+n^{-2\nu p})). \end{aligned}$$

This completes the proof of Theorem 1. $\square $

Proof of Theorem 2

(i)
First, we prove the selection consistency of the variables. Let $\widetilde{\varvec{\theta }}_n=(\widetilde{\varvec{\theta }}_{n1}^T,\ldots ,\widetilde{\varvec{\theta }}_{nd_n}^T)^T$ with
$$\begin{aligned} \widetilde{\varvec{\theta }}_{nj}= \left\{ \begin{array}{ll} \widehat{\varvec{\theta }}_{nj},&{}\text{ if } j\notin M_3\text{, }\\ 0,&{}\text{ if } j\in M_3\text{. } \end{array}\right. \end{aligned}$$
Note that $\widehat{\varvec{\theta }}_n$ satisfies $\displaystyle \frac{\partial Q_n(\widehat{\varvec{\theta }}_n)}{\partial \varvec{\theta }}=\varvec{0}$. By the definition of $\widehat{\varvec{\theta }}_n$ and $\widetilde{\varvec{\theta }}_n$, we have
$$\begin{aligned}&Q_n(\widehat{\varvec{\theta }}_n)-Q_n(\widetilde{\varvec{\theta }}_n) \\= & {} \frac{\partial Q_n(\widehat{\varvec{\theta }}_{n})^T}{\partial \varvec{\theta }}(\widehat{\varvec{\theta }}_{n}-\widetilde{\varvec{\theta }}_n)-\frac{1}{2}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)^T\frac{\partial ^2 Q_n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }\partial \varvec{\theta }^T}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)\\= & {} -\frac{1}{2}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)^T\frac{\partial ^2 Q_n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }\partial \varvec{\theta }^T}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)\\= & {} -\frac{1}{2}\widehat{\varvec{\theta }}_{nM_3}^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }_{M_3}\partial \varvec{\theta }_{M_3}^T}\widehat{\varvec{\theta }}_{nM_3}-\frac{1}{2}\sum \limits _{j\in M_3}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}) \\&-\frac{1}{2}\sum \limits _{j\in M_3}\widehat{\varvec{\theta }}_{nj}^T\big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big )\widehat{\varvec{\theta }}_{nj}, \end{aligned}$$
where $\varvec{\theta }_n^*$ is between $\widehat{\varvec{\theta }}_n$ and $\widetilde{\varvec{\theta }}_n$.

Since $ \widehat{\varvec{\theta }}_n$ is the minimizer of $Q(\varvec{\theta })$, we have $Q(\widehat{\varvec{\theta }}_n)\le Q(\widetilde{\varvec{\theta }}_n)$, which implies that
$$\begin{aligned} \frac{1}{2}\widehat{\varvec{\theta }}_{nM_3}^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }_{M_3}\partial \varvec{\theta }_{M_3}^T}\widehat{\varvec{\theta }}_{nM_3}\ge & {} -\frac{1}{2}\sum \limits _{j\in M_3}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})\nonumber \\&-\frac{1}{2}\sum \limits _{j\in M_3}\widehat{\varvec{\theta }}_{nj}^T\big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big )\widehat{\varvec{\theta }}_{nj}. \end{aligned}$$
(15)
Note that the left hand of Eq (15)
$$\begin{aligned} I_1\le c \widehat{\varvec{\theta }}_{nM_3}^T E(X_{M_3}^TX_{M_3})\widehat{\varvec{\theta }}_{nM_3}\le c\rho _n^*\Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^2 \end{aligned}$$
for some constant c by the continuity of the B-spline functions and the definition of $\rho _n^*$. And using Condition (C9), there exist constants a, b and c such that the right hand of Eq (15)
$$\begin{aligned} I_2\ge c (\lambda _{1}^a+\lambda _{2}^a) \Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^{2-b}. \end{aligned}$$
Thus, by the results of Theorem 1, we obtain that
$$\begin{aligned} O_p(1)(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}\ge \Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^{b}\ge O_p(1)\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*}. \end{aligned}$$
This shows that under the condition that $\displaystyle \frac{\lambda _{1}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}$ and $\displaystyle \frac{\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}} $ goes to infinity,
$$\begin{aligned} P(\Vert \widehat{\varvec{\theta }}_{nM_3}\Vert >0)\le P\Big (\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}\le O_p(1)\Big )\rightarrow 0. \end{aligned}$$
Next, we prove the structure selection consistency. Assume that ${\mathop {\varvec{\theta }}\limits _{\sim }}_{n}=({\mathop {\varvec{\theta }}\limits _{\sim }}_{n1}^{T},\ldots ,{\mathop {\varvec{\theta }}\limits _{\sim }}_{nd_{n}}^T)^T$ with
$$\begin{aligned} {\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_{2}}= \left\{ \begin{array}{ll} \widehat{\varvec{\theta }}_{nj},&{}\text{ if } j\notin M_2\text{, }\\ {\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}, s.t. \ \varvec{{C}_{\xi }}{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}=0,&{}\text{ if } j\in M_2\text{. } \end{array}\right. \end{aligned}$$
Then we have
$$\begin{aligned}&\frac{1}{2}\Big (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2}\Big )^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\Big (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2}\Big )\nonumber \\\ge & {} -\frac{1}{2}\sum \limits _{j\in M_2}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})\nonumber \\&-\frac{1}{2}\sum \limits _{j\in M_2}\Big (\widehat{\varvec{\theta }}_{nj}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}\Big )^T \big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big ) \Big (\widehat{\varvec{\theta }}_{nj}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}\Big ), \end{aligned}$$
(16)
where $\varvec{\theta }_n^0$ is between $\widehat{\varvec{\theta }}_n$ and ${\mathop {\varvec{\theta }}\limits _{\sim }}_n$. The left hand of equation (16)
$$\begin{aligned} II_1\le & {} O_p(1)\Big (\varvec{C}_\xi (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2})\Big )^T \cdot \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\cdot \Big (\varvec{C}_\xi (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2})\Big )\\= & {} O_p(1)(\varvec{C}_\xi \widehat{\varvec{\theta }}_{nM_2})^T \cdot \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\cdot (\varvec{C}_\xi \widehat{\varvec{\theta }}_{nM_2})\\\le & {} O_p(1)\rho _n^*\Vert \widehat{\varvec{\theta }}_{nM_2}\Vert ^2. \end{aligned}$$
Similarly, we can obtain that the right hand of equation (16)
$$\begin{aligned} II_2\ge c (\lambda _{1}^a+\lambda _{2}^a) \Vert \widehat{\varvec{\theta }}_{nM_2}\Vert ^{2-b}. \end{aligned}$$
Therefore,
$$\begin{aligned} P(\Vert \widehat{\varvec{\theta }}_{nM_2}\Vert >0)\le P\Big (\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}\le O_p(1)\Big )\rightarrow 0. \end{aligned}$$
The selection consistency of variable and structure is concluded.
(ii)
Let the column and row vectors of covariate matrix $\varvec{X}^*$ are $X_1^*,\ldots , X_{d_n}^*$ and $X_{(1)}^*,\ldots , X_{(n)}^*$, respectively. Define
$$\begin{aligned} \overline{X}_w=\frac{\sum \nolimits _{i=1}^n \omega _i X_{(i)}}{\sum \nolimits _{i=1}^n \omega _i}, \quad X_{(i)}^*=(n\omega _i)^{1/2}(X_{(i)}-\overline{X}_w), \end{aligned}$$
$$\begin{aligned} U(\varvec{W};\varvec{\beta },\widehat{\varvec{\phi }}_n)\triangleq (-\varvec{X}_{M_2}^{*})\Big (Y^*-\sum \limits _{j\in M_1}\hat{g}_{nj}^*(X_{j})-\varvec{X}_{M_2}^* \varvec{\beta }\Big ), \end{aligned}$$
$$\begin{aligned} \widehat{U}_n(\varvec{\beta })\triangleq \frac{1}{n}\sum \limits _{i=1}^{n}U(\varvec{W}_i;\varvec{\beta },\widehat{\varvec{\phi }}_n), \end{aligned}$$
with $\varvec{W}\triangleq (\omega ,\varvec{X},Y)$. Then $\widehat{\varvec{\beta }}_n$ satisfies the estimating equation $\widehat{U}_n(\widehat{\varvec{\beta }})=0$ by the definition of $\widehat{\varvec{\beta }}_n$ and $\widehat{\varvec{\phi }}_n$.

Let $\displaystyle U_n(\varvec{\beta })\triangleq \frac{1}{n}\sum \nolimits _{i=1}^{n}U(\varvec{W}_i;\varvec{\beta },\tilde{\varvec{\phi }}_0)$ and $\widetilde{\varvec{\beta }}_n$ be the root of $U_n(\varvec{\beta })=0$. We then show that $\widehat{\varvec{\beta }}_n$ has the same distribution with $\widetilde{\varvec{\beta }}_n$. The Fréchet derivative of $U(\varvec{W};\varvec{\beta }_0,\varvec{\phi })$ at $\tilde{\varvec{\phi }}_0$ in the direction $\varvec{h}$ is given as
$$\begin{aligned} D(\varvec{W},\varvec{h})= & {} \lim \limits _{\alpha \rightarrow 0}\frac{U(\varvec{W};\varvec{\beta }_0,\tilde{\varvec{\phi }}_0+\alpha \varvec{h})-U(\varvec{W};\varvec{\beta }_0,\tilde{\varvec{\phi }}_0)}{\alpha } \\= & {} \varvec{X}_{M_2}^{*T}\varvec{X}_{M_1}^{*} \varvec{h}, \end{aligned}$$
with $\varvec{h}\in \{h_1+\ldots +h_{|M_1|},h_j\in \tilde{{\mathcal {H}}},\ j\in M_1\}$.

The relation $\Vert (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)/d_n\Vert =O_p(n^{-(1-\nu )/2}+n^{-\nu p})=o_p(n^{-1/4})$ ensures that the linear assumption 5.1 in Newey (1994) is satisfied. Then by Lemma 3.4.2 of van der Vaart and Wellner (1996), we have
$$\begin{aligned} \sqrt{n}({\mathbb {P}}_n-{\mathbb {P}})\{D(\varvec{W};\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\}\xrightarrow {P}0. \end{aligned}$$
It follows that the stochastic equicontinuity assumption 5.2 holds. For $\varvec{\phi }$ close enough to $\tilde{\varvec{\phi }}_0$, a straightforward calculation yields that $ED(\varvec{W};\varvec{\phi }-\tilde{\varvec{\phi }}_0)=0$ by using Condition (C8). Then the mean square continuity assumption 5.3 holds with $\alpha (\varvec{W})=0$. By Lemma 5.1 of Newey (1994), $\widehat{\varvec{\beta }}_n$ and $\widetilde{\varvec{\beta }}_n$ have the same distribution.

Next, we seek for the asymptotic distribution of $\widetilde{\varvec{\beta }}_{nM_2}$. Let $\iota _n=n^{-1/2},\ V_{1n}(\varvec{a})=Q_n(\varvec{\beta }_0+\iota _n(\varvec{a}^T,\varvec{0}^T)^T,\tilde{\varvec{\phi }}_0)-Q_n(\varvec{\beta }_0,\tilde{\varvec{\phi }}_0)$, where $\varvec{a}=(a_1,\ldots ,a_{|M_2|})^T$ is a $|M_2|$-dimensional constant vector and $\varvec{0}$ is a $|M_3|$-dimensional zero vector. By part (i) of Theorem 2, $\widetilde{\varvec{\beta }}_n-\varvec{\beta }_0=\iota _n(\widehat{\varvec{a}}_n^T,\varvec{0}^T)^T$ with probability converging to one, where $\widehat{\varvec{a}}_n=\text {argmin}\{V_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}$. Letting $\widetilde{\varvec{\theta }}_n$ be the estimator corresponding to $\widetilde{\varvec{\beta }}_n$, then similar to Theorem 2 (i), we also have $\varvec{{C}_{\xi }}\widetilde{\varvec{\theta }}_{nj}=0\ (j\in M_2)$ with probability converging to one.

Note that
$$\begin{aligned} V_{1n}(\varvec{a})= & {} Q_n(\varvec{\beta }_0+\iota _n(\varvec{a}^T,\varvec{0}^T)^T,\tilde{\varvec{\phi }}_0)-Q_n(\varvec{\beta }_0,\tilde{\varvec{\phi }}_0) \\= & {} \Big (\iota _n \varvec{a}^T U_n(\varvec{\beta }_0)+ \frac{\iota _n^2}{2} \varvec{a}^T U_n'(\varvec{\beta }_0)\varvec{a}\Big ) \\&+\left( \sum _{j\in M_2}P_1(\Vert \varvec{{C}_{\xi }}\widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})-\sum _{j\in M_2}P_1(\Vert \varvec{{C}_{\xi }}\varvec{\theta }_{0j}\Vert ;\lambda _{1})\right) \\&+\left( \sum _{j\in M_2}P_2(\Vert \widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{2n}) -\sum _{j\in M_2}P_2(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n})\right) \\\triangleq & {} A_{1n}(\varvec{a})+A_{2n}(\varvec{a})+A_{3n}(\varvec{a}). \end{aligned}$$
Since $\widetilde{\varvec{\beta }}_{n M_2}-\varvec{\beta }_{0}=\iota _n\varvec{a}=\varvec{\psi }_{q_n,m}(\varvec{X}_{M_2})^T(\widetilde{\varvec{\theta }}_{n M_2}-\varvec{\theta }_{0 M_2})$, we have
$$\begin{aligned} \widetilde{\varvec{\theta }}_{nj}-\varvec{\theta }_{0j}=(\varvec{\psi }_{q_n,m}(X_j)\varvec{\psi }_{q_n,m}(X_j)^T)^{-1}\varvec{\psi }_{q_n,m}(X_j)\iota _na_j,\ j=s_1+1,\ldots ,s_2. \end{aligned}$$
It follows that
$$\begin{aligned} A_{3n}(\varvec{a})= & {} \sum _{j\in M_2}P_2(\Vert \widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{2n})-\sum _{j\in M_2}P_2(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n}) \\= & {} \sum _{j\in M_2}\left[ P_2'(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n})\frac{\varvec{\theta }_{0j}^T}{\Vert \varvec{\theta }_{0j}\Vert }+o_p(1)\right] \\&\big [(\varvec{\psi }_{q_n,m}(X_j)\varvec{\psi }_{q_n,m}(X_j)^T)^{-1}\varvec{\psi }_{q_n,m}(X_j)\iota _na_j\big ]. \end{aligned}$$
By Condition (C7), we have
$$\begin{aligned} |A_{3n}(\varvec{a})|\le & {} d_n P_2'(0+;\lambda _{2n}) \sqrt{\Vert ({\varvec{\psi }}_{q_{n,m}}(X_j){\varvec{\psi }}_{q_{n,m}}(X_j)^T)^{-1}\Vert }\iota _n a_j\\= & {} O_p(d_n^2 n^{-\nu }) O_p(n^{-(1-\nu )/2})=o_p(1). \end{aligned}$$
Similarly, we can get that $A_{2n}(\varvec{a})\xrightarrow {p}0$.

Hence, $\widehat{\varvec{a}}_n=\text {argmin}\{V_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}=\text {argmin}\{A_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}$ and so we only care about the minimum of $nA_{1n}(\varvec{a})$. Similar to Huang et al. (2010), we have
$$\begin{aligned} nA_{1n}(\varvec{a})= & {} \varvec{a}^T \big (\sqrt{n}U_n(\varvec{\beta }_0)\big )+ \frac{1}{2}\varvec{a}^T U_n'(\varvec{\beta }_0)\varvec{a}\\\triangleq & {} \varvec{a}^T T_1+ \varvec{a}^T T_2 \varvec{a}. \end{aligned}$$
It can be seen that $T_2 \xrightarrow {p} \Sigma _2$ and $\varvec{u}\Sigma _3^{-1/2} T_1$ is distributed asymptotically by N(0, 1) for any $\varvec{u}\in {\mathbb {R}}^{|M_2|}$ with $\Vert \varvec{u}\Vert =1$ , where $\Sigma _3=Var(\delta \gamma _0(Y)(Y-g_0(\varvec{X}))\varvec{X}_{M_2}+(1-\delta )\gamma _1(Y)-\gamma _2(Y))$ with the following notations that
$$\begin{aligned}&\tilde{H}^{11}(\varvec{x},y)=P(\varvec{X}\le \varvec{x}, Y\le y, \delta =1), \quad \tilde{H}^0=P(Y\le y,\delta =0),\\&\gamma _0(y)=\exp \Bigg (\int _0^{y-}\frac{\tilde{H}^0(dw)}{1-H(w)}\Bigg ),\\&\gamma _{1,j}(y)=\frac{1}{1-H(y)}\int I(z>y)e(z,\varvec{x})x_j\gamma _0(z)\tilde{H}^{11}(d\varvec{x},dz),\\&\gamma _{2,j}(y)=\iint \frac{I(v<y,v<z)e(z,\varvec{x})x_j\gamma _0(z)}{[1-H(v)]^2}\tilde{H}^0(dv)\tilde{H}^{11}(d\varvec{x},dz),\\&\gamma _l(y)=(\gamma _{l,j}; j\in M_2), ~~ l=1,2. \end{aligned}$$
Let $\hat{\varvec{a}}=\text {argmin}\{V_1(\varvec{a})=\varvec{a}^T T_1+\frac{1}{2}\varvec{a}^T \Sigma _2 \varvec{a}: \varvec{a}\in {\mathbb {R}}^{|M_2|}\}$. According to the continuous mapping theorem of Kim and Pollard (1990), $\sqrt{n} \varvec{u}\Sigma ^{-1/2}(\widehat{\varvec{\beta }}_{nM_2}-\varvec{\beta }_{0})$ has the same asymptotical distribution as $\varvec{u}\Sigma ^{-1/2}\hat{\varvec{a}}\xrightarrow {d} N(0,1)$ for any $\varvec{u}\in {\mathbb {R}}^{|M_2|}$ with $\Vert \varvec{u}\Vert =1$, where $\Sigma =\Sigma _2^{-1}\Sigma _3\Sigma _2^{-1}$. This completes the proof of Theorem 2.

$\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Liu, L., Wang, H., Liu, Y. et al. Model pursuit and variable selection in the additive accelerated failure time model. Stat Papers 62, 2627–2659 (2021). https://doi.org/10.1007/s00362-020-01205-0

Download citation

Received: 12 January 2020
Revised: 29 August 2020
Published: 12 October 2020
Issue Date: December 2021
DOI: https://doi.org/10.1007/s00362-020-01205-0

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Model pursuit and variable selection in the additive accelerated failure time model

Abstract

Access this article

Similar content being viewed by others

A Guide for Sparse PCA: Model Comparison and Applications

Maximum likelihood estimation of the Weibull distribution with reduced bias

Joint models with multiple longitudinal outcomes and a time-to-event outcome: a corrected two-stage approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Proof of Proposition 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Model pursuit and variable selection in the additive accelerated failure time model

Abstract

Access this article

Similar content being viewed by others

A Guide for Sparse PCA: Model Comparison and Applications

Maximum likelihood estimation of the Weibull distribution with reduced bias

Joint models with multiple longitudinal outcomes and a time-to-event outcome: a corrected two-stage approach

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Appendix: Proofs

Proof of Proposition 1

Lemma 1

Proof

Lemma 2

Proof

Lemma 3

Proof

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation