Skip to main content
Log in

Model pursuit and variable selection in the additive accelerated failure time model

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

In this paper, we propose a new semiparametric method to simultaneously select important variables, identify the model structure and estimate covariate effects in the additive AFT model, for which the dimension of covariates is allowed to increase with sample size. Instead of directly approximating the non-parametric effects as in most existing studies, we take a linear effect out to weak the condition required for model identifiability. To compute the proposed estimates numerically, we use an alternating direction method of multipliers algorithm so that it can be implemented easily and achieve fast convergence rate. Our method is proved to be selection consistent and possess an asymptotic oracle property. The performance of the proposed methods is illustrated through simulations and the real data analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Akaike H (1973) Information theory and an extension of the maximum likelihood principle. In: Second international symposium on information theory, pp 267–281

  • Antoniadis A, Gijbels I, Lambert-Lacroix S (2014) Penalized estimation in additive varying coefficient models using grouped regularization. Stat Pap 55:727–750

    MathSciNet  MATH  Google Scholar 

  • Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3:1–122

    MATH  Google Scholar 

  • Buckley J, James I (1979) Linear regression with censored data. Biometrika 66:429–436

    MATH  Google Scholar 

  • Candes E, Tao T (2007) The Dantzig selector: statsitical estimation when \(p\) is much larger than \(n\). Ann Stat 35:2313–2351

    MATH  Google Scholar 

  • Cao Y, Huang J, Liu Y, Zhao X (2016) Sieve estimation of Cox models with latent structures. Biometrics 72:1086–1097

    MathSciNet  MATH  Google Scholar 

  • Chen K, Shen J, Ying Z (2005) Rank estimation in partial linear model with censored data. Stat Sin 15(3):767–779

    MathSciNet  MATH  Google Scholar 

  • Chen S, Zhou Y, Ji Y (2018) Nonparametric identification and estimation of sample selection models under symmetry. J Econom 202(2):148–160

    MathSciNet  MATH  Google Scholar 

  • Craven P, Wahba G (1979) Smoothing noisy data with spline functions: estimating the correct degree of smoothing by the method of generalized cross-validation. Numer Math 31:377–403

    MathSciNet  MATH  Google Scholar 

  • de Boor C (1978) A practical guide to splines. Applied Mathematical Sciences, vol 27, no 149. Springer, New York, pp 157

  • Fleming TR, Harrington DP (1991) Counting processes and survival analysis. Wiley, New York

    MATH  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    MathSciNet  MATH  Google Scholar 

  • Fan J, Li R (2002) Variable selection for Cox’s proportional hazards model and frailty model. Ann Stat 30:74–99

    MathSciNet  MATH  Google Scholar 

  • Huang J (1999) Efficient estimation of the partly linear additive Cox model. Ann Stat 27:1536–1563

    MathSciNet  MATH  Google Scholar 

  • Huang J, Ma S (2010) Variable selection in the accelerated failure time model via the bridge method. Lifetime Data Anal 16:176–195

    MathSciNet  MATH  Google Scholar 

  • Huang J, Horowitz JL, Wei F (2010) Variable selection in nonparametric additive models. Ann Stat 38:2282–2313

    MathSciNet  MATH  Google Scholar 

  • Huang J, Wei F, Ma S (2012) Semiparametic regression pursuit. Stat Sin 22:1403–1426

    MATH  Google Scholar 

  • Joseph A (2013) Variable selection in high-dimension with random designs and orthogonal matching pursuit. J Mach Learn Res 14:1771–1800

    MathSciNet  MATH  Google Scholar 

  • Kim J, Pollard DB (1990) Cube root asymptotics. Ann Stat 18:191–219

    MathSciNet  MATH  Google Scholar 

  • Lam C, Fan J (2009) Sparsitency and rates of convergence on large covariance matrix estimation. Ann Stat 37:4254–4278

    MATH  Google Scholar 

  • Leng C, Ma S (2007) Accelerated failure time models with nonlinear covariates effects. Aust N Z J Stat 49:155–172

    MathSciNet  MATH  Google Scholar 

  • Lian H, Lai P, Liang H (2013) Partially linear structure selection in Cox models with varying coefficients. Biometrics 69:348–357

    MathSciNet  MATH  Google Scholar 

  • Liu Y, Zhang J, Zhao X (2018) A new nonparametric screening method for ultrahigh-dimensional survival data. Comput Stat Data Anal 119:74–85

    MathSciNet  MATH  Google Scholar 

  • Ma S, Du P (2012) Variable selection in partly linear regression model with diverging dimensions for right censored data. Stat Sin 22:1003–1020

    MathSciNet  MATH  Google Scholar 

  • Ma S, Kosorok MR, Fine JP (2006) Additive risk models for survival data with high-dimensional covariates. Biometrics 62:202–210

    MathSciNet  MATH  Google Scholar 

  • Newey WK (1994) The asymptotic variance of semiparametric estimators. Econometrica 62:1349–1382

    MathSciNet  MATH  Google Scholar 

  • Neykov NM, Filzmoser P, Neytchev PN (2014) Ultrahigh dimensional variable selection through the penalized maximum trimmed likelihood estimator. Stat Pap 55:187–207

    MathSciNet  MATH  Google Scholar 

  • Robert J, Gray (1992) Flexible methods for analyzing survival data using splines with applications to breast cancer prognosis. J Am Stat Assoc 8:942–951

    Google Scholar 

  • Rosenwald A et al (2002) The use of molecular profiling to predict survival after chemotherapy for diffuse large-B-cell lymphoma. N Engl J Med 346:1937–1947

    Google Scholar 

  • Schumaker L (1981) Spline functions: basic theory. Wiley, New York

    MATH  Google Scholar 

  • Schwartz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    MathSciNet  Google Scholar 

  • Stone C (1986) The dimensionality reduction principle for generalized additive models. Ann Stat 14:590–606

    MathSciNet  MATH  Google Scholar 

  • Stute W (1993) Consistent estimation under random censorship when covariables are available. J Multivar Anal 45:89–103

    MATH  Google Scholar 

  • Stute W (1996) Distributional convergence under random censorship when covariables are present. Scand J Stat 23:461–471

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc B 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Tibshirani R (1997) The lasso method for variable selection in the Cox model. Stat Med 16:385–395

    Google Scholar 

  • van der Vaart A, Wellner JA (1996) Weak convergence and empirical processes. Springer, New York

    MATH  Google Scholar 

  • Wang K, Lin L (2019) Robust and efficient estimator for simultaneous model structure indentification and variable selection in generalized partial linear varying coefficient models with longitudinal data. Stat Pap 60:1649–1676

    MATH  Google Scholar 

  • Wang S, Nan B, Zhu J, David GB (2008) Doubly penalized Buckley-James method for survival data with high-dimensional covariates. Biometrics 64:132–140

    MathSciNet  MATH  Google Scholar 

  • Wei LJ (1992) The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat Med 11:1871–1879

    Google Scholar 

  • Wu Y, Stefanski LA (2015) Automatic structure recovery for additive models. Biometrika 102:381–395

    MathSciNet  MATH  Google Scholar 

  • Zeng D, Lin D (2007) Efficient estimation for the accelerated failure time model. J Am Stat Assoc 102:1387–1396

    MathSciNet  MATH  Google Scholar 

  • Zhang C (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    MathSciNet  MATH  Google Scholar 

  • Zhang HH, Lu W (2007) Adaptive Lasso for Cox’s proportional hazards model. Biometrika 94:691–703

    MathSciNet  MATH  Google Scholar 

  • Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112

    MathSciNet  MATH  Google Scholar 

  • Zhang J, Yin G, Liu Y, Wu Y (2018) Censored cumulative residual independent screening for ultrahigh-dimensional survival data. Lifetime Data Anal 24:273–292

    MathSciNet  MATH  Google Scholar 

  • Zou H (2006) The adaptive Lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the referees, the associate editor and the editor for their constructive and insightful comments and suggestions that greatly improved the paper. This research was partially supported by the National Nature Science Foundation of China (Nos. 11971362, 11571263 and 11771366). The work of J. Huang is supported in part by the NSF grant DMS-1916199.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yanyan Liu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Appendix: Proofs

Proof of Proposition 1

First, the fact that \(\phi _j(x)=\beta _j+\tilde{\phi }_j(x)\) with \(\beta _j=\int _{\alpha _1}^{\alpha _2} \phi _j(x)dx\) and \(\tilde{\phi }_j(x)=\phi _j(x)-\beta _j\) for \(j=1,\ldots , p\) implies that the decomposition (4) holds. To show the uniqueness of the decomposition, we assume that there exist \((\beta _1^{(l)},\ldots ,\beta _{d_n}^{(l)})'\in {\mathbb {R}}^{d_n}\) and \((\tilde{\phi }_1^{(l)},\ldots ,\tilde{\phi }_{d_n}^{(l)})'\in \tilde{{\mathcal {H}}}^{d_n}\), \(l=1,2\) such that

$$\begin{aligned} \sum \limits _{j=1}^{d_n} x_j[\beta _j^{(1)}+\tilde{\phi }_j^{(1)}(x_j)]\equiv \sum \limits _{j=1}^{d_n} x_j[\beta _j^{(2)}+\tilde{\phi }_j^{(2)}(x_j)]. \end{aligned}$$
(10)

It suffices to prove that \(\beta _j^{(1)}=\beta _j^{(2)}\) and \(\tilde{\phi }_j^{(1)}(x)\equiv \tilde{\phi }_j^{(2)}(x)\) for each \(j=1,\ldots , d_n\). To the end, we note that (10) implies that

$$\begin{aligned} \sum \limits _{j=1}^{d_n} x_j\Big (\left[ \beta _j^{(1)}-\beta _j^{(2)}\right] +\left[ \tilde{\phi }_j^{(1)}(x_j)-\tilde{\phi }_j^{(2)}(x_j)\right] \Big )\equiv 0. \end{aligned}$$

When the covariates are not linearly dependent, by the Fubini’s theorem, there exists \((x_1^0,\ldots ,x_{j-1}^0,x_{j+1}^0,\ldots ,x_{d_n}^0)\in [\alpha _1,\alpha _2]^{d_n-1}\) such that

$$\begin{aligned} x_j\Big ([\beta _j^{(1)}-\beta _j^{(2)}]+[\tilde{\phi }_j^{(1)}(x_j)-\tilde{\phi }_j^{(2)}(x_j)]\Big )\equiv -\sum \limits _{i\ne j}x_i^0\Big ([\beta _i^{(1)}-\beta _i^{(2)}]+[\tilde{\phi }_i^{(1)}(x_i^0)-\tilde{\phi }_i^{(2)}(x_i^0)]\Big ). \end{aligned}$$

Writing \(-\sum \nolimits _{i\ne j}x_i^0\Big ([\beta _i^{(1)}-\beta _i^{(2)}]+[\tilde{\phi }_i^{(1)}(x_i^0)-\tilde{\phi }_i^{(2)}(x_i^0)]\Big )\) as \(C_j\) and using the condition that \(E(\beta _j^{(l)} X_j+X_j\tilde{\phi }_j^{(l)}(X_j))=E(X_j\phi (X_j))\) for \(l=1,2\), we have

$$\begin{aligned} (\beta _j^{(1)}-\beta _j^{(2)})+[\tilde{\phi }_j^{(1)}(x)-\tilde{\phi }_j^{(2)}(x)]\equiv 0 \end{aligned}$$
(11)

for each \(j=1,\ldots , d_n\). Noting that \(\tilde{\phi }_j^{(l)}(x)\in \tilde{{\mathcal {H}}}\), integrating two sides of (11) on variable x from \(\alpha _1\) to \(\alpha _2\) gives that

$$\begin{aligned} \beta _j^{(1)}=\beta _j^{(2)}. \end{aligned}$$

Combining with (11), we get that

$$\begin{aligned} \tilde{\phi }_j^{(1)}(x)\equiv \tilde{\phi }_j^{(2)}(x). \end{aligned}$$

\(\square \)

Let \({\mathbb {P}}_n\) be the empirical measure of \(\{(Y_i,\delta _i, \varvec{X}_i):i=1,2,\ldots ,n\}\), and \({\mathbb {P}}\) be the probability measure of \((Y,\delta ,\varvec{X})\). Define \(g_{nj}^*(X_j)=g_{j}^*(\phi _{nj},X_j)\) and \(g_{0j}^*(X_j)=g_{j}^*(\phi _{0j},X_j)\) for \(\phi _{nj}\in {\Omega _n}\). Then denote \(g_n(\varvec{X})=\sum _{j=1}^{d_n} X_j\phi _{nj}(X_j)\), \(g_n^*(\varvec{X})=\sum _{j=1}^{d_n} g_{nj}^*(X_j)\) and \(g_0^*(\varvec{X})=\sum _{j=1}^{d_n} g_{0j}^*(X_j)\). Define

$$\begin{aligned} \varvec{{C}_{\xi }}=\left( \begin{array}{lllll} -\frac{m}{u_{m+1}-u_1} &{}\quad \frac{m}{u_{m+1}-u_1} &{}\quad 0 &{}\quad \cdots \ &{}\quad 0 \\ 0 &{}\quad -\frac{m}{u_{m+2}-u_2} &{} \frac{m}{u_{m+2}-u_2} &{}\quad \cdots &{}\quad 0 \\ \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \ddots &{}\quad \vdots \\ 0 &{}\quad 0 &{}\quad \cdots &{}\quad -\frac{m}{u_{q_n+m-1}-u_{q_n-1}} &{}\quad \frac{m}{u_{q_n+m-1}-u_{q_n-1}} \end{array} \right) _{(q_n-1)\times q_n,} \end{aligned}$$

for \(u_0=\cdots =u_m=\xi _0,\ u_{m+1}=\xi _1,\ldots , u_{q_n-1}=\xi _{K_n-1},\ u_{q_n}=\cdots =u_{q_n+m}=\xi _{K_n}\). Let \(\xrightarrow {P}\) and \(\xrightarrow {d}\) represent convergence in probability and in distribution, respectively, as \(n\rightarrow \infty \) unless otherwise stated. Similar to Lemma A5 in Huang (1999), the following lemma can be established first.

Lemma 1

Assume that Conditions (C1)–(C4) hold for any \(1\le j\le d_n\). Then there exists a function \(\phi _{nj} \in {\Omega _n}\) such that

$$\begin{aligned} \Vert g_n^*-g_0^*\Vert _\infty =\Vert g_n-g_0\Vert _\infty =O_p(d_n(n^{-\nu p}+n^{-(1-\nu )/2})) \end{aligned}$$

with \({\mathbb {P}}_n\delta g_{nj}=0\).

Proof

According to Corollary 6.21 of Schumaker (1981), for any \(1\le j\le d_n\), there exists \(\phi _{nj}\in \Omega _n\) such that \(\Vert \phi _{nj}-\phi _{0j}\Vert _\infty =O(n^{-\nu p})\). We define \(\widetilde{g}_{nj}(X_j)=X_j\phi _{nj}(X_j)\) and

$$\begin{aligned} g_{nj}=\widetilde{g}_{nj}-n_{\delta }^{-1}{\mathbb {P}}_n\delta \widetilde{g}_{nj}, \end{aligned}$$

where \(n_{\delta }=\sum _{i=1}^n\delta _i/n\). Then it is easy to see that \({\mathbb {P}}_n\delta g_{nj}=0\) for any \(1\le j\le d_n\). Furthermore, we note that

$$\begin{aligned} \Vert g_{nj}-g_{0j}\Vert _{\infty }\le \Vert g_{nj}-\widetilde{g}_{nj}\Vert _\infty +\Vert \widetilde{g}_{nj}-g_{0j}\Vert _\infty \triangleq I_{1n}+I_{2n}, \end{aligned}$$
(12)

where

$$\begin{aligned} I_{1n}=\Vert g_{nj}-\widetilde{g}_{nj}\Vert _\infty \le c\Vert {\mathbb {P}}_n\delta \widetilde{g}_{nj}\Vert _{\infty }\le c(\Vert ({\mathbb {P}}_n-{\mathbb {P}})\delta \widetilde{g}_{nj}\Vert _\infty +\Vert {\mathbb {P}}(\delta \widetilde{g}_{nj}-\delta g_{0j})\Vert _\infty ), \end{aligned}$$

with c being a constant independent of n. By Lemma 3.4.2 in van der Vaart and Wellner (1996), we have \(({\mathbb {P}}_n-{\mathbb {P}})\delta \widetilde{g}_{nj}=O_p(n^{-1/2}n^{\nu /2})\). And the definition of \(\phi _{nj}\) shows that \(\Vert {\mathbb {P}}(\delta \widetilde{g}_{nj}-\delta g_{0j})\Vert _\infty \le E(\delta )\Vert \widetilde{g}_{nj}-g_{0j}\Vert _\infty =O(n^{-\nu p})\). Hence we have

$$\begin{aligned} I_{1n}=O_p(n^{-\nu p}+n^{-(1-\nu )/2}). \end{aligned}$$
(13)

In addition,

$$\begin{aligned} I_{2n}=\Vert X_j\phi _{nj}-X_j\phi _{0j}\Vert _{\infty }=O_p(n^{-\nu p}). \end{aligned}$$
(14)

Plugging (13) and (14) into (12), we can get \(\Vert g_{nj}-g_{0j}\Vert _\infty =O_p(n^{-\nu p}+n^{-(1-\nu )/2})\). By using the property of Kaplan–Meier weights (Stute 1993) and Lemma 3.4.2 in van der Vaart and Wellner (1996), we have

$$\begin{aligned} \Vert g_{nj}^*-g_{0j}^*\Vert _\infty\le & {} c_1\Vert (X_j \phi _{nj}-X_j \phi _{0j})-(\overline{g}_{jw}(\phi _{nj},X_j)-\overline{g}_{jw}(\phi _{0j},X_j))\Vert _\infty \\\le & {} c_1\Vert \widetilde{g}_{nj}-g_{0j}\Vert _\infty +c_2\Big \Vert \sum _{i=1}^n \omega _i X_{(i)j}{\phi }_{nj}(X_{(i)j})-\delta \widetilde{g}_{nj}\Big \Vert _\infty \\&\quad +c_3\Big \Vert \sum _{i=1}^n \omega _i X_{(i)j}{\phi }_{0j}(X_{(i)j})-\delta g_{0j}\Vert _\infty +c_4\Vert \delta \widetilde{g}_{nj}-\delta g_{0j}\Big \Vert _\infty \\= & {} O_p(n^{-\nu p})+O_p(n^{-(1-\nu )/2})+O_p(n^{-1/2})+O_p(n^{-\nu p})\\= & {} O_p(n^{-\nu p}+n^{-(1-\nu )/2}), \end{aligned}$$

where \(c_i\)’s \(i=1,\ldots , 4\) are finite constants. Thus, we have

$$\begin{aligned} \Vert g_n^*-g_0^*\Vert _\infty =\Vert g_n-g_0\Vert _\infty =O_p(d_n(n^{-\nu p}+n^{-(1-\nu )/2})). \end{aligned}$$

\(\square \)

Define \(\widehat{g}_{nj}^*(X_j)=g_j^*(\widehat{\phi }_{nj},X_j)\) and \(\widehat{g}_n^*(\varvec{X})=\sum _{j=1}^{d_n}\widehat{g}_{nj}^*(X_j)\), then we have the following lemma.

Lemma 2

Assume that Conditions (C1)–(C7) hold. If \(0.25/p<\nu <0.5\), then \(\Vert \widehat{g}_n^*-g_n^*\Vert ^2=o_p(d_n^2 q_n^{-1})\) and \(\displaystyle \left\| \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\right\| _{\infty }=o_p(1)\).

Proof

Let \(\eta _{nj}\in \Omega _n\) such that \(\eta _{nj}(x)=\varvec{\theta }_{nj}^{*T}\varvec{\psi }_{q_n,m}(x)\) and \(\Vert \eta _{nj}(x)\Vert ^2=O(q_n^{-1})\). Denote \(h_n^*(\varvec{X})=\sum _{j=1}^{d_n} g_j^*(\eta _{nj},X_j)\), then we have \(\displaystyle \Vert \frac{1}{d_n}h_n^*(\varvec{X})\Vert ^2=O_p(q_n^{-1})\). Define \(H_n(\alpha )=Q_n(\varvec{\theta }_n+\alpha \varvec{\theta }_n^{*})\). To prove this lemma, it is sufficient to show that for any \(\alpha _0>0\), \(H'_n(\alpha _0)>0\) and \(H'_n(-\alpha _0)<0\) with probability tending to one.

Note that

$$\begin{aligned} H_n(\alpha _0)= & {} \frac{1}{2n}\Vert Y^*-(g_n^*+\alpha _0 h_n^*)(\varvec{X})\Vert ^2 +\sum _{j=1}^{d_n}P_1(\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert ;\lambda _{1})\\&+\sum _{j=1}^{d_n}P_2(\Vert \varvec{\theta }_{nj} +\alpha _0 \varvec{\theta }_{nj}^{*}\Vert ;\lambda _{2}). \end{aligned}$$

Then

$$\begin{aligned} H'_n(\alpha _0)= & {} -{\mathbb {P}}_n\Big [h_n^*\big (Y^{*}-g_n^*-\alpha _0 h_n^*\big )\Big ] \\&+\sum _{j=1}^{d_n}P'_1(\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert ;\lambda _{1}) \frac{(\varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*})^T \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})}{\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert } \\&+\sum _{j=1}^{d_n}P'_2(\Vert \varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*}\Vert ;\lambda _{2})\frac{\varvec{\theta }_{nj}^{*T}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})}{\Vert \varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*}\Vert } \\\triangleq & {} H_1+H_2+H_3. \end{aligned}$$

We consider the first part

$$\begin{aligned} H_1= & {} -{\mathbb {P}}_n\big [h_n^*(Y^*-g_n^*)\big ]+\alpha _0{\mathbb {P}}_n(h_n^*\cdot h_n^*) \\= & {} -{\mathbb {P}}_n\big [h_n^*(Y^*-g_n^*)\big ]+\alpha _0{\mathbb {P}}\Vert h_n^*(\varvec{X})\Vert ^2+\alpha _0({\mathbb {P}}_n-{\mathbb {P}})(h_n^*\cdot h_n^*) \\= & {} -{\mathbb {P}}_n\big [h_n^*(Y^*-g_n^*)\big ]+c_0 \alpha _0 d_n^2 n^{-\nu }+O_p(n^{-1/2}d_n^2), \end{aligned}$$

where \(c_0>0\) is a constant and the first term

$$\begin{aligned} {\mathbb {P}}_n\big [h_n^*(Y^*-g_n^*)\big ]= & {} ({\mathbb {P}}_n-{\mathbb {P}})\big [h_n^*(Y^*-g_n^*)\big ]+{\mathbb {P}}\big [h_n^*(Y^*-g_n^*)\big ] \\\triangleq & {} J_{1n}+J_{2n}. \end{aligned}$$

In \(J_{1n}\), \(\Vert Y^*-g_n^*\Vert _\infty =\Vert Y^*-g_0^*+g_0^*-g_n^*\Vert _\infty \le O_p(1)+O_p(d_n (n^{-\nu p}+n^{-(1-\nu )/2}))\). Since \(d_n^4/n\rightarrow 0,\ 0.25/p<\nu <0.5\), we have \(\Vert \displaystyle \frac{1}{d_n^2}h_n^*(Y^*-g_n^*)\Vert _\infty \le M_0\) with a constant \(M_0\). Let

$$\begin{aligned} \mu _0(\eta )=\left\{ \frac{1}{d_n^2}h_n^*\cdot (Y^*-g_n^*):\ \left\| \frac{1}{d_n}h_n^*\right\| \le \eta ,\left\| \frac{1}{d_n}(g_n^*-g_0^*)\right\| \le \eta \right\} . \end{aligned}$$

Then similar to Lemma A2 and Corollary A1 in Huang (1999), we have

$$\begin{aligned} \log N_\square (\varepsilon ,\mu _0(\eta ),L_2({\mathbb {P}}))\le c_0q_n\log (\eta /\varepsilon ), \end{aligned}$$

for any \(\varepsilon <\eta \) with a constant \(c_0\) and

$$\begin{aligned} J_\square (\eta ,\mu _0,L_2({\mathbb {P}}))\le c_0q_n^{1/2}\eta . \end{aligned}$$

Here we can take \(\eta =q_n^{-1/2}\). Combining the results of Lemma 3.4.2 in van der Vaart and Wellner (1996) and Lemma A1 in Huang (1999), we get

$$\begin{aligned} J_{1n}=O_p(1)\cdot d_n^2\cdot n^{-1/2}\Big (q_n^{1/2}\eta +\frac{q_n}{\sqrt{n}}M_0\Big )=O_p\big (n^{-1/2}d_n^2\big ). \end{aligned}$$

We then consider \(J_{2n}\) as

$$\begin{aligned} J_{2n}={\mathbb {P}}\big [h_n^*(g_n^*-g_0^*)\big ]=d_n^2\cdot {\mathbb {P}}\Big [\frac{h_n^*}{d_n}\cdot \frac{g_0^*-g_n^*}{d_n}\Big ], \end{aligned}$$

which gives that

$$\begin{aligned} |J_{2n}|\le O_p(1)\cdot d_n^2\cdot \big \Vert \frac{h_n^*}{d_n}\big \Vert \cdot \big \Vert \frac{g_0^*-g_n^*}{d_n}\big \Vert =O_p\big (d_n^2(n^{-(1/2+p)\nu }+n^{-1/2})\big ). \end{aligned}$$

Therefore,

$$\begin{aligned} H_1\ge c_0\alpha _0 d_n^2 n^{-\nu }+O_p(d_n^2 n^{-1/2})+O_p(d_n^2(n^{-(1/2+p)\nu }+n^{-1/2})). \end{aligned}$$

Next we focus on \(H_2\) and \(H_3\). Let \(\varvec{B}_j(X_j)=(\varvec{\psi }_{q_n,m}(X_{1j}),\ldots ,\varvec{\psi }_{q_n,m}(X_{nj}))^T\). By Lemma 3 of Huang and Ma (2010), it follows that there are constants \(0<c_3<c_4<\infty \) such that

$$\begin{aligned} c_3 q_n^{-1}\le \Lambda _{\min }\Big (\frac{\varvec{B}_j(X_j)^T \varvec{B}_j(X_j)}{n}\Big )\le \Lambda _{\max }\Big (\frac{\varvec{B}_j(X_j)^T \varvec{B}_j(X_j)}{n}\Big )\le c_4 q_n^{-1} \end{aligned}$$

with probability tending to one. Then we have \(\Vert \varvec{\theta }_{nj}^{*}\Vert =O_p(1)\) and \(\Vert \varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*}\Vert =O_p(1)\) by using of the fact that \(\Vert \varvec{\theta }_{nj}^{*T} \varvec{\psi }_{q_n,m}(X_j)\Vert =O(q_n^{-1/2})\). Observing that

$$\begin{aligned} |H_2|= & {} \left| \sum _{j=1}^{d_n}P'_1(\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert ;\lambda _{1}) \frac{(\varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*})^T \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})}{\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert }\right| \\\le & {} \sum _{j=1}^{d_n}P'_1(\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert ;\lambda _{1})\frac{\big |(\varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*})^T \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\big |}{\Vert \varvec{{C}_{\xi }}(\varvec{\theta }_{nj}+\alpha _0 \varvec{\theta }_{nj}^{*})\Vert }, \end{aligned}$$

by using of Condition (C9) and \(\lambda _{1}=o(d_n n^{-\nu })\), we have

$$\begin{aligned} |H_2| \le P'_1(0+;\lambda _{1})\sum _{j=1}^{d_n}\Vert \varvec{{C}_{\xi }}\varvec{\theta }_{nj}^{*}\Vert \le O(\lambda _{1})O_p(d_n)=o_p(d_n^2 n^{-\nu }). \end{aligned}$$

The same arguments as above give that \(|H_3|\le o_p(d_n^2 n^{-\nu })\) if \(\lambda _{2}=o(d_n n^{-\nu })\).

Consequently, \(H'_n(\alpha _0)\ge c_0\alpha _0 d_n^2 n^{-\nu }+o_p(d_n^2 n^{-\nu })>0\) with probability tending to one. Similarly, we can prove that \(H'_n(-\alpha _0)<0\) with probability tending to one. Therefore, the boundness of covariate \(\varvec{X}\) in Condition (C2) ensures that

$$\begin{aligned} \Vert \widehat{g}_n^*-g_n^*\Vert ^2=o_p(d_n^2 q_n^{-1})=o_p(d_n^2 n^{-\nu }). \end{aligned}$$

Subsequently, Lemma 7 of Stone (1986) yields that \(\displaystyle \Vert \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\Vert _{\infty }=o_p(1)\). \(\square \)

To verify the consistency of parameter estimation, we need the following lemma.

Lemma 3

Define \(m_0(x,y^*;g^*)=(y^*-g^*(x))^2/d_n^2\). Denote \(M_0={\mathbb {P}} m_0\) and \(\displaystyle M_n={\mathbb {P}}_n m_0=\frac{1}{n}\Vert Y^*-g^*(\varvec{X})\Vert ^2/d_n^2\). Under the conditions of Lemma 1, for any function \(g(\cdot )\) satisfying \(E[\delta g(\varvec{X})]=0\), there exists a constant \(c>0\) such that

$$\begin{aligned} {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=c\left\| \frac{1}{d_n}(g^*-g_n^*)\right\| ^2+O_p(n^{-2\nu p}+n^{-(1-\nu )}). \end{aligned}$$

Proof

Let \(h^*=g^*-g_0^*\) and

$$\begin{aligned} L(s)= & {} {\mathbb {P}} m_0(\cdot ;g_0^*+sh^*)-{\mathbb {P}} m_0(\cdot ;g_0^*) \\= & {} \frac{1}{d_n^2}\big [{\mathbb {P}}(Y^*-(g_0^*+sh^*))^2-{\mathbb {P}}(Y^*-g_0^*)^2\big ] \\= & {} \frac{1}{d_n^2}{\mathbb {P}}(-2sY^*h^*+2sg_0^* h^*+s^2 h^{*2}). \end{aligned}$$

Since \(L'(0)=0\) and \(L''(0)=2{\mathbb {P}}(h^{*2})/d_n^2\), there exists a constant \(c>0\), such that \(\displaystyle {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)=c\left\| \frac{1}{d_n}(g^*-g_0^*)\right\| ^2\). Similarly, we have

$$\begin{aligned} {\mathbb {P}} m_0(\cdot ;g_n^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)=O_p(1)\left\| \frac{1}{d_n}(g_n^*-g_0^*)\right\| ^2. \end{aligned}$$

By Lemma 1, \({\mathbb {P}} m_0(\cdot ;g_n^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)=O_p(n^{-2\nu p}+n^{-(1-\nu )})\). Combining the following equality

$$\begin{aligned} {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=\Big ({\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_0^*)\Big )+\Big ({\mathbb {P}} m_0(\cdot ;g_0^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)\Big ) \end{aligned}$$

with the triangle inequality

$$\begin{aligned} \Vert g^*-g_n^*\Vert ^2-\Vert g_n^*-g_0^*\Vert ^2\le \Vert g^*-g_0^*\Vert ^2\le \Vert g^*-g_n^*\Vert ^2+\Vert g_n^*-g_0^*\Vert ^2, \end{aligned}$$

we have

$$\begin{aligned} {\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=c\left\| \frac{1}{d_n}(g^*-g_n^*)\right\| ^2+O_p(n^{-2\nu p}+n^{-(1-\nu )}), \end{aligned}$$

where \(c>0\) is a finite constant. \(\square \)

Proof of Theorem 1

Let

$$\begin{aligned} V= & {} M_n(g^*)-M_n(g_n^*)-(M_0(g^*)-M_0(g_n^*)) \\= & {} {\mathbb {P}}_n m_0(\cdot ;g^*)-{\mathbb {P}}_n m_0(\cdot ;g_n^*)-({\mathbb {P}} m_0(\cdot ;g^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)) \\= & {} ({\mathbb {P}}_n-{\mathbb {P}})(m_0(\cdot ;g^*)-m_0(\cdot ;g_n^*)), \end{aligned}$$

By Lemma 3.4.2 of van der Vaart and Wellner (1996),

$$\begin{aligned} E\sup \limits _{\left\| \frac{1}{d_n}(g^*-g_n^*)\right\| \le \eta }|V|=n^{-1/2}\eta q_n^{1/2}. \end{aligned}$$

Then by Theorem 3.4.1 of van der Vaart and Wellner (1996), choosing the distance \(d(\widehat{g}_n^*,g_n^*)=-[{\mathbb {P}} m_0(\cdot ;\widehat{g}_n^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)]\) there, we have

$$\begin{aligned} -r_{1n}^2[{\mathbb {P}} m_0(\cdot ;\widehat{g}_n^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)]=O_p(1), \end{aligned}$$

where \(r_{1n}=O(n^{1/2}q_n^{-1/2})=O(n^{(1-\nu )/2})\). Therefore, \({\mathbb {P}} m_0(\cdot ;\widehat{g}_n^*)-{\mathbb {P}} m_0(\cdot ;g_n^*)=O_p(n^{-(1-\nu )})\). Thus Lemma 3 gives that \(\Vert \frac{1}{d_n}(\widehat{g}_n^*-g_n^*)\Vert ^2=O_p(n^{-2\nu p}+n^{-(1-\nu )})\). Combining the result in Lemma 1 that \(\Vert g_n^*-g_0^*\Vert _\infty ^2=O_p(d_n^2(n^{-2\nu p}+n^{-(1-\nu )}))\), we have

$$\begin{aligned} \Vert \widehat{g}_n^*-g_0^*\Vert ^2=O_p(d_n^2(n^{-2\nu p}+n^{-(1-\nu )})). \end{aligned}$$

By Conditions (C2)–(C4), it follows that

$$\begin{aligned} E\delta \big \Vert \varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n(\varvec{\varvec{X}}_{M_1})-\tilde{\varvec{\phi }}_0(\varvec{\varvec{X}}_{M_1}))+\varvec{\varvec{X}}_{M_2} (\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)\big \Vert ^2=O(d_n^2(n^{-2\nu p}+n^{-(1-\nu )})). \end{aligned}$$

Denoting the projection of \(\varvec{\varvec{X}}_{M_2}\) on \(\varvec{\varvec{X}}_{M_1}\) as W, we have

$$\begin{aligned}&E\delta \big \Vert (\varvec{\varvec{X}}_{M_2}-W)(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)+W(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)+\varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\big \Vert ^2 \\&\quad =E\delta \big \Vert (\varvec{\varvec{X}}_{M_2}-W)(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)\big \Vert ^2+E\delta \big \Vert W(\widehat{\varvec{\beta }}_n-\varvec{\beta }_0)+\varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\big \Vert ^2 \\&\quad =O(d_n^2(n^{-2\nu p}+n^{-(1-\nu )})). \end{aligned}$$

By Condition (C6), we obtain

$$\begin{aligned} \Vert \widehat{\varvec{\beta }}_n-\varvec{\beta }_0\Vert ^2=O_p(d_n^2 (n^{-(1-\nu )}+n^{-2\nu p})). \end{aligned}$$

This in turn implies \(E\delta \big \Vert \varvec{\varvec{X}}_{M_1} (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\big \Vert ^2=O_p(d_n^2 (n^{-(1-\nu )}+n^{-2\nu p}))\). Therefore,

$$\begin{aligned} \Vert \widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0\Vert ^2=O_p(d_n^2 (n^{-(1-\nu )}+n^{-2\nu p})). \end{aligned}$$

This completes the proof of Theorem 1. \(\square \)

Proof of Theorem 2

  1. (i)

    First, we prove the selection consistency of the variables. Let \(\widetilde{\varvec{\theta }}_n=(\widetilde{\varvec{\theta }}_{n1}^T,\ldots ,\widetilde{\varvec{\theta }}_{nd_n}^T)^T\) with

    $$\begin{aligned} \widetilde{\varvec{\theta }}_{nj}= \left\{ \begin{array}{ll} \widehat{\varvec{\theta }}_{nj},&{}\text{ if } j\notin M_3\text{, }\\ 0,&{}\text{ if } j\in M_3\text{. } \end{array}\right. \end{aligned}$$

    Note that \(\widehat{\varvec{\theta }}_n\) satisfies \(\displaystyle \frac{\partial Q_n(\widehat{\varvec{\theta }}_n)}{\partial \varvec{\theta }}=\varvec{0}\). By the definition of \(\widehat{\varvec{\theta }}_n\) and \(\widetilde{\varvec{\theta }}_n\), we have

    $$\begin{aligned}&Q_n(\widehat{\varvec{\theta }}_n)-Q_n(\widetilde{\varvec{\theta }}_n) \\= & {} \frac{\partial Q_n(\widehat{\varvec{\theta }}_{n})^T}{\partial \varvec{\theta }}(\widehat{\varvec{\theta }}_{n}-\widetilde{\varvec{\theta }}_n)-\frac{1}{2}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)^T\frac{\partial ^2 Q_n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }\partial \varvec{\theta }^T}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)\\= & {} -\frac{1}{2}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)^T\frac{\partial ^2 Q_n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }\partial \varvec{\theta }^T}(\widehat{\varvec{\theta }}_n-\widetilde{\varvec{\theta }}_n)\\= & {} -\frac{1}{2}\widehat{\varvec{\theta }}_{nM_3}^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }_{M_3}\partial \varvec{\theta }_{M_3}^T}\widehat{\varvec{\theta }}_{nM_3}-\frac{1}{2}\sum \limits _{j\in M_3}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}) \\&-\frac{1}{2}\sum \limits _{j\in M_3}\widehat{\varvec{\theta }}_{nj}^T\big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big )\widehat{\varvec{\theta }}_{nj}, \end{aligned}$$

    where \(\varvec{\theta }_n^*\) is between \(\widehat{\varvec{\theta }}_n\) and \(\widetilde{\varvec{\theta }}_n\).

    Since \( \widehat{\varvec{\theta }}_n\) is the minimizer of \(Q(\varvec{\theta })\), we have \(Q(\widehat{\varvec{\theta }}_n)\le Q(\widetilde{\varvec{\theta }}_n)\), which implies that

    $$\begin{aligned} \frac{1}{2}\widehat{\varvec{\theta }}_{nM_3}^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^*)}{\partial \varvec{\theta }_{M_3}\partial \varvec{\theta }_{M_3}^T}\widehat{\varvec{\theta }}_{nM_3}\ge & {} -\frac{1}{2}\sum \limits _{j\in M_3}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})\nonumber \\&-\frac{1}{2}\sum \limits _{j\in M_3}\widehat{\varvec{\theta }}_{nj}^T\big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big )\widehat{\varvec{\theta }}_{nj}. \end{aligned}$$
    (15)

    Note that the left hand of Eq (15)

    $$\begin{aligned} I_1\le c \widehat{\varvec{\theta }}_{nM_3}^T E(X_{M_3}^TX_{M_3})\widehat{\varvec{\theta }}_{nM_3}\le c\rho _n^*\Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^2 \end{aligned}$$

    for some constant c by the continuity of the B-spline functions and the definition of \(\rho _n^*\). And using Condition (C9), there exist constants ab and c such that the right hand of Eq (15)

    $$\begin{aligned} I_2\ge c (\lambda _{1}^a+\lambda _{2}^a) \Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^{2-b}. \end{aligned}$$

    Thus, by the results of Theorem 1, we obtain that

    $$\begin{aligned} O_p(1)(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}\ge \Vert \widehat{\varvec{\theta }}_{nM_3}\Vert ^{b}\ge O_p(1)\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*}. \end{aligned}$$

    This shows that under the condition that \(\displaystyle \frac{\lambda _{1}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}\) and \(\displaystyle \frac{\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}} \) goes to infinity,

    $$\begin{aligned} P(\Vert \widehat{\varvec{\theta }}_{nM_3}\Vert >0)\le P\Big (\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}\le O_p(1)\Big )\rightarrow 0. \end{aligned}$$

    Next, we prove the structure selection consistency. Assume that \({\mathop {\varvec{\theta }}\limits _{\sim }}_{n}=({\mathop {\varvec{\theta }}\limits _{\sim }}_{n1}^{T},\ldots ,{\mathop {\varvec{\theta }}\limits _{\sim }}_{nd_{n}}^T)^T\) with

    $$\begin{aligned} {\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_{2}}= \left\{ \begin{array}{ll} \widehat{\varvec{\theta }}_{nj},&{}\text{ if } j\notin M_2\text{, }\\ {\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}, s.t. \ \varvec{{C}_{\xi }}{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}=0,&{}\text{ if } j\in M_2\text{. } \end{array}\right. \end{aligned}$$

    Then we have

    $$\begin{aligned}&\frac{1}{2}\Big (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2}\Big )^T \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\Big (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2}\Big )\nonumber \\\ge & {} -\frac{1}{2}\sum \limits _{j\in M_2}(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})^T\big (P''_1(\Vert \varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})+o_p(1)\big )(\varvec{{C}_{\xi }}\widehat{\varvec{\theta }}_{nj})\nonumber \\&-\frac{1}{2}\sum \limits _{j\in M_2}\Big (\widehat{\varvec{\theta }}_{nj}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}\Big )^T \big (P''_2(\Vert \widehat{\varvec{\theta }}_{nj}\Vert ;\lambda _{2})+o_p(1)\big ) \Big (\widehat{\varvec{\theta }}_{nj}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nj}\Big ), \end{aligned}$$
    (16)

    where \(\varvec{\theta }_n^0\) is between \(\widehat{\varvec{\theta }}_n\) and \({\mathop {\varvec{\theta }}\limits _{\sim }}_n\). The left hand of equation (16)

    $$\begin{aligned} II_1\le & {} O_p(1)\Big (\varvec{C}_\xi (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2})\Big )^T \cdot \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\cdot \Big (\varvec{C}_\xi (\widehat{\varvec{\theta }}_{nM_2}-{\mathop {\varvec{\theta }}\limits _{\sim }}_{nM_2})\Big )\\= & {} O_p(1)(\varvec{C}_\xi \widehat{\varvec{\theta }}_{nM_2})^T \cdot \frac{\partial ^2 \ell _n(\varvec{\theta }_n^0)}{\partial \varvec{\theta }_{M_2}\partial \varvec{\theta }_{M_2}^T}\cdot (\varvec{C}_\xi \widehat{\varvec{\theta }}_{nM_2})\\\le & {} O_p(1)\rho _n^*\Vert \widehat{\varvec{\theta }}_{nM_2}\Vert ^2. \end{aligned}$$

    Similarly, we can obtain that the right hand of equation (16)

    $$\begin{aligned} II_2\ge c (\lambda _{1}^a+\lambda _{2}^a) \Vert \widehat{\varvec{\theta }}_{nM_2}\Vert ^{2-b}. \end{aligned}$$

    Therefore,

    $$\begin{aligned} P(\Vert \widehat{\varvec{\theta }}_{nM_2}\Vert >0)\le P\Big (\frac{\lambda _{1}^a+\lambda _{2}^a}{\rho _n^*(d_n^2(n^{-(1-\nu )}+n^{-2\nu p}))^{b/2}}\le O_p(1)\Big )\rightarrow 0. \end{aligned}$$

    The selection consistency of variable and structure is concluded.

  2. (ii)

    Let the column and row vectors of covariate matrix \(\varvec{X}^*\) are \(X_1^*,\ldots , X_{d_n}^*\) and \(X_{(1)}^*,\ldots , X_{(n)}^*\), respectively. Define

    $$\begin{aligned} \overline{X}_w=\frac{\sum \nolimits _{i=1}^n \omega _i X_{(i)}}{\sum \nolimits _{i=1}^n \omega _i}, \quad X_{(i)}^*=(n\omega _i)^{1/2}(X_{(i)}-\overline{X}_w), \end{aligned}$$
    $$\begin{aligned} U(\varvec{W};\varvec{\beta },\widehat{\varvec{\phi }}_n)\triangleq (-\varvec{X}_{M_2}^{*})\Big (Y^*-\sum \limits _{j\in M_1}\hat{g}_{nj}^*(X_{j})-\varvec{X}_{M_2}^* \varvec{\beta }\Big ), \end{aligned}$$
    $$\begin{aligned} \widehat{U}_n(\varvec{\beta })\triangleq \frac{1}{n}\sum \limits _{i=1}^{n}U(\varvec{W}_i;\varvec{\beta },\widehat{\varvec{\phi }}_n), \end{aligned}$$

    with \(\varvec{W}\triangleq (\omega ,\varvec{X},Y)\). Then \(\widehat{\varvec{\beta }}_n\) satisfies the estimating equation \(\widehat{U}_n(\widehat{\varvec{\beta }})=0\) by the definition of \(\widehat{\varvec{\beta }}_n\) and \(\widehat{\varvec{\phi }}_n\).

    Let \(\displaystyle U_n(\varvec{\beta })\triangleq \frac{1}{n}\sum \nolimits _{i=1}^{n}U(\varvec{W}_i;\varvec{\beta },\tilde{\varvec{\phi }}_0)\) and \(\widetilde{\varvec{\beta }}_n\) be the root of \(U_n(\varvec{\beta })=0\). We then show that \(\widehat{\varvec{\beta }}_n\) has the same distribution with \(\widetilde{\varvec{\beta }}_n\). The Fréchet derivative of \(U(\varvec{W};\varvec{\beta }_0,\varvec{\phi })\) at \(\tilde{\varvec{\phi }}_0\) in the direction \(\varvec{h}\) is given as

    $$\begin{aligned} D(\varvec{W},\varvec{h})= & {} \lim \limits _{\alpha \rightarrow 0}\frac{U(\varvec{W};\varvec{\beta }_0,\tilde{\varvec{\phi }}_0+\alpha \varvec{h})-U(\varvec{W};\varvec{\beta }_0,\tilde{\varvec{\phi }}_0)}{\alpha } \\= & {} \varvec{X}_{M_2}^{*T}\varvec{X}_{M_1}^{*} \varvec{h}, \end{aligned}$$

    with \(\varvec{h}\in \{h_1+\ldots +h_{|M_1|},h_j\in \tilde{{\mathcal {H}}},\ j\in M_1\}\).

    The relation \(\Vert (\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)/d_n\Vert =O_p(n^{-(1-\nu )/2}+n^{-\nu p})=o_p(n^{-1/4})\) ensures that the linear assumption 5.1 in Newey (1994) is satisfied. Then by Lemma 3.4.2 of van der Vaart and Wellner (1996), we have

    $$\begin{aligned} \sqrt{n}({\mathbb {P}}_n-{\mathbb {P}})\{D(\varvec{W};\widehat{\varvec{\phi }}_n-\tilde{\varvec{\phi }}_0)\}\xrightarrow {P}0. \end{aligned}$$

    It follows that the stochastic equicontinuity assumption 5.2 holds. For \(\varvec{\phi }\) close enough to \(\tilde{\varvec{\phi }}_0\), a straightforward calculation yields that \(ED(\varvec{W};\varvec{\phi }-\tilde{\varvec{\phi }}_0)=0\) by using Condition (C8). Then the mean square continuity assumption 5.3 holds with \(\alpha (\varvec{W})=0\). By Lemma 5.1 of Newey (1994), \(\widehat{\varvec{\beta }}_n\) and \(\widetilde{\varvec{\beta }}_n\) have the same distribution.

    Next, we seek for the asymptotic distribution of \(\widetilde{\varvec{\beta }}_{nM_2}\). Let \(\iota _n=n^{-1/2},\ V_{1n}(\varvec{a})=Q_n(\varvec{\beta }_0+\iota _n(\varvec{a}^T,\varvec{0}^T)^T,\tilde{\varvec{\phi }}_0)-Q_n(\varvec{\beta }_0,\tilde{\varvec{\phi }}_0)\), where \(\varvec{a}=(a_1,\ldots ,a_{|M_2|})^T\) is a \(|M_2|\)-dimensional constant vector and \(\varvec{0}\) is a \(|M_3|\)-dimensional zero vector. By part (i) of Theorem 2, \(\widetilde{\varvec{\beta }}_n-\varvec{\beta }_0=\iota _n(\widehat{\varvec{a}}_n^T,\varvec{0}^T)^T\) with probability converging to one, where \(\widehat{\varvec{a}}_n=\text {argmin}\{V_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}\). Letting \(\widetilde{\varvec{\theta }}_n\) be the estimator corresponding to \(\widetilde{\varvec{\beta }}_n\), then similar to Theorem 2 (i), we also have \(\varvec{{C}_{\xi }}\widetilde{\varvec{\theta }}_{nj}=0\ (j\in M_2)\) with probability converging to one.

    Note that

    $$\begin{aligned} V_{1n}(\varvec{a})= & {} Q_n(\varvec{\beta }_0+\iota _n(\varvec{a}^T,\varvec{0}^T)^T,\tilde{\varvec{\phi }}_0)-Q_n(\varvec{\beta }_0,\tilde{\varvec{\phi }}_0) \\= & {} \Big (\iota _n \varvec{a}^T U_n(\varvec{\beta }_0)+ \frac{\iota _n^2}{2} \varvec{a}^T U_n'(\varvec{\beta }_0)\varvec{a}\Big ) \\&+\left( \sum _{j\in M_2}P_1(\Vert \varvec{{C}_{\xi }}\widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{1})-\sum _{j\in M_2}P_1(\Vert \varvec{{C}_{\xi }}\varvec{\theta }_{0j}\Vert ;\lambda _{1})\right) \\&+\left( \sum _{j\in M_2}P_2(\Vert \widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{2n}) -\sum _{j\in M_2}P_2(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n})\right) \\\triangleq & {} A_{1n}(\varvec{a})+A_{2n}(\varvec{a})+A_{3n}(\varvec{a}). \end{aligned}$$

    Since \(\widetilde{\varvec{\beta }}_{n M_2}-\varvec{\beta }_{0}=\iota _n\varvec{a}=\varvec{\psi }_{q_n,m}(\varvec{X}_{M_2})^T(\widetilde{\varvec{\theta }}_{n M_2}-\varvec{\theta }_{0 M_2})\), we have

    $$\begin{aligned} \widetilde{\varvec{\theta }}_{nj}-\varvec{\theta }_{0j}=(\varvec{\psi }_{q_n,m}(X_j)\varvec{\psi }_{q_n,m}(X_j)^T)^{-1}\varvec{\psi }_{q_n,m}(X_j)\iota _na_j,\ j=s_1+1,\ldots ,s_2. \end{aligned}$$

    It follows that

    $$\begin{aligned} A_{3n}(\varvec{a})= & {} \sum _{j\in M_2}P_2(\Vert \widetilde{\varvec{\theta }}_{nj}\Vert ;\lambda _{2n})-\sum _{j\in M_2}P_2(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n}) \\= & {} \sum _{j\in M_2}\left[ P_2'(\Vert \varvec{\theta }_{0j}\Vert ;\lambda _{2n})\frac{\varvec{\theta }_{0j}^T}{\Vert \varvec{\theta }_{0j}\Vert }+o_p(1)\right] \\&\big [(\varvec{\psi }_{q_n,m}(X_j)\varvec{\psi }_{q_n,m}(X_j)^T)^{-1}\varvec{\psi }_{q_n,m}(X_j)\iota _na_j\big ]. \end{aligned}$$

    By Condition (C7), we have

    $$\begin{aligned} |A_{3n}(\varvec{a})|\le & {} d_n P_2'(0+;\lambda _{2n}) \sqrt{\Vert ({\varvec{\psi }}_{q_{n,m}}(X_j){\varvec{\psi }}_{q_{n,m}}(X_j)^T)^{-1}\Vert }\iota _n a_j\\= & {} O_p(d_n^2 n^{-\nu }) O_p(n^{-(1-\nu )/2})=o_p(1). \end{aligned}$$

    Similarly, we can get that \(A_{2n}(\varvec{a})\xrightarrow {p}0\).

    Hence, \(\widehat{\varvec{a}}_n=\text {argmin}\{V_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}=\text {argmin}\{A_{1n}(\varvec{a}): \varvec{a}\in {\mathbb {R}}^{|M_2|}\}\) and so we only care about the minimum of \(nA_{1n}(\varvec{a})\). Similar to Huang et al. (2010), we have

    $$\begin{aligned} nA_{1n}(\varvec{a})= & {} \varvec{a}^T \big (\sqrt{n}U_n(\varvec{\beta }_0)\big )+ \frac{1}{2}\varvec{a}^T U_n'(\varvec{\beta }_0)\varvec{a}\\\triangleq & {} \varvec{a}^T T_1+ \varvec{a}^T T_2 \varvec{a}. \end{aligned}$$

    It can be seen that \(T_2 \xrightarrow {p} \Sigma _2\) and \(\varvec{u}\Sigma _3^{-1/2} T_1\) is distributed asymptotically by N(0, 1) for any \(\varvec{u}\in {\mathbb {R}}^{|M_2|}\) with \(\Vert \varvec{u}\Vert =1\) , where \(\Sigma _3=Var(\delta \gamma _0(Y)(Y-g_0(\varvec{X}))\varvec{X}_{M_2}+(1-\delta )\gamma _1(Y)-\gamma _2(Y))\) with the following notations that

    $$\begin{aligned}&\tilde{H}^{11}(\varvec{x},y)=P(\varvec{X}\le \varvec{x}, Y\le y, \delta =1), \quad \tilde{H}^0=P(Y\le y,\delta =0),\\&\gamma _0(y)=\exp \Bigg (\int _0^{y-}\frac{\tilde{H}^0(dw)}{1-H(w)}\Bigg ),\\&\gamma _{1,j}(y)=\frac{1}{1-H(y)}\int I(z>y)e(z,\varvec{x})x_j\gamma _0(z)\tilde{H}^{11}(d\varvec{x},dz),\\&\gamma _{2,j}(y)=\iint \frac{I(v<y,v<z)e(z,\varvec{x})x_j\gamma _0(z)}{[1-H(v)]^2}\tilde{H}^0(dv)\tilde{H}^{11}(d\varvec{x},dz),\\&\gamma _l(y)=(\gamma _{l,j}; j\in M_2), ~~ l=1,2. \end{aligned}$$

    Let \(\hat{\varvec{a}}=\text {argmin}\{V_1(\varvec{a})=\varvec{a}^T T_1+\frac{1}{2}\varvec{a}^T \Sigma _2 \varvec{a}: \varvec{a}\in {\mathbb {R}}^{|M_2|}\}\). According to the continuous mapping theorem of Kim and Pollard (1990), \(\sqrt{n} \varvec{u}\Sigma ^{-1/2}(\widehat{\varvec{\beta }}_{nM_2}-\varvec{\beta }_{0})\) has the same asymptotical distribution as \(\varvec{u}\Sigma ^{-1/2}\hat{\varvec{a}}\xrightarrow {d} N(0,1)\) for any \(\varvec{u}\in {\mathbb {R}}^{|M_2|}\) with \(\Vert \varvec{u}\Vert =1\), where \(\Sigma =\Sigma _2^{-1}\Sigma _3\Sigma _2^{-1}\). This completes the proof of Theorem 2.

\(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, L., Wang, H., Liu, Y. et al. Model pursuit and variable selection in the additive accelerated failure time model. Stat Papers 62, 2627–2659 (2021). https://doi.org/10.1007/s00362-020-01205-0

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-020-01205-0

Keywords

Mathematics Subject Classification

Navigation