Robust variable selection for the varying index coefficient models

Zou, Hang; Jiang, Yunlu

doi:10.1007/s42952-023-00221-8

Robust variable selection for the varying index coefficient models

Research Article
Published: 07 August 2023

Volume 52, pages 767–793, (2023)
Cite this article

Journal of the Korean Statistical Society Aims and scope Submit manuscript

244 Accesses
Explore all metrics

Abstract

Recently, the statistical inference of the varying index coefficient model has been widely concerned. However, to the best of our knowledge, there has no existing robust variable selection method for the varying index coefficient model in the presence of outliers in the response and covariates. To overcome this difficulty, we develop a robust variable selection method for the varying index coefficient model via the exponential squared loss (ESL) function in this article. We first approximate nonparametric functions by B-spline basis functions and then apply the minorization-maximization (MM) algorithm and the Fisher scoring algorithm to calculate the proposed estimators. Under some mild conditions, the theoretical properties of the proposed estimators are established. Furthermore, we propose a data-driven procedure to select the tuning parameters. Some numerical simulations are conducted to illustrate the finite sample performance of the proposed method. Finally, the analysis of New Zealand workforce data reveals the merit of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression

Article 28 December 2015

Variable selection for generalized varying coefficient models with longitudinal data

Article 29 November 2014

Variable selection for the partial linear single-index model

Article 08 April 2017

Data availability

The data that support the findings of this study are openly available in R package VGAMdata.

References

Cai, Z., Fan, J., & Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association, 95(451), 888–902.
Article MathSciNet MATH Google Scholar
Carroll, R. J., Fan, J., Gijbels, I., & Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489.
Article MathSciNet MATH Google Scholar
De Boor, C. (2001). A practical guide to splines. Springer.
MATH Google Scholar
Doukhan, P., Massart, P., & Rio, E. (1995). Invariance principles for absolutely regular empirical processes. Annales de l'IHP Probabilités et statistiques, 31(2), 393–427.
MathSciNet MATH Google Scholar
Fan, J., & Jiang, J. (2005). Nonparametric inferences for additive models. Journal of the American Statistical Association, 100(471), 890–907.
Article MathSciNet MATH Google Scholar
Fan, J., & Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32(3), 928–961.
Article MathSciNet MATH Google Scholar
Fan, J., & Zhang, W. (2008). Statistical methods with varying coefficient models. Statistics and Its Interface, 1, 179–195.
Article MathSciNet MATH Google Scholar
Hastie, T., & Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4), 757–779.
MathSciNet MATH Google Scholar
Hasti, T., & Tibshirani, R. (1986). Generalized additive models. Journal of the Royal Statistical Society: Series B (Methodological), 1(3), 297–318.
MathSciNet MATH Google Scholar
Huang, J., Horowitz, J. L., & Wei, F. (2010). Variable selection in nonparametric additive models. Annals of Statistics, 38(4), 2282–2313.
Article MathSciNet MATH Google Scholar
Hunter, D. R., & Lange, K. (2004). A tutorial on mm algorithms. The American Statistician, 58(1), 30–37.
Article MathSciNet Google Scholar
Jiang, Y., Tian, G.-L., & Fei, Y. (2019). A robust and efficient estimation method for partially nonlinear models via a new mm algorithm. Statistical Papers, 60(6), 2063–2085.
Article MathSciNet MATH Google Scholar
Liang, H., Liu, X., Li, R., & Tsai, C.-L. (2010). Estimation and testing for partially linear single-index models. Annals of Statistics, 38(6), 3811–3836.
Article MathSciNet MATH Google Scholar
Lv, J., & Li, J. (2022). High-dimensional varying index coefficient quantile regression model. Statistica Sinica, 32(2), 673–694.
MathSciNet MATH Google Scholar
Lv, J., Yang, H., & Guo, C. (2016). Robust estimation for varying index coefficient models. Computational Statistics, 31(3), 1131–1167.
Article MathSciNet MATH Google Scholar
Ma, S., & Song, P.X.-K. (2015). Varying index coefficient models. Journal of the American Statistical Association, 110(509), 341–356.
Article MathSciNet MATH Google Scholar
Ma, S., & Xu, S. (2015). Semiparametric nonlinear regression for detecting gene and environment interactions. Journal of Statistical Planning and Inference, 156, 31–47.
Article MathSciNet MATH Google Scholar
Ma, S., & Yang, L. (2011). Spline-backfitted kernel smoothing of partially linear additive model. Journal of Statistical Planning and Inference, 141, 204–219.
Article MathSciNet MATH Google Scholar
Na, S., Yang, Z., Wang, Z., & Kolar, M. (2019). High-dimensional varying index coefficient models via stein’s identity. Journal of Machine Learning Research, 20(152), 1–44.
MathSciNet MATH Google Scholar
Neykov, N., Čížek, P., Filzmoser, P., & Neytchev, P. (2012). The least trimmed quantile regression. Computational Statistics & Data Analysis, 56(6), 1757–1770.
Article MathSciNet MATH Google Scholar
Schumaker, L. (1981). Spline functions: Basic theory. Willey.
MATH Google Scholar
Song, Y., Jian, L., & Lin, L. (2016). Robust exponential squared loss-based variable selection for high-dimensional single-index varying-coefficient model. Journal of Computational and Applied Mathematics, 308, 330–345.
Article MathSciNet MATH Google Scholar
Wang, X., Jiang, Y., Huang, M., & Zhang, H. (2013). Robust variable selection with exponential squared loss. Journal of the American Statistical Association, 108(502), 632–643.
Article MathSciNet MATH Google Scholar
Wang, H., Li, B., & Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 671–683.
Article MathSciNet MATH Google Scholar
Wang, K., & Lin, L. (2016). Robust structure identification and variable selection in partial linear varying coefficient models. Journal of Statistical Planning and Inference, 174, 153–168.
Article MathSciNet MATH Google Scholar
Wang, L., Liu, X., Liang, H., & Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. Annals of Statistics, 39(4), 1827–1851.
Article MathSciNet MATH Google Scholar
Whang, Y. J. (2006). Smoothed empirical likelihood methods for quantile regression models. Econometric Theory, 22(2), 173–205.
Article MathSciNet MATH Google Scholar
Xia, Y., & Härdle, W. (2006). Semi-parametric estimation of partially linear single-index models. Journal of Multivariate Analysis, 97(5), 1162–1184.
Article MathSciNet MATH Google Scholar
Yao, W., & Li, L. (2014). A new regression model: modal linear regression. Scandinavian Journal of Statistics, 41(3), 656–671.
Article MathSciNet MATH Google Scholar
Yu, Y., & Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97(460), 1042–1054.
Article MathSciNet MATH Google Scholar
Zhao, W., Zhang, R., Liu, J., & Lv, Y. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66(1), 165–191.
Article MathSciNet MATH Google Scholar

Download references

Funding

Jiang’s research is partially supported by NSFC (12171203), the Fundamental Research Funds for the Central Universities (23JNQMX21) and the Natural Science Foundation of Guangdong (2022A1515010045).

Author information

Authors and Affiliations

Department of Statistics & Data Science, College of Economics, Jinan University, Guangzhou, 510632, China
Hang Zou & Yunlu Jiang

Authors

Hang Zou
View author publications
You can also search for this author in PubMed Google Scholar
Yunlu Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yunlu Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Lemma 1

Under conditions (C1)–(C4), $N_{n} \rightarrow \infty$ and $n N_{n}^{-1} \rightarrow \infty$, as $n \rightarrow \infty$, we have (i) $|{\hat{m}}_{l}\left( u_{l}, \varvec{\beta }^{0}\right) -m_{l}\left( u_{l}\right) |=O_{p}\left( \sqrt{N_{n} / n}+N_{n}^{-r}\right)$ uniformly for any $u_{l} \in$ [0, 1]; and (ii) under $N_{n} \rightarrow \infty$ and n $N_{n}^{-3} \rightarrow \infty$, as $n \rightarrow \infty ,|\hat{{\dot{m}}}_{l}\left( u_{l}, \varvec{\beta }^{0}\right) -{\dot{m}}_{l}\left( u_{l}\right) |=$ $O_{p}\left( \sqrt{N_{n}^{3} / n}+N_{n}^{-r+1}\right)$ uniformly for any $u_{l} \in [0,1]$.

Proof of Lemma 1

Denote $\quad {\textbf{m}}=\left\{ m\left( {\textbf{Z}}_1, {\textbf{X}}_1, \varvec{\beta }^0\right) \right.$, $\left. \ldots , m\left( {\textbf{Z}}_n, {\textbf{X}}_n, \varvec{\beta }^0\right) \right\} ^{\textrm{T}}$. By (5), ${\widehat{\lambda }}(\varvec{\beta })$ can be decomposed into $\hat{\lambda }(\varvec{\beta })={\widehat{\lambda }}_m(\varvec{\beta }) +{\widehat{\lambda }}_{\epsilon }(\varvec{\beta })$, where

$$\begin{aligned} \begin{aligned} {\widehat{\lambda }}_m(\varvec{\beta })&=\left\{ {\textbf{D}}(\varvec{\beta })^{\textrm{T}} {\textbf{D}}(\varvec{\beta })\right\} ^{-1} {\textbf{D}}(\varvec{\beta })^{\textrm{T}} {\textbf{m}}, \\ {\widehat{\lambda }}_\epsilon (\varvec{\beta })&\quad =\left\{ {\textbf{D}}(\varvec{\beta })^{\textrm{T}} {\textbf{D}}(\varvec{\beta })\right\} ^{-1} {\textbf{D}}(\varvec{\beta })^{\textrm{T}}({\textbf{Y}}-{\textbf{m}}). \end{aligned} \end{aligned}$$

Let ${\widehat{\lambda }}_\epsilon (\varvec{\beta })=\left\{ {\widehat{\lambda }}_{1, \epsilon }(\varvec{\beta })^{\textrm{T}}, \ldots , \hat{\lambda }_{d, \epsilon }(\varvec{\beta })^{\textrm{T}}\right\} ^{\textrm{T}}$, where $\hat{\lambda }_{l, \epsilon }(\varvec{\beta })=$ $\left\{ {\widehat{\lambda }}_{s, l, \epsilon }(\varvec{\beta }): 1 \le s \le J_n\right\} ^{\textrm{T}}$ and ${\widehat{\lambda }}_m(\varvec{\beta })=\left\{ {\widehat{\lambda }}_{1, m}(\varvec{\beta })^{\textrm{T}}, \ldots , {\widehat{\lambda }}_{d, m}(\varvec{\beta })^{\textrm{T}}\right\} ^{\textrm{T}}$, where ${\widehat{\lambda }}_{l, m}(\varvec{\beta })=\left\{ {\widehat{\lambda }}_{s, l, m}(\varvec{\beta }): 1 \le s \le J_n\right\} ^{\textrm{T}}$. Thus,

$$\begin{aligned} {\widehat{m}}_l\left( u_l, \varvec{\beta }\right) ={\widehat{m}}_{l, \epsilon }\left( u_l, \varvec{\beta }\right) +{\widehat{m}}_{l, m}\left( u_l, \varvec{\beta }\right) , \end{aligned}$$

where

$$\begin{aligned} {\widehat{m}}_{l, \epsilon }\left( u_l, \varvec{\beta }\right) ={\textbf{B}}_r\left( u_l\right) ^{\textrm{T}} {\widehat{\lambda }}_{l, \epsilon }(\varvec{\beta }) \text{ and } {\widehat{m}}_{l, m}\left( u_l, \varvec{\beta }\right) ={\textbf{B}}_r\left( u_l\right) ^{\textrm{T}} {\widehat{\lambda }}_{l, m}(\varvec{\beta }) \text{. } \end{aligned}$$

For $m_l$ satisfying Condition (C2), there is a function $m_l^0\left( u_l\right) ={\textbf{B}}_r\left( u_l\right) ^{\textrm{T}} \varvec{\lambda }_l$, such that

$$\begin{aligned} \sup _{u_l \in [0,1]}\left|m_l^0\left( u_l\right) -m_l\left( u_l\right) \right|=O_{p}\left( J_n^{-r}\right) . \end{aligned}$$

(17)

From the similar arguments as Lemma A.1 of Ma and Xu (2015), we have

$$\begin{aligned} {\hat{m}}_{l, \varepsilon }\left( u_l, \varvec{\beta }^0\right) =O_p\left( \sqrt{N_n / n}\right) , \sup _{u_l \in [0,1]}\left|{\hat{m}}_{l, m}\left( u_l, \varvec{\beta }^0\right) \right|=O_p\left( N_n^{-r}\right) . \end{aligned}$$

By Taylor expansion, for $1 \le l \le d$,

$$\begin{aligned} \begin{aligned}&{\hat{m}}_l\left( u_l, \varvec{\beta }^0\right) -m_l^0\left( u_l, \varvec{\beta }^0\right) \\&\quad ={\varvec{1}}_l^T {\varvec{B}}({\varvec{u}})\left\{ \hat{\lambda }\left( \varvec{\beta }^0\right) -\lambda ^0\right\} \\&\quad ={\varvec{1}}_l^T {\varvec{B}}({\varvec{u}}) \varvec{\Omega }_n\left( \varvec{\beta }^0\right) ^{-1}\left[ \frac{1}{n} \sum _{i=1}^n {\dot{\phi }}_{\tau }\left\{ Y_i-\sum _{l=1}^d {\varvec{B}}_q\left( U_{i l}\left( \varvec{\beta }_l^0\right) \right) ^T \lambda _l^0 X_{i l}\right\} \right. \\&\quad \left. {\varvec{D}}_i\left( \varvec{\beta }^0\right) \right] \left[ 1+o_p(1)\right] \\&\quad = {\varvec{1}}_l^T {\varvec{B}}(u) \varvec{\Omega }_n\left( \varvec{\beta }^0\right) ^{-1}\left[ \frac{1}{n} \sum _{i=1}^n {\dot{\phi }}_{\tau }\left\{ Y_i-\sum _{l=1}^d m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l}\right\} \right. \\&\quad \left. {\varvec{D}}_i\left( \varvec{\beta }^0\right) \right] \left[ 1+o_p(1)\right] \\&\quad +{\varvec{1}}_l^T {\varvec{B}}(u) \varvec{\Omega }_n\left( \varvec{\beta }^0\right) ^{-1} \frac{1}{n} \sum _{i=1}^n \ddot{\phi }_{\tau }\left\{ Y_i-\sum _{l=1}^a m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l}\right\} \\&\quad \times \left\{ \sum _{l=1}^d\left( m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) -{\varvec{B}}_q\left( U_{i l}\left( \varvec{\beta }_l^0\right) \right) ^T \lambda _l^0\right) X_{i l}\right\} {\varvec{D}}_i\left( \varvec{\beta }^0\right) \left[ 1+o_p(1)\right] \\&\quad = \left\{ {\hat{m}}_{l, \varepsilon }\left( u_l, \varvec{\beta }^0\right) +{\hat{m}}_{l, m}\left( u_l, \varvec{\beta }^0\right) \right\} \left\{ 1+o_p(1)\right\} \\&\quad = O_p\left( \sqrt{N_n / n}\right) , \end{aligned} \end{aligned}$$

(18)

where ${\varvec{1}}_l$ is the $d \times 1$ vector with the l-th element as “1" and other elements as “0", ${\varvec{B}}({\varvec{u}})=diag\left( {\varvec{B}}_q\left( u_1\right) ^T, {\varvec{B}}_q\left( u_2\right) ^T,\ldots ,{\varvec{B}}_q\left( u_d\right) ^T\right) _{d \times d J_n}$ with ${\varvec{u}}=\left( u_1, \ldots , u_d\right) ^T$, $\varvec{\Omega }_n\left( \varvec{\beta }^0\right) =n^{-1} \sum _{i=1}^n \ddot{\phi }_{\tau }\left\{ Y_i-\sum _{l=1}^d m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l}\right\} {\varvec{D}}_i\left( \varvec{\beta }^0\right) {\varvec{D}}_i\left( \varvec{\beta }^0\right) ^T$,

${\varvec{D}}_i\left( \varvec{\beta }^0\right) =\left( D_{i, s l}\left( \varvec{\beta }_l^0\right) , 1 \le s \le J_n, 1 \le l \le d\right) ^T$ and $D_{i, s l}\left( \varvec{\beta }_l^0\right) =$ $B_{s, q}\left( U_{i l}\left( \varvec{\beta }_l^0\right) \right) X_{i l}$.

Combing (17) and (18), we have $|{\hat{m}}_l\left( u_l, \varvec{\beta }^0\right) -m_l\left( u_l\right) |=O\left( \sqrt{N_n / n}+N_n^{-r}\right)$ uniformly for any $u_l \in [0,1]$. This is the proof of (i).

Following the similar reasoning as the proof for ${\hat{m}}_l\left( u_l, \varvec{\beta }^0\right)$, one can prove that $|\hat{{\dot{m}}}_l\left( u_l, \varvec{\beta }^0\right) -{\dot{m}}_l\left( u_l\right) |=$ $O_p\left( \sqrt{N_n^3 / n}+N_n^{-r+1}\right)$ uniformly for any $u_l \in [0,1]$. This complete the proof of (ii). $\square$

Let

$$\begin{aligned} m_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}; \varvec{\beta }\right)= & {} E\left\{ m_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}^{0}\right) \mid {\varvec{Z}}^{T} \varvec{\beta }_{l}\right\} , \\ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}; \varvec{\beta }\right)= & {} E\left\{ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}^{0}\right) \mid {\varvec{Z}}^{T} \varvec{\beta }_{l}\right\} , \\ Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right)= & {} E\left[ {\dot{\phi }}_{\tau }\left\{ Y-\sum _{l=1}^{d} m_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}; \varvec{\beta }\right) X_{l}\right\} \left[ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}, \varvec{\beta }\right) X_{l} {\varvec{J}}_{l}^{T} \tilde{{\textbf{Z}}}\right] _{l=1}^{d}\right] , \\ Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}\right)= & {} \frac{1}{n} \sum _{i=1}^{n} {\dot{\phi }}_{\tau }\left\{ Y_{i}-\sum _{l=1}^{d} m_{l}\left( {\varvec{Z}}_{i}^{T} \varvec{\beta }_{l}; \varvec{\beta }\right) X_{i l}\right\} \left[ {\dot{m}}_{l}\left( {\textbf{Z}}_{i}^{T} \varvec{\beta }_{l}, \varvec{\beta }\right) X_{i l} {\varvec{J}}_{l}^{T} \tilde{{\textbf{Z}}}_{i}\right] _{l=1}^{d}, \\ \eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right)= & {} -E\left[ \ddot{\phi }_{\tau }\left( Y-{\varvec{m}}^{T} {\varvec{X}}\right) (\hat{{\varvec{m}}}-{\varvec{m}})^{T} {\varvec{X}}\left[ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}, \varvec{\beta }\right) X_{l} {\varvec{J}}_{l}^{T} \tilde{{\varvec{Z}}}\right] _{l=1}^{d}\right] , \end{aligned}$$

$$\begin{aligned} \Theta _{n,-1}=\left\{ \varvec{\beta }_{-1}=\left( \varvec{\beta }_{l,-1}^T: 1 \le l \le d\right) ^T:\left\| \varvec{\beta }_{l,-1}-\varvec{\beta }_{l,-1}^0\right\| \le C n^{-1 / 2}\right\} . \end{aligned}$$

(19)

for some positive constant C

$$\begin{aligned} {\mathcal {G}}_\delta =\left\{ \hat{{\varvec{m}}} \in {\mathcal {G}}:\Vert \hat{{\varvec{m}}}-{\varvec{m}}\Vert _{{\mathcal {G}}} \le \delta ,\Vert \hat{{\varvec{m}}}-\dot{{\varvec{m}}}\Vert _{{\mathcal {G}}} \le \delta \right\} . \end{aligned}$$

(20)

Proof of Theorem 1

Let $\alpha _{n}=O\left( \left( n^{-1 / 2}+a_{n}\right) \right)$. It suffices to show that for any given $\delta >0$, there exists a large constant C such that for all n sufficiently large,

$$\begin{aligned} P\left\{ \inf _{\left\| \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right\| _{2}=C \alpha _{n}}\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T}\left[ Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) +n {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right) \right] >0\right\} \ge 1-\delta . \end{aligned}$$

(21)

Note that

$$\begin{aligned} \begin{aligned}&\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T}\left[ Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) + {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right) \right] \\&\quad =\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T}Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) + \left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T}{\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right) \\&\quad =I_{1}+I_{2}. \end{aligned} \end{aligned}$$

As for $Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right)$, by some calculations, we have

$$\begin{aligned} \begin{aligned} Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right)&=\left[ Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) + Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) \right] \\&\quad +\left[ Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) -\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) \right] \\&\quad +\left[ \eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}^{0}\right) \right] \\&\quad +\left[ Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) +\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}^{0}\right) \right] +Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) \\&\quad =I_{11}+I_{12}+I_{13}+I_{14}+I_{15}. \end{aligned} \end{aligned}$$

Noting that $Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) =0$, we have

$$\begin{aligned} \begin{aligned} I_{11}&=Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) + Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) \\&\quad =n^{-1/2}\left\{ n^{1/2}\left[ Q_{n}\left( \hat{{\varvec{m}}},\varvec{\beta }_{-1}\right) -Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) \right] \right. \\&\left. \quad -n^{1/2}\left[ Q_{n}\left( {\varvec{m}},\varvec{\beta }_{-1}^{0}\right) -Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) \right] \right\} \\&\quad =n^{-1/2}\left\{ r_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -r_n\left( {\varvec{m}}, \varvec{\beta }_{-1}^0\right) \right\} . \end{aligned} \end{aligned}$$

By checking the conditions of Theorem 1 in Doukhan et al. (1995), we can show the empirical process $\left\{ r_n\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) : {\varvec{m}} \in {\mathcal {G}}_1, \varvec{\beta }_{-1} \in \Theta _{1,-1}\right\}$ has the stochastic equicontinuity, where ${\mathcal {G}}_1$ is defined in (20) with $\delta =1$ and $\Theta _{1,-1}$ is defined in (19) with $n=1$ respectively. Therefore, we have $r_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -r_n\left( {\varvec{m}}, \varvec{\beta }_{-1}^0\right) =$ $o_p(1)$, uniformly for $\hat{{\varvec{m}}} \in {\mathcal {G}}_1, \varvec{\beta }_{-1} \in \Theta _{1,-1}$. It follows that

$$\begin{aligned} \sup _{\varvec{\beta }_{-1} \in \Theta _{n,-1}}\left\| I_{11}\right\| =o_p\left( n^{-1 / 2}\right) . \end{aligned}$$

(22)

Taking taylor expansion for $I_{12}$, we have

$$\begin{aligned} \begin{aligned} I_{12}&= Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) -\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) \\&\quad =-E\left[ \ddot{\phi }\left( Y-{\varvec{m}}^T {\varvec{X}}\right) (\hat{{\varvec{m}}}-{\varvec{m}})^T {\varvec{X}}\left[ \left( \hat{{\dot{m}}}_l-{\dot{m}}_l\right) X_l {\varvec{J}}_l^T \tilde{{\varvec{Z}}}\right] _{l=1}^d\right] . \end{aligned} \end{aligned}$$

By the condition (C4) and Lemma 1, we can prove

$$\begin{aligned} \sup _{\varvec{\beta }_{-1} \in \Theta _{n,-1}}\left\| I_{12}\right\| =o_p\left( n^{-1 / 2}\right) . \end{aligned}$$

(23)

Similarly, by the condition (C2) and Lemma 1, we can obtain

$$\begin{aligned} \sup _{\varvec{\beta }_{-1} \in \Theta _{n,-1}}\left\| I_{13}\right\| =o_p\left( n^{-1 / 2}\right) . \end{aligned}$$

(24)

As for $I_{14}$, by Lemma 1, we have

$$\begin{aligned} I_{14}&=Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) +\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}^{0}\right) \nonumber \\&\quad =Q_n\left( {\varvec{m}}, \varvec{\beta }_{-1}^0\right) +o_p\left( n^{-1 / 2}\right) \nonumber \\&\quad =\frac{1}{n} \sum _{i=1}^n {\dot{\phi }}_{\tau }\left\{ \varepsilon _i\right\} \left[ {\dot{m}}_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l} {\varvec{J}}_l^T \tilde{{\varvec{Z}}}_i\right] _{l=1}^d+o_p\left( n^{-1 / 2}\right) \nonumber \\&\quad =\varvec{\Psi }_n+o_p\left( n^{-1 / 2}\right) . \end{aligned}$$

(25)

For any vector $\zeta$ whose components are not all zero and conditioning on $\left\{ {\varvec{X}}_i, {\varvec{Z}}_i\right\} , \xi _i$ are independent with mean zero and variance one, we have

$$\begin{aligned} \sqrt{n} \zeta ^T \varvec{\Psi }_n=\sum _{i=1}^n \frac{1}{\sqrt{n}} \zeta ^T {\dot{\phi }}_{h_2}\left\{ \varepsilon _i\right\} \left[ {\dot{m}}_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l} {\varvec{J}}_l^T \tilde{{\varvec{Z}}}_i\right] _{l=1}^d=\sum _{i=1}^n a_i \xi _i, \end{aligned}$$

where $a_i^2=\frac{1}{n} G\left( {\varvec{X}}_i, {\varvec{Z}}_i, h_2\right) \zeta ^T\left\{ \left[ {\dot{m}}_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l} {\varvec{J}}_l^T \tilde{{\varvec{Z}}}_i\right] _{l=1}^d\right\} ^{\otimes 2} \zeta$. By applying slutsky theorem and Lindeberg-Feller central limit theorem, we have

$$\begin{aligned} \sqrt{n} \varvec{\Psi }_n {\mathop {\rightarrow }\limits ^{d}} N({\textbf{0}}, \varvec{\Psi }), \end{aligned}$$

(26)

where $\varvec{\Psi }=E\left( G\left( {\varvec{X}}, {\varvec{Z}}, h_2\right) \left\{ \left[ {\dot{m}}_l\left( {\varvec{Z}}^T \varvec{\beta }_l^0\right) X_l {\varvec{J}}_l^T \tilde{{\varvec{Z}}}\right] _{l=1}^d\right\} ^{\otimes 2}\right) .$

Taking taylor expansion for $I_{15}$, we have

$$\begin{aligned} I_{15}=Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right)&=Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) -\varvec{\Phi }_{n}\left( \varvec{\beta }_{-1} -\varvec{\beta }_{-1}^{0}\right) +o_{p}\left( n^{-1 / 2}\right) \nonumber \\&\quad =\varvec{\Phi }_{n}\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) +o_{p}\left( n^{-1 / 2}\right) \nonumber \\&\quad =\varvec{\Phi }\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) +o_{p}\left( n^{-1 / 2}\right) \end{aligned}$$

(27)

uniformly for $\varvec{\beta }_{-1} \in \Theta _{n,-1}$, where

$$\begin{aligned} \varvec{\Phi }_{n}=\frac{1}{n} \sum _{i=1}^{n} \ddot{\phi }\left\{ \varepsilon _{i}\right\} \left\{ \left[ {\dot{m}}_{l}\left( {\varvec{Z}}_{i}^{T} \varvec{\beta }_{l}^{0}\right) X_{i l} {\varvec{J}}_{l}^{T} \tilde{{\varvec{Z}}}_{i}\right] _{l=1}^{d}\right\} ^{\otimes 2}, \end{aligned}$$

and

$$\begin{aligned} \varvec{\Phi }=E\left( F\left( {\varvec{X}}, {\varvec{Z}}, h_{2}\right) \left\{ \left[ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}^{0}\right) X_{l} {\varvec{J}}_{l}^{T} \tilde{{\varvec{Z}}}\right] _{l=1}^{d}\right\} ^{\otimes 2}\right) . \end{aligned}$$

The last equation in (27) is followed by the law of large number.

Combining (22), (23), (24), (25), (26) and (27), we have

$$\begin{aligned} \begin{aligned}&\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T Q_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) \\&\quad =\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T I_{14}-\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T \varvec{\Phi }\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) +o_p\left( n^{-1}\right) . \end{aligned} \end{aligned}$$

(28)

By choosing a sufficiently large C, the second term in (28) dominates other terms uniformly.

Next, we consider $I_{2}=\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right)$. Define

$$\begin{aligned} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}^{0}\right)= & {} \left[ {\dot{p}}_{\lambda }\left( |\beta _{12}^{0}|\right) {\text {sgn}}\left( \beta _{12}^{0}\right) , \ldots , {\dot{p}}_{\lambda }\left( |\beta _{1 p}^{0}|\right) {\text {sgn}}\left( \beta _{1 p}^{0}\right) , \ldots ,\right. \\{} & {} \left. {\dot{p}}_{\lambda }\left( |\beta _{d p}^{0}|\right) {\text {sgn}}\left( \beta _{d p}^{0}\right) \right] , \end{aligned}$$

and

$$\begin{aligned} \varvec{\Sigma }_{\lambda }\left( \varvec{\beta }_{-1}^{0}\right) ={\text {diag}}\left\{ \ddot{p}_{\lambda }\left( |\beta _{12}^{0}|\right) , \ldots , \ddot{p}_{\lambda }\left( |\beta _{1 p}^{0}|\right) , \ldots , \ddot{p}_{\lambda }\left( |\beta _{d p}^{0}|\right) \right\} . \end{aligned}$$

Taking Taylor’s explanation for ${\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right)$ at $\varvec{\beta }_{-1}^{0}$, we have

$$\begin{aligned} \begin{aligned}&|\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right) |\\&\quad \le |\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}^{0}\right) |+|\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} \varvec{\Sigma }_{\lambda }\left( \varvec{\beta }_{-1}^{0}\right) \\&\quad \{1+o(1)\} \left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) |\\&\quad \le a_{n} \sqrt{d\left( p-1\right) }\left\| \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right\| _{2}+ b_{n}\left\| \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right\| _{2}^{2} \\&\quad =O_{p}\left( \alpha _{n}^{2} C\right) +o_{p}\left( \alpha _{n}^{2} C^{2}\right) . \end{aligned} \end{aligned}$$

This is also dominated by the second term of (28). Hence, by choosing a sufficiently large C, we have $\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T\left[ Q_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) +n {\varvec{b}}_\lambda \left( \varvec{\beta }_{-1}\right) \right] >0$, implying that with the probability at least $1-\delta$, (17) holds. This completes the proof of the Theorem 1.

Proof of Theorem 2

Theorem 2 follows from Theorem 1 and Lemma 1.

Proof of Theorem 3

It is sufficient to show that with probability tending to 1 as $n \rightarrow \infty$, for any $\varvec{\hat{\beta }}_{1}$ satisfying $\varvec{\beta }_{-1}^{(1)}-\varvec{\beta }_{-1}^{0(1)}=O_{P}\left( n^{-1 / 2}+a_{n}\right)$ and for some small $\varepsilon _{n}=$ $C n^{-1 / 2}$ and $l=1, \ldots , d, j=s_{l}+1, \ldots , p$, we have

$$\begin{aligned} Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) + b_{\lambda }\left( \beta _{l j,-1}\right) =\left\{ \begin{array}{l}<0,0<\beta _{l j,-1}<\epsilon _{n}, \\ >0,-\epsilon _{n}<\beta _{l j,-1}<0, \end{array}\right. \end{aligned}$$

From the proof of Theorem 1, we have $Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}\right) =O_{p}\left( 1/\sqrt{n}\right)$. Using $\sqrt{1 / n} / \lambda \rightarrow 0$ and condition (C8), for $l=1, \ldots , d, j=s_{l}+1, \ldots , p$, we have

$$\begin{aligned} \begin{aligned}&Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}\right) +b_{\lambda }\left( \beta _{l j,-1}\right) \\&\quad = \lambda \left\{ O_{p}\left( \frac{\sqrt{1 / n}}{\lambda }\right) +\frac{{\dot{p}}_{\lambda }\left( |\beta _{l j,-1}|\right) }{\lambda } {\text {sgn}}\left( \beta _{l j,-1}\right) \right\} . \end{aligned} \end{aligned}$$

Obviously, the sign of $\beta _{l j,-1}$ determines the sign of $Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}\right) +b_{\lambda }\left( \beta _{l j,-1}\right)$. This implies $\hat{\beta }_{ l j,-1}=0$ with probability converging to 1 for $l=1, \ldots , d, j=s_{l}+1, \ldots , p$. This completes the proof of Theorem 2 (i).

Next we need to prove (ii). By Theorem 1 and Theorem 3 (i), there exists $\hat{\varvec{\beta }}_{-1}^{(1)}$ satisfies the following equations:

$$\begin{aligned} Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}^{(1)}\right) +b_{\lambda }\left( \beta _{l j,-1}^{(1)}\right) =0,l=1, \ldots , d, j=s_{l}+1, \ldots , p, \end{aligned}$$

Then, by taylor expansion, we have

$$\begin{aligned} \begin{aligned}&Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}^{(1)}\right) +b_{\lambda }\left( \beta _{l j,-1}^{(1)}\right) \\&\quad =Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}^{0(1)}\right) +\frac{\partial Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}^{0(1)}\right) }{\partial \beta _{l j,-1}^{(1)}}\left( \beta _{l j}^{(1)}-\beta _{l j}^{0(1)}\right) \{1+o(1)\} \\&\quad +{\dot{p}}_{\lambda }\left( \left|\beta _{l j}^{0(1)}\right|\right) {\text {sgn}}\left( \beta _{l j}^{0(1)}\right) +\ddot{p}_{\lambda }\left( \left|\beta _{l j}^{0(1)}\right|\right) \{1+o(1)\}\left( \beta _{l j}^{(1)}-\beta _{l j}^{0(1)}\right) , \end{aligned} \end{aligned}$$

where $l=1, \ldots , d, j=s_{l}+1, \ldots , p$. It follows by Slutsky theorem and central limit theorem that

$$\begin{aligned} \sqrt{n}\left( \varvec{\Phi }+\varvec{\Sigma }\right) \left\{ \hat{\varvec{\beta }}_{-1}^{(1)}-\varvec{\beta }_{-1}^{0(1)}+ (\varvec{\Phi }+\varvec{\Sigma })^{-1} {\textbf{b}}\right\} \rightarrow N \left\{ {\textbf{0}}, \varvec{\Psi } \right\} . \end{aligned}$$

This completes the proof of Theorem 3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zou, H., Jiang, Y. Robust variable selection for the varying index coefficient models. J. Korean Stat. Soc. 52, 767–793 (2023). https://doi.org/10.1007/s42952-023-00221-8

Download citation

Received: 20 February 2023
Accepted: 22 July 2023
Published: 07 August 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s42952-023-00221-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust variable selection for the varying index coefficient models

Abstract

Access this article

Similar content being viewed by others

Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression

Variable selection for generalized varying coefficient models with longitudinal data

Variable selection for the partial linear single-index model

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust variable selection for the varying index coefficient models

Abstract

Access this article

Similar content being viewed by others

Robust variable selection in high-dimensional varying coefficient models based on weighted composite quantile regression

Variable selection for generalized varying coefficient models with longitudinal data

Variable selection for the partial linear single-index model

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix

Appendix

Lemma 1

Proof of Lemma 1

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation