Skip to main content
Log in

Robust variable selection for the varying index coefficient models

  • Research Article
  • Published:
Journal of the Korean Statistical Society Aims and scope Submit manuscript

Abstract

Recently, the statistical inference of the varying index coefficient model has been widely concerned. However, to the best of our knowledge, there has no existing robust variable selection method for the varying index coefficient model in the presence of outliers in the response and covariates. To overcome this difficulty, we develop a robust variable selection method for the varying index coefficient model via the exponential squared loss (ESL) function in this article. We first approximate nonparametric functions by B-spline basis functions and then apply the minorization-maximization (MM) algorithm and the Fisher scoring algorithm to calculate the proposed estimators. Under some mild conditions, the theoretical properties of the proposed estimators are established. Furthermore, we propose a data-driven procedure to select the tuning parameters. Some numerical simulations are conducted to illustrate the finite sample performance of the proposed method. Finally, the analysis of New Zealand workforce data reveals the merit of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

Data availability

The data that support the findings of this study are openly available in R package VGAMdata.

References

  • Cai, Z., Fan, J., & Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association, 95(451), 888–902.

    Article  MathSciNet  MATH  Google Scholar 

  • Carroll, R. J., Fan, J., Gijbels, I., & Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489.

    Article  MathSciNet  MATH  Google Scholar 

  • De Boor, C. (2001). A practical guide to splines. Springer.

    MATH  Google Scholar 

  • Doukhan, P., Massart, P., & Rio, E. (1995). Invariance principles for absolutely regular empirical processes. Annales de l'IHP Probabilités et statistiques, 31(2), 393–427.

    MathSciNet  MATH  Google Scholar 

  • Fan, J., & Jiang, J. (2005). Nonparametric inferences for additive models. Journal of the American Statistical Association, 100(471), 890–907.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., & Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32(3), 928–961.

    Article  MathSciNet  MATH  Google Scholar 

  • Fan, J., & Zhang, W. (2008). Statistical methods with varying coefficient models. Statistics and Its Interface, 1, 179–195.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie, T., & Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4), 757–779.

    MathSciNet  MATH  Google Scholar 

  • Hasti, T., & Tibshirani, R. (1986). Generalized additive models. Journal of the Royal Statistical Society: Series B (Methodological), 1(3), 297–318.

    MathSciNet  MATH  Google Scholar 

  • Huang, J., Horowitz, J. L., & Wei, F. (2010). Variable selection in nonparametric additive models. Annals of Statistics, 38(4), 2282–2313.

    Article  MathSciNet  MATH  Google Scholar 

  • Hunter, D. R., & Lange, K. (2004). A tutorial on mm algorithms. The American Statistician, 58(1), 30–37.

    Article  MathSciNet  Google Scholar 

  • Jiang, Y., Tian, G.-L., & Fei, Y. (2019). A robust and efficient estimation method for partially nonlinear models via a new mm algorithm. Statistical Papers, 60(6), 2063–2085.

    Article  MathSciNet  MATH  Google Scholar 

  • Liang, H., Liu, X., Li, R., & Tsai, C.-L. (2010). Estimation and testing for partially linear single-index models. Annals of Statistics, 38(6), 3811–3836.

    Article  MathSciNet  MATH  Google Scholar 

  • Lv, J., & Li, J. (2022). High-dimensional varying index coefficient quantile regression model. Statistica Sinica, 32(2), 673–694.

    MathSciNet  MATH  Google Scholar 

  • Lv, J., Yang, H., & Guo, C. (2016). Robust estimation for varying index coefficient models. Computational Statistics, 31(3), 1131–1167.

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, S., & Song, P.X.-K. (2015). Varying index coefficient models. Journal of the American Statistical Association, 110(509), 341–356.

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, S., & Xu, S. (2015). Semiparametric nonlinear regression for detecting gene and environment interactions. Journal of Statistical Planning and Inference, 156, 31–47.

    Article  MathSciNet  MATH  Google Scholar 

  • Ma, S., & Yang, L. (2011). Spline-backfitted kernel smoothing of partially linear additive model. Journal of Statistical Planning and Inference, 141, 204–219.

    Article  MathSciNet  MATH  Google Scholar 

  • Na, S., Yang, Z., Wang, Z., & Kolar, M. (2019). High-dimensional varying index coefficient models via stein’s identity. Journal of Machine Learning Research, 20(152), 1–44.

    MathSciNet  MATH  Google Scholar 

  • Neykov, N., Čížek, P., Filzmoser, P., & Neytchev, P. (2012). The least trimmed quantile regression. Computational Statistics & Data Analysis, 56(6), 1757–1770.

    Article  MathSciNet  MATH  Google Scholar 

  • Schumaker, L. (1981). Spline functions: Basic theory. Willey.

    MATH  Google Scholar 

  • Song, Y., Jian, L., & Lin, L. (2016). Robust exponential squared loss-based variable selection for high-dimensional single-index varying-coefficient model. Journal of Computational and Applied Mathematics, 308, 330–345.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, X., Jiang, Y., Huang, M., & Zhang, H. (2013). Robust variable selection with exponential squared loss. Journal of the American Statistical Association, 108(502), 632–643.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, H., Li, B., & Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 671–683.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, K., & Lin, L. (2016). Robust structure identification and variable selection in partial linear varying coefficient models. Journal of Statistical Planning and Inference, 174, 153–168.

    Article  MathSciNet  MATH  Google Scholar 

  • Wang, L., Liu, X., Liang, H., & Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. Annals of Statistics, 39(4), 1827–1851.

    Article  MathSciNet  MATH  Google Scholar 

  • Whang, Y. J. (2006). Smoothed empirical likelihood methods for quantile regression models. Econometric Theory, 22(2), 173–205.

    Article  MathSciNet  MATH  Google Scholar 

  • Xia, Y., & Härdle, W. (2006). Semi-parametric estimation of partially linear single-index models. Journal of Multivariate Analysis, 97(5), 1162–1184.

    Article  MathSciNet  MATH  Google Scholar 

  • Yao, W., & Li, L. (2014). A new regression model: modal linear regression. Scandinavian Journal of Statistics, 41(3), 656–671.

    Article  MathSciNet  MATH  Google Scholar 

  • Yu, Y., & Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97(460), 1042–1054.

    Article  MathSciNet  MATH  Google Scholar 

  • Zhao, W., Zhang, R., Liu, J., & Lv, Y. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66(1), 165–191.

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

Jiang’s research is partially supported by NSFC (12171203), the Fundamental Research Funds for the Central Universities (23JNQMX21) and the Natural Science Foundation of Guangdong (2022A1515010045).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yunlu Jiang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Lemma 1

Under conditions (C1)–(C4), \(N_{n} \rightarrow \infty\) and \(n N_{n}^{-1} \rightarrow \infty\), as \(n \rightarrow \infty\), we have (i) \(|{\hat{m}}_{l}\left( u_{l}, \varvec{\beta }^{0}\right) -m_{l}\left( u_{l}\right) |=O_{p}\left( \sqrt{N_{n} / n}+N_{n}^{-r}\right)\) uniformly for any \(u_{l} \in\) [0, 1]; and (ii) under \(N_{n} \rightarrow \infty\) and n \(N_{n}^{-3} \rightarrow \infty\), as \(n \rightarrow \infty ,|\hat{{\dot{m}}}_{l}\left( u_{l}, \varvec{\beta }^{0}\right) -{\dot{m}}_{l}\left( u_{l}\right) |=\) \(O_{p}\left( \sqrt{N_{n}^{3} / n}+N_{n}^{-r+1}\right)\) uniformly for any \(u_{l} \in [0,1]\).

Proof of Lemma 1

Denote \(\quad {\textbf{m}}=\left\{ m\left( {\textbf{Z}}_1, {\textbf{X}}_1, \varvec{\beta }^0\right) \right.\), \(\left. \ldots , m\left( {\textbf{Z}}_n, {\textbf{X}}_n, \varvec{\beta }^0\right) \right\} ^{\textrm{T}}\). By (5), \({\widehat{\lambda }}(\varvec{\beta })\) can be decomposed into \(\hat{\lambda }(\varvec{\beta })={\widehat{\lambda }}_m(\varvec{\beta }) +{\widehat{\lambda }}_{\epsilon }(\varvec{\beta })\), where

$$\begin{aligned} \begin{aligned} {\widehat{\lambda }}_m(\varvec{\beta })&=\left\{ {\textbf{D}}(\varvec{\beta })^{\textrm{T}} {\textbf{D}}(\varvec{\beta })\right\} ^{-1} {\textbf{D}}(\varvec{\beta })^{\textrm{T}} {\textbf{m}}, \\ {\widehat{\lambda }}_\epsilon (\varvec{\beta })&\quad =\left\{ {\textbf{D}}(\varvec{\beta })^{\textrm{T}} {\textbf{D}}(\varvec{\beta })\right\} ^{-1} {\textbf{D}}(\varvec{\beta })^{\textrm{T}}({\textbf{Y}}-{\textbf{m}}). \end{aligned} \end{aligned}$$

Let \({\widehat{\lambda }}_\epsilon (\varvec{\beta })=\left\{ {\widehat{\lambda }}_{1, \epsilon }(\varvec{\beta })^{\textrm{T}}, \ldots , \hat{\lambda }_{d, \epsilon }(\varvec{\beta })^{\textrm{T}}\right\} ^{\textrm{T}}\), where \(\hat{\lambda }_{l, \epsilon }(\varvec{\beta })=\) \(\left\{ {\widehat{\lambda }}_{s, l, \epsilon }(\varvec{\beta }): 1 \le s \le J_n\right\} ^{\textrm{T}}\) and \({\widehat{\lambda }}_m(\varvec{\beta })=\left\{ {\widehat{\lambda }}_{1, m}(\varvec{\beta })^{\textrm{T}}, \ldots , {\widehat{\lambda }}_{d, m}(\varvec{\beta })^{\textrm{T}}\right\} ^{\textrm{T}}\), where \({\widehat{\lambda }}_{l, m}(\varvec{\beta })=\left\{ {\widehat{\lambda }}_{s, l, m}(\varvec{\beta }): 1 \le s \le J_n\right\} ^{\textrm{T}}\). Thus,

$$\begin{aligned} {\widehat{m}}_l\left( u_l, \varvec{\beta }\right) ={\widehat{m}}_{l, \epsilon }\left( u_l, \varvec{\beta }\right) +{\widehat{m}}_{l, m}\left( u_l, \varvec{\beta }\right) , \end{aligned}$$

where

$$\begin{aligned} {\widehat{m}}_{l, \epsilon }\left( u_l, \varvec{\beta }\right) ={\textbf{B}}_r\left( u_l\right) ^{\textrm{T}} {\widehat{\lambda }}_{l, \epsilon }(\varvec{\beta }) \text{ and } {\widehat{m}}_{l, m}\left( u_l, \varvec{\beta }\right) ={\textbf{B}}_r\left( u_l\right) ^{\textrm{T}} {\widehat{\lambda }}_{l, m}(\varvec{\beta }) \text{. } \end{aligned}$$

For \(m_l\) satisfying Condition (C2), there is a function \(m_l^0\left( u_l\right) ={\textbf{B}}_r\left( u_l\right) ^{\textrm{T}} \varvec{\lambda }_l\), such that

$$\begin{aligned} \sup _{u_l \in [0,1]}\left|m_l^0\left( u_l\right) -m_l\left( u_l\right) \right|=O_{p}\left( J_n^{-r}\right) . \end{aligned}$$
(17)

From the similar arguments as Lemma A.1 of Ma and Xu (2015), we have

$$\begin{aligned} {\hat{m}}_{l, \varepsilon }\left( u_l, \varvec{\beta }^0\right) =O_p\left( \sqrt{N_n / n}\right) , \sup _{u_l \in [0,1]}\left|{\hat{m}}_{l, m}\left( u_l, \varvec{\beta }^0\right) \right|=O_p\left( N_n^{-r}\right) . \end{aligned}$$

By Taylor expansion, for \(1 \le l \le d\),

$$\begin{aligned} \begin{aligned}&{\hat{m}}_l\left( u_l, \varvec{\beta }^0\right) -m_l^0\left( u_l, \varvec{\beta }^0\right) \\&\quad ={\varvec{1}}_l^T {\varvec{B}}({\varvec{u}})\left\{ \hat{\lambda }\left( \varvec{\beta }^0\right) -\lambda ^0\right\} \\&\quad ={\varvec{1}}_l^T {\varvec{B}}({\varvec{u}}) \varvec{\Omega }_n\left( \varvec{\beta }^0\right) ^{-1}\left[ \frac{1}{n} \sum _{i=1}^n {\dot{\phi }}_{\tau }\left\{ Y_i-\sum _{l=1}^d {\varvec{B}}_q\left( U_{i l}\left( \varvec{\beta }_l^0\right) \right) ^T \lambda _l^0 X_{i l}\right\} \right. \\&\quad \left. {\varvec{D}}_i\left( \varvec{\beta }^0\right) \right] \left[ 1+o_p(1)\right] \\&\quad = {\varvec{1}}_l^T {\varvec{B}}(u) \varvec{\Omega }_n\left( \varvec{\beta }^0\right) ^{-1}\left[ \frac{1}{n} \sum _{i=1}^n {\dot{\phi }}_{\tau }\left\{ Y_i-\sum _{l=1}^d m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l}\right\} \right. \\&\quad \left. {\varvec{D}}_i\left( \varvec{\beta }^0\right) \right] \left[ 1+o_p(1)\right] \\&\quad +{\varvec{1}}_l^T {\varvec{B}}(u) \varvec{\Omega }_n\left( \varvec{\beta }^0\right) ^{-1} \frac{1}{n} \sum _{i=1}^n \ddot{\phi }_{\tau }\left\{ Y_i-\sum _{l=1}^a m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l}\right\} \\&\quad \times \left\{ \sum _{l=1}^d\left( m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) -{\varvec{B}}_q\left( U_{i l}\left( \varvec{\beta }_l^0\right) \right) ^T \lambda _l^0\right) X_{i l}\right\} {\varvec{D}}_i\left( \varvec{\beta }^0\right) \left[ 1+o_p(1)\right] \\&\quad = \left\{ {\hat{m}}_{l, \varepsilon }\left( u_l, \varvec{\beta }^0\right) +{\hat{m}}_{l, m}\left( u_l, \varvec{\beta }^0\right) \right\} \left\{ 1+o_p(1)\right\} \\&\quad = O_p\left( \sqrt{N_n / n}\right) , \end{aligned} \end{aligned}$$
(18)

where \({\varvec{1}}_l\) is the \(d \times 1\) vector with the l-th element as “1" and other elements as “0", \({\varvec{B}}({\varvec{u}})=diag\left( {\varvec{B}}_q\left( u_1\right) ^T, {\varvec{B}}_q\left( u_2\right) ^T,\ldots ,{\varvec{B}}_q\left( u_d\right) ^T\right) _{d \times d J_n}\) with \({\varvec{u}}=\left( u_1, \ldots , u_d\right) ^T\), \(\varvec{\Omega }_n\left( \varvec{\beta }^0\right) =n^{-1} \sum _{i=1}^n \ddot{\phi }_{\tau }\left\{ Y_i-\sum _{l=1}^d m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l}\right\} {\varvec{D}}_i\left( \varvec{\beta }^0\right) {\varvec{D}}_i\left( \varvec{\beta }^0\right) ^T\),

\({\varvec{D}}_i\left( \varvec{\beta }^0\right) =\left( D_{i, s l}\left( \varvec{\beta }_l^0\right) , 1 \le s \le J_n, 1 \le l \le d\right) ^T\) and \(D_{i, s l}\left( \varvec{\beta }_l^0\right) =\) \(B_{s, q}\left( U_{i l}\left( \varvec{\beta }_l^0\right) \right) X_{i l}\).

Combing (17) and (18), we have \(|{\hat{m}}_l\left( u_l, \varvec{\beta }^0\right) -m_l\left( u_l\right) |=O\left( \sqrt{N_n / n}+N_n^{-r}\right)\) uniformly for any \(u_l \in [0,1]\). This is the proof of (i).

Following the similar reasoning as the proof for \({\hat{m}}_l\left( u_l, \varvec{\beta }^0\right)\), one can prove that \(|\hat{{\dot{m}}}_l\left( u_l, \varvec{\beta }^0\right) -{\dot{m}}_l\left( u_l\right) |=\) \(O_p\left( \sqrt{N_n^3 / n}+N_n^{-r+1}\right)\) uniformly for any \(u_l \in [0,1]\). This complete the proof of (ii). \(\square\)

Let

$$\begin{aligned} m_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}; \varvec{\beta }\right)= & {} E\left\{ m_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}^{0}\right) \mid {\varvec{Z}}^{T} \varvec{\beta }_{l}\right\} , \\ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}; \varvec{\beta }\right)= & {} E\left\{ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}^{0}\right) \mid {\varvec{Z}}^{T} \varvec{\beta }_{l}\right\} , \\ Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right)= & {} E\left[ {\dot{\phi }}_{\tau }\left\{ Y-\sum _{l=1}^{d} m_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}; \varvec{\beta }\right) X_{l}\right\} \left[ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}, \varvec{\beta }\right) X_{l} {\varvec{J}}_{l}^{T} \tilde{{\textbf{Z}}}\right] _{l=1}^{d}\right] , \\ Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}\right)= & {} \frac{1}{n} \sum _{i=1}^{n} {\dot{\phi }}_{\tau }\left\{ Y_{i}-\sum _{l=1}^{d} m_{l}\left( {\varvec{Z}}_{i}^{T} \varvec{\beta }_{l}; \varvec{\beta }\right) X_{i l}\right\} \left[ {\dot{m}}_{l}\left( {\textbf{Z}}_{i}^{T} \varvec{\beta }_{l}, \varvec{\beta }\right) X_{i l} {\varvec{J}}_{l}^{T} \tilde{{\textbf{Z}}}_{i}\right] _{l=1}^{d}, \\ \eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right)= & {} -E\left[ \ddot{\phi }_{\tau }\left( Y-{\varvec{m}}^{T} {\varvec{X}}\right) (\hat{{\varvec{m}}}-{\varvec{m}})^{T} {\varvec{X}}\left[ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}, \varvec{\beta }\right) X_{l} {\varvec{J}}_{l}^{T} \tilde{{\varvec{Z}}}\right] _{l=1}^{d}\right] , \end{aligned}$$
$$\begin{aligned} \Theta _{n,-1}=\left\{ \varvec{\beta }_{-1}=\left( \varvec{\beta }_{l,-1}^T: 1 \le l \le d\right) ^T:\left\| \varvec{\beta }_{l,-1}-\varvec{\beta }_{l,-1}^0\right\| \le C n^{-1 / 2}\right\} . \end{aligned}$$
(19)

for some positive constant C

$$\begin{aligned} {\mathcal {G}}_\delta =\left\{ \hat{{\varvec{m}}} \in {\mathcal {G}}:\Vert \hat{{\varvec{m}}}-{\varvec{m}}\Vert _{{\mathcal {G}}} \le \delta ,\Vert \hat{{\varvec{m}}}-\dot{{\varvec{m}}}\Vert _{{\mathcal {G}}} \le \delta \right\} . \end{aligned}$$
(20)

Proof of Theorem 1

Let \(\alpha _{n}=O\left( \left( n^{-1 / 2}+a_{n}\right) \right)\). It suffices to show that for any given \(\delta >0\), there exists a large constant C such that for all n sufficiently large,

$$\begin{aligned} P\left\{ \inf _{\left\| \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right\| _{2}=C \alpha _{n}}\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T}\left[ Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) +n {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right) \right] >0\right\} \ge 1-\delta . \end{aligned}$$
(21)

Note that

$$\begin{aligned} \begin{aligned}&\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T}\left[ Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) + {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right) \right] \\&\quad =\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T}Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) + \left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T}{\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right) \\&\quad =I_{1}+I_{2}. \end{aligned} \end{aligned}$$

As for \(Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right)\), by some calculations, we have

$$\begin{aligned} \begin{aligned} Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right)&=\left[ Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) + Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) \right] \\&\quad +\left[ Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) -\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) \right] \\&\quad +\left[ \eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}^{0}\right) \right] \\&\quad +\left[ Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) +\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}^{0}\right) \right] +Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) \\&\quad =I_{11}+I_{12}+I_{13}+I_{14}+I_{15}. \end{aligned} \end{aligned}$$

Noting that \(Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) =0\), we have

$$\begin{aligned} \begin{aligned} I_{11}&=Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) + Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) \\&\quad =n^{-1/2}\left\{ n^{1/2}\left[ Q_{n}\left( \hat{{\varvec{m}}},\varvec{\beta }_{-1}\right) -Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) \right] \right. \\&\left. \quad -n^{1/2}\left[ Q_{n}\left( {\varvec{m}},\varvec{\beta }_{-1}^{0}\right) -Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) \right] \right\} \\&\quad =n^{-1/2}\left\{ r_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -r_n\left( {\varvec{m}}, \varvec{\beta }_{-1}^0\right) \right\} . \end{aligned} \end{aligned}$$

By checking the conditions of Theorem 1 in Doukhan et al. (1995), we can show the empirical process \(\left\{ r_n\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) : {\varvec{m}} \in {\mathcal {G}}_1, \varvec{\beta }_{-1} \in \Theta _{1,-1}\right\}\) has the stochastic equicontinuity, where \({\mathcal {G}}_1\) is defined in (20) with \(\delta =1\) and \(\Theta _{1,-1}\) is defined in (19) with \(n=1\) respectively. Therefore, we have \(r_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -r_n\left( {\varvec{m}}, \varvec{\beta }_{-1}^0\right) =\) \(o_p(1)\), uniformly for \(\hat{{\varvec{m}}} \in {\mathcal {G}}_1, \varvec{\beta }_{-1} \in \Theta _{1,-1}\). It follows that

$$\begin{aligned} \sup _{\varvec{\beta }_{-1} \in \Theta _{n,-1}}\left\| I_{11}\right\| =o_p\left( n^{-1 / 2}\right) . \end{aligned}$$
(22)

Taking taylor expansion for \(I_{12}\), we have

$$\begin{aligned} \begin{aligned} I_{12}&= Q\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) -\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) \\&\quad =-E\left[ \ddot{\phi }\left( Y-{\varvec{m}}^T {\varvec{X}}\right) (\hat{{\varvec{m}}}-{\varvec{m}})^T {\varvec{X}}\left[ \left( \hat{{\dot{m}}}_l-{\dot{m}}_l\right) X_l {\varvec{J}}_l^T \tilde{{\varvec{Z}}}\right] _{l=1}^d\right] . \end{aligned} \end{aligned}$$

By the condition (C4) and Lemma 1, we can prove

$$\begin{aligned} \sup _{\varvec{\beta }_{-1} \in \Theta _{n,-1}}\left\| I_{12}\right\| =o_p\left( n^{-1 / 2}\right) . \end{aligned}$$
(23)

Similarly, by the condition (C2) and Lemma 1, we can obtain

$$\begin{aligned} \sup _{\varvec{\beta }_{-1} \in \Theta _{n,-1}}\left\| I_{13}\right\| =o_p\left( n^{-1 / 2}\right) . \end{aligned}$$
(24)

As for \(I_{14}\), by Lemma 1, we have

$$\begin{aligned} I_{14}&=Q_{n}\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) +\eta \left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}^{0}\right) \nonumber \\&\quad =Q_n\left( {\varvec{m}}, \varvec{\beta }_{-1}^0\right) +o_p\left( n^{-1 / 2}\right) \nonumber \\&\quad =\frac{1}{n} \sum _{i=1}^n {\dot{\phi }}_{\tau }\left\{ \varepsilon _i\right\} \left[ {\dot{m}}_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l} {\varvec{J}}_l^T \tilde{{\varvec{Z}}}_i\right] _{l=1}^d+o_p\left( n^{-1 / 2}\right) \nonumber \\&\quad =\varvec{\Psi }_n+o_p\left( n^{-1 / 2}\right) . \end{aligned}$$
(25)

For any vector \(\zeta\) whose components are not all zero and conditioning on \(\left\{ {\varvec{X}}_i, {\varvec{Z}}_i\right\} , \xi _i\) are independent with mean zero and variance one, we have

$$\begin{aligned} \sqrt{n} \zeta ^T \varvec{\Psi }_n=\sum _{i=1}^n \frac{1}{\sqrt{n}} \zeta ^T {\dot{\phi }}_{h_2}\left\{ \varepsilon _i\right\} \left[ {\dot{m}}_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l} {\varvec{J}}_l^T \tilde{{\varvec{Z}}}_i\right] _{l=1}^d=\sum _{i=1}^n a_i \xi _i, \end{aligned}$$

where \(a_i^2=\frac{1}{n} G\left( {\varvec{X}}_i, {\varvec{Z}}_i, h_2\right) \zeta ^T\left\{ \left[ {\dot{m}}_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l} {\varvec{J}}_l^T \tilde{{\varvec{Z}}}_i\right] _{l=1}^d\right\} ^{\otimes 2} \zeta\). By applying slutsky theorem and Lindeberg-Feller central limit theorem, we have

$$\begin{aligned} \sqrt{n} \varvec{\Psi }_n {\mathop {\rightarrow }\limits ^{d}} N({\textbf{0}}, \varvec{\Psi }), \end{aligned}$$
(26)

where \(\varvec{\Psi }=E\left( G\left( {\varvec{X}}, {\varvec{Z}}, h_2\right) \left\{ \left[ {\dot{m}}_l\left( {\varvec{Z}}^T \varvec{\beta }_l^0\right) X_l {\varvec{J}}_l^T \tilde{{\varvec{Z}}}\right] _{l=1}^d\right\} ^{\otimes 2}\right) .\)

Taking taylor expansion for \(I_{15}\), we have

$$\begin{aligned} I_{15}=Q\left( {\varvec{m}}, \varvec{\beta }_{-1}\right)&=Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) -\varvec{\Phi }_{n}\left( \varvec{\beta }_{-1} -\varvec{\beta }_{-1}^{0}\right) +o_{p}\left( n^{-1 / 2}\right) \nonumber \\&\quad =\varvec{\Phi }_{n}\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) +o_{p}\left( n^{-1 / 2}\right) \nonumber \\&\quad =\varvec{\Phi }\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) +o_{p}\left( n^{-1 / 2}\right) \end{aligned}$$
(27)

uniformly for \(\varvec{\beta }_{-1} \in \Theta _{n,-1}\), where

$$\begin{aligned} \varvec{\Phi }_{n}=\frac{1}{n} \sum _{i=1}^{n} \ddot{\phi }\left\{ \varepsilon _{i}\right\} \left\{ \left[ {\dot{m}}_{l}\left( {\varvec{Z}}_{i}^{T} \varvec{\beta }_{l}^{0}\right) X_{i l} {\varvec{J}}_{l}^{T} \tilde{{\varvec{Z}}}_{i}\right] _{l=1}^{d}\right\} ^{\otimes 2}, \end{aligned}$$

and

$$\begin{aligned} \varvec{\Phi }=E\left( F\left( {\varvec{X}}, {\varvec{Z}}, h_{2}\right) \left\{ \left[ {\dot{m}}_{l}\left( {\varvec{Z}}^{T} \varvec{\beta }_{l}^{0}\right) X_{l} {\varvec{J}}_{l}^{T} \tilde{{\varvec{Z}}}\right] _{l=1}^{d}\right\} ^{\otimes 2}\right) . \end{aligned}$$

The last equation in (27) is followed by the law of large number.

Combining (22), (23), (24), (25), (26) and (27), we have

$$\begin{aligned} \begin{aligned}&\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T Q_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) \\&\quad =\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T I_{14}-\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T \varvec{\Phi }\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) +o_p\left( n^{-1}\right) . \end{aligned} \end{aligned}$$
(28)

By choosing a sufficiently large C, the second term in (28) dominates other terms uniformly.

Next, we consider \(I_{2}=\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right)\). Define

$$\begin{aligned} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}^{0}\right)= & {} \left[ {\dot{p}}_{\lambda }\left( |\beta _{12}^{0}|\right) {\text {sgn}}\left( \beta _{12}^{0}\right) , \ldots , {\dot{p}}_{\lambda }\left( |\beta _{1 p}^{0}|\right) {\text {sgn}}\left( \beta _{1 p}^{0}\right) , \ldots ,\right. \\{} & {} \left. {\dot{p}}_{\lambda }\left( |\beta _{d p}^{0}|\right) {\text {sgn}}\left( \beta _{d p}^{0}\right) \right] , \end{aligned}$$

and

$$\begin{aligned} \varvec{\Sigma }_{\lambda }\left( \varvec{\beta }_{-1}^{0}\right) ={\text {diag}}\left\{ \ddot{p}_{\lambda }\left( |\beta _{12}^{0}|\right) , \ldots , \ddot{p}_{\lambda }\left( |\beta _{1 p}^{0}|\right) , \ldots , \ddot{p}_{\lambda }\left( |\beta _{d p}^{0}|\right) \right\} . \end{aligned}$$

Taking Taylor’s explanation for \({\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right)\) at \(\varvec{\beta }_{-1}^{0}\), we have

$$\begin{aligned} \begin{aligned}&|\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right) |\\&\quad \le |\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}^{0}\right) |+|\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} \varvec{\Sigma }_{\lambda }\left( \varvec{\beta }_{-1}^{0}\right) \\&\quad \{1+o(1)\} \left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) |\\&\quad \le a_{n} \sqrt{d\left( p-1\right) }\left\| \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right\| _{2}+ b_{n}\left\| \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right\| _{2}^{2} \\&\quad =O_{p}\left( \alpha _{n}^{2} C\right) +o_{p}\left( \alpha _{n}^{2} C^{2}\right) . \end{aligned} \end{aligned}$$

This is also dominated by the second term of (28). Hence, by choosing a sufficiently large C, we have \(\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T\left[ Q_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) +n {\varvec{b}}_\lambda \left( \varvec{\beta }_{-1}\right) \right] >0\), implying that with the probability at least \(1-\delta\), (17) holds. This completes the proof of the Theorem 1.

Proof of Theorem 2

Theorem 2 follows from Theorem 1 and Lemma 1.

Proof of Theorem 3

It is sufficient to show that with probability tending to 1 as \(n \rightarrow \infty\), for any \(\varvec{\hat{\beta }}_{1}\) satisfying \(\varvec{\beta }_{-1}^{(1)}-\varvec{\beta }_{-1}^{0(1)}=O_{P}\left( n^{-1 / 2}+a_{n}\right)\) and for some small \(\varepsilon _{n}=\) \(C n^{-1 / 2}\) and \(l=1, \ldots , d, j=s_{l}+1, \ldots , p\), we have

$$\begin{aligned} Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) + b_{\lambda }\left( \beta _{l j,-1}\right) =\left\{ \begin{array}{l}<0,0<\beta _{l j,-1}<\epsilon _{n}, \\ >0,-\epsilon _{n}<\beta _{l j,-1}<0, \end{array}\right. \end{aligned}$$

From the proof of Theorem 1, we have \(Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}\right) =O_{p}\left( 1/\sqrt{n}\right)\). Using \(\sqrt{1 / n} / \lambda \rightarrow 0\) and condition (C8), for \(l=1, \ldots , d, j=s_{l}+1, \ldots , p\), we have

$$\begin{aligned} \begin{aligned}&Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}\right) +b_{\lambda }\left( \beta _{l j,-1}\right) \\&\quad = \lambda \left\{ O_{p}\left( \frac{\sqrt{1 / n}}{\lambda }\right) +\frac{{\dot{p}}_{\lambda }\left( |\beta _{l j,-1}|\right) }{\lambda } {\text {sgn}}\left( \beta _{l j,-1}\right) \right\} . \end{aligned} \end{aligned}$$

Obviously, the sign of \(\beta _{l j,-1}\) determines the sign of \(Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}\right) +b_{\lambda }\left( \beta _{l j,-1}\right)\). This implies \(\hat{\beta }_{ l j,-1}=0\) with probability converging to 1 for \(l=1, \ldots , d, j=s_{l}+1, \ldots , p\). This completes the proof of Theorem 2 (i).

Next we need to prove (ii). By Theorem 1 and Theorem 3 (i), there exists \(\hat{\varvec{\beta }}_{-1}^{(1)}\) satisfies the following equations:

$$\begin{aligned} Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}^{(1)}\right) +b_{\lambda }\left( \beta _{l j,-1}^{(1)}\right) =0,l=1, \ldots , d, j=s_{l}+1, \ldots , p, \end{aligned}$$

Then, by taylor expansion, we have

$$\begin{aligned} \begin{aligned}&Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}^{(1)}\right) +b_{\lambda }\left( \beta _{l j,-1}^{(1)}\right) \\&\quad =Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}^{0(1)}\right) +\frac{\partial Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}^{0(1)}\right) }{\partial \beta _{l j,-1}^{(1)}}\left( \beta _{l j}^{(1)}-\beta _{l j}^{0(1)}\right) \{1+o(1)\} \\&\quad +{\dot{p}}_{\lambda }\left( \left|\beta _{l j}^{0(1)}\right|\right) {\text {sgn}}\left( \beta _{l j}^{0(1)}\right) +\ddot{p}_{\lambda }\left( \left|\beta _{l j}^{0(1)}\right|\right) \{1+o(1)\}\left( \beta _{l j}^{(1)}-\beta _{l j}^{0(1)}\right) , \end{aligned} \end{aligned}$$

where \(l=1, \ldots , d, j=s_{l}+1, \ldots , p\). It follows by Slutsky theorem and central limit theorem that

$$\begin{aligned} \sqrt{n}\left( \varvec{\Phi }+\varvec{\Sigma }\right) \left\{ \hat{\varvec{\beta }}_{-1}^{(1)}-\varvec{\beta }_{-1}^{0(1)}+ (\varvec{\Phi }+\varvec{\Sigma })^{-1} {\textbf{b}}\right\} \rightarrow N \left\{ {\textbf{0}}, \varvec{\Psi } \right\} . \end{aligned}$$

This completes the proof of Theorem 3.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zou, H., Jiang, Y. Robust variable selection for the varying index coefficient models. J. Korean Stat. Soc. 52, 767–793 (2023). https://doi.org/10.1007/s42952-023-00221-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42952-023-00221-8

Keywords

Navigation