Abstract
Recently, the statistical inference of the varying index coefficient model has been widely concerned. However, to the best of our knowledge, there has no existing robust variable selection method for the varying index coefficient model in the presence of outliers in the response and covariates. To overcome this difficulty, we develop a robust variable selection method for the varying index coefficient model via the exponential squared loss (ESL) function in this article. We first approximate nonparametric functions by B-spline basis functions and then apply the minorization-maximization (MM) algorithm and the Fisher scoring algorithm to calculate the proposed estimators. Under some mild conditions, the theoretical properties of the proposed estimators are established. Furthermore, we propose a data-driven procedure to select the tuning parameters. Some numerical simulations are conducted to illustrate the finite sample performance of the proposed method. Finally, the analysis of New Zealand workforce data reveals the merit of the proposed method.
Similar content being viewed by others
Data availability
The data that support the findings of this study are openly available in R package VGAMdata.
References
Cai, Z., Fan, J., & Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association, 95(451), 888–902.
Carroll, R. J., Fan, J., Gijbels, I., & Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489.
De Boor, C. (2001). A practical guide to splines. Springer.
Doukhan, P., Massart, P., & Rio, E. (1995). Invariance principles for absolutely regular empirical processes. Annales de l'IHP Probabilités et statistiques, 31(2), 393–427.
Fan, J., & Jiang, J. (2005). Nonparametric inferences for additive models. Journal of the American Statistical Association, 100(471), 890–907.
Fan, J., & Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. The Annals of Statistics, 32(3), 928–961.
Fan, J., & Zhang, W. (2008). Statistical methods with varying coefficient models. Statistics and Its Interface, 1, 179–195.
Hastie, T., & Tibshirani, R. (1993). Varying-coefficient models. Journal of the Royal Statistical Society: Series B (Methodological), 55(4), 757–779.
Hasti, T., & Tibshirani, R. (1986). Generalized additive models. Journal of the Royal Statistical Society: Series B (Methodological), 1(3), 297–318.
Huang, J., Horowitz, J. L., & Wei, F. (2010). Variable selection in nonparametric additive models. Annals of Statistics, 38(4), 2282–2313.
Hunter, D. R., & Lange, K. (2004). A tutorial on mm algorithms. The American Statistician, 58(1), 30–37.
Jiang, Y., Tian, G.-L., & Fei, Y. (2019). A robust and efficient estimation method for partially nonlinear models via a new mm algorithm. Statistical Papers, 60(6), 2063–2085.
Liang, H., Liu, X., Li, R., & Tsai, C.-L. (2010). Estimation and testing for partially linear single-index models. Annals of Statistics, 38(6), 3811–3836.
Lv, J., & Li, J. (2022). High-dimensional varying index coefficient quantile regression model. Statistica Sinica, 32(2), 673–694.
Lv, J., Yang, H., & Guo, C. (2016). Robust estimation for varying index coefficient models. Computational Statistics, 31(3), 1131–1167.
Ma, S., & Song, P.X.-K. (2015). Varying index coefficient models. Journal of the American Statistical Association, 110(509), 341–356.
Ma, S., & Xu, S. (2015). Semiparametric nonlinear regression for detecting gene and environment interactions. Journal of Statistical Planning and Inference, 156, 31–47.
Ma, S., & Yang, L. (2011). Spline-backfitted kernel smoothing of partially linear additive model. Journal of Statistical Planning and Inference, 141, 204–219.
Na, S., Yang, Z., Wang, Z., & Kolar, M. (2019). High-dimensional varying index coefficient models via stein’s identity. Journal of Machine Learning Research, 20(152), 1–44.
Neykov, N., Čížek, P., Filzmoser, P., & Neytchev, P. (2012). The least trimmed quantile regression. Computational Statistics & Data Analysis, 56(6), 1757–1770.
Schumaker, L. (1981). Spline functions: Basic theory. Willey.
Song, Y., Jian, L., & Lin, L. (2016). Robust exponential squared loss-based variable selection for high-dimensional single-index varying-coefficient model. Journal of Computational and Applied Mathematics, 308, 330–345.
Wang, X., Jiang, Y., Huang, M., & Zhang, H. (2013). Robust variable selection with exponential squared loss. Journal of the American Statistical Association, 108(502), 632–643.
Wang, H., Li, B., & Leng, C. (2009). Shrinkage tuning parameter selection with a diverging number of parameters. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 71(3), 671–683.
Wang, K., & Lin, L. (2016). Robust structure identification and variable selection in partial linear varying coefficient models. Journal of Statistical Planning and Inference, 174, 153–168.
Wang, L., Liu, X., Liang, H., & Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. Annals of Statistics, 39(4), 1827–1851.
Whang, Y. J. (2006). Smoothed empirical likelihood methods for quantile regression models. Econometric Theory, 22(2), 173–205.
Xia, Y., & Härdle, W. (2006). Semi-parametric estimation of partially linear single-index models. Journal of Multivariate Analysis, 97(5), 1162–1184.
Yao, W., & Li, L. (2014). A new regression model: modal linear regression. Scandinavian Journal of Statistics, 41(3), 656–671.
Yu, Y., & Ruppert, D. (2002). Penalized spline estimation for partially linear single-index models. Journal of the American Statistical Association, 97(460), 1042–1054.
Zhao, W., Zhang, R., Liu, J., & Lv, Y. (2014). Robust and efficient variable selection for semiparametric partially linear varying coefficient model based on modal regression. Annals of the Institute of Statistical Mathematics, 66(1), 165–191.
Funding
Jiang’s research is partially supported by NSFC (12171203), the Fundamental Research Funds for the Central Universities (23JNQMX21) and the Natural Science Foundation of Guangdong (2022A1515010045).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Lemma 1
Under conditions (C1)–(C4), \(N_{n} \rightarrow \infty\) and \(n N_{n}^{-1} \rightarrow \infty\), as \(n \rightarrow \infty\), we have (i) \(|{\hat{m}}_{l}\left( u_{l}, \varvec{\beta }^{0}\right) -m_{l}\left( u_{l}\right) |=O_{p}\left( \sqrt{N_{n} / n}+N_{n}^{-r}\right)\) uniformly for any \(u_{l} \in\) [0, 1]; and (ii) under \(N_{n} \rightarrow \infty\) and n \(N_{n}^{-3} \rightarrow \infty\), as \(n \rightarrow \infty ,|\hat{{\dot{m}}}_{l}\left( u_{l}, \varvec{\beta }^{0}\right) -{\dot{m}}_{l}\left( u_{l}\right) |=\) \(O_{p}\left( \sqrt{N_{n}^{3} / n}+N_{n}^{-r+1}\right)\) uniformly for any \(u_{l} \in [0,1]\).
Proof of Lemma 1
Denote \(\quad {\textbf{m}}=\left\{ m\left( {\textbf{Z}}_1, {\textbf{X}}_1, \varvec{\beta }^0\right) \right.\), \(\left. \ldots , m\left( {\textbf{Z}}_n, {\textbf{X}}_n, \varvec{\beta }^0\right) \right\} ^{\textrm{T}}\). By (5), \({\widehat{\lambda }}(\varvec{\beta })\) can be decomposed into \(\hat{\lambda }(\varvec{\beta })={\widehat{\lambda }}_m(\varvec{\beta }) +{\widehat{\lambda }}_{\epsilon }(\varvec{\beta })\), where
Let \({\widehat{\lambda }}_\epsilon (\varvec{\beta })=\left\{ {\widehat{\lambda }}_{1, \epsilon }(\varvec{\beta })^{\textrm{T}}, \ldots , \hat{\lambda }_{d, \epsilon }(\varvec{\beta })^{\textrm{T}}\right\} ^{\textrm{T}}\), where \(\hat{\lambda }_{l, \epsilon }(\varvec{\beta })=\) \(\left\{ {\widehat{\lambda }}_{s, l, \epsilon }(\varvec{\beta }): 1 \le s \le J_n\right\} ^{\textrm{T}}\) and \({\widehat{\lambda }}_m(\varvec{\beta })=\left\{ {\widehat{\lambda }}_{1, m}(\varvec{\beta })^{\textrm{T}}, \ldots , {\widehat{\lambda }}_{d, m}(\varvec{\beta })^{\textrm{T}}\right\} ^{\textrm{T}}\), where \({\widehat{\lambda }}_{l, m}(\varvec{\beta })=\left\{ {\widehat{\lambda }}_{s, l, m}(\varvec{\beta }): 1 \le s \le J_n\right\} ^{\textrm{T}}\). Thus,
where
For \(m_l\) satisfying Condition (C2), there is a function \(m_l^0\left( u_l\right) ={\textbf{B}}_r\left( u_l\right) ^{\textrm{T}} \varvec{\lambda }_l\), such that
From the similar arguments as Lemma A.1 of Ma and Xu (2015), we have
By Taylor expansion, for \(1 \le l \le d\),
where \({\varvec{1}}_l\) is the \(d \times 1\) vector with the l-th element as “1" and other elements as “0", \({\varvec{B}}({\varvec{u}})=diag\left( {\varvec{B}}_q\left( u_1\right) ^T, {\varvec{B}}_q\left( u_2\right) ^T,\ldots ,{\varvec{B}}_q\left( u_d\right) ^T\right) _{d \times d J_n}\) with \({\varvec{u}}=\left( u_1, \ldots , u_d\right) ^T\), \(\varvec{\Omega }_n\left( \varvec{\beta }^0\right) =n^{-1} \sum _{i=1}^n \ddot{\phi }_{\tau }\left\{ Y_i-\sum _{l=1}^d m_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l}\right\} {\varvec{D}}_i\left( \varvec{\beta }^0\right) {\varvec{D}}_i\left( \varvec{\beta }^0\right) ^T\),
\({\varvec{D}}_i\left( \varvec{\beta }^0\right) =\left( D_{i, s l}\left( \varvec{\beta }_l^0\right) , 1 \le s \le J_n, 1 \le l \le d\right) ^T\) and \(D_{i, s l}\left( \varvec{\beta }_l^0\right) =\) \(B_{s, q}\left( U_{i l}\left( \varvec{\beta }_l^0\right) \right) X_{i l}\).
Combing (17) and (18), we have \(|{\hat{m}}_l\left( u_l, \varvec{\beta }^0\right) -m_l\left( u_l\right) |=O\left( \sqrt{N_n / n}+N_n^{-r}\right)\) uniformly for any \(u_l \in [0,1]\). This is the proof of (i).
Following the similar reasoning as the proof for \({\hat{m}}_l\left( u_l, \varvec{\beta }^0\right)\), one can prove that \(|\hat{{\dot{m}}}_l\left( u_l, \varvec{\beta }^0\right) -{\dot{m}}_l\left( u_l\right) |=\) \(O_p\left( \sqrt{N_n^3 / n}+N_n^{-r+1}\right)\) uniformly for any \(u_l \in [0,1]\). This complete the proof of (ii). \(\square\)
Let
for some positive constant C
Proof of Theorem 1
Let \(\alpha _{n}=O\left( \left( n^{-1 / 2}+a_{n}\right) \right)\). It suffices to show that for any given \(\delta >0\), there exists a large constant C such that for all n sufficiently large,
Note that
As for \(Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right)\), by some calculations, we have
Noting that \(Q\left( {\varvec{m}}, \varvec{\beta }_{-1}^{0}\right) =0\), we have
By checking the conditions of Theorem 1 in Doukhan et al. (1995), we can show the empirical process \(\left\{ r_n\left( {\varvec{m}}, \varvec{\beta }_{-1}\right) : {\varvec{m}} \in {\mathcal {G}}_1, \varvec{\beta }_{-1} \in \Theta _{1,-1}\right\}\) has the stochastic equicontinuity, where \({\mathcal {G}}_1\) is defined in (20) with \(\delta =1\) and \(\Theta _{1,-1}\) is defined in (19) with \(n=1\) respectively. Therefore, we have \(r_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) -r_n\left( {\varvec{m}}, \varvec{\beta }_{-1}^0\right) =\) \(o_p(1)\), uniformly for \(\hat{{\varvec{m}}} \in {\mathcal {G}}_1, \varvec{\beta }_{-1} \in \Theta _{1,-1}\). It follows that
Taking taylor expansion for \(I_{12}\), we have
By the condition (C4) and Lemma 1, we can prove
Similarly, by the condition (C2) and Lemma 1, we can obtain
As for \(I_{14}\), by Lemma 1, we have
For any vector \(\zeta\) whose components are not all zero and conditioning on \(\left\{ {\varvec{X}}_i, {\varvec{Z}}_i\right\} , \xi _i\) are independent with mean zero and variance one, we have
where \(a_i^2=\frac{1}{n} G\left( {\varvec{X}}_i, {\varvec{Z}}_i, h_2\right) \zeta ^T\left\{ \left[ {\dot{m}}_l\left( {\varvec{Z}}_i^T \varvec{\beta }_l^0\right) X_{i l} {\varvec{J}}_l^T \tilde{{\varvec{Z}}}_i\right] _{l=1}^d\right\} ^{\otimes 2} \zeta\). By applying slutsky theorem and Lindeberg-Feller central limit theorem, we have
where \(\varvec{\Psi }=E\left( G\left( {\varvec{X}}, {\varvec{Z}}, h_2\right) \left\{ \left[ {\dot{m}}_l\left( {\varvec{Z}}^T \varvec{\beta }_l^0\right) X_l {\varvec{J}}_l^T \tilde{{\varvec{Z}}}\right] _{l=1}^d\right\} ^{\otimes 2}\right) .\)
Taking taylor expansion for \(I_{15}\), we have
uniformly for \(\varvec{\beta }_{-1} \in \Theta _{n,-1}\), where
and
The last equation in (27) is followed by the law of large number.
Combining (22), (23), (24), (25), (26) and (27), we have
By choosing a sufficiently large C, the second term in (28) dominates other terms uniformly.
Next, we consider \(I_{2}=\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^{0}\right) ^{T} {\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right)\). Define
and
Taking Taylor’s explanation for \({\varvec{b}}_{\lambda }\left( \varvec{\beta }_{-1}\right)\) at \(\varvec{\beta }_{-1}^{0}\), we have
This is also dominated by the second term of (28). Hence, by choosing a sufficiently large C, we have \(\left( \varvec{\beta }_{-1}-\varvec{\beta }_{-1}^0\right) ^T\left[ Q_n\left( \hat{{\varvec{m}}}, \varvec{\beta }_{-1}\right) +n {\varvec{b}}_\lambda \left( \varvec{\beta }_{-1}\right) \right] >0\), implying that with the probability at least \(1-\delta\), (17) holds. This completes the proof of the Theorem 1.
Proof of Theorem 2
Theorem 2 follows from Theorem 1 and Lemma 1.
Proof of Theorem 3
It is sufficient to show that with probability tending to 1 as \(n \rightarrow \infty\), for any \(\varvec{\hat{\beta }}_{1}\) satisfying \(\varvec{\beta }_{-1}^{(1)}-\varvec{\beta }_{-1}^{0(1)}=O_{P}\left( n^{-1 / 2}+a_{n}\right)\) and for some small \(\varepsilon _{n}=\) \(C n^{-1 / 2}\) and \(l=1, \ldots , d, j=s_{l}+1, \ldots , p\), we have
From the proof of Theorem 1, we have \(Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}\right) =O_{p}\left( 1/\sqrt{n}\right)\). Using \(\sqrt{1 / n} / \lambda \rightarrow 0\) and condition (C8), for \(l=1, \ldots , d, j=s_{l}+1, \ldots , p\), we have
Obviously, the sign of \(\beta _{l j,-1}\) determines the sign of \(Q_{n}\left( \hat{{\varvec{m}}}, \varvec{\beta }_{lj,-1}\right) +b_{\lambda }\left( \beta _{l j,-1}\right)\). This implies \(\hat{\beta }_{ l j,-1}=0\) with probability converging to 1 for \(l=1, \ldots , d, j=s_{l}+1, \ldots , p\). This completes the proof of Theorem 2 (i).
Next we need to prove (ii). By Theorem 1 and Theorem 3 (i), there exists \(\hat{\varvec{\beta }}_{-1}^{(1)}\) satisfies the following equations:
Then, by taylor expansion, we have
where \(l=1, \ldots , d, j=s_{l}+1, \ldots , p\). It follows by Slutsky theorem and central limit theorem that
This completes the proof of Theorem 3.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zou, H., Jiang, Y. Robust variable selection for the varying index coefficient models. J. Korean Stat. Soc. 52, 767–793 (2023). https://doi.org/10.1007/s42952-023-00221-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42952-023-00221-8