Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters

Abstract

In this paper, we consider problem of variable selection in higher-order partially linear spatial autoregressive model with a diverging number of parameters. By combining series approximation method, two-stage least squares method and a class of non-convex penalty function, we propose a variable selection method to simultaneously select both significant spatial lags of the response variable and explanatory variables in the parametric component and estimate the corresponding nonzero parameters. Unlike existing variable selection methods for spatial autoregressive models, the proposed variable selection method can simultaneously select significant explanatory variables and spatial lags of the response variable. Under appropriate conditions, we establish rate of convergence of the penalized estimator of the parameter vector in the parametric component and uniform rate of convergence of the series estimator of the nonparametric component, and show that the proposed variable selection method enjoys the oracle property. That is, it can estimate the zero parameters as exact zero with probability approaching one, and estimate the nonzero parameters as efficiently as if the true model was known in advance. Simulation studies show that the proposed variable selection method is of satisfactory finite sample properties. Especially, when the sample size is moderate, the proposed variable selection method even works well in the case where the correlation among the explanatory variables in the parametric component is strong. An application of the proposed variable selection method to the Boston house price data serves as a practical illustration.

This is a preview of subscription content, access via your institution.

References

  1. Ai CR, Zhang YQ (2017) Estimation of partially specified spatial panel data models with fixed-effects. Econom Rev 36:6–22

    MathSciNet  Article  Google Scholar 

  2. Badinger H, Egger P (2011) Estimation of higher-order spatial autoregressive cross-section models with heteroscedastic disturbances. Pap Reg Sci 90:213–235

    Article  Google Scholar 

  3. Cai ZW, Xu XP (2008) Nonparametric quantile estimations for dynamic smooth coefficient models. J Am Stat Assoc 103:1595–1608

    MathSciNet  Article  Google Scholar 

  4. Du J, Sun XQ, Cao RY, Zhang ZZ (2018) Statistical inference for partially linear additive spatial autoregressive models. Spat Stat 25:52–67

    MathSciNet  Article  Google Scholar 

  5. Fan JQ, Huang T (2005) Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11:1031–1057

    MathSciNet  Article  Google Scholar 

  6. Fan JQ, Li RZ (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    MathSciNet  Article  Google Scholar 

  7. Gupta A, Robinson PM (2015) Inference on higher-order spatial autoregressive models with increasingly many parameters. J Econom 186:19–31

    MathSciNet  Article  Google Scholar 

  8. Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manage 5:81–102

    Article  Google Scholar 

  9. Horn R, Johnson C (1985) Matrix analysis. Cambridge University Press, Cambridge

    Book  Google Scholar 

  10. Hoshino T (2018) Semiparametric spatial autoregressive models with endogenous regressors: with an application to crime data. J Bus Econ Stat 36:160–172

    MathSciNet  Article  Google Scholar 

  11. Kelejian HH, Prucha IR (1999) A generalized moments estimator for the autoregressive parameter in a spatial model. Int Econom Rev 40:509–533

    MathSciNet  Article  Google Scholar 

  12. Kong E, Xia YC (2012) A single-index quantile regression model and its estimation. Econ Theory 28:730–768

    MathSciNet  Article  Google Scholar 

  13. Lee LF (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72:1899–1925

    MathSciNet  Article  Google Scholar 

  14. Lee LF, Liu XD (2010) Efficient GMM estimation of high order spatial autoregressive models. Econ Theory 26:187–230

    MathSciNet  Article  Google Scholar 

  15. Lesage JP, Pace RK (2009) Introduction to spatial econometrics. CRC Press, Boca Raton

    Book  Google Scholar 

  16. Li DK, Mei CL, Wang N (2019) Tests for spatial dependence and heterogeneity in spatially autoregressive varying coefficient models with application to Boston house price analysis. Reg Sci Urban Econ 79:103470

    Article  Google Scholar 

  17. Li TZ, Mei CL (2013) Testing a polynomial relationship of the non-parametric component in partially linear spatial autoregressive models. Pap Reg Sci 92:633–649

    Google Scholar 

  18. Li TZ, Mei CL (2016) Statistical inference on the parametric component in partially linear spatial autoregressive models. Commun Stat Simul Comput 45:1991–2006

    MathSciNet  Article  Google Scholar 

  19. Lin X, Lee LF (2010) GMM estimation of spatial autoregressive models with unknown heteroskedasticity. J Econom 157:34–52

    MathSciNet  Article  Google Scholar 

  20. Lin X, Weinberg B (2014) Unrequited friendship? how reciprocity mediates adolescent peer effects. Reg Sci Urban Econ 48:144–153

    Article  Google Scholar 

  21. Liu X, Chen JB, Cheng SL (2018) A penalized quasi-maximum likelihood method for variable selection in the spatial autoregressive model. Spat Stat 25:86–104

    MathSciNet  Article  Google Scholar 

  22. Luo GW, Wu MX (2019) Variable selection for semiparametric varying-coefficient spatial autoregressive models with a diverging number of parameters. Commun Stat Theory Methods 50(9):2062–2079. https://doi.org/10.1080/03610926.2019.1659367

    MathSciNet  Article  Google Scholar 

  23. Newey WK (1997) Convergence rates and asymptotic normality for series estimators. J Econom 79:147–168

    MathSciNet  Article  Google Scholar 

  24. Neyman J, Scotts EL (1948) Consistent estimates based on partially consistent observations. Econometrica 16:1–32

    MathSciNet  Article  Google Scholar 

  25. Pace RK, Gilley OW (1997) Using the spatial configuration of the data to improve estimation. J Real Estate Financ Econ 14:333–340

    Article  Google Scholar 

  26. Su LJ (2012) Semiparametric GMM estimation of spatial autoregressive models. J Econom 167:543–560

    MathSciNet  Article  Google Scholar 

  27. Su LJ, Jin SN (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33

    MathSciNet  Article  Google Scholar 

  28. Sun Y, Yan HJ, Zhang WY, Lu ZD (2014) A semiparametric spatial dynamic model. Ann Stat 42:700–727

    MathSciNet  MATH  Google Scholar 

  29. Tao J (2005) Spatial econometrics: models, methods and applications. Dissertation, Ohio State University

  30. Wang HS, Li RZ, Tsai C-L (2007) Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94:553–568

    MathSciNet  Article  Google Scholar 

  31. Wu YQ, Sun Y (2017) Shrinkage estimation of the linear model with spatial interaction. Metrika 80:51–68

    MathSciNet  Article  Google Scholar 

  32. Xie HL, Huang J (2009) SCAD-penalized regression in high-dimensional partially linear models. Ann Stat 37:673–696

    MathSciNet  Article  Google Scholar 

  33. Xie TF, Cao RY, Du J (2020) Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Pap 61:1125–1145

    MathSciNet  Article  Google Scholar 

  34. Yang ZL (2018) Bootstrap LM tests for higher-order spatial effects in spatial linear regression models. Empir Econ 55:35–68

    Article  Google Scholar 

  35. Zhang YQ, Sun YQ (2015) Estimation of partially specified dynamic spatial panel data models with fixed-effects. Reg Sci Urban Econ 51:37–46

    Article  Google Scholar 

  36. Zhang YQ, Yang GR (2015a) Statistical inference of partially specified spatial autoregressive model. Acta Math Appl Sin E 31:1–16

    MathSciNet  Article  Google Scholar 

  37. Zhang YQ, Yang GR (2015b) Estimation of partially specified spatial panel data models with random-effects. Acta Math Sin 31:456–478

    MathSciNet  Article  Google Scholar 

  38. Zhang ZY (2013) A pairwise difference estimator for partially linear spatial autoregressive models. Spatial Econom Anal 8:176–194

    Article  Google Scholar 

  39. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    MathSciNet  Article  Google Scholar 

Download references

Acknowledgements

The authors are grateful to the editor Christine H. Müller and reviewers for their constructive comments and suggestions, which lead to an improved version of this paper. This research was supported by the Natural Science Foundation of Shaanxi Province [Grant Number 2021JM349], the National Statistical Science Project [Grant Number 2019LY36] and the National Natural Science Foundation of China [Grant Number 11972273].

Author information

Affiliations

Authors

Corresponding author

Correspondence to Tizheng Li.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

Appendix: Proofs

In this appendix, we give the technical proofs of Theorems 13. In our proofs, we will frequently use the following three facts.

Fact 1. If the row and column sums of \(n{\times }n\) matrices \({\mathbf{A}}_{n1}\) and \({\mathbf{A}}_{n2}\) are uniformly bounded in absolute value, then the row and column sums of \({\mathbf{A}}_{n1}{\mathbf{A}}_{n2}\) and \({\mathbf{A}}_{n2}{\mathbf{A}}_{n1}\) are also uniformly bounded in absolute value.

Fact 2. The largest eigenvalue of an idempotent matrix is at most one.

Fact 3. For any \(n{\times }n\) matrix \({\mathbf{A}}_{n}\), its spectral radius is bounded by \(\mathrm{{max}}_{1{\le }i{\le }n} \sum _{j=1}^{n}|a_{n,ij}|\), where \(a_{n,ij}\) is the (ij)th element of \({\mathbf{A}}_{n}\).

Proof of Theorem 1

Let \({\alpha }_{n}={\sqrt{p_{n}/n}}+K^{-{\delta }}\) and \({\varvec{\theta }}={\varvec{\theta }}_{0}+{\alpha }_{n}{\mathbf{u}}\). It suffices to prove that for any given \({\eta }>0\), there exists a sufficiently large positive constant C such that

$$\begin{aligned} P\left( \inf _{\Vert {\mathbf{u}}\Vert =C}Q({\varvec{\theta }}_{0}+{\alpha }_{n}{\mathbf{u}})> Q({\varvec{\theta }}_{0})\right) {\ge }1-{\eta }. \end{aligned}$$
(14)

This implies that with probability at least \(1-{\eta }\), there exists a local minimizer \(\widehat{\varvec{\theta }}\) in the ball \(\{{\varvec{\theta }}_{0}+{\alpha }_{n}{\mathbf{u}}:{\Vert {\mathbf{u}}\Vert {\le }C}\}\) such that \(\Vert \widehat{\varvec{\theta }}-{\varvec{\theta }}_{0}\Vert = \mathrm{{O}}_{p}({\alpha }_{n})\).

Let

$$\begin{aligned} {C}_{n1}=\Vert \widetilde{\mathbf{Y}}_{n}-{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}({\varvec{\theta }}_{0}+{\alpha }_{n}{\mathbf{u}})\Vert ^{2} -\Vert \widetilde{\mathbf{Y}}_{n}-{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\varvec{\theta }}_{0}\Vert ^{2} \end{aligned}$$

and

$$\begin{aligned} {C}_{n2}=n\sum _{j=1}^{t+l}\left[ p_{{\lambda }_{j}}(|{\theta }_{j0}+{\alpha }_{n}u_{j}|)- p_{{\lambda }_{j}}(|{\theta }_{j0}|)\right] . \end{aligned}$$

It follows from the assumptions about the penalty function that \(p_{{\lambda }_{j}}(0)=0\) and \(p_{{\lambda }_{j}}(\cdot )\) is increasing on \([0,{\infty })\). Thus, we have

$$\begin{aligned}&Q({\varvec{\theta }}_{0}+{\alpha }_{n}{\mathbf{u}})-Q({\varvec{\theta }}_{0})\\&\quad ={\frac{1}{2}} {C}_{n1}+n\sum _{j=1}^{p_{n}+r}\left[ p_{{\lambda }_{j}}(|{\theta }_{j0}+{\alpha }_{n}u_{j}|)- p_{{\lambda }_{j}}(|{\theta }_{j0}|)\right] \\&\quad ={\frac{1}{2}} {C}_{n1}+n\sum _{j=1}^{t+l}\left[ p_{{\lambda }_{j}}(|{\theta }_{j0}+{\alpha }_{n}u_{j}|)- p_{{\lambda }_{j}}(|{\theta }_{j0}|)\right] +n\sum _{j=t+l+1}^{p_{n}+r}p_{{\lambda }_{j}}({\alpha }_{n}|u_{j}|)\\&\quad {\ge }\,{\frac{1}{2}} {C}_{n1}+n\sum _{j=1}^{t+l}\left[ p_{{\lambda }_{j}}(|{\theta }_{j0}+{\alpha }_{n}u_{j}|)- p_{{\lambda }_{j}}(|{\theta }_{j0}|)\right] \\&\quad ={\frac{1}{2}}{C}_{n1}+{C}_{n2}. \end{aligned}$$

For \({C}_{n1}\), we have

$$\begin{aligned} {C}_{n1}= & {} \Vert \widetilde{\mathbf{Y}}_{n}-{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}({\varvec{\theta }}_{0}+{\alpha }_{n}{\mathbf{u}})\Vert ^{2} -\Vert \widetilde{\mathbf{Y}}_{n}-{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\varvec{\theta }}_{0}\Vert ^{2}\\= & {} \Vert (\widetilde{\mathbf{Y}}_{n}-{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\varvec{\theta }}_{0})-{\alpha }_{n}{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\mathbf{u}}\Vert ^{2} -\Vert \widetilde{\mathbf{Y}}_{n}-{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\varvec{\theta }}_{0}\Vert ^{2}\\= & {} -2{\alpha }_{n}(\widetilde{\mathbf{Y}}_{n}- {\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\varvec{\theta }}_{0})^{{\mathrm{T}}}{\mathbf{M}}_{n} \widetilde{\mathbf{D}}_{n}{\mathbf{u}}+{\alpha }_{n}^{2}{\mathbf{u}}^{{\mathrm{T}}}\widetilde{\mathbf{D}}_{n}^{{\mathrm{T}}}{\mathbf{M}}_{n} \widetilde{\mathbf{D}}_{n}{\mathbf{u}}\\&{\overset{\triangle }{=}}&-2{\alpha }_{n}{D}_{n1}+{\alpha }_{n}^{2}{D}_{n2}. \end{aligned}$$

Let \({\mathbf{V}}_{n}={\mathbf{m}}_{0}({\mathbf{Z}}_{n})-{\mathbf{P}}_{n}{\varvec{\nu }}_{0}\) and \({\overline{\varvec{\varepsilon }}}_{n}=({\mathbf{G}}_{n1}{\varvec{\varepsilon }}_{n}, \ldots ,{\mathbf{G}}_{nr}{\varvec{\varepsilon }}_{n},{\mathbf{0}}_{n{\times }p_{n}})\), then it is easy to show \({\mathbf{D}}_{n}={\overline{\mathbf{D}}}_{n}+{\overline{\varvec{\varepsilon }}}_{n}\). Thus, \({D}_{n1}\) can be decomposed into

$$\begin{aligned} {D}_{n1}= & {} (\widetilde{\mathbf{Y}}_{n}- {\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\varvec{\theta }}_{0})^{{\mathrm{T}}}{\mathbf{M}}_{n} \widetilde{\mathbf{D}}_{n}{\mathbf{u}}\\= & {} [\widetilde{\varvec{\varepsilon }}_{n}+({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n} +({\mathbf{I}}_{n}-{\mathbf{M}}_{n})\widetilde{\mathbf{D}}_{n}{\varvec{\theta }}_{0}]^{{\mathrm{T}}}{\mathbf{M}}_{n} \widetilde{\mathbf{D}}_{n}{\mathbf{u}}\\= & {} \widetilde{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\mathbf{u}} +{\mathbf{V}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n}{\mathbf{u}}\\= & {} {\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{D}}_{n}{\mathbf{u}} +{\mathbf{V}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{D}}_{n}{\mathbf{u}}\\= & {} {D}_{n11}+{D}_{n12}+{D}_{n13}+{D}_{n14}, \end{aligned}$$

where \({D}_{n11}={\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}{\mathbf{u}}\), \({D}_{n12}={\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\varvec{\varepsilon }}_{n}{\mathbf{u}}\), \({D}_{n13}={\mathbf{V}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}{\mathbf{u}}\) and \({D}_{n14}={\mathbf{V}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\varvec{\varepsilon }}_{n}{\mathbf{u}}\).

By Assumption 2 and Fact 2, we have

$$\begin{aligned} \mathrm{{E}}(\Vert {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}\Vert ^{2})&=\mathrm{{E}}({\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n})\\&={\sigma }_{0}^{2}\mathrm{{tr}}(({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}))\\&{\le }\,{\sigma }_{0}^{2}\mathrm{{tr}}({\mathbf{M}}_{n})\\&={\sigma }_{0}^{2}\mathrm{{tr}}({\mathbf{H}}_{n}({\mathbf{H}}_{n}^{{\mathrm{T}}}{\mathbf{H}}_{n})^{-1}{\mathbf{H}}_{n}^{{\mathrm{T}}})\\&={\sigma }_{0}^{2}\mathrm{{tr}}({\mathbf{I}}_{p_{n}+r})\\&=\mathrm{{O}}(p_{n}). \end{aligned}$$

This together with Markov’s inequality implies that

$$\begin{aligned} \Vert {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}\Vert ^{2} =\mathrm{{O}}_{P}(p_{n}). \end{aligned}$$
(15)

It follows from Assumption 3.4 and Fact 2 that

$$\begin{aligned} \Vert {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}{\mathbf{u}}\Vert ^{2}&={\mathbf{u}}^{{\mathrm{T}}}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}{\mathbf{u}}\nonumber \\&{\le }\,\,{\mathbf{u}}^{{\mathrm{T}}}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) \overline{\mathbf{D}}_{n}{\mathbf{u}}\nonumber \\&{\le }\,\,{\mathbf{u}}^{{\mathrm{T}}}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}} \overline{\mathbf{D}}_{n}{\mathbf{u}}\nonumber \\&{\le }\,\,n{\cdot }{\eta }_{\max }(n^{-1}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}} \overline{\mathbf{D}}_{n}){\mathbf{u}}^{{\mathrm{T}}}{\mathbf{u}}\nonumber \\&=\mathrm{{O}}(n\Vert {\mathbf{u}}\Vert ^{2}). \end{aligned}$$
(16)

With (15), (16) and Cauchy–Schwarz inequality, we obtain \({D}_{n11}=\mathrm{{O}}_{P}({\sqrt{np_{n}}}\Vert {\mathbf{u}}\Vert )\).

For \(j=1,\ldots ,r\), it follows from Assumption 1.3 and Fact 1 that the row sums of \({\mathbf{G}}_{nj}{\mathbf{G}}_{nj}^{{\mathrm{T}}}\) are uniformly bounded in absolute value. Hence, we obtain \({\eta }_{\max }({\mathbf{G}}_{nj}{\mathbf{G}}_{nj}^{{\mathrm{T}}})=\mathrm{{O}}(1)\) by Fact 3. Using this result and a similar proof to that of (15), we have

$$\begin{aligned} \Vert {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{G}}_{nj}{\varvec{\varepsilon }}_{n}\Vert ^{2} =\mathrm{{O}}_{P}(p_{n}),\,j=1,\ldots ,r. \end{aligned}$$

This together with (15) and Cauchy–Schwarz inequality yields

$$\begin{aligned} {\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) {\mathbf{G}}_{nj}{\varvec{\varepsilon }}_{n}=\mathrm{{O}}_{P}(p_{n}),\,j=1,\ldots ,r. \end{aligned}$$
(17)

It follows from (17) and Cauchy–Schwarz inequality that

$$\begin{aligned} D_{n12}{\le }\Vert {\mathbf{u}}\Vert \left\{ \sum _{j=1}^{r}[{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) {\mathbf{G}}_{nj}{\varvec{\varepsilon }}_{n}]^{2}\right\} ^{1/2} =\mathrm{{O}}_{P}(p_{n}\Vert {\mathbf{u}}\Vert ). \end{aligned}$$

By Fact 2 and Assumption 4.1, we obtain

$$\begin{aligned} \Vert {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\Vert ^{2}= & {} {\mathbf{V}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\nonumber \\\le & {} {\mathbf{V}}_{n}^{{\mathrm{T}}}{\mathbf{V}}_{n}\nonumber \\= & {} \sum _{i=1}^{n}|m_{0}({\mathbf{z}}_{n,i})-{\mathbf{p}}^{K}({\mathbf{z}}_{n,i}) ^{{\mathrm{T}}}{\varvec{\nu }}_{0}|^{2}\nonumber \\\le & {} n{\cdot }\max _{1{\le }i{\le }n}|m_{0}({\mathbf{z}}_{n,i})-{\mathbf{p}}^{K}({\mathbf{z}}_{n,i}) ^{{\mathrm{T}}}{\varvec{\nu }}_{0}|^{2}\nonumber \\\le & {} n{\cdot }\left( \max _{1{\le }i{\le }n}|m_{0}({\mathbf{z}}_{n,i})-{\mathbf{p}}^{K}({\mathbf{z}}_{n,i}) ^{{\mathrm{T}}}{\varvec{\nu }}_{0}|\right) ^{2}\nonumber \\\le & {} n{\cdot }\left( {\sup }_{{\mathbf{z}}{\in }{{\mathcal {Z}}}}|m_{0}({\mathbf{z}})-{\mathbf{p}}^{K}({\mathbf{z}}) ^{{\mathrm{T}}}{\varvec{\nu }}_{0}|\right) ^{2}\nonumber \\= & {} \mathrm{{O}}(nK^{-2{\delta }}). \end{aligned}$$
(18)

Using (16), (18) and Cauchy–Schwarz inequality, we get

$$\begin{aligned} D_{n13}= & {} {\mathbf{V}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}{\mathbf{u}}\\&{\le }&\left\{ \Vert {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\Vert ^{2} {\cdot }\Vert {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}{\mathbf{u}}\Vert ^{2}{\cdot } \Vert {\mathbf{u}}\Vert ^{2}\right\} ^{\frac{1}{2}}\\= & {} \mathrm{{O}}_{P}(nK^{-{\delta }}\Vert {\mathbf{u}}\Vert ). \end{aligned}$$

Similarly, we can prove \(D_{n14}=\mathrm{{O}}_{P}({\sqrt{np_{n}}}K^{-{\delta }}\Vert {\mathbf{u}}\Vert )\) by using (17), (18) and Cauchy–Schwarz inequality. Combining the orders of \(D_{n11}\), \(D_{n12}\), \(D_{n13}\) and \(D_{n14}\), we obtain

$$\begin{aligned} D_{n1}=\mathrm{{O}}_{P}(({\sqrt{np_{n}}}+nK^{-{\delta }})\Vert {\mathbf{u}}\Vert ). \end{aligned}$$

For \(D_{n2}\), we have

$$\begin{aligned} D_{n2}= & {} {\mathbf{u}}^{{\mathrm{T}}}\widetilde{\mathbf{D}}_{n}^{{\mathrm{T}}}{\mathbf{M}}_{n} \widetilde{\mathbf{D}}_{n}{\mathbf{u}}\\= & {} {\mathbf{u}}^{{\mathrm{T}}}(\overline{\mathbf{D}}_{n}+\overline{\varvec{\varepsilon }}_{n})^{{\mathrm{T}}} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) (\overline{\mathbf{D}}_{n}+\overline{\varvec{\varepsilon }}_{n}){\mathbf{u}}\\= & {} D_{n21}+D_{n22}+2D_{n23}, \end{aligned}$$

where \(D_{n21}={\mathbf{u}}^{{\mathrm{T}}}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}{\mathbf{u}}\), \(D_{n22}={\mathbf{u}}^{{\mathrm{T}}}\overline{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\varvec{\varepsilon }}_{n}{\mathbf{u}}\) and \(D_{n23}={\mathbf{u}}^{{\mathrm{T}}}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}) {\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\varvec{\varepsilon }}_{n}{\mathbf{u}}\). Similar to the proof of \(D_{n1}\), we can obtain \(D_{n21}=\mathrm{{O}}(n\Vert {\mathbf{u}}\Vert ^{2})\), \(D_{n22}=\mathrm{{O}}_{P}(p_{n}\Vert {\mathbf{u}}\Vert ^{2})\) and \(D_{n23}=\mathrm{{O}}_{P}({\sqrt{np_{n}}}\Vert {\mathbf{u}}\Vert ^{2})\). Thus, we have \(D_{n2}=\mathrm{{O}}_{P}(n\Vert {\mathbf{u}}\Vert ^{2})\).

Next, we consider \({C}_{n2}\). Let \(a_{n}=\max \{|p_{{\lambda }_{j}}^{\prime }(|{\theta }_{j0}|)|, {\theta }_{j0}{\ne }0\}\) and \(b_{n}=\max \{|p_{{\lambda }_{j}}^{\prime \prime }(|{\theta }_{j0}|)|, {\theta }_{j0}{\ne }0\}\). Then, it follows from \({\lambda }_{{\mathrm{max}}}{\rightarrow }0\) as \(n{\rightarrow }{\infty }\) and the Condition (6) on the penalty function that \(a_{n}=o(1)\) and \(b_{n}=o(1)\). By using Taylor expansion of the penalty function and Cauchy-Schwarz inequality, we have

$$\begin{aligned} {C}_{n2}= & {} n\sum _{j=1}^{t+l}\left\{ p_{{\lambda }_{j}}(|{\theta }_{j0}+{\alpha }_{n}u_{j}|)- p_{{\lambda }_{j}}(|{\theta }_{j0}|)\right\} \\= & {} n{\alpha }_{n}{\sum _{j=1}^{t+l}p_{{\lambda }_{j}}^{\prime }(|{\theta }_{j0}|)\mathrm{{sgn}}({\theta }_{j0})u_{j}}+ {\frac{n{\alpha }_{n}^{2}}{2}}{\sum _{j=1}^{t+l}p_{{\lambda }_{j}}^{\prime \prime }(|{\theta }_{j0}|)u_{j}^{2}[1+\mathrm{{o}}(1)]}\\&{\ge }&-\,n{\alpha }_{n}{\sum _{j=1}^{t+l}|p_{{\lambda }_{j}}^{\prime }(|{\theta }_{j0}|)||u_{j}|} -{\frac{n{\alpha }_{n}^{2}}{2}}{\sum _{j=1}^{t+l}|p_{{\lambda }_{j}}^{\prime \prime }(|{\theta }_{j0}|)|u_{j}^{2}[1+\mathrm{{o}}(1)]}\\&{\ge }&-\,n{\alpha }_{n}{a_{n}}{\sum _{j=1}^{t+l}|u_{j}|} -{\frac{n{\alpha }_{n}^{2}}{2}}{b_{n}}{\sum _{j=1}^{t+l}u_{j}^{2}[1+\mathrm{{o}}(1)]}\\&{\ge }&-\,n{\alpha }_{n}{a_{n}}{\sqrt{t+l}}\Vert {\mathbf{u}}\Vert -n{\alpha }_{n}^{2}{\Vert {\mathbf{u}}\Vert }^{2}{b_{n}} \end{aligned}$$

By comparing the orders of \({\alpha }_{n}{D}_{n1}\), \({\alpha }_{n}^{2}{D}_{n2}\) and \({C}_{n2}\) and noting that \({a_{n}}=\mathrm{{o}}(1)\) and \({b_{n}}=\mathrm{{o}}(1)\), we can conclude that \({\alpha }_{n}^{2}{D}_{n2}\) dominates both \({\alpha }_{n}{D}_{n1}\) and \({C}_{n2}\) provided C is sufficiently large. Thus, (14) holds for sufficiently large C. This completes the proof of Theorem 1. \(\square \)

Proof of Theorem 2

We first prove part (a). It is sufficient to prove that, for any \({\varvec{\theta }}\) satisfying \(\Vert {\varvec{\theta }}-{\varvec{\theta }}_{0}\Vert = \mathrm{{O}}_{P}({\sqrt{p_{n}/n}}+K^{-{\delta }})\) and some small \({{\delta }}_{n}=C({\sqrt{p_{n}/n}}+K^{-{\delta }})\), with probability tending to 1 as \(n \rightarrow \infty \),

$$\begin{aligned} \frac{\partial {Q({\varvec{\theta }})}}{\partial {{\theta }_{j}}}<0,\,\,\mathrm{{for}} \,\,{-{{\delta }}_{n}<{\theta }_{j}<0},\,\,j=t+l+1,\ldots ,p_{n}+r \end{aligned}$$
(19)

and

$$\begin{aligned} \frac{\partial {Q({\varvec{\theta }})}}{\partial {{\theta }_{j}}}>0,\,\,\mathrm{{for}} \,\,{0<{\theta }_{j}<{{\delta }}_{n}},\,\,j=t+l+1,\ldots ,p_{n}+r. \end{aligned}$$
(20)

Hence, (19) and (20) imply that the minimizer of \(Q({\varvec{\theta }})\) attains at \({\theta }_{j}=0,j=t+l+1,\ldots ,p_{n}+r\).

For \(j=t+l+1,\ldots ,p_{n}+r\), we have

$$\begin{aligned} \frac{\partial {Q({\varvec{\theta }})}}{\partial {{\theta }_{j}}}= & {} -\sum _{i=1}^{n}({\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n})_{n,ij} [\widetilde{{Y}}_{n,i}-({\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n})_{n,i{\cdot }}{\varvec{\theta }}] +np_{{\lambda }_{j}}^{\prime }(|{\theta }_{j}|)\mathrm{{sgn}}({\theta }_{j})\\= & {} -\sum _{i=1}^{n}({\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n})_{n,ij} [\widetilde{\mathbf{D}}_{n,i{\cdot }}{\varvec{\theta }}_{0}+ ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})_{n,i{\cdot }}{\mathbf{V}}_{n} +\widetilde{\varvec{\varepsilon }}_{n,i}-({\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n})_{n,i{\cdot }} {\varvec{\theta }}_{0}]\\&-\sum _{i=1}^{n}({\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n})_{n,ij}({\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n})_{n,i{\cdot }} ({\varvec{\theta }}_{0}-{\varvec{\theta }}) +np_{{\lambda }_{j}}^{\prime }(|{\theta }_{j}|)\mathrm{{sgn}}({\theta }_{j})\\= & {} -\,(\widetilde{\mathbf{D}}_{n}^{{\mathrm{T}}})_{n,j{\cdot }}{\mathbf{M}}_{n}\widetilde{\varvec{\varepsilon }}_{n} -(\widetilde{\mathbf{D}}_{n}^{{\mathrm{T}}})_{n,j{\cdot }}{\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n} +(\widetilde{\mathbf{D}}_{n}^{{\mathrm{T}}})_{n,j{\cdot }}{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n} ({\varvec{\theta }}-{\varvec{\theta }}_{0})\\&+\; np_{{\lambda }_{j}}^{\prime }(|{\theta }_{j}|)\mathrm{{sgn}}({\theta }_{j}). \end{aligned}$$

By using the same arguments as those used in the proof of Theorem 1 and notice that \({\varvec{\theta }}-{\varvec{\theta }}_{0}= \mathrm{{O}}_{P}({\sqrt{p_{n}/n}}+K^{-{\delta }})\), we can conclude that

$$\begin{aligned} \widetilde{\mathbf{D}}_{n}^{{\mathrm{T}}}{\mathbf{M}}_{n}\widetilde{\varvec{\varepsilon }}_{n}= & {} {\mathbf{D}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}\\= & {} \overline{\mathbf{D}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}\\&+\;\overline{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}\\= & {} \mathrm{{O}}_{P}({\sqrt{np_{n}}})+\mathrm{{O}}_{P}(p_{n})\\= & {} \mathrm{{O}}_{P}({\sqrt{np_{n}}}),\\ \widetilde{\mathbf{D}}_{n}^{{\mathrm{T}}}{\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}= & {} {\mathbf{D}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\\= & {} \overline{\mathbf{D}}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\\&+\;\overline{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\\= & {} \mathrm{{O}}_{P}(nK^{-{\delta }})+\mathrm{{O}}_{P}({\sqrt{np_{n}}}K^{-{\delta }})\\= & {} \mathrm{{O}}_{P}(nK^{-{\delta }}) \end{aligned}$$

and

$$\begin{aligned} \widetilde{\mathbf{D}}_{n}^{{\mathrm{T}}}{\mathbf{M}}_{n}\widetilde{\mathbf{D}}_{n} ({\varvec{\theta }}-{\varvec{\theta }}_{0}) =\mathrm{{O}}_{P}(n({\sqrt{p_{n}/n}}+K^{-{\delta }})). \end{aligned}$$

Combining the above results, we get

$$\begin{aligned} \frac{\partial {Q({\varvec{\theta }})}}{\partial {{\theta }_{j}}} =n{\lambda }_{j} \left\{ {\lambda }_{j}^{-1}p_{{\lambda }_{j}}^{\prime }(|{\theta }_{j}|)\mathrm{{sgn}}({\theta }_{j}) +\mathrm{{O}}_{P}({\lambda }_{j}^{-1}({\sqrt{p_{n}/n}}+K^{-{\delta }}))\right\} . \end{aligned}$$

Since \(p_{{\lambda }_{j}}^{\prime }(0+)={\lambda }_{j}\), \({\lambda }_{j}^{-1}p_{{\lambda }_{j}}^{\prime }(|{\theta }_{j}|) {\ge }\mathop {\lim \inf }_{n \rightarrow \infty }\mathop {\lim \inf }_{{\theta }_{j} \rightarrow 0}{\lambda }_{j}^{-1}p_{{\lambda }_{j}}^{\prime }(|{\theta }_{j}|)=1\). This together with \({\lambda }_{j}\left( {\sqrt{p_{n}/n}}+K^{-{\delta }}\right) ^{-1} >{\lambda }_{{\mathrm{min}}}\left( {\sqrt{p_{n}/n}}+K^{-{\delta }}\right) ^{-1}{\rightarrow }{\infty }\) as \(n{\rightarrow }{\infty }\) imply that the sign of \(\frac{\partial {Q({\varvec{\theta }})}}{\partial {{\theta }_{j}}}\) is completely determined by that of \({\theta }_{j}\). As a result, (19) and (20) hold.

Next, we prove part (b). Since \({\widehat{\varvec{\theta }}}=(({\widehat{\varvec{\theta }}}^{*})^{{\mathrm{T}}},{\mathbf{0}})^{{\mathrm{T}}}\) minimizes \(Q({\varvec{\theta }})\), \({\widehat{\varvec{\theta }}}\) must satisfy the following system of equations

$$\begin{aligned} \frac{\partial {Q({\widehat{\varvec{\theta }}})}}{\partial {{\theta }_{j}}}=0,\,\,j=1,\ldots ,t+l. \end{aligned}$$

That is

$$\begin{aligned} -\sum _{i=1}^{n}({\mathbf{M}}_{n}^{*}\widetilde{\mathbf{D}}_{n}^{*})_{n,ij} [\widetilde{{Y}}_{n,i}-({\mathbf{M}}_{n}^{*}\widetilde{\mathbf{D}}_{n}^{*})_{n,i{\cdot }} {\widehat{\varvec{\theta }}}^{*}] +np_{{\lambda }_{j}}^{\prime }(|{\widehat{\theta }}_{j}^{*}|)\mathrm{{sgn}} ({\widehat{\theta }}_{j}^{*})=0,\,\,j=1,\ldots ,t+l. \qquad \end{aligned}$$
(21)

By straightforward derivation, we have

$$\begin{aligned}&\sum _{i=1}^{n}({\mathbf{M}}_{n}^{*}\widetilde{\mathbf{D}}_{n}^{*})_{n,ij} [\widetilde{{Y}}_{n,i}-({\mathbf{M}}_{n}^{*}\widetilde{\mathbf{D}}_{n}^{*})_{n,i{\cdot }} {\widehat{\varvec{\theta }}}^{*}]\nonumber \\&\quad =\quad ((\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}})_{n,j{\cdot }}{\mathbf{M}}_{n}^{*}\widetilde{\varvec{\varepsilon }}_{n} +((\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}})_{n,j{\cdot }}{\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\nonumber \\&\qquad +((\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}})_{n,j{\cdot }}{\mathbf{M}}_{n}^{*}\widetilde{\mathbf{D}}_{n}^{*} ({\varvec{\theta }}_{0}^{*}-\widehat{\varvec{\theta }}^{*}),\,\,j=1,\ldots ,t+l. \end{aligned}$$
(22)

where \(\widetilde{\mathbf{D}}_{n}^{*}=({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{D}}_{n}^{*}\). By applying Taylor expansion to \(p_{{\lambda }_{j}}^{\prime }(|{\widehat{\theta }}_{j}^{*}|)\), we obtain

$$\begin{aligned} p_{{\lambda }_{j}}^{\prime }(|{\widehat{\theta }}_{j}^{*}|) =p_{{\lambda }_{j}}^{\prime }(|{\theta }_{j0}^{*}|)+ [p_{{\lambda }_{j}}^{\prime \prime }(|{\theta }_{j0}^{*}|)+\mathrm{{o}}_{P}(1)] ({\widehat{\theta }}_{j}^{*}-{\theta }_{j0}^{*}),\,\,\quad j=1,\ldots ,t+l. \end{aligned}$$

For both SCAD and MCP penalty functions, as \({\lambda }_{\max }{\rightarrow }0\), \(p_{{\lambda }_{j}}^{\prime }(|{\theta }_{j0}^{*}|)=0\) and \(p_{{\lambda }_{j}}^{\prime \prime }(|{\theta }_{j0}^{*}|)=0\). Thus, we have

$$\begin{aligned} p_{{\lambda }_{j}}^{\prime }(|{\widehat{\theta }}_{j}^{*}|) =\mathrm{{o}}_{P}(1)({\widehat{\theta }}_{j}^{*}-{\theta }_{j0}^{*}),\,\,j=1,\ldots ,t+l. \end{aligned}$$

This together with (21) and (22) yields

$$\begin{aligned}&(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}\widetilde{\mathbf{D}}_{n}^{*} (\widehat{\varvec{\theta }}^{*}-{\varvec{\theta }}_{0}^{*}) -(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}\widetilde{\varvec{\varepsilon }}_{n}\nonumber \\&\quad -(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n} +\mathrm{{o}}_{P}(n)(\widehat{\varvec{\theta }}^{*}-{\varvec{\theta }}_{0}^{*}) ={\mathbf{0}}. \end{aligned}$$
(23)

It follows from (23) that

$$\begin{aligned} {\mathbf{0}}= & {} \left[ n^{-1}(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}\widetilde{\mathbf{D}}_{n}^{*} +\mathrm{{o}}_{P}(1)\right] {\sqrt{n}}(\widehat{\varvec{\theta }}^{*}-{\varvec{\theta }}_{0}^{*}) -\frac{1}{\sqrt{n}}(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}\widetilde{\varvec{\varepsilon }}_{n}\nonumber \\&-\frac{1}{\sqrt{n}}(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\nonumber \\&{\overset{\triangle }{=}}&\left[ C_{n3}+\mathrm{{o}}_{P}(1)\right] {\sqrt{n}}(\widehat{\varvec{\theta }}^{*}-{\varvec{\theta }}_{0}^{*}) -C_{n4}-C_{n5}. \end{aligned}$$
(24)

Note that \({\mathbf{M}}_{n}^{*}\) is an \(n{\times }(t+l)\) idempotent matrix. Thus, by using the same arguments as those in the proof of Theorem 1, we can obtain

$$\begin{aligned}&\Vert {\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}\Vert ^{2} =\mathrm{{O}}_{P}(t+l)=\mathrm{{O}}_{P}(1), \\&{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}{\mathbf{G}}_{ni}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{G}}_{nj}{\varvec{\varepsilon }}_{n} =\mathrm{{O}}_{P}(t+l)=\mathrm{{O}}_{P}(1),\,i,j=1,\ldots ,t, \\&\Vert {\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}^{*}\Vert ^{2} =\mathrm{{O}}(n) \end{aligned}$$

and

$$\begin{aligned} \Vert {\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\Vert ^{2} =\mathrm{{O}}(nK^{-2{\delta }}). \end{aligned}$$

Let \({\overline{\varvec{\varepsilon }}}_{n}^{*}=({\mathbf{G}}_{n1}{\varvec{\varepsilon }}_{n}, \ldots ,{\mathbf{G}}_{nt}{\varvec{\varepsilon }}_{n},{\mathbf{0}}_{n{\times }l})\), then \({\mathbf{D}}_{n}^{*}=\overline{\mathbf{D}}_{n}^{*}+\overline{\varvec{\varepsilon }}_{n}^{*}\). This together with Cauchy–Schwarz inequality and the above four results, we have

$$\begin{aligned} C_{n3}= & {} n^{-1}(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}\widetilde{\mathbf{D}}_{n}^{*}\nonumber \\= & {} n^{-1}{\overline{\mathbf{D}}_{n}^{*}}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}^{*} +n^{-1}{\overline{\varvec{\varepsilon }}_{n}^{*}}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\varvec{\varepsilon }}_{n}^{*}\nonumber \\&+\,n^{-1}{\overline{\mathbf{D}}_{n}^{*}}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\varvec{\varepsilon }}_{n}^{*} +n^{-1}{\overline{\varvec{\varepsilon }}_{n}^{*}}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n})\overline{\mathbf{D}}_{n}^{*}\nonumber \\= & {} {\varvec{\varSigma }}_{n,1}+\mathrm{{O}}_{P}(n^{-1})+\mathrm{{O}}_{P}(n^{-1/2})\nonumber \\= & {} {\varvec{\varSigma }}_{n,1}+\mathrm{{o}}_{P}(1), \end{aligned}$$
(25)
$$\begin{aligned} C_{n4}= & {} \frac{1}{\sqrt{n}}(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}\widetilde{\varvec{\varepsilon }}_{n}\nonumber \\= & {} \frac{1}{\sqrt{n}}(\overline{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n} +\frac{1}{\sqrt{n}}{\overline{\varvec{\varepsilon }}_{n}^{*}}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}\nonumber \\= & {} \frac{1}{\sqrt{n}}(\overline{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}+\mathrm{{O}}_{P}(n^{-1/2})\nonumber \\= & {} \frac{1}{\sqrt{n}}(\overline{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}+\mathrm{{o}}_{P}(1), \end{aligned}$$
(26)

and

$$\begin{aligned} C_{n5}= & {} \frac{1}{\sqrt{n}}(\widetilde{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}{\mathbf{M}}_{n}^{*}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\nonumber \\= & {} \frac{1}{\sqrt{n}}(\overline{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}+\frac{1}{\sqrt{n}}{\overline{\varvec{\varepsilon }}_{n}^{*}}^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{V}}_{n}\nonumber \\= & {} \mathrm{{O}}_{P}({\sqrt{n}}K^{-\delta })+\mathrm{{O}}_{P}(K^{-\delta })\nonumber \\= & {} \mathrm{{O}}_{P}({\sqrt{n}}K^{-\delta })\nonumber \\= & {} \mathrm{{o}}_{P}(1). \end{aligned}$$
(27)

Combining (24)–(27), we obtain

$$\begin{aligned} \left[ {\varvec{\varSigma }}_{n,1}^{*}+\mathrm{{o}}_{P}(1)\right] {\sqrt{n}}(\widehat{\varvec{\theta }}^{*}-{\varvec{\theta }}_{0}^{*}) =\frac{1}{\sqrt{n}}(\overline{\mathbf{D}}_{n}^{*})^{{\mathrm{T}}}({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\mathbf{M}}_{n}^{*} ({\mathbf{I}}_{n}-{\varvec{\varPi }}_{n}){\varvec{\varepsilon }}_{n}+\mathrm{{o}}_{P}(1). \end{aligned}$$

By using the central limit theorem and the Slutsky’s Lemma, we have

$$\begin{aligned} {\sqrt{n}}(\widehat{\varvec{\theta }}^{*}-{\varvec{\theta }}_{0}^{*}) \overset{\text {D}}{\longrightarrow }N({\mathbf{0}},{\varvec{\varSigma }}). \end{aligned}$$

\(\square \)

Proof of Theorem 3

First, we prove part (a). By the definition of \({\widehat{\varvec{\nu }}}\), we have

$$\begin{aligned} {\widehat{\varvec{\nu }}}-{\varvec{\nu }}_{0}= & {} ({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1} {\mathbf{P}}_{n}^{{\mathrm{T}}}({\mathbf{Y}}_{n}-{\mathbf{D}}_{n}\widehat{\varvec{\theta }}) -{\varvec{\nu }}_{0}\\= & {} ({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1} {\mathbf{P}}_{n}^{{\mathrm{T}}}({\mathbf{D}}_{n}{\varvec{\theta }}_{0} +{\mathbf{P}}_{n}{\varvec{\nu }}_{0}+{\mathbf{V}}_{n}+{\varvec{\varepsilon }}_{n}-{\mathbf{D}}_{n}\widehat{\varvec{\theta }}) -{\varvec{\nu }}_{0}\\= & {} C_{n6}+C_{n7}+c_{n8}, \end{aligned}$$

where \(C_{n6}=({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{D}}_{n} ({\varvec{\theta }}_{0}-\widehat{\varvec{\theta }})\), \(C_{n7}=({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{V}}_{n}\) and \(C_{n8}=({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\varvec{\varepsilon }}_{n}\).

By Assumption 3.4, Fact 2 and \({\mathbf{D}}_{n}=\overline{\mathbf{D}}_{n}+\overline{\varvec{\varepsilon }}_{n}\), we obtain

$$\begin{aligned} \Vert C_{n6}\Vert ^{2}= & {} (\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})^{{\mathrm{T}}} {\mathbf{D}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n}({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1} ({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{D}}_{n} (\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})\\&{\le }&n^{-1}{\eta }_{\min }^{-1}(n^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n}) (\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})^{{\mathrm{T}}} {\mathbf{D}}_{n}^{{\mathrm{T}}}{\varvec{\varPi }}_{n}{\mathbf{D}}_{n} (\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})\\&{\le }&{{\underline{c}}}_{P}^{-1}(\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})^{{\mathrm{T}}} (n^{-1}{\mathbf{D}}_{n}^{{\mathrm{T}}}{\mathbf{D}}_{n}) (\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})\\= & {} C_{n61}+C_{n62}+C_{n63}, \end{aligned}$$

where \(C_{n61}={{\underline{c}}}_{P}^{-1}(\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})^{{\mathrm{T}}} (n^{-1}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}}\overline{\mathbf{D}}_{n}) (\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})\), \(C_{n62}={{\underline{c}}}_{P}^{-1}(\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})^{{\mathrm{T}}} (n^{-1}\overline{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}\overline{\varvec{\varepsilon }}_{n}) (\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})\) and \(C_{n63}=2{{\underline{c}}}_{P}^{-1}(\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})^{{\mathrm{T}}} (n^{-1}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}}\overline{\varvec{\varepsilon }}_{n}) (\widehat{\varvec{\theta }}-{\varvec{\theta }}_{0})\).

It follows from Theorem 1 that \(\Vert \widehat{\varvec{\theta }}-{\varvec{\theta }}_{0}\Vert ^{2}= \mathrm{{O}}_{P}({{p_{n}/n}}+K^{-2{\delta }})\). This together with Assumption 3.4 yields

$$\begin{aligned} C_{n61}{\le }{{\underline{c}}}_{P}^{-1}{\eta }_{\max }(n^{-1}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}} \overline{\mathbf{D}}_{n})\Vert \widehat{\varvec{\theta }}-{\varvec{\theta }}_{0}\Vert ^{2}=\mathrm{{O}}_{P}({{p_{n}/n}}+K^{-2{\delta }}). \end{aligned}$$

For \(i,j=1,\ldots ,r\), by Assumption 1.3 and Facts 1 and 3, we have

$$\begin{aligned} \mathrm{{E}}(n^{-1}{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}{\mathbf{G}}_{ni}^{{\mathrm{T}}}{\mathbf{G}}_{nj}{\varvec{\varepsilon }}_{n}) =n^{-1}{\sigma }_{0}^{2}\mathrm{{tr}}({\mathbf{G}}_{ni}^{{\mathrm{T}}}{\mathbf{G}}_{nj}) {\le }{\sigma }_{0}^{2}{\eta }_{\max }({\mathbf{G}}_{ni}^{{\mathrm{T}}}{\mathbf{G}}_{nj}) =\mathrm{{O}}(1). \end{aligned}$$

This implies that \(n^{-1}{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}{\mathbf{G}}_{ni}^{{\mathrm{T}}}{\mathbf{G}}_{nj}{\varvec{\varepsilon }}_{n} =\mathrm{{O}}_{P}(1)\) (\(i,j=1,\ldots ,r\)). Thus, we get

$$\begin{aligned} C_{n62}={{\underline{c}}}_{P}^{-1}({\varvec{\theta }}_{0}-\widehat{\varvec{\theta }})^{{\mathrm{T}}}(n^{-1}\overline{\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}\overline{\varvec{\varepsilon }}_{n})({\varvec{\theta }}_{0}-\widehat{\varvec{\theta }})=\mathrm{{O}}_{P}({{p_{n}/n}}+K^{-2{\delta }}). \end{aligned}$$

By using Cauchy–Schwarz inequality and the orders of \(C_{n61}\) and \(C_{n62}\), we can show that

$$\begin{aligned} C_{n63}= & {} 2{{\underline{c}}}_{P}^{-1}({\varvec{\theta }}_{0}-\widehat{\varvec{\theta }})^{{\mathrm{T}}} (n^{-1}\overline{\mathbf{D}}_{n}^{{\mathrm{T}}}\overline{\varvec{\varepsilon }}_{n}) ({\varvec{\theta }}_{0}-\widehat{\varvec{\theta }})\\&{\le }&2\left| \left[ {{\underline{c}}}_{P}^{-1/2}n^{-1/2}({\varvec{\theta }}_{0}-\widehat{\varvec{\theta }})^{{\mathrm{T}}} \overline{\mathbf{D}}_{n}^{{\mathrm{T}}}\right] {\cdot }\left[ {{\underline{c}}}_{P}^{-1/2}n^{-1/2}\overline{\varvec{\varepsilon }}_{n} ({\varvec{\theta }}_{0}-\widehat{\varvec{\theta }})\right] \right| \\\le & {} 2(C_{n61}C_{n62})^{1/2}\\= & {} \mathrm{{O}}_{P}({{p_{n}/n}}+K^{-2{\delta }}). \end{aligned}$$

Combining the orders of \(C_{n61}\), \(C_{n62}\) and \(C_{n63}\), we obtain

$$\begin{aligned} \Vert C_{n6}\Vert =\mathrm{{O}}_{P}({\sqrt{p_{n}/n}}+K^{-{\delta }}). \end{aligned}$$

From the proof of (18), we have \(\Vert {\mathbf{V}}_{n}\Vert ^{2}=\mathrm{{O}}(nK^{-2{\delta }})\). This together with Assumption 4.4 and Fact 2 yields

$$\begin{aligned} \Vert C_{n7}\Vert ^{2}= & {} {\mathbf{V}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n}({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1} ({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{V}}_{n}\\\le & {} n^{-1}{\eta }_{\min }^{-1}(n^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n}){\mathbf{V}}_{n}^{{\mathrm{T}}} {\varvec{\varPi }}_{n}{\mathbf{V}}_{n}\\\le & {} n^{-1}{{\underline{c}}}_{P}^{-1}\Vert {\mathbf{V}}_{n}\Vert ^{2}\\= & {} \mathrm{{O}}(K^{-2{\delta }}). \end{aligned}$$

This means that \(\Vert C_{n7}\Vert =\mathrm{{O}}(K^{-{\delta }})\).

Under Assumption 4.4, we have

$$\begin{aligned} \mathrm{{E}}(\Vert C_{n8}\Vert ^{2})= & {} \mathrm{{E}}({\varvec{\varepsilon }}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n}({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1} ({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\varvec{\varepsilon }}_{n})\\\le & {} n^{-1}{\eta }_{\min }^{-1}(n^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n}) \mathrm{{E}}({\varvec{\varepsilon }}_{n}^{{\mathrm{T}}} {\mathbf{P}}_{n}({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}}{\varvec{\varepsilon }}_{n})\\= & {} n^{-1}{{\underline{c}}}_{P}^{-1}{\sigma }_{0}^{2} \mathrm{{tr}}({\mathbf{P}}_{n}({\mathbf{P}}_{n}^{{\mathrm{T}}}{\mathbf{P}}_{n})^{-1}{\mathbf{P}}_{n}^{{\mathrm{T}}})\\= & {} \mathrm{{O}}(K/n). \end{aligned}$$

This implies that \(\Vert C_{n8}\Vert =\mathrm{{O}}_{P}({\sqrt{K/n}})\).

Combining the orders of \(C_{n6}\), \(C_{n7}\) and \(C_{n8}\) with triangle inequality, we obtain

$$\begin{aligned} \Vert {\widehat{\varvec{\nu }}}-{\varvec{\nu }}_{0}\Vert {\le }\Vert C_{n6}\Vert +\Vert C_{n7}\Vert +\Vert C_{n8}\Vert =\mathrm{{O}}_{P}({\sqrt{p_{n}/n}}+{\sqrt{K/n}}+K^{-{\delta }}). \end{aligned}$$

This together with Assumptions 4.1 and 4.3 yields

$$\begin{aligned} {\sup }_{{\mathbf{z}}{\in }{{\mathcal {Z}}}}|{{\widehat{m}}}({\mathbf{z}})-{{m}}_{0}({\mathbf{z}})|= & {} |{\mathbf{p}}^{K}({\mathbf{z}})^{{\mathrm{T}}}{\widehat{\varvec{\nu }}} -{{m}}_{0}({\mathbf{z}})|\\= & {} |{\mathbf{p}}^{K}({\mathbf{z}})^{{\mathrm{T}}}({\widehat{\varvec{\nu }}}- {{\varvec{\nu }}}_{0})+{\mathbf{p}}^{K}({\mathbf{z}})^{{\mathrm{T}}}{{\varvec{\nu }}}_{0} -{{m}}_{0}({\mathbf{z}})|\\\le & {} |{\mathbf{p}}^{K}({\mathbf{z}})^{{\mathrm{T}}}({\widehat{\varvec{\nu }}}- {{\varvec{\nu }}}_{0})| +|{{m}}_{0}({\mathbf{z}})-{\mathbf{p}}^{K}({\mathbf{z}})^{{\mathrm{T}}}{{\varvec{\nu }}}_{0}|\\\le & {} \Vert {\widehat{\varvec{\nu }}}- {{\varvec{\nu }}}_{0}\Vert {\cdot }{\sup }_{{\mathbf{z}}{\in }{{\mathcal {Z}}}}\Vert {\mathbf{p}}^{K}({\mathbf{z}})\Vert + {\sup }_{{\mathbf{z}}{\in }{{\mathcal {Z}}}}|m_{0}({\mathbf{z}})-{\mathbf{p}}^{K}({\mathbf{z}}) ^{{\mathrm{T}}}{\varvec{\nu }}_{0}|\\\le & {} {\zeta }(K)\Vert {\widehat{\varvec{\nu }}}- {{\varvec{\nu }}}_{0}\Vert +\mathrm{{O}}(K^{-{\delta }})\\= & {} \mathrm{{O}}_{P}({\zeta }(K)({\sqrt{p_{n}/n}}+{\sqrt{K/n}}+K^{-{\delta }})). \end{aligned}$$

Thus, we complete the proof of part (b). \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Li, T., Kang, X. Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters. Stat Papers (2021). https://doi.org/10.1007/s00362-021-01241-4

Download citation

Keywords

  • Higher-order spatial dependence
  • Partially linear model
  • Series estimation
  • Two-stage least squares
  • Non-convex penalty