Variable selection for spatial autoregressive models with a diverging number of parameters


Variable selection has played a fundamental role in regression analysis. Spatial autoregressive model is a useful tool in econometrics and statistics in which context variable selection is necessary but not adequately investigated. In this paper, we consider conducting variable selection in spatial autoregressive models with a diverging number of parameters. Smoothly clipped absolute deviation penalty is considered to obtain the estimators. Moreover the dimension of the covariates are allowed to vary with sample size. In order to attenuate the bias caused by endogeneity, instrumental variable is adopted in the estimation procedure. The proposed method can do parametric estimation and variable selection simultaneously. Under mild conditions, we establish the asymptotic and oracle property of the proposed estimators. Finally, the performance of the proposed estimation procedure is examined via Monte Carlo simulation studies and a data set from a Boston housing price is analyzed as an illustrative example.

This is a preview of subscription content, log in to check access.


  1. Ai CR, Zhang YQ (2017) Estimation of partially specified spatial panel data models with fixed-effects. Econ Rev 36(1–3):6–22

    MathSciNet  Article  Google Scholar 

  2. Anderson TW, Hsiao C (1981) Estimation of dynamic models with error components. J Am Stat Assoc 76:598–606

    MathSciNet  Article  Google Scholar 

  3. Anderson TW, Hsiao C (1982) Formulation and estimation of dynamic models using panel data. J Econ 18:47–82

    MathSciNet  Article  Google Scholar 

  4. Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, The Netherlands

    Google Scholar 

  5. Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah A (ed) Handbook of applied economic statistics. CRC Press, Marcel Dekker, New York, pp 237–290

    Google Scholar 

  6. Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4):373–384

    MathSciNet  Article  Google Scholar 

  7. Cliff A, Ord JK (1973) Spatial autocorrelation. Pion, London

    Google Scholar 

  8. Dai XW, Jin LB, Shi L, Yang CP, Liu SZ (2015) Local influence analysis for general spatial models. Adv Stat Anal (Online)

  9. Dai XW, Jin LB, Shi AQ, Shi L (2016) Outlier detection and accommodations in general spatial models. Stat Methods Appl (Online)

  10. Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    MathSciNet  Article  Google Scholar 

  11. Fan J, Li R (2004) New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 99:710–723

    MathSciNet  Article  Google Scholar 

  12. Fan J, Peng H (2004) On nonconcave penalized likelihood with diverging number of parameters. Ann Stat 32(3):928–961

    MathSciNet  Article  Google Scholar 

  13. Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148

    Article  Google Scholar 

  14. Koenker R, Ng P, Portnoy S (1994) Quantile smoothing splines. Biometrika 81:673–680

    MathSciNet  Article  Google Scholar 

  15. Lee LF (2003) Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econ Rev 22:307–335

    MathSciNet  Article  Google Scholar 

  16. Lee LF (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72:1899–1925

    MathSciNet  Article  Google Scholar 

  17. Lee LF (2007) GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J Econ 137:489–514

    MathSciNet  Article  Google Scholar 

  18. Lee LF, Yu JH (2010) Estimation of spatial autoregressive panel data models with fixed effects. J Econ 154:165–185

    MathSciNet  Article  Google Scholar 

  19. Leng C (2010) Variable selection and coefficient estimation via regularized rank regression. Stat Sinica 20(1):167–181

    MathSciNet  MATH  Google Scholar 

  20. Li R, Liang H (2008) Variable selection in semiparametric regression model. Ann Stat 36:261–286

    Article  Google Scholar 

  21. Pace RK, Gilley OW (1997) Using the spatial configuration of the data to improve estimation. J Real Estate Financ Econ 14:333–340

    Article  Google Scholar 

  22. Su L, Yang Z (2009) Instrumental variable quantile estimation of spatial autoregressive models. Working paper. Singapore Management University.

  23. Tibshirani RJ (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc: Ser. B 58:267–288

  24. Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757

    MathSciNet  Article  Google Scholar 

  25. Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J Bus Econ Stat 25:347–355

    MathSciNet  Article  Google Scholar 

  26. Wang XQ, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. J Am Stat Assoc 108(502):632–643

    MathSciNet  Article  Google Scholar 

  27. Xiong S (2010) Some notes on the nonnegative garrote. Technometrics 52(3):349–361

    MathSciNet  Article  Google Scholar 

  28. Zhang YQ, Shen DM (2015) Estimation of semi-parametric varying-coefficient spatial panel data models with random-effects. J Stat Plan Inference 159:64–80

    MathSciNet  Article  Google Scholar 

  29. Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    MathSciNet  Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Ruiyuan Cao.

Additional information

Xie’s work is supported by the National Natural Science Foundation of China (No. 11571340) and the Science and Technology Project of Beijing Municipal Education Commission (KM201710005032). Cao’s work is supported by China Postdoctoral Science Foundation (No. 2016M591030) and the National Natural Science Foundation of China (No. 11701020). Du’s work is supported by the National Natural Science Foundation of China (Nos. 11501018, 11771032, 11571340), and Program for Rixin Talents in Beijing University of Technology (No. 006000514116003).

Appendix : Proofs

Appendix : Proofs

In this part of the paper, we denote by C a generic positive constant, which may take different values at different places.

Proof of Theorem 1

Let \(\gamma _n=\sqrt{p_n}(n^{-1/2}+a_n)\) and set \(||u||=C\), where C is a large enough constant. Similar to Fan and Li (2001), we first show that \(\Vert \hat{\varvec{\theta }}-\varvec{\theta }_0\Vert =O_p(\gamma _n).\) It suffices to show that for any given \(\eta >0\), there is a large constant C such that, for large n,

$$\begin{aligned} P\left\{ \inf _{||\varvec{u}||=C}Q_n(\varvec{\theta }_0+\gamma _n {\varvec{u}})>Q_n(\varvec{\theta }_0)\right\} \ge 1-\eta . \end{aligned}$$


$$\begin{aligned} S_n(\varvec{u})=\Vert {\varvec{Y}}_n-\varvec{Z}(\varvec{\theta }_0+\gamma _n {\varvec{u}})\Vert ^2-\Vert {\varvec{Y}}_n-\varvec{Z} \varvec{\theta }_0 \Vert ^2. \end{aligned}$$

Then, \(S_n(\varvec{u})\) can be written as

$$\begin{aligned} S_n(\varvec{u})= & {} -2\gamma _n \left( {\varvec{Y}}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}+\gamma _n^2 \varvec{u}^T \varvec{Z} ^T\varvec{Z} \varvec{u}\\= & {} -\gamma _n R_1+\gamma _n^2R_2, \end{aligned}$$

where \(R_1=\left( {\varvec{Y}}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}\) and \(R_2= \varvec{u}^T \varvec{Z} ^T\varvec{Z} \varvec{u}.\)

First, we analyze \(R_1\). Note that

$$\begin{aligned} R_1= & {} \left( {\varvec{Y}}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}\\= & {} \left( {\varvec{B}}\varvec{\theta }_0+{\varvec{\varepsilon }}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T\varvec{Z} \varvec{u}+\varvec{\theta }_0^T \left( {\varvec{B}} -\varvec{Z} \right) ^T\varvec{Z} \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T\varvec{Z} \varvec{u}+\varvec{\theta }_0^T \left( {\varvec{B}} -{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{B}} \right) ^T{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{B}} \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{B}} \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T({\varvec{B}}^*+\varvec{e}) \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{B}}^*\varvec{u}+ {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \varvec{e} \varvec{u}\\= & {} R_{11}+R_{12}, \end{aligned}$$

where \({\varvec{B}}=(\varvec{D}_n,{\varvec{X}}_n)\), \({\varvec{B}}^*=({\varvec{G}}_n{\varvec{X}}_n{\varvec{\beta }}_0,{\varvec{X}}_n)\), \(\varvec{e}=({\varvec{G}}_n{\varvec{\varepsilon }}_n,0)=({\varvec{G}}_n,0) {\varvec{\varepsilon }}_n\), \({\varvec{G}}_n={\varvec{W}}_n(\varvec{I}_n-\rho _0{\varvec{W}}_n)^{-1}\) and \(R_{11},R_{12}\) are as follows:

$$\begin{aligned}&R_{11}= {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{B}}^*\varvec{u},\\&R_{12}= {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \varvec{e} \varvec{u}. \end{aligned}$$

Consider \(R_{11}\). Obviously, \(E(R_{11})=0 \) and

$$\begin{aligned} E(\Vert {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1/2}\Vert ^2)= & {} E [ {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{\varepsilon }}_n ]\\= & {} E [\text {trace }\{{\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{\varepsilon }} \}]\\= & {} E [\text {trace }\{{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{\varepsilon }} {\varvec{\varepsilon }}_n^T \}]\\= & {} \text {trace }\{E [{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{\varepsilon }} {\varvec{\varepsilon }}_n^T ]\}\\= & {} \sigma ^2\text {trace }\{E [{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T\varvec{]}\}\\= & {} \sigma ^2E [\text {trace }\{ {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \}]\\= & {} O(p_n). \end{aligned}$$

Thus, one has \(\Vert {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1/2}\Vert ^2=O_p( p_n ).\) By Assumption 2 and the condition \(||u||=C\), we have \( \varvec{u}^T{{\varvec{B}}^*}^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{B}}^*\varvec{u}=O_p(n\Vert \varvec{u}\Vert ^2).\) Invoking Cauchy-Schwarz inequality, we have \(R_{11}=O_p(\sqrt{np_n}\Vert \varvec{u}\Vert )\). Similarly, by Assumption 3, one has

$$\begin{aligned} E(\Vert {\varvec{\varepsilon }}_n^T{\varvec{G}}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1/2}\Vert ^2)= & {} E [ {\varvec{\varepsilon }}_n^T {\varvec{G}}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{G}}_n{\varvec{\varepsilon }}_n ]\\= & {} E [\text {trace }\{{\varvec{\varepsilon }}_n^T {\varvec{G}}_n^T{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{G}}_n\varvec{\varepsilon }_n \}]\\= & {} \sigma ^2\text {trace }\{E [{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{G}}_n {\varvec{G}}_n^T]\}\\\le & {} \tilde{\lambda }_c \sigma ^2E [\text {trace }\{ {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \}]\\= & {} O(p_n). \end{aligned}$$

Therefore, by Cauchy-Schwarz inequality, it has

$$\begin{aligned} R_{12}= & {} {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \varvec{e} \varvec{u}\\\le & {} \left\{ {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{\varepsilon }}_n \varvec{u}^T{\varvec{\varepsilon }}_n^T ({\varvec{G}}_n,0)^T{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T ({\varvec{G}}_n,0) {\varvec{\varepsilon }}_n \varvec{u}\right\} ^{{-1/2}}\\= & {} O(p_n\Vert \varvec{u}\Vert ). \end{aligned}$$

Combining the convergence rates of \(R_{11}\) and \(R_{12}\), one has \(R_{1}=O_p(\sqrt{np_n}\Vert \varvec{u}\Vert ).\)

Next, we consider \(R_2.\) By the fact \({\varvec{B}}^*=({\varvec{G}}_n{\varvec{X}}_n{\varvec{\beta }}_0,{\varvec{X}}_n) \) and the definition of \(\varvec{Z}\), one has

$$\begin{aligned} \frac{1}{n}R_2= & {} \frac{1}{n}\varvec{u}^T\varvec{Z}^T\varvec{Z}\varvec{u}\\= & {} \frac{1}{n}\varvec{u}^T {\varvec{B}}^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T {\varvec{B}}\varvec{u} \\= & {} \frac{1}{n}\varvec{u}^T( {\varvec{B}}^*+\varvec{e})^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T ( {\varvec{B}}^*+\varvec{e})\varvec{u}\\= & {} R_{21}+R_{22}+R_{23}, \end{aligned}$$


$$\begin{aligned}&R_{21}=\frac{1}{n} \varvec{u}^T { {\varvec{B}}^*}^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T {\varvec{B}}^*\varvec{u} ,\\&R_{22}= \frac{1}{n}\varvec{u}^T { \varvec{e}}^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T \varvec{e} \varvec{u},\\&R_{23}=\frac{2}{n}\varvec{u}^T { {\varvec{B}}^*}^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T \varvec{e} \varvec{u} . \end{aligned}$$

By Assumption 2, we have \( R_{21} \asymp \Vert \varvec{u}\Vert ^2 \). Similarly, \( R_{22} \asymp \Vert \varvec{u}\Vert ^2,\) and \( R_{23} \asymp \Vert \varvec{u}\Vert ^2.\) Thus, \( R_{2} \asymp n \Vert \varvec{u}\Vert ^2.\) Consequently, \(\gamma _n R_1=O_p(\gamma _n\sqrt{np_n}\Vert \varvec{u}\Vert )\) and \(\gamma _n^2 R_2=O_p(\gamma _n^2n {\Vert \varvec{u}\Vert ^2})\).

Summarizing the above results, \(\gamma _n^2 R_2\) dominates \(\gamma _n R_1\) uniformly in \(\Vert \varvec{u}\Vert =C\) for a large enough C. Note that

$$\begin{aligned} Q_n(\varvec{\theta }_0+\gamma _n {\varvec{u}})-Q_n(\varvec{\theta }_0)= & {} \frac{1}{2}S_n({\varvec{u}})+n\sum _{j=0}^{p_n}\left\{ p_{\lambda _n}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _n}(|\beta _{j0}|)\right\} \\\ge & {} \frac{1}{2}S_n({\varvec{u}})+n\sum _{j=0}^s\left\{ p_{\lambda _n}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _n}(|\beta _{j0}|)\right\} \\\ge & {} \frac{1}{2}S_n({\varvec{u}})+K_1({\varvec{u}}), \end{aligned}$$

where s is the dimension of \({{\varvec{\beta }}}_{10}\), \(K_1({\varvec{u}})=n\sum \nolimits _{j=0}^s\{ p_{\lambda _{j}}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _{j}}(|\beta _{j0}|)\}\). Then by Taylor’s expansion, we obtain

$$\begin{aligned} K_1({\varvec{u}})= & {} n\sum \limits _{j=0}^s\{ p_{\lambda _{j}}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _{j}}(|\beta _{j0}|)\}\\= & {} n \gamma _n \sum \limits _{j=0}^s p^\prime _ {\lambda _{j}} (|\beta _{j0}|)\text {sgn}(\beta _{j0})u_j+n \gamma _n^2\sum \limits _{j=0}^sp^{\prime \prime } _ {\lambda _{j}} (|\beta _{j0}|) u_j^2\{1+o(1)\}\\\ge & {} -a_n n \gamma _n \sum \limits _{j=0}^s |u_j|-b_n n \gamma _n^2 \sum \limits _{j=0}^s u_j^2\{1+o(1)\}\\\ge & {} -a_n n \gamma _n \sum \limits _{j=0}^s |u_j|-2b_n n \gamma _n^2 \Vert \varvec{u}\Vert ^2 \\\ge & {} {-\sqrt{s_n}a_n n \gamma _n \Vert \varvec{u}\Vert -2b_n n \gamma _n^2 \Vert \varvec{u}\Vert ^2} \\\ge & {} {-\sqrt{p_n}a_n n \gamma _n \Vert \varvec{u}\Vert -2b_n n \gamma _n^2 \Vert \varvec{u}\Vert ^2} \\\ge & {} {- n \gamma _n^2 \Vert \varvec{u}\Vert -2b_n n \gamma _n^2 \Vert \varvec{u}\Vert ^2.} \end{aligned}$$

Thus, by taking C large enough, \(\gamma _n^2R_2\) dominates both \(\gamma _nR_1\) and \(K_1({\varvec{u}})\), and \(\gamma _n^2R_2\) is positive. This proves Theorem 1. \(\square \)

Proof of Theorem 2

We now show the sparsity. It is sufficient to show that with probability tending to one as \(n\rightarrow \infty \) for any \({{\varvec{\beta }}}_1\) satisfying \({{\varvec{\beta }}}_1-{{\varvec{\beta }}}_{10}=O_p(\sqrt{{p_n}/{n}})\) and for some small \(\delta _n=C\sqrt{ {p_n}/{n}}\) and \(j=s+1,\ldots ,p_n\),

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}=\left\{ \begin{array}{ccc} >0, &{}\quad \mathrm{for} &{}0<\beta _{j}<\delta _n\\<0, &{}\quad \mathrm{for} &{} -\delta _n<\beta _{j}<0. \end{array} \right. \end{aligned}$$

In fact, for any \(\beta _{j},\ j=s+1,\ldots ,p_n\), using Taylor’s expansion we obtain

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}= & {} -\sum _{i=1}^n(Y_i-Z_i{\varvec{\theta }})Z_{ij}+np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j}), \end{aligned}$$

where \(Y_{i}\) is the ith component of \({\varvec{Y}}\) and \(Z_{ij}\) is the jth component of \(Z_i\), respectively. By the regulation conditions, we conclude

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}= & {} -\sum _{i=1}^n(Y_i-\varvec{Z}_i{\varvec{\theta }})Z_{ij}+np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j})\\= & {} -\sum _{i=1}^n(Y_i-\varvec{Z}_i{\varvec{\theta }}_0)Z_{ij} -\sum _{i=1}^n \varvec{Z}_i({\varvec{\theta }}_0-{\varvec{\theta }})Z_{ij}+np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j})\\= & {} -\sum _{i=1}^n \varepsilon _i Z_{ij} {-}\sum _{i=1}^n \sum _{k=1}^s Z_{ik}( \theta _{k0}- \theta _k)Z_{ij}\\&{-} \sum _{i=1}^n \sum _{k=1+s}^{p_n} Z_{ik}( \theta _{k0}- \theta _k)Z_{ij}+np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j})\\= & {} R_{31}+R_{32}+R_{33} +np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j}) \end{aligned}$$

By the proof of Theorem 1, we can conclude that \(R_{31}=O_p(\sqrt{np_n})\), \(R_{32}=O_p(n\sqrt{\frac{p_n}{n}})\) and \(R_{33}=O_p(n\sqrt{\frac{p_n}{n}}).\) Then, we have

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}=n\lambda _{j}\left\{ \lambda ^{-1}_{j}p'_{\lambda _{j}}(|\beta _{j}|) \mathrm{sgn}(\beta _{j})+O_{p}\left( \lambda _{j}^{-1}\sqrt{\frac{p_n}{n}} \right) \right\} . \end{aligned}$$

According to the Assumption 4, as \( n\rightarrow \infty \)

$$\begin{aligned} \liminf \limits _{n\rightarrow \infty }\liminf \limits _{t\rightarrow \ 0^{+}}\frac{p'_{\lambda _{j}}(t)}{\lambda _{j}}>0 \end{aligned}$$


$$\begin{aligned} \sqrt{\frac{p_n}{n}}\lambda _{j }^{-1}\rightarrow 0. \end{aligned}$$

So it is easy to see that the sign of \(\beta _j\) completely determines the sign of \( \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}},\) in other words, for \(j=s+1,\ldots ,p_n\)

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}=\left\{ \begin{array}{ccc} >0, &{}\quad \mathrm{for} &{}0<\beta _{j}<\delta _n\\<0, &{}\quad \mathrm{for} &{} -\delta _n<\beta _{j}<0. \end{array} \right. \end{aligned}$$

It follows that \(\hat{{{\varvec{\beta }}}_2}=0\).

We now prove (II), namely showing the asymptotic normality of \((\hat{\rho },\hat{{\varvec{\beta }}}_1^T)^T\). For ease of presentation, let \(\beta _{10}^*=\rho \) and \(\beta _{1j}^*=\beta _{1j},\ j=1,\ldots ,s\), then denote \({\varvec{\beta }}_1^*=(\rho ,\beta _{11},\ldots ,\beta _{1s})^T\) and \({\varvec{\beta }}_{0}^*=(\rho _0,\beta _{01},\ldots ,\beta _{0s})^T\). Since \(\hat{\varvec{\theta }}\) minimizes \(Q_n({\varvec{\theta }})\), it satisfies the stationary equation for \({{\varvec{\beta }}^*}\) that

$$\begin{aligned} -\sum _{i=1}^n(Y_i-Z_i\hat{\varvec{\theta }})Z_{ij}+np'_{\lambda _{j}}(|\hat{\beta }_{1j}^*|)\mathrm{sgn}(\hat{\beta }_{1j}^*)=0, ~~~~~~j=0,1,\dots s. \end{aligned}$$

Invoking the local quadratic approximated proposed by Fan and Li (2001), for \(j=0,1,\ldots ,s,\)

$$\begin{aligned}&-\sum _{i=1}^n[{Y}_i-{Z}_i {\varvec{\theta }}_0]Z_{ij}+\sum _{i=1}^n Z_{ij}Z_i(\hat{\beta }_{1j}^*-\beta _{0j}^*) \\&~~~~~+n\left\{ p'_{\lambda _{j}}(|\beta _{0j}^*|)\mathrm{sgn}(\beta _{0j}^*)+\left[ p''_{\lambda _{j}}(|\beta _{0j}^*|)+o_{p}(1)\right] (\hat{\beta }_{1j}^*-\beta _{0j}^*)\right\} =0. \end{aligned}$$

The above equation can be rewritten as follows

$$\begin{aligned} - {\varvec{Z}}_1^T( {{\varvec{Y}}} - {\varvec{Z}}_1 {{\varvec{\beta }}}_{0}^*)+{\varvec{Z}}_1^T{\varvec{Z}}_1(\hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*)+ n{\varvec{b}}_n+n\left[ {{\varvec{\Sigma }}_\lambda }+o_{p}(1)\right] (\hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*)=0,\nonumber \\ \end{aligned}$$

where \(\varvec{b}_n=\{ p'_{\lambda _{0}}(|\rho _{0}|)\mathrm{sgn}(\rho _{0}),p'_{\lambda _{1}}(|\beta _{01}|)\mathrm{sgn}(\beta _{0j}),\ldots ,p'_{\lambda _{j}}(|\beta _{0s}|)\mathrm{sgn}(\beta _{0s}) \}^T\), and

$$\begin{aligned} {\varvec{\Sigma }}_\lambda = \text {diag} \{ p''_{\lambda _{0}}(|\rho _{0}|),p'' _{\lambda _{1}}(|\beta _{01}|) ,\ldots ,p''_{\lambda _{s}}(|\beta _{0s}|)\}. \end{aligned}$$

By the definition of instrumental variable \({\varvec{H}}\) and Theorem 1, we have

$$\begin{aligned} \frac{1}{\sqrt{n}}{\varvec{Z}}_1^T( {{\varvec{Y}}} - {\varvec{Z}}_1 {{\varvec{\beta }}}_{0}^*)\triangleq & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T \left( {\varvec{Y}} -\varvec{Z}\varvec{\theta }_0\right) \\= & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T \left( {\varvec{B}}\theta _0+{\varvec{\varepsilon }}_n-\varvec{Z}\theta _0\right) \\= & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T {\varvec{\varepsilon }}_n + \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T \left( {\varvec{B}} -\varvec{Z} \right) \varvec{\theta }_0\\= & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T {\varvec{\varepsilon }}_n +\frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T \left( \varvec{I} -{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \right) {\varvec{B}}\varvec{\theta }_0 \\= & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T {\varvec{\varepsilon }}_n \\= & {} \frac{1}{\sqrt{n}}{{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +\frac{1}{\sqrt{n}}( {\varvec{G}}_n{ {\varvec{\varepsilon }}}_n,0)^T{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T { {\varvec{\varepsilon }}}_n\\= & {} \frac{1}{\sqrt{n}}{{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +O_p\left( \frac{p_n}{\sqrt{n}}\right) . \end{aligned}$$

Combining this, we can conclude that

$$\begin{aligned} \left[ {\varvec{Z}}_1^T{\varvec{Z}}_1+n {{\varvec{\Sigma }}_\lambda } \right] (\hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*)+ n{\varvec{b}}_n = {{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +O_p\left( p_n \right) , \end{aligned}$$


$$\begin{aligned} \left[ {\varvec{Z}}_1^T{\varvec{Z}}_1+n{{\varvec{\Sigma }}_\lambda } \right] {\left( \hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*+\left[ {\varvec{Z}}_1^T{\varvec{Z}}_1/n+ {{\varvec{\Sigma }}_\lambda } \right] ^{-1} {\varvec{b}}_n\right) } = {{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +O_p\left( p_n\right) . \end{aligned}$$


$$\begin{aligned}&{{\sqrt{n}}}{\varvec{A}}_n\left[ {\varvec{Z}}_1^T{\varvec{Z}}_1/n+{{\varvec{\Sigma }}_\lambda } \right] {\left( \hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*+\left[ {\varvec{Z}}_1^T{\varvec{Z}}_1/n+ {{\varvec{\Sigma }}_\lambda } \right] ^{-1} {\varvec{b}}_n\right) }\\&\quad = {\frac{1}{\sqrt{n}}}{\varvec{A}}_n{{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +o_p(1). \end{aligned}$$

Invoking Lindeberg–Feller central limit theorem, we have

$$\begin{aligned} \sqrt{n}{\varvec{A}}_n {\varvec{\Sigma }}_{1 }^{-1/2} \big ({\varvec{\Sigma }}_{1}+{{\varvec{\Sigma }}_{\lambda }}\big )\left[ (\hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^* )+\big ({\varvec{\Sigma }}_{1}+{{\varvec{\Sigma }}_{\lambda }}\big )^{-1}\varvec{b}_n\right] {\mathop {\longrightarrow }\limits ^{L}}N(0,\sigma ^2{\varvec{G}}), \end{aligned}$$

where \({\varvec{\Sigma }}_1=\mathop {\text {lim}}\limits _{n\rightarrow \infty } E({\varvec{Z}}_1^T{\varvec{Z}}_1)/n\) and \({\varvec{G}}=\mathop {\text {lim}}\limits _{n\rightarrow \infty } {\varvec{A}}_n^T{\varvec{A}}_n\). Since \({\varvec{\beta }}_1^*=(\rho ,\beta _{11}\ldots ,\beta _{1s})^T\) and \({\varvec{\beta }}_0^*=(\rho _0,\beta _{01},\ldots ,\beta _{0s})^T\), the proof is completed. \(\square \)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Xie, T., Cao, R. & Du, J. Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Papers 61, 1125–1145 (2020).

Download citation


  • Spatial autoregressive models
  • Variable selection
  • Instrumental variable
  • Oracle property

Mathematics Subject Classification

  • 62G08
  • 62G20