Variable selection for spatial autoregressive models with a diverging number of parameters

Xie, Tianfa; Cao, Ruiyuan; Du, Jiang

doi:10.1007/s00362-018-0984-2

Variable selection for spatial autoregressive models with a diverging number of parameters

Regular Article
Published: 29 January 2018

Volume 61, pages 1125–1145, (2020)
Cite this article

Statistical Papers Aims and scope Submit manuscript

1188 Accesses
28 Citations
Explore all metrics

Abstract

Variable selection has played a fundamental role in regression analysis. Spatial autoregressive model is a useful tool in econometrics and statistics in which context variable selection is necessary but not adequately investigated. In this paper, we consider conducting variable selection in spatial autoregressive models with a diverging number of parameters. Smoothly clipped absolute deviation penalty is considered to obtain the estimators. Moreover the dimension of the covariates are allowed to vary with sample size. In order to attenuate the bias caused by endogeneity, instrumental variable is adopted in the estimation procedure. The proposed method can do parametric estimation and variable selection simultaneously. Under mild conditions, we establish the asymptotic and oracle property of the proposed estimators. Finally, the performance of the proposed estimation procedure is examined via Monte Carlo simulation studies and a data set from a Boston housing price is analyzed as an illustrative example.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters

Article 27 May 2021

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Article 03 May 2023

Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models

Article 08 February 2024

References

Ai CR, Zhang YQ (2017) Estimation of partially specified spatial panel data models with fixed-effects. Econ Rev 36(1–3):6–22
Article MathSciNet Google Scholar
Anderson TW, Hsiao C (1981) Estimation of dynamic models with error components. J Am Stat Assoc 76:598–606
Article MathSciNet Google Scholar
Anderson TW, Hsiao C (1982) Formulation and estimation of dynamic models using panel data. J Econ 18:47–82
Article MathSciNet Google Scholar
Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, The Netherlands
Book Google Scholar
Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah A (ed) Handbook of applied economic statistics. CRC Press, Marcel Dekker, New York, pp 237–290
Google Scholar
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4):373–384
Article MathSciNet Google Scholar
Cliff A, Ord JK (1973) Spatial autocorrelation. Pion, London
Google Scholar
Dai XW, Jin LB, Shi L, Yang CP, Liu SZ (2015) Local influence analysis for general spatial models. Adv Stat Anal (Online)
Dai XW, Jin LB, Shi AQ, Shi L (2016) Outlier detection and accommodations in general spatial models. Stat Methods Appl (Online)
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet Google Scholar
Fan J, Li R (2004) New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 99:710–723
Article MathSciNet Google Scholar
Fan J, Peng H (2004) On nonconcave penalized likelihood with diverging number of parameters. Ann Stat 32(3):928–961
Article MathSciNet Google Scholar
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148
Article Google Scholar
Koenker R, Ng P, Portnoy S (1994) Quantile smoothing splines. Biometrika 81:673–680
Article MathSciNet Google Scholar
Lee LF (2003) Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econ Rev 22:307–335
Article MathSciNet Google Scholar
Lee LF (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72:1899–1925
Article MathSciNet Google Scholar
Lee LF (2007) GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J Econ 137:489–514
Article MathSciNet Google Scholar
Lee LF, Yu JH (2010) Estimation of spatial autoregressive panel data models with fixed effects. J Econ 154:165–185
Article MathSciNet Google Scholar
Leng C (2010) Variable selection and coefficient estimation via regularized rank regression. Stat Sinica 20(1):167–181
MathSciNet MATH Google Scholar
Li R, Liang H (2008) Variable selection in semiparametric regression model. Ann Stat 36:261–286
Article Google Scholar
Pace RK, Gilley OW (1997) Using the spatial configuration of the data to improve estimation. J Real Estate Financ Econ 14:333–340
Article Google Scholar
Su L, Yang Z (2009) Instrumental variable quantile estimation of spatial autoregressive models. Working paper. Singapore Management University.
Tibshirani RJ (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc: Ser. B 58:267–288
Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757
Article MathSciNet Google Scholar
Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J Bus Econ Stat 25:347–355
Article MathSciNet Google Scholar
Wang XQ, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. J Am Stat Assoc 108(502):632–643
Article MathSciNet Google Scholar
Xiong S (2010) Some notes on the nonnegative garrote. Technometrics 52(3):349–361
Article MathSciNet Google Scholar
Zhang YQ, Shen DM (2015) Estimation of semi-parametric varying-coefficient spatial panel data models with random-effects. J Stat Plan Inference 159:64–80
Article MathSciNet Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

College of Applied Sciences, Beijing University of Technology, Beijing, 100124, China
Tianfa Xie, Ruiyuan Cao & Jiang Du

Authors

Tianfa Xie
View author publications
You can also search for this author in PubMed Google Scholar
Ruiyuan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Jiang Du
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ruiyuan Cao.

Additional information

Xie’s work is supported by the National Natural Science Foundation of China (No. 11571340) and the Science and Technology Project of Beijing Municipal Education Commission (KM201710005032). Cao’s work is supported by China Postdoctoral Science Foundation (No. 2016M591030) and the National Natural Science Foundation of China (No. 11701020). Du’s work is supported by the National Natural Science Foundation of China (Nos. 11501018, 11771032, 11571340), and Program for Rixin Talents in Beijing University of Technology (No. 006000514116003).

Appendix : Proofs

In this part of the paper, we denote by C a generic positive constant, which may take different values at different places.

Proof of Theorem 1

Let $\gamma _n=\sqrt{p_n}(n^{-1/2}+a_n)$ and set $||u||=C$, where C is a large enough constant. Similar to Fan and Li (2001), we first show that $\Vert \hat{\varvec{\theta }}-\varvec{\theta }_0\Vert =O_p(\gamma _n).$ It suffices to show that for any given $\eta >0$, there is a large constant C such that, for large n,

$$\begin{aligned} P\left\{ \inf _{||\varvec{u}||=C}Q_n(\varvec{\theta }_0+\gamma _n {\varvec{u}})>Q_n(\varvec{\theta }_0)\right\} \ge 1-\eta . \end{aligned}$$

(3)

Denote

$$\begin{aligned} S_n(\varvec{u})=\Vert {\varvec{Y}}_n-\varvec{Z}(\varvec{\theta }_0+\gamma _n {\varvec{u}})\Vert ^2-\Vert {\varvec{Y}}_n-\varvec{Z} \varvec{\theta }_0 \Vert ^2. \end{aligned}$$

Then, $S_n(\varvec{u})$ can be written as

$$\begin{aligned} S_n(\varvec{u})= & {} -2\gamma _n \left( {\varvec{Y}}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}+\gamma _n^2 \varvec{u}^T \varvec{Z} ^T\varvec{Z} \varvec{u}\\= & {} -\gamma _n R_1+\gamma _n^2R_2, \end{aligned}$$

where $R_1=\left( {\varvec{Y}}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}$ and $R_2= \varvec{u}^T \varvec{Z} ^T\varvec{Z} \varvec{u}.$

First, we analyze $R_1$. Note that

$$\begin{aligned} R_1= & {} \left( {\varvec{Y}}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}\\= & {} \left( {\varvec{B}}\varvec{\theta }_0+{\varvec{\varepsilon }}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T\varvec{Z} \varvec{u}+\varvec{\theta }_0^T \left( {\varvec{B}} -\varvec{Z} \right) ^T\varvec{Z} \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T\varvec{Z} \varvec{u}+\varvec{\theta }_0^T \left( {\varvec{B}} -{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{B}} \right) ^T{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{B}} \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{B}} \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T({\varvec{B}}^*+\varvec{e}) \varvec{u}\\= & {} {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{B}}^*\varvec{u}+ {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \varvec{e} \varvec{u}\\= & {} R_{11}+R_{12}, \end{aligned}$$

where ${\varvec{B}}=(\varvec{D}_n,{\varvec{X}}_n)$, ${\varvec{B}}^*=({\varvec{G}}_n{\varvec{X}}_n{\varvec{\beta }}_0,{\varvec{X}}_n)$, $\varvec{e}=({\varvec{G}}_n{\varvec{\varepsilon }}_n,0)=({\varvec{G}}_n,0) {\varvec{\varepsilon }}_n$, ${\varvec{G}}_n={\varvec{W}}_n(\varvec{I}_n-\rho _0{\varvec{W}}_n)^{-1}$ and $R_{11},R_{12}$ are as follows:

$$\begin{aligned}&R_{11}= {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{B}}^*\varvec{u},\\&R_{12}= {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \varvec{e} \varvec{u}. \end{aligned}$$

Consider $R_{11}$. Obviously, $E(R_{11})=0 $ and

$$\begin{aligned} E(\Vert {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1/2}\Vert ^2)= & {} E [ {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{\varepsilon }}_n ]\\= & {} E [\text {trace }\{{\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{\varepsilon }} \}]\\= & {} E [\text {trace }\{{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{\varepsilon }} {\varvec{\varepsilon }}_n^T \}]\\= & {} \text {trace }\{E [{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{\varepsilon }} {\varvec{\varepsilon }}_n^T ]\}\\= & {} \sigma ^2\text {trace }\{E [{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T\varvec{]}\}\\= & {} \sigma ^2E [\text {trace }\{ {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \}]\\= & {} O(p_n). \end{aligned}$$

Thus, one has $\Vert {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1/2}\Vert ^2=O_p( p_n ).$ By Assumption 2 and the condition $||u||=C$, we have $ \varvec{u}^T{{\varvec{B}}^*}^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{B}}^*\varvec{u}=O_p(n\Vert \varvec{u}\Vert ^2).$ Invoking Cauchy-Schwarz inequality, we have $R_{11}=O_p(\sqrt{np_n}\Vert \varvec{u}\Vert )$. Similarly, by Assumption 3, one has

$$\begin{aligned} E(\Vert {\varvec{\varepsilon }}_n^T{\varvec{G}}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1/2}\Vert ^2)= & {} E [ {\varvec{\varepsilon }}_n^T {\varvec{G}}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T{\varvec{G}}_n{\varvec{\varepsilon }}_n ]\\= & {} E [\text {trace }\{{\varvec{\varepsilon }}_n^T {\varvec{G}}_n^T{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{G}}_n\varvec{\varepsilon }_n \}]\\= & {} \sigma ^2\text {trace }\{E [{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{G}}_n {\varvec{G}}_n^T]\}\\\le & {} \tilde{\lambda }_c \sigma ^2E [\text {trace }\{ {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \}]\\= & {} O(p_n). \end{aligned}$$

Therefore, by Cauchy-Schwarz inequality, it has

$$\begin{aligned} R_{12}= & {} {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \varvec{e} \varvec{u}\\\le & {} \left\{ {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{\varepsilon }}_n \varvec{u}^T{\varvec{\varepsilon }}_n^T ({\varvec{G}}_n,0)^T{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T ({\varvec{G}}_n,0) {\varvec{\varepsilon }}_n \varvec{u}\right\} ^{{-1/2}}\\= & {} O(p_n\Vert \varvec{u}\Vert ). \end{aligned}$$

Combining the convergence rates of $R_{11}$ and $R_{12}$, one has $R_{1}=O_p(\sqrt{np_n}\Vert \varvec{u}\Vert ).$

Next, we consider $R_2.$ By the fact ${\varvec{B}}^*=({\varvec{G}}_n{\varvec{X}}_n{\varvec{\beta }}_0,{\varvec{X}}_n) $ and the definition of $\varvec{Z}$, one has

$$\begin{aligned} \frac{1}{n}R_2= & {} \frac{1}{n}\varvec{u}^T\varvec{Z}^T\varvec{Z}\varvec{u}\\= & {} \frac{1}{n}\varvec{u}^T {\varvec{B}}^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T {\varvec{B}}\varvec{u} \\= & {} \frac{1}{n}\varvec{u}^T( {\varvec{B}}^*+\varvec{e})^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T ( {\varvec{B}}^*+\varvec{e})\varvec{u}\\= & {} R_{21}+R_{22}+R_{23}, \end{aligned}$$

where

$$\begin{aligned}&R_{21}=\frac{1}{n} \varvec{u}^T { {\varvec{B}}^*}^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T {\varvec{B}}^*\varvec{u} ,\\&R_{22}= \frac{1}{n}\varvec{u}^T { \varvec{e}}^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T \varvec{e} \varvec{u},\\&R_{23}=\frac{2}{n}\varvec{u}^T { {\varvec{B}}^*}^T {\varvec{H}} ({\varvec{H}}^T{\varvec{H}})^{-1} {\varvec{H}}^T \varvec{e} \varvec{u} . \end{aligned}$$

By Assumption 2, we have $ R_{21} \asymp \Vert \varvec{u}\Vert ^2 $. Similarly, $ R_{22} \asymp \Vert \varvec{u}\Vert ^2,$ and $ R_{23} \asymp \Vert \varvec{u}\Vert ^2.$ Thus, $ R_{2} \asymp n \Vert \varvec{u}\Vert ^2.$ Consequently, $\gamma _n R_1=O_p(\gamma _n\sqrt{np_n}\Vert \varvec{u}\Vert )$ and $\gamma _n^2 R_2=O_p(\gamma _n^2n {\Vert \varvec{u}\Vert ^2})$.

Summarizing the above results, $\gamma _n^2 R_2$ dominates $\gamma _n R_1$ uniformly in $\Vert \varvec{u}\Vert =C$ for a large enough C. Note that

$$\begin{aligned} Q_n(\varvec{\theta }_0+\gamma _n {\varvec{u}})-Q_n(\varvec{\theta }_0)= & {} \frac{1}{2}S_n({\varvec{u}})+n\sum _{j=0}^{p_n}\left\{ p_{\lambda _n}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _n}(|\beta _{j0}|)\right\} \\\ge & {} \frac{1}{2}S_n({\varvec{u}})+n\sum _{j=0}^s\left\{ p_{\lambda _n}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _n}(|\beta _{j0}|)\right\} \\\ge & {} \frac{1}{2}S_n({\varvec{u}})+K_1({\varvec{u}}), \end{aligned}$$

where s is the dimension of ${{\varvec{\beta }}}_{10}$, $K_1({\varvec{u}})=n\sum \nolimits _{j=0}^s\{ p_{\lambda _{j}}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _{j}}(|\beta _{j0}|)\}$. Then by Taylor’s expansion, we obtain

$$\begin{aligned} K_1({\varvec{u}})= & {} n\sum \limits _{j=0}^s\{ p_{\lambda _{j}}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _{j}}(|\beta _{j0}|)\}\\= & {} n \gamma _n \sum \limits _{j=0}^s p^\prime _ {\lambda _{j}} (|\beta _{j0}|)\text {sgn}(\beta _{j0})u_j+n \gamma _n^2\sum \limits _{j=0}^sp^{\prime \prime } _ {\lambda _{j}} (|\beta _{j0}|) u_j^2\{1+o(1)\}\\\ge & {} -a_n n \gamma _n \sum \limits _{j=0}^s |u_j|-b_n n \gamma _n^2 \sum \limits _{j=0}^s u_j^2\{1+o(1)\}\\\ge & {} -a_n n \gamma _n \sum \limits _{j=0}^s |u_j|-2b_n n \gamma _n^2 \Vert \varvec{u}\Vert ^2 \\\ge & {} {-\sqrt{s_n}a_n n \gamma _n \Vert \varvec{u}\Vert -2b_n n \gamma _n^2 \Vert \varvec{u}\Vert ^2} \\\ge & {} {-\sqrt{p_n}a_n n \gamma _n \Vert \varvec{u}\Vert -2b_n n \gamma _n^2 \Vert \varvec{u}\Vert ^2} \\\ge & {} {- n \gamma _n^2 \Vert \varvec{u}\Vert -2b_n n \gamma _n^2 \Vert \varvec{u}\Vert ^2.} \end{aligned}$$

Thus, by taking C large enough, $\gamma _n^2R_2$ dominates both $\gamma _nR_1$ and $K_1({\varvec{u}})$, and $\gamma _n^2R_2$ is positive. This proves Theorem 1. $\square $

Proof of Theorem 2

We now show the sparsity. It is sufficient to show that with probability tending to one as $n\rightarrow \infty $ for any ${{\varvec{\beta }}}_1$ satisfying ${{\varvec{\beta }}}_1-{{\varvec{\beta }}}_{10}=O_p(\sqrt{{p_n}/{n}})$ and for some small $\delta _n=C\sqrt{ {p_n}/{n}}$ and $j=s+1,\ldots ,p_n$,

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}=\left\{ \begin{array}{ccc} >0, &{}\quad \mathrm{for} &{}0<\beta _{j}<\delta _n\\<0, &{}\quad \mathrm{for} &{} -\delta _n<\beta _{j}<0. \end{array} \right. \end{aligned}$$

In fact, for any $\beta _{j},\ j=s+1,\ldots ,p_n$, using Taylor’s expansion we obtain

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}= & {} -\sum _{i=1}^n(Y_i-Z_i{\varvec{\theta }})Z_{ij}+np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j}), \end{aligned}$$

where $Y_{i}$ is the ith component of ${\varvec{Y}}$ and $Z_{ij}$ is the jth component of $Z_i$, respectively. By the regulation conditions, we conclude

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}= & {} -\sum _{i=1}^n(Y_i-\varvec{Z}_i{\varvec{\theta }})Z_{ij}+np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j})\\= & {} -\sum _{i=1}^n(Y_i-\varvec{Z}_i{\varvec{\theta }}_0)Z_{ij} -\sum _{i=1}^n \varvec{Z}_i({\varvec{\theta }}_0-{\varvec{\theta }})Z_{ij}+np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j})\\= & {} -\sum _{i=1}^n \varepsilon _i Z_{ij} {-}\sum _{i=1}^n \sum _{k=1}^s Z_{ik}( \theta _{k0}- \theta _k)Z_{ij}\\&{-} \sum _{i=1}^n \sum _{k=1+s}^{p_n} Z_{ik}( \theta _{k0}- \theta _k)Z_{ij}+np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j})\\= & {} R_{31}+R_{32}+R_{33} +np'_{\lambda _{j}}(|\beta _{j}|)\mathrm{sgn}(\beta _{j}) \end{aligned}$$

By the proof of Theorem 1, we can conclude that $R_{31}=O_p(\sqrt{np_n})$, $R_{32}=O_p(n\sqrt{\frac{p_n}{n}})$ and $R_{33}=O_p(n\sqrt{\frac{p_n}{n}}).$ Then, we have

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}=n\lambda _{j}\left\{ \lambda ^{-1}_{j}p'_{\lambda _{j}}(|\beta _{j}|) \mathrm{sgn}(\beta _{j})+O_{p}\left( \lambda _{j}^{-1}\sqrt{\frac{p_n}{n}} \right) \right\} . \end{aligned}$$

According to the Assumption 4, as $ n\rightarrow \infty $

$$\begin{aligned} \liminf \limits _{n\rightarrow \infty }\liminf \limits _{t\rightarrow \ 0^{+}}\frac{p'_{\lambda _{j}}(t)}{\lambda _{j}}>0 \end{aligned}$$

and

$$\begin{aligned} \sqrt{\frac{p_n}{n}}\lambda _{j }^{-1}\rightarrow 0. \end{aligned}$$

So it is easy to see that the sign of $\beta _j$ completely determines the sign of $ \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}},$ in other words, for $j=s+1,\ldots ,p_n$

$$\begin{aligned} \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}}=\left\{ \begin{array}{ccc} >0, &{}\quad \mathrm{for} &{}0<\beta _{j}<\delta _n\\<0, &{}\quad \mathrm{for} &{} -\delta _n<\beta _{j}<0. \end{array} \right. \end{aligned}$$

It follows that $\hat{{{\varvec{\beta }}}_2}=0$.

We now prove (II), namely showing the asymptotic normality of $(\hat{\rho },\hat{{\varvec{\beta }}}_1^T)^T$. For ease of presentation, let $\beta _{10}^*=\rho $ and $\beta _{1j}^*=\beta _{1j},\ j=1,\ldots ,s$, then denote ${\varvec{\beta }}_1^*=(\rho ,\beta _{11},\ldots ,\beta _{1s})^T$ and ${\varvec{\beta }}_{0}^*=(\rho _0,\beta _{01},\ldots ,\beta _{0s})^T$. Since $\hat{\varvec{\theta }}$ minimizes $Q_n({\varvec{\theta }})$, it satisfies the stationary equation for ${{\varvec{\beta }}^*}$ that

$$\begin{aligned} -\sum _{i=1}^n(Y_i-Z_i\hat{\varvec{\theta }})Z_{ij}+np'_{\lambda _{j}}(|\hat{\beta }_{1j}^*|)\mathrm{sgn}(\hat{\beta }_{1j}^*)=0, ~~~~~~j=0,1,\dots s. \end{aligned}$$

(4)

Invoking the local quadratic approximated proposed by Fan and Li (2001), for $j=0,1,\ldots ,s,$

$$\begin{aligned}&-\sum _{i=1}^n[{Y}_i-{Z}_i {\varvec{\theta }}_0]Z_{ij}+\sum _{i=1}^n Z_{ij}Z_i(\hat{\beta }_{1j}^*-\beta _{0j}^*) \\&~~~~~+n\left\{ p'_{\lambda _{j}}(|\beta _{0j}^*|)\mathrm{sgn}(\beta _{0j}^*)+\left[ p''_{\lambda _{j}}(|\beta _{0j}^*|)+o_{p}(1)\right] (\hat{\beta }_{1j}^*-\beta _{0j}^*)\right\} =0. \end{aligned}$$

The above equation can be rewritten as follows

$$\begin{aligned} - {\varvec{Z}}_1^T( {{\varvec{Y}}} - {\varvec{Z}}_1 {{\varvec{\beta }}}_{0}^*)+{\varvec{Z}}_1^T{\varvec{Z}}_1(\hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*)+ n{\varvec{b}}_n+n\left[ {{\varvec{\Sigma }}_\lambda }+o_{p}(1)\right] (\hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*)=0,\nonumber \\ \end{aligned}$$

(5)

where $\varvec{b}_n=\{ p'_{\lambda _{0}}(|\rho _{0}|)\mathrm{sgn}(\rho _{0}),p'_{\lambda _{1}}(|\beta _{01}|)\mathrm{sgn}(\beta _{0j}),\ldots ,p'_{\lambda _{j}}(|\beta _{0s}|)\mathrm{sgn}(\beta _{0s}) \}^T$, and

$$\begin{aligned} {\varvec{\Sigma }}_\lambda = \text {diag} \{ p''_{\lambda _{0}}(|\rho _{0}|),p'' _{\lambda _{1}}(|\beta _{01}|) ,\ldots ,p''_{\lambda _{s}}(|\beta _{0s}|)\}. \end{aligned}$$

By the definition of instrumental variable ${\varvec{H}}$ and Theorem 1, we have

$$\begin{aligned} \frac{1}{\sqrt{n}}{\varvec{Z}}_1^T( {{\varvec{Y}}} - {\varvec{Z}}_1 {{\varvec{\beta }}}_{0}^*)\triangleq & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T \left( {\varvec{Y}} -\varvec{Z}\varvec{\theta }_0\right) \\= & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T \left( {\varvec{B}}\theta _0+{\varvec{\varepsilon }}_n-\varvec{Z}\theta _0\right) \\= & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T {\varvec{\varepsilon }}_n + \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T \left( {\varvec{B}} -\varvec{Z} \right) \varvec{\theta }_0\\= & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T {\varvec{\varepsilon }}_n +\frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T \left( \varvec{I} -{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T \right) {\varvec{B}}\varvec{\theta }_0 \\= & {} \frac{1}{\sqrt{n}}{\varvec{Z}}_1 ^T {\varvec{\varepsilon }}_n \\= & {} \frac{1}{\sqrt{n}}{{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +\frac{1}{\sqrt{n}}( {\varvec{G}}_n{ {\varvec{\varepsilon }}}_n,0)^T{\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T { {\varvec{\varepsilon }}}_n\\= & {} \frac{1}{\sqrt{n}}{{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +O_p\left( \frac{p_n}{\sqrt{n}}\right) . \end{aligned}$$

Combining this, we can conclude that

$$\begin{aligned} \left[ {\varvec{Z}}_1^T{\varvec{Z}}_1+n {{\varvec{\Sigma }}_\lambda } \right] (\hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*)+ n{\varvec{b}}_n = {{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +O_p\left( p_n \right) , \end{aligned}$$

(6)

i.e.

$$\begin{aligned} \left[ {\varvec{Z}}_1^T{\varvec{Z}}_1+n{{\varvec{\Sigma }}_\lambda } \right] {\left( \hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*+\left[ {\varvec{Z}}_1^T{\varvec{Z}}_1/n+ {{\varvec{\Sigma }}_\lambda } \right] ^{-1} {\varvec{b}}_n\right) } = {{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +O_p\left( p_n\right) . \end{aligned}$$

(7)

Then,

$$\begin{aligned}&{{\sqrt{n}}}{\varvec{A}}_n\left[ {\varvec{Z}}_1^T{\varvec{Z}}_1/n+{{\varvec{\Sigma }}_\lambda } \right] {\left( \hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^*+\left[ {\varvec{Z}}_1^T{\varvec{Z}}_1/n+ {{\varvec{\Sigma }}_\lambda } \right] ^{-1} {\varvec{b}}_n\right) }\\&\quad = {\frac{1}{\sqrt{n}}}{\varvec{A}}_n{{\varvec{B}}}_1 ^T { {\varvec{\varepsilon }}}_n +o_p(1). \end{aligned}$$

Invoking Lindeberg–Feller central limit theorem, we have

$$\begin{aligned} \sqrt{n}{\varvec{A}}_n {\varvec{\Sigma }}_{1 }^{-1/2} \big ({\varvec{\Sigma }}_{1}+{{\varvec{\Sigma }}_{\lambda }}\big )\left[ (\hat{{\varvec{\beta }}}_{1}^*-{{\varvec{\beta }}}_{0}^* )+\big ({\varvec{\Sigma }}_{1}+{{\varvec{\Sigma }}_{\lambda }}\big )^{-1}\varvec{b}_n\right] {\mathop {\longrightarrow }\limits ^{L}}N(0,\sigma ^2{\varvec{G}}), \end{aligned}$$

where ${\varvec{\Sigma }}_1=\mathop {\text {lim}}\limits _{n\rightarrow \infty } E({\varvec{Z}}_1^T{\varvec{Z}}_1)/n$ and ${\varvec{G}}=\mathop {\text {lim}}\limits _{n\rightarrow \infty } {\varvec{A}}_n^T{\varvec{A}}_n$. Since ${\varvec{\beta }}_1^*=(\rho ,\beta _{11}\ldots ,\beta _{1s})^T$ and ${\varvec{\beta }}_0^*=(\rho _0,\beta _{01},\ldots ,\beta _{0s})^T$, the proof is completed. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Xie, T., Cao, R. & Du, J. Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Papers 61, 1125–1145 (2020). https://doi.org/10.1007/s00362-018-0984-2

Download citation

Received: 21 June 2017
Revised: 07 November 2017
Published: 29 January 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s00362-018-0984-2

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Variable selection for spatial autoregressive models with a diverging number of parameters

Abstract

Access this article

Similar content being viewed by others

Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix : Proofs

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Variable selection for spatial autoregressive models with a diverging number of parameters

Abstract

Access this article

Similar content being viewed by others

Variable selection of higher-order partially linear spatial autoregressive model with a diverging number of parameters

Robust variable selection with exponential squared loss for partially linear spatial autoregressive models

Local Walsh-average-based Estimation and Variable Selection for Spatial Single-index Autoregressive Models

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix : Proofs

Appendix : Proofs

Proof of Theorem 1

Proof of Theorem 2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation