Abstract
Variable selection has played a fundamental role in regression analysis. Spatial autoregressive model is a useful tool in econometrics and statistics in which context variable selection is necessary but not adequately investigated. In this paper, we consider conducting variable selection in spatial autoregressive models with a diverging number of parameters. Smoothly clipped absolute deviation penalty is considered to obtain the estimators. Moreover the dimension of the covariates are allowed to vary with sample size. In order to attenuate the bias caused by endogeneity, instrumental variable is adopted in the estimation procedure. The proposed method can do parametric estimation and variable selection simultaneously. Under mild conditions, we establish the asymptotic and oracle property of the proposed estimators. Finally, the performance of the proposed estimation procedure is examined via Monte Carlo simulation studies and a data set from a Boston housing price is analyzed as an illustrative example.
Similar content being viewed by others
References
Ai CR, Zhang YQ (2017) Estimation of partially specified spatial panel data models with fixed-effects. Econ Rev 36(1–3):6–22
Anderson TW, Hsiao C (1981) Estimation of dynamic models with error components. J Am Stat Assoc 76:598–606
Anderson TW, Hsiao C (1982) Formulation and estimation of dynamic models using panel data. J Econ 18:47–82
Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic Publishers, The Netherlands
Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah A (ed) Handbook of applied economic statistics. CRC Press, Marcel Dekker, New York, pp 237–290
Breiman L (1995) Better subset regression using the nonnegative garrote. Technometrics 37(4):373–384
Cliff A, Ord JK (1973) Spatial autocorrelation. Pion, London
Dai XW, Jin LB, Shi L, Yang CP, Liu SZ (2015) Local influence analysis for general spatial models. Adv Stat Anal (Online)
Dai XW, Jin LB, Shi AQ, Shi L (2016) Outlier detection and accommodations in general spatial models. Stat Methods Appl (Online)
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Fan J, Li R (2004) New estimation and model selection procedures for semiparametric modeling in longitudinal data analysis. J Am Stat Assoc 99:710–723
Fan J, Peng H (2004) On nonconcave penalized likelihood with diverging number of parameters. Ann Stat 32(3):928–961
Frank IE, Friedman JH (1993) A statistical view of some chemometrics regression tools. Technometrics 35:109–148
Koenker R, Ng P, Portnoy S (1994) Quantile smoothing splines. Biometrika 81:673–680
Lee LF (2003) Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econ Rev 22:307–335
Lee LF (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72:1899–1925
Lee LF (2007) GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J Econ 137:489–514
Lee LF, Yu JH (2010) Estimation of spatial autoregressive panel data models with fixed effects. J Econ 154:165–185
Leng C (2010) Variable selection and coefficient estimation via regularized rank regression. Stat Sinica 20(1):167–181
Li R, Liang H (2008) Variable selection in semiparametric regression model. Ann Stat 36:261–286
Pace RK, Gilley OW (1997) Using the spatial configuration of the data to improve estimation. J Real Estate Financ Econ 14:333–340
Su L, Yang Z (2009) Instrumental variable quantile estimation of spatial autoregressive models. Working paper. Singapore Management University.
Tibshirani RJ (1996) Regression shrinkage and selection via the LASSO. J R Stat Soc: Ser. B 58:267–288
Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757
Wang H, Li G, Jiang G (2007) Robust regression shrinkage and consistent variable selection through the LAD-Lasso. J Bus Econ Stat 25:347–355
Wang XQ, Jiang Y, Huang M, Zhang H (2013) Robust variable selection with exponential squared loss. J Am Stat Assoc 108(502):632–643
Xiong S (2010) Some notes on the nonnegative garrote. Technometrics 52(3):349–361
Zhang YQ, Shen DM (2015) Estimation of semi-parametric varying-coefficient spatial panel data models with random-effects. J Stat Plan Inference 159:64–80
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Author information
Authors and Affiliations
Corresponding author
Additional information
Xie’s work is supported by the National Natural Science Foundation of China (No. 11571340) and the Science and Technology Project of Beijing Municipal Education Commission (KM201710005032). Cao’s work is supported by China Postdoctoral Science Foundation (No. 2016M591030) and the National Natural Science Foundation of China (No. 11701020). Du’s work is supported by the National Natural Science Foundation of China (Nos. 11501018, 11771032, 11571340), and Program for Rixin Talents in Beijing University of Technology (No. 006000514116003).
Appendix : Proofs
Appendix : Proofs
In this part of the paper, we denote by C a generic positive constant, which may take different values at different places.
Proof of Theorem 1
Let \(\gamma _n=\sqrt{p_n}(n^{-1/2}+a_n)\) and set \(||u||=C\), where C is a large enough constant. Similar to Fan and Li (2001), we first show that \(\Vert \hat{\varvec{\theta }}-\varvec{\theta }_0\Vert =O_p(\gamma _n).\) It suffices to show that for any given \(\eta >0\), there is a large constant C such that, for large n,
Denote
Then, \(S_n(\varvec{u})\) can be written as
where \(R_1=\left( {\varvec{Y}}_n-\varvec{Z}\varvec{\theta }_0\right) ^T\varvec{Z} \varvec{u}\) and \(R_2= \varvec{u}^T \varvec{Z} ^T\varvec{Z} \varvec{u}.\)
First, we analyze \(R_1\). Note that
where \({\varvec{B}}=(\varvec{D}_n,{\varvec{X}}_n)\), \({\varvec{B}}^*=({\varvec{G}}_n{\varvec{X}}_n{\varvec{\beta }}_0,{\varvec{X}}_n)\), \(\varvec{e}=({\varvec{G}}_n{\varvec{\varepsilon }}_n,0)=({\varvec{G}}_n,0) {\varvec{\varepsilon }}_n\), \({\varvec{G}}_n={\varvec{W}}_n(\varvec{I}_n-\rho _0{\varvec{W}}_n)^{-1}\) and \(R_{11},R_{12}\) are as follows:
Consider \(R_{11}\). Obviously, \(E(R_{11})=0 \) and
Thus, one has \(\Vert {\varvec{\varepsilon }}_n^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1/2}\Vert ^2=O_p( p_n ).\) By Assumption 2 and the condition \(||u||=C\), we have \( \varvec{u}^T{{\varvec{B}}^*}^T {\varvec{H}}({\varvec{H}}^T{\varvec{H}})^{-1}{\varvec{H}}^T {\varvec{B}}^*\varvec{u}=O_p(n\Vert \varvec{u}\Vert ^2).\) Invoking Cauchy-Schwarz inequality, we have \(R_{11}=O_p(\sqrt{np_n}\Vert \varvec{u}\Vert )\). Similarly, by Assumption 3, one has
Therefore, by Cauchy-Schwarz inequality, it has
Combining the convergence rates of \(R_{11}\) and \(R_{12}\), one has \(R_{1}=O_p(\sqrt{np_n}\Vert \varvec{u}\Vert ).\)
Next, we consider \(R_2.\) By the fact \({\varvec{B}}^*=({\varvec{G}}_n{\varvec{X}}_n{\varvec{\beta }}_0,{\varvec{X}}_n) \) and the definition of \(\varvec{Z}\), one has
where
By Assumption 2, we have \( R_{21} \asymp \Vert \varvec{u}\Vert ^2 \). Similarly, \( R_{22} \asymp \Vert \varvec{u}\Vert ^2,\) and \( R_{23} \asymp \Vert \varvec{u}\Vert ^2.\) Thus, \( R_{2} \asymp n \Vert \varvec{u}\Vert ^2.\) Consequently, \(\gamma _n R_1=O_p(\gamma _n\sqrt{np_n}\Vert \varvec{u}\Vert )\) and \(\gamma _n^2 R_2=O_p(\gamma _n^2n {\Vert \varvec{u}\Vert ^2})\).
Summarizing the above results, \(\gamma _n^2 R_2\) dominates \(\gamma _n R_1\) uniformly in \(\Vert \varvec{u}\Vert =C\) for a large enough C. Note that
where s is the dimension of \({{\varvec{\beta }}}_{10}\), \(K_1({\varvec{u}})=n\sum \nolimits _{j=0}^s\{ p_{\lambda _{j}}(|\beta _{j0}+\gamma _nu_j|)-p_{\lambda _{j}}(|\beta _{j0}|)\}\). Then by Taylor’s expansion, we obtain
Thus, by taking C large enough, \(\gamma _n^2R_2\) dominates both \(\gamma _nR_1\) and \(K_1({\varvec{u}})\), and \(\gamma _n^2R_2\) is positive. This proves Theorem 1. \(\square \)
Proof of Theorem 2
We now show the sparsity. It is sufficient to show that with probability tending to one as \(n\rightarrow \infty \) for any \({{\varvec{\beta }}}_1\) satisfying \({{\varvec{\beta }}}_1-{{\varvec{\beta }}}_{10}=O_p(\sqrt{{p_n}/{n}})\) and for some small \(\delta _n=C\sqrt{ {p_n}/{n}}\) and \(j=s+1,\ldots ,p_n\),
In fact, for any \(\beta _{j},\ j=s+1,\ldots ,p_n\), using Taylor’s expansion we obtain
where \(Y_{i}\) is the ith component of \({\varvec{Y}}\) and \(Z_{ij}\) is the jth component of \(Z_i\), respectively. By the regulation conditions, we conclude
By the proof of Theorem 1, we can conclude that \(R_{31}=O_p(\sqrt{np_n})\), \(R_{32}=O_p(n\sqrt{\frac{p_n}{n}})\) and \(R_{33}=O_p(n\sqrt{\frac{p_n}{n}}).\) Then, we have
According to the Assumption 4, as \( n\rightarrow \infty \)
and
So it is easy to see that the sign of \(\beta _j\) completely determines the sign of \( \frac{\partial Q_n({{\varvec{\beta }}})}{\partial \beta _{j}},\) in other words, for \(j=s+1,\ldots ,p_n\)
It follows that \(\hat{{{\varvec{\beta }}}_2}=0\).
We now prove (II), namely showing the asymptotic normality of \((\hat{\rho },\hat{{\varvec{\beta }}}_1^T)^T\). For ease of presentation, let \(\beta _{10}^*=\rho \) and \(\beta _{1j}^*=\beta _{1j},\ j=1,\ldots ,s\), then denote \({\varvec{\beta }}_1^*=(\rho ,\beta _{11},\ldots ,\beta _{1s})^T\) and \({\varvec{\beta }}_{0}^*=(\rho _0,\beta _{01},\ldots ,\beta _{0s})^T\). Since \(\hat{\varvec{\theta }}\) minimizes \(Q_n({\varvec{\theta }})\), it satisfies the stationary equation for \({{\varvec{\beta }}^*}\) that
Invoking the local quadratic approximated proposed by Fan and Li (2001), for \(j=0,1,\ldots ,s,\)
The above equation can be rewritten as follows
where \(\varvec{b}_n=\{ p'_{\lambda _{0}}(|\rho _{0}|)\mathrm{sgn}(\rho _{0}),p'_{\lambda _{1}}(|\beta _{01}|)\mathrm{sgn}(\beta _{0j}),\ldots ,p'_{\lambda _{j}}(|\beta _{0s}|)\mathrm{sgn}(\beta _{0s}) \}^T\), and
By the definition of instrumental variable \({\varvec{H}}\) and Theorem 1, we have
Combining this, we can conclude that
i.e.
Then,
Invoking Lindeberg–Feller central limit theorem, we have
where \({\varvec{\Sigma }}_1=\mathop {\text {lim}}\limits _{n\rightarrow \infty } E({\varvec{Z}}_1^T{\varvec{Z}}_1)/n\) and \({\varvec{G}}=\mathop {\text {lim}}\limits _{n\rightarrow \infty } {\varvec{A}}_n^T{\varvec{A}}_n\). Since \({\varvec{\beta }}_1^*=(\rho ,\beta _{11}\ldots ,\beta _{1s})^T\) and \({\varvec{\beta }}_0^*=(\rho _0,\beta _{01},\ldots ,\beta _{0s})^T\), the proof is completed. \(\square \)
Rights and permissions
About this article
Cite this article
Xie, T., Cao, R. & Du, J. Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Papers 61, 1125–1145 (2020). https://doi.org/10.1007/s00362-018-0984-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-018-0984-2