Abstract
Issues concerning spatial dependence among cross-sectional units in econometrics have received more and more attention. Motivated by a Boston housing price data analysis, this paper studies the sparse inference of varying coefficient partially linear spatial autoregressive model, which is quite valuable in econometrics with high-dimensional data. A novel, efficient and convenient one-step variable selection procedure is proposed by using a twofold penalty for simultaneous estimation and variable selection of the parametric components and varying coefficient functions, in which the varying coefficient functions are approximated by the B-spline basis. Under some regularity conditions, asymptotic properties of the resulting estimators are established, including consistency, asymptotic normality and the oracle property. Besides, the optimal choices of the tuning parameters are discussed and a practical iterative algorithm based on the locally quadratic approximation approach is presented for implementation. Finally, extensive numerical simulations and a Boston housing price data analysis are conducted to confirm the finite sample performance and theoretical findings of the new method.
Similar content being viewed by others
Availability of data
The dataset is freely available from the R package spdep.
References
Ai C, Zhang Y (2017) Estimation of partially specified spatial panel data models with fixed-effects. Econom Rev 36:6–22
Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah A, Giles DEA (eds) Handbook of applied economic statistics. Marcel Dekker, New York, pp 237–289
Baltagi BH, Li D (2001) LM tests for functional form and spatial error correlation. Int Reg Sci Rev 24:194–225
Basile R, Gress B (2005) Semi-parametric spatial auto-covariance models of regional growth in Europe. Rég Dév 21:93–118
Chen Y, Wang Q, Yao W (2015) Adaptive estimation for varying coefficient models. J Multivar Anal 137:17–31
Cheng S, Chen J, Liu X (2019) GMM estimation of partially linear single-index spatial autoregressive model. Spat Stat 31:100354
De Boor C (2001) A practical guide to splines. Springer, New York
Du J, Sun X, Cao R, Zhang Z (2018) Statistical inference for partially linear additive spatial autoregressive models. Spat Stat 25:52–67
Fan J, Huang T (2005) Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11:1031–1057
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Frank I, Friedman J (1993) A statistical view of some chemometrics tools. Technometrics 35:109–135
Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5:81–102
Hu T, Xia Y (2012) Adaptive semi-varying coefficient model selection. Stat Sin 22:575–599
Jencks C, Mayer S (1990) The social consequences of growing up in a poor neighborhood. Inner-city poverty in the United States. National Academy, Washington
Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J Real Estate Finance Econ 17:99–121
Kim MO (2007) Quantile regression with varying coefficients. Ann Stat 35:92–108
Kostov P (2009) A spatial quantile regression hedonic model of agriculture land prices. Spat Econ Anal 4:53–72
Lee LF (2003) Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econom Rev 22:307–335
Lee LF (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric models. Econometrica 72:1899–1926
Lee LF (2007) GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J Econom 137:489–514
Leng C (2009) A simple approach for varying-coefficient model selection. J Stat Plan Inference 139:2138–2146
Li R, Liang H (2008) Variable selection in semiparametric regression model. Ann Stat 36:261–286
Lian H (2012) Semiparametric estimation of additive quantile regression models by two-fold penalty. J Bus Econ Stat 30:337–350
Lin X, Lee LF (2010) GMM estimation of spatial autoregressive models with unknown heteroskedasticity. J Econom 157:34–52
Noh H, Chung K, Van Keilegom I (2012) Variable selection of varying coefficient models in quantile regression. Electron J Stat 6:1220–1238
Pace RK, Gilley OW (1997) Using the spatial configuration of the data to improve estimation. J Real Estate Finance Econ 14:333–340
Pal AB, Dubey AK, Chaturvedi A (2016) Shrinkage estimation in spatial autoregressive model. J Multivar Anal 143:362–373
Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464
Su L (2012) Semiparametric GMM estimation of spatial autoregressive models. J Econom 167:543–560
Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33
Su L, Yang Z (2011) Instrumental variable quantile estimation of spatial autoregressive models. Working paper, Singapore Management University
Sun Y (2017) Estimation of single-index model with spatial interaction. Reg Sci Urban Econ 62:36–45
Sun Y, Wu Y (2018) Estimation and testing for a partially linear single-index spatial regression model. Spat Econ Anal 13:473–489
Sun Y, Zhang Y, Huang JZ (2019) Estimation of a semiparametric varying-coefficient mixed regressive spatial autoregressive model. Econom Stat 9:140–155
Tang Y, Wang HJ, Zhu Z, Song X (2012) A unified variable selection approach for varying coefficient models. Stat Sin 22:601–628
Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267–288
Wakefield J (2007) Disease mapping and spatial regression with count data. Biostatistics 8:158–183
Waller LA, Gotway CA (2004) Applied spatial statistics for public health data. Wiley, Hoboken
Wang D, Kulasekera KB (2012) Parametric component detection and variable selection in varying-coefficient partially linear models. J Multivar Anal 112:117–129
Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757
Wang HJ, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Ann Stat 37:3841–3866
Wei C, Guo S, Zhai S (2017) Statistical inference of partially linear varying coefficient spatial autoregressive models. Econ Model 64:553–559
Wu Y, Sun Y (2017) Shrinkage estimation of the linear model with spatial interaction. Metrika 80:51–68
Xia Y, Zhang W, Tong H (2004) Efficient estimation for semivarying-coefficient models. Biometrika 91:661–681
Xie T, Cao R, Du J (2020) Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Pap 61:1125–1145
Xu X, Lee LF (2015) A spatial auto regressive model with a nonlinear transformation of the dependent variable. J Econom 186:1–18
Yang Z, Li C, Tse YK (2006) Functional form and spatial dependence in dynamic panels. Econ Lett 91:138–145
Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942
Zhang Y, Shen D (2015) Estimation of semi-parametric varying-coefficient spatial panel data models with random-effects. J Stat Plan Inference 159:64–80
Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112
Zhao P, Xue L (2012) Variable selection in semiparametric regression analysis for longitudinal data. Ann Inst Stat Math 64:213–231
Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101:1418–1429
Acknowledgements
The authors would like to thank the editor, associate editor and the referees for their valuable comments which significantly enhance the quality of this paper.
Funding
Fang Lu’s research was funded by the National Natural Science Foundation of China (Grant 11801169) and the Natural Science Foundation of Hunan Province (Grant 2019JJ50378). Jing Yang’s research was funded by the National Natural Science Foundation of China (Grant 11801168), the Natural Science Foundation of Hunan Province (Grant 2018JJ3322), the Scientific Research Fund of Hunan Provincial Education Department (Grant 18B024) and the support of China Scholarship Council for his visiting to University of California, Riverside. Xuewen Lu’s research was funded by Discovery Grants (RGPIN-2018-06466) from Natural Sciences and Engineering Research Council (NSERC) of Canada.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Code availability
All codes are written in R software and are available from the corresponding author on request.
Conflict of interest
The authors declare that they have no conflict of interest.
Ethical approval
This article does not contain any studies with human participants performed by any of the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Let C denote a generic constant that might assume different values at different places. Throughout the appendix, we use \(\Vert \cdot \Vert \) to represent the Euclidean norm, which means \(\Vert a\Vert =\sqrt{\sum _{i=1}^p {a_i^2}}\) for any vector \(a=(a_1,\ldots ,a_p)\) and \(\Vert M\Vert =\lambda _{\max }^{1/2}(M^TM)\) for any matrix M. Suppose \(B(u)^T\gamma _{s0}\) be the best approximating spline function to \(\alpha _s(u)\), it follows from the result on page 149 of De Boor (2001) that
under conditions (C6) and (C7), where \(r_s(u)=\alpha _s(u)-B(u)^T\gamma _{s0}\), \(s=1,\ldots ,q\).
Proof of Theorem 1
(i) Let \(\delta _n=k_n^{-r}+a_n\), \(\theta ^*=\theta +\delta _n\omega \), where \(\omega =(\omega _1,\ldots ,\omega _{p+qL+1})^T\) is a \((p+qL+1)\)-dimensional vector. It is sufficient to show that, for any given \(\eta >0\), there exists a large constant C such that
Note that
where the inequality holds due to \(P_{\lambda }(\cdot )\ge 0\) and \(P_{\lambda }(0)=0\).
For the term \(L_{n1}\), it can be expressed as
where the last equality holds by the idempotent property of M. Let \(r_n=Z\alpha (U)-\varPi \gamma \), since \(V=(W(I-\rho W)^{-1}(X\beta +Z\alpha (U)+\varepsilon ),X,\varPi )\) and \(Y-V\theta =\varepsilon +r_n\), we have
where \(\tilde{Q}=(W(I-\rho W)^{-1}(X\beta +Z\alpha (U)),X,\varPi )\) and \(e=(W(I-\rho W)^{-1}\varepsilon ,0,0))\).
Let \(\varDelta _n=\tilde{Q}-Q=(0,0,(1-k_n^{1/2})\varPi )\), then \(\tilde{Q}=Q+\varDelta _n\). Note that
where the inequality holds due to condition (C3). Moreover, we can similarly prove \(E(\Vert \varepsilon ^TM\varDelta _n\Vert ^2)=O_p(k_n)\) by Lemma A.4 of Kim (2007) that the eigenvalues of \(k_n\varPi \varPi ^T/n\) are bounded in probability. Hence, we have \(R_1=O_p(k_n^{1/2})\) by noting that \(E(R_1)=0\) .
In addition, \(E(\Vert \varepsilon ^TM\Vert ^2)=E\{ \text{ trace }(M^T\varepsilon \varepsilon ^TM) \}=O_p(k_n)\), and
from condition (C3), then \(\Vert \varepsilon ^TM\Vert =O_p(k_n^{1/2})\) and \(\Vert MS\varepsilon \Vert =O_p(k_n^{1/2})\) by noting that \(E(\varepsilon ^TM)=0\) and \(E(MS\varepsilon )=0\). As a result,
Moreover, it is easy to verify that \(\Vert r_n\Vert =O_p(\sqrt{n}k_n^{-r})\) by condition (C5) and Eq. (11). Combining this result with condition (C3) as well as some similar arguments in \(R_1\), we can obtain \(\Vert R_3\Vert =O_p(\sqrt{n}k_n^{-r})\).
Notice that \(E(\Vert r_n^TM\Vert ^2)=E\{ \text{ trace }(r_n^TMM^Tr_n) \} \le E\{ \text{ trace }(r_n^Tr_n) \}=O_p(nk_n^{-2r})\), and
Then, \(\Vert R_4\Vert =\Vert r_n^TMMe\Vert \le \Vert r_n^TM\Vert \Vert Me\Vert =O_p(\sqrt{n}k_n^{-(r-1/2)})\). Taking into account of the condition \(k_n=O(n^{1/(2r+1)})\), we have
On the other hand, \(V^TMV=(\tilde{Q}+e)^TM(\tilde{Q}+e)=\tilde{Q}^TM\tilde{Q}+\tilde{Q}^TMe+e^TM\tilde{Q}+e^TMe\), based on some similar arguments as above and condition (C4), we can verify that \(\Vert \tilde{Q}^TM\tilde{Q}\Vert =O_p(n)\), \(\Vert \tilde{Q}^TMe\Vert =\Vert e^TM\tilde{Q}\Vert =O_p(k_n^{1/2})\) and \(\Vert e^TMe\Vert =O_p(k_n^{1/2})\). Consequently,
Clearly, \(L_{n1}\) is uniformly dominated by \(\delta _n^2\omega ^TV^TMV\omega /2\) from (13)–(15) in \(\Vert \omega \Vert =C\).
Now, we consider \(L_{n2}\). Recall that \(P_{\lambda }(0)=0\), applying the Taylor expansion approach to the penalty \(P_{\lambda _{1k}}(|\beta _k^*|)\) yields
where the definitions of \(a_n\) and \(b_n\) are given in Sect. 3. Therefore, \(\delta _n^2\omega ^TV^TMV\omega /2\) also uniformly dominate \(L_{n2}\) in \(\Vert \omega \Vert =C\). With the same arguments, we can demonstrate that \(L_{n3}\) is also uniformly dominated by \(\delta _n^2\omega ^TV^TMV\omega /2\). Therefore, (12) holds for sufficiently large C. This means, with probability at least \(1-\eta \), there exists a local minimizer \(\hat{\theta }\) such that \(\Vert \hat{\theta }-\theta \Vert =O_p(\delta _n)\). This completes the proof.
(ii) For \(s=1,\ldots ,q\), it follows from result (i) and Eq. (11) that
where the penultimate equation holds due to \(B=\int B(u)B(u)^T {\mathrm{d}}u = O(1)\). Consequently, \(\Vert \hat{\alpha }_s(\cdot )- \alpha _{s}(\cdot )\Vert =O_p(\delta _n)=O_p(k_n^{-r}+a_n)\); this completes the proof. \(\square \)
Proof of Theorem 2
(i) Obviously, \(a_n\rightarrow 0\) as \(\lambda _{\max }\rightarrow 0\). Based on Theorem 1, it is sufficient to show that, for any \(\hat{\gamma }\) satisfying \(\Vert \hat{\gamma }-\gamma \Vert =O_p(k_n^{-r})\), \(\hat{\beta }_k\) satisfying \(\Vert \hat{\beta }_k-\beta _{k}\Vert =O_p(k_n^{-r})\), \(k=1,\ldots ,p_0\), and some given small \(\xi _n=Ck_n^{-r}\), with probability approaching to 1 as \(n\rightarrow \infty \), we have
Thus, (16) and (17) imply that the minimizer of \(\partial L_n(\theta )\) achieves at \(\hat{\beta }_k=0\), \(k=p_0+1,\ldots ,p\).
For any matrix A, let \(A_{ij}\) be the (i, j)th element, \(A_{i\cdot }\) and \(A_{\cdot j}\), respectively, be the ith row and jth column of A. In fact,
where the last equality holds since M is an idempotent matrix.
By a similar proof of Theorem 1, we can obtain
Since \(O_p((\lambda _{1k}k_n^{r})^{-1})\rightarrow 0\) due to \(\lambda _{1k}k_n^{r}=\lambda _{1k}n^{r/(2r+1)}>\lambda _{\min }n^{r/(2r+1)}\rightarrow \infty \). By the Taylor expansion, assumptions (A1) and (A2), we have \(\lambda _{1k}^{-1}P^\prime _{\lambda _{1k}}(|\hat{\beta }_k|)>0\). Hence, the sign of \(\frac{\partial L_n(\theta )}{\partial \beta _k}\Big |_{\hat{\beta }_k}\) is completely determined by the sign of \(\hat{\beta }_k\), which implies the results of (16) and (17). This completes the proof.
(ii) Following the similar arguments as in the proof of (i), we can obtain that \(\hat{\gamma }_s=0\), \(s=q_0+1,\ldots ,q\), with probability approaching to 1 as \(n\rightarrow \infty \). Combining this with the fact that \(\sup _{u} \Vert B(u)\Vert =O(1)\) and \(\hat{\alpha }_s(u)=B(u)^T\hat{\gamma }_s\), the proof is immediately completed. \(\square \)
Proof of Theorem 3
Based on Theorems 1 and 2, we know that with probability approaching to 1 as \(n\rightarrow \infty \), \(L_n(\theta )\) achieves the minimal value at \(\hat{\rho }\), \((\hat{\beta }_I^T,0^T)^T\) and \((\hat{\gamma }_I^T,0^T)^T\). Define \(L_{n1}(\theta )=\partial L_{n}(\theta )/\partial \beta _I\) and \(L_{n2}(\theta )=\partial L_{n}(\theta )/\partial \gamma _I\), then \(\hat{\rho }\), \((\hat{\beta }_I^T,0^T)^T\) and \((\hat{\gamma }_I^T,0^T)^T\) must satisfy
where \(V_I=(W(I-\rho W)^{-1}(X_I\beta _I+Z_I\alpha _I(U)+\varepsilon ),X_I)\), \(\xi =(\rho ,\beta _I)\) is defined in Sect. 3. By the Taylor expansion, we have
Since \(P_{\lambda _{1k}}^{\prime \prime }(|\beta _k|)=o_p(1)\) by assumption (A1) and \(P_{\lambda _{1k}}^\prime (|\beta _k|)\rightarrow 0 \) as \(\lambda _{\max }\rightarrow 0\), then \(\sum _{k=1}^{p_0} { P_{\lambda _{1k}}^\prime (|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k) }=o_p(1)\). Similarly, \(\sum _{s=1}^{q_0} { P_{\lambda _{2s}}(\Vert \hat{\gamma }_s\Vert _B) \frac{B\hat{\gamma }_s}{\Vert \hat{\gamma }_s\Vert _B} }=o_p(1)\). Based on some similar arguments as in the proof of Theorem 1, the above two equations can be written as
where \(\varXi _I=(W(I-\rho W)^{-1}(X_I\beta _I+Z_I\alpha _I(U)),X_I)\) is defined in assumption (A3).
Let \(\varPhi _n=\varPi _I^TM\varPi _I/n\) and \(\varPsi _n=\varPi _I^TM \varXi _I/n\), it follows from Eq. (19) that
That is,
Let \(\varGamma _n=\varXi _I^TM\varXi _I/n\), inserting Eq. (20) into (18) leads to
which is equivalent to
Note that \(\frac{1}{\sqrt{n}}(\varPsi _n^T\varPhi _n^{-1}\varPi _I^TM-\varXi _I^TM)r_{n}=o_p(1)\) from (11) and assumptions (A3), then
Based on the slutsky’s theorem and central limit theorem, we have
This completes the proof of Theorem 3. \(\square \)
Rights and permissions
About this article
Cite this article
Lu, F., Yang, J. & Lu, X. One-step oracle procedure for semi-parametric spatial autoregressive model and its empirical application to Boston housing price data. Empir Econ 62, 2645–2671 (2022). https://doi.org/10.1007/s00181-021-02118-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00181-021-02118-z