Skip to main content
Log in

One-step oracle procedure for semi-parametric spatial autoregressive model and its empirical application to Boston housing price data

  • Published:
Empirical Economics Aims and scope Submit manuscript

Abstract

Issues concerning spatial dependence among cross-sectional units in econometrics have received more and more attention. Motivated by a Boston housing price data analysis, this paper studies the sparse inference of varying coefficient partially linear spatial autoregressive model, which is quite valuable in econometrics with high-dimensional data. A novel, efficient and convenient one-step variable selection procedure is proposed by using a twofold penalty for simultaneous estimation and variable selection of the parametric components and varying coefficient functions, in which the varying coefficient functions are approximated by the B-spline basis. Under some regularity conditions, asymptotic properties of the resulting estimators are established, including consistency, asymptotic normality and the oracle property. Besides, the optimal choices of the tuning parameters are discussed and a practical iterative algorithm based on the locally quadratic approximation approach is presented for implementation. Finally, extensive numerical simulations and a Boston housing price data analysis are conducted to confirm the finite sample performance and theoretical findings of the new method.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Availability of data

The dataset is freely available from the R package spdep.

References

  • Ai C, Zhang Y (2017) Estimation of partially specified spatial panel data models with fixed-effects. Econom Rev 36:6–22

    Article  Google Scholar 

  • Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an introduction to spatial econometrics. In: Ullah A, Giles DEA (eds) Handbook of applied economic statistics. Marcel Dekker, New York, pp 237–289

    Google Scholar 

  • Baltagi BH, Li D (2001) LM tests for functional form and spatial error correlation. Int Reg Sci Rev 24:194–225

    Article  Google Scholar 

  • Basile R, Gress B (2005) Semi-parametric spatial auto-covariance models of regional growth in Europe. Rég Dév 21:93–118

    Google Scholar 

  • Chen Y, Wang Q, Yao W (2015) Adaptive estimation for varying coefficient models. J Multivar Anal 137:17–31

    Article  Google Scholar 

  • Cheng S, Chen J, Liu X (2019) GMM estimation of partially linear single-index spatial autoregressive model. Spat Stat 31:100354

    Article  Google Scholar 

  • De Boor C (2001) A practical guide to splines. Springer, New York

    Google Scholar 

  • Du J, Sun X, Cao R, Zhang Z (2018) Statistical inference for partially linear additive spatial autoregressive models. Spat Stat 25:52–67

    Article  Google Scholar 

  • Fan J, Huang T (2005) Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli 11:1031–1057

    Article  Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  Google Scholar 

  • Frank I, Friedman J (1993) A statistical view of some chemometrics tools. Technometrics 35:109–135

    Article  Google Scholar 

  • Harrison D, Rubinfeld DL (1978) Hedonic housing prices and the demand for clean air. J Environ Econ Manag 5:81–102

    Article  Google Scholar 

  • Hu T, Xia Y (2012) Adaptive semi-varying coefficient model selection. Stat Sin 22:575–599

    Article  Google Scholar 

  • Jencks C, Mayer S (1990) The social consequences of growing up in a poor neighborhood. Inner-city poverty in the United States. National Academy, Washington

    Google Scholar 

  • Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbances. J Real Estate Finance Econ 17:99–121

    Article  Google Scholar 

  • Kim MO (2007) Quantile regression with varying coefficients. Ann Stat 35:92–108

    Article  Google Scholar 

  • Kostov P (2009) A spatial quantile regression hedonic model of agriculture land prices. Spat Econ Anal 4:53–72

    Article  Google Scholar 

  • Lee LF (2003) Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econom Rev 22:307–335

    Article  Google Scholar 

  • Lee LF (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial econometric models. Econometrica 72:1899–1926

    Article  Google Scholar 

  • Lee LF (2007) GMM and 2SLS estimation of mixed regressive, spatial autoregressive models. J Econom 137:489–514

    Article  Google Scholar 

  • Leng C (2009) A simple approach for varying-coefficient model selection. J Stat Plan Inference 139:2138–2146

    Article  Google Scholar 

  • Li R, Liang H (2008) Variable selection in semiparametric regression model. Ann Stat 36:261–286

    Article  Google Scholar 

  • Lian H (2012) Semiparametric estimation of additive quantile regression models by two-fold penalty. J Bus Econ Stat 30:337–350

    Article  Google Scholar 

  • Lin X, Lee LF (2010) GMM estimation of spatial autoregressive models with unknown heteroskedasticity. J Econom 157:34–52

    Article  Google Scholar 

  • Noh H, Chung K, Van Keilegom I (2012) Variable selection of varying coefficient models in quantile regression. Electron J Stat 6:1220–1238

    Article  Google Scholar 

  • Pace RK, Gilley OW (1997) Using the spatial configuration of the data to improve estimation. J Real Estate Finance Econ 14:333–340

    Article  Google Scholar 

  • Pal AB, Dubey AK, Chaturvedi A (2016) Shrinkage estimation in spatial autoregressive model. J Multivar Anal 143:362–373

    Article  Google Scholar 

  • Schwarz G (1978) Estimating the dimension of a model. Ann Stat 6:461–464

    Article  Google Scholar 

  • Su L (2012) Semiparametric GMM estimation of spatial autoregressive models. J Econom 167:543–560

    Article  Google Scholar 

  • Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33

    Article  Google Scholar 

  • Su L, Yang Z (2011) Instrumental variable quantile estimation of spatial autoregressive models. Working paper, Singapore Management University

  • Sun Y (2017) Estimation of single-index model with spatial interaction. Reg Sci Urban Econ 62:36–45

    Article  Google Scholar 

  • Sun Y, Wu Y (2018) Estimation and testing for a partially linear single-index spatial regression model. Spat Econ Anal 13:473–489

    Article  Google Scholar 

  • Sun Y, Zhang Y, Huang JZ (2019) Estimation of a semiparametric varying-coefficient mixed regressive spatial autoregressive model. Econom Stat 9:140–155

    Google Scholar 

  • Tang Y, Wang HJ, Zhu Z, Song X (2012) A unified variable selection approach for varying coefficient models. Stat Sin 22:601–628

    Article  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the Lasso. J R Stat Soc Ser B 58:267–288

    Google Scholar 

  • Wakefield J (2007) Disease mapping and spatial regression with count data. Biostatistics 8:158–183

    Article  Google Scholar 

  • Waller LA, Gotway CA (2004) Applied spatial statistics for public health data. Wiley, Hoboken

    Book  Google Scholar 

  • Wang D, Kulasekera KB (2012) Parametric component detection and variable selection in varying-coefficient partially linear models. J Multivar Anal 112:117–129

    Article  Google Scholar 

  • Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757

    Article  Google Scholar 

  • Wang HJ, Zhu Z, Zhou J (2009) Quantile regression in partially linear varying coefficient models. Ann Stat 37:3841–3866

    Article  Google Scholar 

  • Wei C, Guo S, Zhai S (2017) Statistical inference of partially linear varying coefficient spatial autoregressive models. Econ Model 64:553–559

    Article  Google Scholar 

  • Wu Y, Sun Y (2017) Shrinkage estimation of the linear model with spatial interaction. Metrika 80:51–68

    Article  Google Scholar 

  • Xia Y, Zhang W, Tong H (2004) Efficient estimation for semivarying-coefficient models. Biometrika 91:661–681

    Article  Google Scholar 

  • Xie T, Cao R, Du J (2020) Variable selection for spatial autoregressive models with a diverging number of parameters. Stat Pap 61:1125–1145

    Article  Google Scholar 

  • Xu X, Lee LF (2015) A spatial auto regressive model with a nonlinear transformation of the dependent variable. J Econom 186:1–18

    Article  Google Scholar 

  • Yang Z, Li C, Tse YK (2006) Functional form and spatial dependence in dynamic panels. Econ Lett 91:138–145

    Article  Google Scholar 

  • Zhang CH (2010) Nearly unbiased variable selection under minimax concave penalty. Ann Stat 38:894–942

    Article  Google Scholar 

  • Zhang Y, Shen D (2015) Estimation of semi-parametric varying-coefficient spatial panel data models with random-effects. J Stat Plan Inference 159:64–80

    Article  Google Scholar 

  • Zhang HH, Cheng G, Liu Y (2011) Linear or nonlinear? Automatic structure discovery for partially linear models. J Am Stat Assoc 106:1099–1112

    Article  Google Scholar 

  • Zhao P, Xue L (2012) Variable selection in semiparametric regression analysis for longitudinal data. Ann Inst Stat Math 64:213–231

    Article  Google Scholar 

  • Zou H (2006) The adaptive LASSO and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  Google Scholar 

Download references

Acknowledgements

The authors would like to thank the editor, associate editor and the referees for their valuable comments which significantly enhance the quality of this paper.

Funding

Fang Lu’s research was funded by the National Natural Science Foundation of China (Grant 11801169) and the Natural Science Foundation of Hunan Province (Grant 2019JJ50378). Jing Yang’s research was funded by the National Natural Science Foundation of China (Grant 11801168), the Natural Science Foundation of Hunan Province (Grant 2018JJ3322), the Scientific Research Fund of Hunan Provincial Education Department (Grant 18B024) and the support of China Scholarship Council for his visiting to University of California, Riverside. Xuewen Lu’s research was funded by Discovery Grants (RGPIN-2018-06466) from Natural Sciences and Engineering Research Council (NSERC) of Canada.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jing Yang.

Ethics declarations

Code availability

All codes are written in R software and are available from the corresponding author on request.

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This article does not contain any studies with human participants performed by any of the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

Let C denote a generic constant that might assume different values at different places. Throughout the appendix, we use \(\Vert \cdot \Vert \) to represent the Euclidean norm, which means \(\Vert a\Vert =\sqrt{\sum _{i=1}^p {a_i^2}}\) for any vector \(a=(a_1,\ldots ,a_p)\) and \(\Vert M\Vert =\lambda _{\max }^{1/2}(M^TM)\) for any matrix M. Suppose \(B(u)^T\gamma _{s0}\) be the best approximating spline function to \(\alpha _s(u)\), it follows from the result on page 149 of De Boor (2001) that

$$\begin{aligned} \sup _{u\in [0,1]}|r_s(u)|=O_p\big (k_n^{-r}\big ) \end{aligned}$$
(11)

under conditions (C6) and (C7), where \(r_s(u)=\alpha _s(u)-B(u)^T\gamma _{s0}\), \(s=1,\ldots ,q\).

Proof of Theorem 1

(i) Let \(\delta _n=k_n^{-r}+a_n\), \(\theta ^*=\theta +\delta _n\omega \), where \(\omega =(\omega _1,\ldots ,\omega _{p+qL+1})^T\) is a \((p+qL+1)\)-dimensional vector. It is sufficient to show that, for any given \(\eta >0\), there exists a large constant C such that

$$\begin{aligned} P\left\{ \inf _{\Vert \omega \Vert =C}L_n(\theta ^*)>L_n(\theta )\right\} \ge 1-\eta . \end{aligned}$$
(12)

Note that

$$\begin{aligned}&L_n(\theta ^*)-L_n(\theta ) \\&\quad =\frac{1}{2}\left\{ \Vert Y-MV\theta ^*\Vert ^2-\Vert Y-MV\theta \Vert ^2\right\} + n \sum _{k=1}^p { \left\{ P_{\lambda _{1k}}(|\beta _k^*|)-P_{\lambda _{1k}}(|\beta _k|) \right\} } \\&\qquad +n \sum _{s=1}^q {\left\{ P_{\lambda _{2s}}(\Vert \gamma _s^*\Vert _B)-P_{\lambda _{2s}}(\Vert \gamma _s\Vert _B)\right\} } \\&\quad \ge \frac{1}{2}\left\{ \Vert Y-MV\theta ^*\Vert ^2-\Vert Y-MV\theta \Vert ^2\right\} + n \sum _{k=1}^{p_0} { \left\{ P_{\lambda _{1k}}(|\beta _k^*|)-P_{\lambda _{1k}}(|\beta _k|) \right\} } \\&\qquad +n \sum _{s=1}^{q_0} {\left\{ P_{\lambda _{2s}}(\Vert \gamma _s^*\Vert _B)-P_{\lambda _{2s}}(\Vert \gamma _s\Vert _B)\right\} } \\&\qquad \triangleq L_{n1}+L_{n2}+L_{n3}, \end{aligned}$$

where the inequality holds due to \(P_{\lambda }(\cdot )\ge 0\) and \(P_{\lambda }(0)=0\).

For the term \(L_{n1}\), it can be expressed as

$$\begin{aligned} L_{n1}= & {} \frac{1}{2}\left\{ \Vert Y-MV(\theta +\delta _n\omega )\Vert ^2-\Vert Y-MV\theta \Vert ^2\right\} \nonumber \\= & {} -\delta _n(Y-MV\theta )^TMV\omega + \frac{\delta _n^2}{2}\omega ^TV^TM^TMV\omega \nonumber \\= & {} -\delta _n(Y-V\theta )^TMV\omega + \frac{\delta _n^2}{2}\omega ^TV^TMV\omega , \end{aligned}$$
(13)

where the last equality holds by the idempotent property of M. Let \(r_n=Z\alpha (U)-\varPi \gamma \), since \(V=(W(I-\rho W)^{-1}(X\beta +Z\alpha (U)+\varepsilon ),X,\varPi )\) and \(Y-V\theta =\varepsilon +r_n\), we have

$$\begin{aligned}&(Y-V\theta )^TMV \\&\quad =(\varepsilon +r_n)^TM \{ (W(I-\rho W)^{-1}(X\beta +Z\alpha (U)),X,\varPi ) + (W(I-\rho W)^{-1}\varepsilon ,0,0))\} \\&\quad =(\varepsilon +r_n)^TM (\tilde{Q}+e) = \varepsilon ^TM\tilde{Q} + \varepsilon ^TM e + r_n^TM \tilde{Q} + r_n^TM e \\&\quad \triangleq R_1 + R_2 + R_3 + R_4, \end{aligned}$$

where \(\tilde{Q}=(W(I-\rho W)^{-1}(X\beta +Z\alpha (U)),X,\varPi )\) and \(e=(W(I-\rho W)^{-1}\varepsilon ,0,0))\).

Let \(\varDelta _n=\tilde{Q}-Q=(0,0,(1-k_n^{1/2})\varPi )\), then \(\tilde{Q}=Q+\varDelta _n\). Note that

$$\begin{aligned}&E(\Vert \varepsilon ^TMQ\Vert ^2)=E\{ \text{ trace }(Q^TM\varepsilon \varepsilon ^TMQ) \}=\sigma ^2 E\{ \text{ trace }(Q^TMMQ) \} \\&\quad =\sigma ^2 E\{ \text{ trace }(MQQ^TM) \}\le C_2 \sigma ^2 E\{ \text{ trace }(H(H^TH)^{-1}H^T) \}=O_p(k_n), \end{aligned}$$

where the inequality holds due to condition (C3). Moreover, we can similarly prove \(E(\Vert \varepsilon ^TM\varDelta _n\Vert ^2)=O_p(k_n)\) by Lemma A.4 of Kim (2007) that the eigenvalues of \(k_n\varPi \varPi ^T/n\) are bounded in probability. Hence, we have \(R_1=O_p(k_n^{1/2})\) by noting that \(E(R_1)=0\) .

In addition, \(E(\Vert \varepsilon ^TM\Vert ^2)=E\{ \text{ trace }(M^T\varepsilon \varepsilon ^TM) \}=O_p(k_n)\), and

$$\begin{aligned} E(\Vert MS\varepsilon \Vert ^2)=E\{ \text{ trace }(MS\varepsilon \varepsilon ^TS^TM) \}\le C_1\sigma ^2 E\{ \text{ trace }(M) \}=O_p(k_n) \end{aligned}$$

from condition (C3), then \(\Vert \varepsilon ^TM\Vert =O_p(k_n^{1/2})\) and \(\Vert MS\varepsilon \Vert =O_p(k_n^{1/2})\) by noting that \(E(\varepsilon ^TM)=0\) and \(E(MS\varepsilon )=0\). As a result,

$$\begin{aligned} \Vert R_2\Vert =\Vert \varepsilon ^TMe\Vert =\Vert \varepsilon ^TMS\varepsilon \Vert \le \Vert \varepsilon ^TMMS\varepsilon \Vert \le \Vert \varepsilon ^TM\Vert \Vert MS\varepsilon \Vert =O_p(k_n). \end{aligned}$$

Moreover, it is easy to verify that \(\Vert r_n\Vert =O_p(\sqrt{n}k_n^{-r})\) by condition (C5) and Eq. (11). Combining this result with condition (C3) as well as some similar arguments in \(R_1\), we can obtain \(\Vert R_3\Vert =O_p(\sqrt{n}k_n^{-r})\).

Notice that \(E(\Vert r_n^TM\Vert ^2)=E\{ \text{ trace }(r_n^TMM^Tr_n) \} \le E\{ \text{ trace }(r_n^Tr_n) \}=O_p(nk_n^{-2r})\), and

$$\begin{aligned} E(\Vert Me\Vert ^2)= & {} E(\Vert MS\varepsilon \Vert ^2)=E\{ \text{ trace }(\varepsilon ^TS^TMMS\varepsilon ) \} \\= & {} E\{ \text{ trace }(MS\varepsilon \varepsilon ^TS^TM) \} \le C_1\sigma ^2 E\{ \text{ trace }(M) \}=O_p(k_n). \end{aligned}$$

Then, \(\Vert R_4\Vert =\Vert r_n^TMMe\Vert \le \Vert r_n^TM\Vert \Vert Me\Vert =O_p(\sqrt{n}k_n^{-(r-1/2)})\). Taking into account of the condition \(k_n=O(n^{1/(2r+1)})\), we have

$$\begin{aligned} \Vert \delta _n(Y-V\theta )^TMV\omega \Vert =O_p\left( \delta _n\big (\sqrt{n}k_n^{-(r-1/2)}+k_n\big )\Vert \omega \Vert \right) =O_p\left( \delta _nk_n\Vert \omega \Vert )\right) . \end{aligned}$$
(14)

On the other hand, \(V^TMV=(\tilde{Q}+e)^TM(\tilde{Q}+e)=\tilde{Q}^TM\tilde{Q}+\tilde{Q}^TMe+e^TM\tilde{Q}+e^TMe\), based on some similar arguments as above and condition (C4), we can verify that \(\Vert \tilde{Q}^TM\tilde{Q}\Vert =O_p(n)\), \(\Vert \tilde{Q}^TMe\Vert =\Vert e^TM\tilde{Q}\Vert =O_p(k_n^{1/2})\) and \(\Vert e^TMe\Vert =O_p(k_n^{1/2})\). Consequently,

$$\begin{aligned} \Vert \delta _n^2\omega ^TV^TMV\omega /2\Vert =O_p\big (n\delta _n^2\Vert \omega \Vert ^2\big ). \end{aligned}$$
(15)

Clearly, \(L_{n1}\) is uniformly dominated by \(\delta _n^2\omega ^TV^TMV\omega /2\) from (13)–(15) in \(\Vert \omega \Vert =C\).

Now, we consider \(L_{n2}\). Recall that \(P_{\lambda }(0)=0\), applying the Taylor expansion approach to the penalty \(P_{\lambda _{1k}}(|\beta _k^*|)\) yields

$$\begin{aligned} L_{n2}= & {} n \sum _{k=1}^{p_0} { \left\{ P_{\lambda _{1k}}(|\beta _k^*|)-P_{\lambda _{1k}}(|\beta _k|) \right\} } \\\le & {} \sum _{k=1}^{p_0} { \left\{ n \delta _n P_{\lambda _{1k}}^\prime (|\beta _k|)\text{ sgn }(\beta _k)|\omega (k+1)|+ n \delta _n^2 P_{\lambda _{1k}}^{\prime \prime }(|\beta _k|)|\omega (k+1)|^2(1+o(1))\right\} } \\\le & {} \sqrt{p_0}n\delta _na_n\Vert \omega \Vert + n \delta _n^2 b_n \Vert \omega \Vert ^2, \end{aligned}$$

where the definitions of \(a_n\) and \(b_n\) are given in Sect. 3. Therefore, \(\delta _n^2\omega ^TV^TMV\omega /2\) also uniformly dominate \(L_{n2}\) in \(\Vert \omega \Vert =C\). With the same arguments, we can demonstrate that \(L_{n3}\) is also uniformly dominated by \(\delta _n^2\omega ^TV^TMV\omega /2\). Therefore, (12) holds for sufficiently large C. This means, with probability at least \(1-\eta \), there exists a local minimizer \(\hat{\theta }\) such that \(\Vert \hat{\theta }-\theta \Vert =O_p(\delta _n)\). This completes the proof.

(ii) For \(s=1,\ldots ,q\), it follows from result (i) and Eq. (11) that

$$\begin{aligned} \Vert \hat{\alpha }_s(\cdot )- \alpha _{s}(\cdot )\Vert ^2= & {} \int _0^1 \big ( \hat{\alpha }_s(\cdot )- \alpha _{s}(\cdot ) \big )^2 {\mathrm{d}}u \\= & {} \int _0^1 \big ( B(u)^T \hat{\gamma }_s - B(u)^T \hat{\gamma }_{s0} +r_s(u) \big )^2 {\mathrm{d}}u \\\le & {} 2 \int _0^1 \big ( B(u)^T \hat{\gamma }_s - B(u)^T \hat{\gamma }_{s0} +r_s(u) \big )^2 {\mathrm{d}}u + 2 \int _0^1 r_s(u)^2 {\mathrm{d}}u \\= & {} 2 \big ( \hat{\gamma }_s - \hat{\gamma }_{s0} \big )^T \left( \int B(u)B(u)^T {\mathrm{d}}u\right) \big ( \hat{\gamma }_s - \hat{\gamma }_{s0} \big ) + 2 \int _0^1 r_s(u)^2 {\mathrm{d}}u \\= & {} O_p(\delta _n^2)+O_p\big (k_n^{-2r}\big )=O_p\big (\delta _n^2\big ), \end{aligned}$$

where the penultimate equation holds due to \(B=\int B(u)B(u)^T {\mathrm{d}}u = O(1)\). Consequently, \(\Vert \hat{\alpha }_s(\cdot )- \alpha _{s}(\cdot )\Vert =O_p(\delta _n)=O_p(k_n^{-r}+a_n)\); this completes the proof. \(\square \)

Proof of Theorem 2

(i) Obviously, \(a_n\rightarrow 0\) as \(\lambda _{\max }\rightarrow 0\). Based on Theorem 1, it is sufficient to show that, for any \(\hat{\gamma }\) satisfying \(\Vert \hat{\gamma }-\gamma \Vert =O_p(k_n^{-r})\), \(\hat{\beta }_k\) satisfying \(\Vert \hat{\beta }_k-\beta _{k}\Vert =O_p(k_n^{-r})\), \(k=1,\ldots ,p_0\), and some given small \(\xi _n=Ck_n^{-r}\), with probability approaching to 1 as \(n\rightarrow \infty \), we have

$$\begin{aligned}&\frac{\partial L_n(\theta )}{\partial \beta _k}\Big |_{\hat{\beta }_k} >0,\quad \text{ for }\;0<\hat{\beta }_k<\xi _n,\; k=p_0+1,\ldots ,p, \end{aligned}$$
(16)
$$\begin{aligned}&\frac{\partial L_n(\theta )}{\partial \beta _k}\Big |_{\hat{\beta }_k}<0,\quad \text{ for }\;-\xi _n<\hat{\beta }_k<0,\;k=p_0+1,\ldots ,p. \end{aligned}$$
(17)

Thus, (16) and (17) imply that the minimizer of \(\partial L_n(\theta )\) achieves at \(\hat{\beta }_k=0\), \(k=p_0+1,\ldots ,p\).

For any matrix A, let \(A_{ij}\) be the (ij)th element, \(A_{i\cdot }\) and \(A_{\cdot j}\), respectively, be the ith row and jth column of A. In fact,

$$\begin{aligned} \frac{\partial L_n(\theta )}{\partial \beta _k}\Big |_{\hat{\beta }_k}= & {} -\sum _{i=1}^n{ (MV)_{ik}\big [Y_i-(MV)_{i\cdot }^T\hat{\theta }\big ] } + n P^\prime _{\lambda _{1k}}(|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k) \\= & {} -\sum _{i=1}^n{ (MV)_{ik}\big [Y_i-V_{i\cdot }^T\theta +V_{i\cdot }^T\theta -(MV)_{i\cdot }^T\hat{\theta }\big ] } + n P^\prime _{\lambda _{1k}}(|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k) \\= & {} -\sum _{i=1}^n{ (MV)_{ik}(\varepsilon _i+r_{ni}) } +\sum _{i=1}^n{ (MV)_{ik}\big [(MV)_{i}^T\hat{\theta }-V_{i\cdot }^T\theta \big ] }\\&+ n P^\prime _{\lambda _{1k}}(|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k) \\= & {} - V_{\cdot k}^T M \varepsilon -V_{\cdot k}^T M r_n + V_{\cdot k}^T M V (\hat{\theta }-\theta )+ nP^\prime _{\lambda _{1k}}(|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k), \end{aligned}$$

where the last equality holds since M is an idempotent matrix.

By a similar proof of Theorem 1, we can obtain

$$\begin{aligned} \frac{\partial L_n(\theta )}{\partial \beta _k}\Big |_{\hat{\beta }_k}= & {} O_p\big (nk_n^{-r}\big )+nP^\prime _{\lambda _{1k}}(|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k)\\= & {} n \lambda _{1k} \left\{ O_p\big (\big (\lambda _{1k}k_n^{r}\big )^{-1}\big ) + \lambda _{1k}^{-1}P^\prime _{\lambda _{1k}}(|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k) \right\} . \end{aligned}$$

Since \(O_p((\lambda _{1k}k_n^{r})^{-1})\rightarrow 0\) due to \(\lambda _{1k}k_n^{r}=\lambda _{1k}n^{r/(2r+1)}>\lambda _{\min }n^{r/(2r+1)}\rightarrow \infty \). By the Taylor expansion, assumptions (A1) and (A2), we have \(\lambda _{1k}^{-1}P^\prime _{\lambda _{1k}}(|\hat{\beta }_k|)>0\). Hence, the sign of \(\frac{\partial L_n(\theta )}{\partial \beta _k}\Big |_{\hat{\beta }_k}\) is completely determined by the sign of \(\hat{\beta }_k\), which implies the results of (16) and (17). This completes the proof.

(ii) Following the similar arguments as in the proof of (i), we can obtain that \(\hat{\gamma }_s=0\), \(s=q_0+1,\ldots ,q\), with probability approaching to 1 as \(n\rightarrow \infty \). Combining this with the fact that \(\sup _{u} \Vert B(u)\Vert =O(1)\) and \(\hat{\alpha }_s(u)=B(u)^T\hat{\gamma }_s\), the proof is immediately completed. \(\square \)

Proof of Theorem 3

Based on Theorems 1 and 2, we know that with probability approaching to 1 as \(n\rightarrow \infty \), \(L_n(\theta )\) achieves the minimal value at \(\hat{\rho }\), \((\hat{\beta }_I^T,0^T)^T\) and \((\hat{\gamma }_I^T,0^T)^T\). Define \(L_{n1}(\theta )=\partial L_{n}(\theta )/\partial \beta _I\) and \(L_{n2}(\theta )=\partial L_{n}(\theta )/\partial \gamma _I\), then \(\hat{\rho }\), \((\hat{\beta }_I^T,0^T)^T\) and \((\hat{\gamma }_I^T,0^T)^T\) must satisfy

$$\begin{aligned}&\frac{1}{n}L_{n1}\left( \hat{\rho },\left( \hat{\beta }_I^T,0^T\right) ^T,\left( \hat{\gamma }_I^T,0^T\right) ^T\right) \\&\quad =\frac{1}{n}\varXi _I^TM\big (Y-V_I\hat{\xi }-\varPi _I\hat{\gamma }_I\big ) + \sum _{k=1}^{p_0} { P_{\lambda _{1k}}^\prime (|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k) } =0, \\&\frac{1}{n}L_{n2}\left( \hat{\rho },\left( \hat{\beta }_I^T,0^T\right) ^T,\left( \hat{\gamma }_I^T,0^T\right) ^T\right) \\&\quad =\frac{1}{n} \varPi _I^TM(Y-V_I\hat{\xi }-\varPi _I\hat{\gamma }_I) + \sum _{s=1}^{q_0} { P_{\lambda _{2s}}(\Vert \hat{\gamma }_s\Vert _B) \frac{B\hat{\gamma }_s}{\Vert \hat{\gamma }_s\Vert _B} }=0, \end{aligned}$$

where \(V_I=(W(I-\rho W)^{-1}(X_I\beta _I+Z_I\alpha _I(U)+\varepsilon ),X_I)\), \(\xi =(\rho ,\beta _I)\) is defined in Sect. 3. By the Taylor expansion, we have

$$\begin{aligned} P_{\lambda _{1k}}^\prime (|\hat{\beta }_k|)=P_{\lambda _{1k}}^\prime (|\beta _k|) + P_{\lambda _{1k}}^{\prime \prime }(|\beta _k|)(1+o(1))(\hat{\beta }_k-\beta _k). \end{aligned}$$

Since \(P_{\lambda _{1k}}^{\prime \prime }(|\beta _k|)=o_p(1)\) by assumption (A1) and \(P_{\lambda _{1k}}^\prime (|\beta _k|)\rightarrow 0 \) as \(\lambda _{\max }\rightarrow 0\), then \(\sum _{k=1}^{p_0} { P_{\lambda _{1k}}^\prime (|\hat{\beta }_k|)\text{ sgn }(\hat{\beta }_k) }=o_p(1)\). Similarly, \(\sum _{s=1}^{q_0} { P_{\lambda _{2s}}(\Vert \hat{\gamma }_s\Vert _B) \frac{B\hat{\gamma }_s}{\Vert \hat{\gamma }_s\Vert _B} }=o_p(1)\). Based on some similar arguments as in the proof of Theorem 1, the above two equations can be written as

$$\begin{aligned}&\frac{1}{n}\varXi _I^TM\left( \varXi _I(\xi -\hat{\xi })-\varPi _I(\gamma _I-\hat{\gamma }_I)+r_n+\varepsilon \right) + o_p(1)=0, \end{aligned}$$
(18)
$$\begin{aligned}&\frac{1}{n}\varPi _I^TM\left( \varXi _I(\xi -\hat{\xi })-\varPi _I(\gamma _I-\hat{\gamma }_I)+r_{n}+\varepsilon \right) + o_p(1)=0, \end{aligned}$$
(19)

where \(\varXi _I=(W(I-\rho W)^{-1}(X_I\beta _I+Z_I\alpha _I(U)),X_I)\) is defined in assumption (A3).

Let \(\varPhi _n=\varPi _I^TM\varPi _I/n\) and \(\varPsi _n=\varPi _I^TM \varXi _I/n\), it follows from Eq. (19) that

$$\begin{aligned} \varPhi _n(\gamma _I-\hat{\gamma }_I)=\varPsi _n(\xi -\hat{\xi })+\frac{1}{n}\varPi _I^TM(r_{n}+\varepsilon )+o_p(1). \end{aligned}$$

That is,

$$\begin{aligned} \gamma _I-\hat{\gamma }_I=\varPhi _n^{-1}\varPsi _n(\xi -\hat{\xi })+\frac{1}{n}\varPhi _n^{-1}\varPi _I^TM(r_{n}+\varepsilon )+c. \end{aligned}$$
(20)

Let \(\varGamma _n=\varXi _I^TM\varXi _I/n\), inserting Eq. (20) into (18) leads to

$$\begin{aligned}&\varGamma _n(\xi -\hat{\xi })-\varPsi _n^T\left\{ \varPhi _n^{-1}\varPsi _n(\xi -\hat{\xi })+\frac{1}{n}\varPhi _n^{-1}\varPi _I^TM(r_{n}+\varepsilon )+o_p(1) \right\} \\&\qquad +\frac{1}{n} \varXi _I^TM (r_{n}+\varepsilon )+o_p(1)=0, \end{aligned}$$

which is equivalent to

$$\begin{aligned}&\left( \varGamma _n-\varPsi _n^T\varPhi _n^{-1}\varPsi _n\right) \sqrt{n}(\xi -\hat{\xi }) \\&\quad =\frac{1}{\sqrt{n}}\left( \varPsi _n^T\varPhi _n^{-1}\varPi _I^TM -\varXi _I^TM\right) \varepsilon +\frac{1}{\sqrt{n}}\left( \varPsi _n^T\varPhi _n^{-1}\varPi _I^TM-\varXi _I^TM\right) r_{n}+o_p(1). \end{aligned}$$

Note that \(\frac{1}{\sqrt{n}}(\varPsi _n^T\varPhi _n^{-1}\varPi _I^TM-\varXi _I^TM)r_{n}=o_p(1)\) from (11) and assumptions (A3), then

$$\begin{aligned} \left( \varGamma _n-\varPsi _n^T\varPhi _n^{-1}\varPsi _n\right) \sqrt{n}(\xi -\hat{\xi })=\frac{1}{\sqrt{n}}\left( \varPsi _n^T\varPhi _n^{-1}\varPi _I^TM -\varXi _I^TM\right) \varepsilon +o_p(1). \end{aligned}$$

Based on the slutsky’s theorem and central limit theorem, we have

$$\begin{aligned} \sqrt{n}(\xi -\hat{\xi })~\mathop \rightarrow \limits ^d~ N\left( 0,~ \sigma ^2 (\varGamma -\varPsi \varPhi ^{-1}\varPsi )^{-1} \right) . \end{aligned}$$

This completes the proof of Theorem 3. \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, F., Yang, J. & Lu, X. One-step oracle procedure for semi-parametric spatial autoregressive model and its empirical application to Boston housing price data. Empir Econ 62, 2645–2671 (2022). https://doi.org/10.1007/s00181-021-02118-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00181-021-02118-z

Keywords

Navigation