Abstract
The linear model with spatial interaction has attracted huge attention in the past several decades. Different from most existing research which focuses on its estimation, we study its variable selection problem using the adaptive lasso. Our results show that the method can identify the true model consistently, and the resulting estimator can be efficient as the oracle estimator which is obtained when the zero coefficients in the model are known. Simulation studies show that the proposed methods perform very well.
Similar content being viewed by others
Notes
For matrix \(A=(a_{ij})_{1\le i,j \le n}\), its row matrix norm is defined by \(\Vert A\Vert =\max _{1\le i\le n} \sum \nolimits _{j=1}^n |a_{ij}|\), and its column matrix norm is defined by \(\Vert A\Vert =\max _{1\le j\le n} \sum \nolimits _{i=1}^n |a_{ij}|\).
References
Breiman L (1995) Better subset regression using the nonnegative garrot. Technometrics 37:373–384
Case AC (1991) Spatial patterns in household demand. Economics 44:438–467
Cliff AD, Ord JK (1973) Spatial autocorrelation. Pion Ltd., London
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Geyer C (1994) On the asymptotics of constrained M-esimation. Ann Stat 22:1993–2010
Jencks C, Mayer S (1990) The social consequences of growing up in a poor neighborhood. In: Lynn LE, McGeary MGH (eds) Inner-city poverty in the United States. National Academy, Washington
Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbance. J Real Estate Finance Econ 17:99–121
Kelejian HH, Prucha IR (1999) A generalized moments estimator for the autoregressive parameter in a spatial model. Int Econ Rev 40:509–533
Kelejian HH, Prucha IR (2010) Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J Econom 157:53–67
Knight K, Fu W (2000) Asymptotics for lasso-type estimators. Ann Stat 28:1356–1378
Lee LF (2003) Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econom Rev 22:307–335
Lee LF (2004) Asymptotic distribution of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72:1899–1925
Lee LF (2007) GMM and 2SLS estimation of mixed regressive spatial autoregressive models. J Econom 137:489–514
Ord JK (1975) Estimation methods for models of spatial interaction. J Am Stat Assoc 70:120–126
Smirnov O, Anselin L (2001) Fast maximum likelihood estimation of very large spatial autoregressive models: A characteristic polynomial approach. Comput Stat Data Anal 35:301–319
Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33
Sun Y, Yan H, Zhang W, Lu Z (2014) A semiparametric spatial dynamic model. Ann Stat 42:700–727
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Statl Assoc SerB 58:267–288
Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757
Zhang Y, Li R, Tsai C (2010) Regularization parameter selections via generalized information criterion. J Am Stat Assoc 105:312–323
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Acknowledgments
This work was partially supported by National Science Foundation of China (Grant 11271242), Key Laboratory of Mathematical Economics (SUFE), Ministry of Education of China, Program for Changjiang Scholars and Innovative Research Team in SUFE (IRT13077) and Shanghai Municipal Science and Technology Research Project (14DZ1201900).
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In this section, we will prove the Theorem. We first establish two fundamental Lemmas.
Lemma 1
Under Assumptions 1–5, \({\varvec{\theta }}_0=({\varvec{\beta }}_0^T,~ \rho _0, ~\sigma _0^2)^T\) is globally identifiable and \(\hat{{\varvec{\theta }}}_n=(\hat{{\varvec{\beta }}}(\hat{\rho })^T, ~\hat{\rho }, ~\hat{\sigma }(\hat{\rho })^2)^T\) is a consistent estimator of \({\varvec{\theta }}_0\).
The proof is similar to Theroem 3.1 of Lee (2004) and here is omitted.
Lemma 2
Under Assumptions 1–5, we have
\(-\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\mathop {\longrightarrow }\limits ^{P}\Sigma \) and \(\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}} \mathop {\longrightarrow }\limits ^{D}N(0,\Sigma +\varOmega )\).
Proof
By the proof of Theorem 3.2 of Lee (2004), we know that
and
The only difference is that Lee (2004) assumed that \(X_i\) were constant regressors while we assume that \(X_i\) can be random variables. Based on the results of Lee (2004), we have
Therefore,
and combing (6.1) we can conclude that \(-\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\mathop {\longrightarrow }\limits ^{P}\Sigma \).
On the other hand, from the results of Lee (2004) we also have
where
and \(G_i\) is the ith row of G. Since \(\varOmega _n\) is a symmetric matrix and here we only list the lower triangular part. It is obvious that \(\lim _{n\rightarrow \infty }E_X\varOmega _n=\varOmega \), which leads to that \(\lim _{n\rightarrow \infty }E\left( \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}^T}\right) =\Sigma +\varOmega \). Substituting it to (6.2), we know that the proof of Lemma 2 is finished. \(\square \)
Proof of Theorem 1
The proof follows a similar outline to that of Zou (2006). Let \({\varvec{\beta }}={\varvec{\beta }}_0+\text{ u }/\sqrt{n}\) and
Define \(\hat{\text{ u }}^{(n)}=\text{ argmin }\varPsi _n(\text{ u })\), then \(\tilde{{\varvec{\beta }}}={\varvec{\beta }}_0+\hat{\text{ u }}^{(n)}/\sqrt{n}\) or \(\hat{\text{ u }}^{(n)}=\sqrt{n}(\tilde{{\varvec{\beta }}}-{\varvec{\beta }}_0)\). By straightforward calculations, we have
and
By taking the Taylor expansion of \(\partial \log L(\hat{{\varvec{\theta }}})/\partial {\varvec{\theta }}\) at \({\varvec{\theta }}={\varvec{\theta }}_0\), we can get that
where \({\varvec{\theta }}^*\) is between \({\varvec{\theta }}_0\) and \(\hat{{\varvec{\theta }}}\), and it is obvious that \({\varvec{\theta }}^*\mathop {\longrightarrow }\limits ^{P}{\varvec{\theta }}_0\) because \(\hat{{\varvec{\theta }}}\mathop {\longrightarrow }\limits ^{P}{\varvec{\theta }}_0\). (6.6) is equivalent to that
Let
where we use notations \((\eta ~k_1~k_2)\) to denote the second row of the matrix K. Following Lemma 2 we know that \(K\mathop {\longrightarrow }\limits ^{P}\Sigma ^{-1}\), where
By (2.2) and through some calculation we can get that
Combing (6.7), (6.8) and (6.10) we can obtain that
It implies that
From (6.4), (6.5), (6.10) and (6.11), we can get that
where \(D=\frac{X^TGX{\varvec{\beta }}_0}{n}\big [\eta -\sigma _0^2\frac{(GX{\varvec{\beta }}_0)^TX}{n}\big [\frac{X^TGX{\varvec{\beta }}_0}{n} \frac{(GX{\varvec{\beta }}_0)^TX}{n}\big ]^{-1}, ~k_1,~k_2\big ].\) From (6.9) and the fact that \(\frac{X^TGX{\varvec{\beta }}_0}{n}\mathop {\longrightarrow }\limits ^{P}\text{ v }\), it is easy to obtain that \(D\mathop {\longrightarrow }\limits ^{P}B\). Combing this with the result from Lemma 2, we have
Moreover, following Lemma 1 we know that \(\hat{\rho }-\rho _0\mathop {\longrightarrow }\limits ^{P}0\) and \(X^TG\epsilon /\sqrt{n}\) converges in distribution to some normal distribution, which leads to that
Therefore, from (6.12), (6.13) and (6.14) we have
Next we consider the last term of (6.3). If \(\beta _{0j}\ne 0\) (i.e., \(j\in \mathcal {A}\)), then
under the assumption that \(a_nn^{-1/2}\rightarrow 0\) as n goes to infinity. If \(\beta _{0j}=0\) (i.e., \(j\in \mathcal {A}^c\)), under the assumption that \(b_nn^{-1/2}\rightarrow \infty \), then we have
Hence, it is obvious that as n goes to infinity,
Let \(\text{ u }=\left( \begin{array}{c}\text{ u }_{\mathcal {A}}\\ \text{ u }_{\mathcal {A}^c}\end{array}\right) \) and \({\varvec{\xi }}=\left( \begin{array}{c}{\varvec{\xi }}_{\mathcal {A}}\\ {\varvec{\xi }}_{\mathcal {A}^c}\end{array}\right) \). From (6.3), (6.15), (6.16) and the fact that \(X^TX/n\mathop {\longrightarrow }\limits ^{P}\varPhi \) we can get that \(\varPsi _n(\text{ u })-\varPsi _n(0)\mathop {\longrightarrow }\limits ^{D}G(\text{ u })\), where
We see that \(G(\text{ u })\) has the unique minimum at \(\left( {\varvec{\xi }}_{\mathcal {A}}^T\varPhi _1^{-1},~ \text{0 }_{1\times (p-p_0)}\right) ^T\). Follow the epi-convergence results of Geyer (1994) and Knight and Fu (2000), we then have \(\hat{\text{ u }}_{\mathcal {A}}^{(n)}\mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}\) and \(\hat{\text{ u }}_{\mathcal {A}^c}^{(n)}\mathop {\longrightarrow }\limits ^{D}\text{0 }\). That is \(\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}}-{\varvec{\beta }}_{0\mathcal {A}})\mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}\) and (i) is proved.
Now we prove the consistency part. For any \(j\in \mathcal {A}\) (i.e., \(\beta _{0j}\ne 0\)), by the result of (i) we know that \(\tilde{\beta }_j\mathop {\longrightarrow }\limits ^{P}\beta _{0j}\) when \(n\rightarrow \infty \). Hence, \(P(\tilde{\beta }_j\ne 0)=P(j\in \mathcal {A}_n)\rightarrow 1\) as \(n\rightarrow \infty \). That is
Next we are going to prove that \(P(\mathcal {A}\supseteq \mathcal {A}_n)\rightarrow 1\). In fact, consider the event \(j'\in \mathcal {A}_n\) which means that \(\tilde{\beta }_{j'}\ne 0\). By the Karush–Kuhn–Tucker (KKT) optimality condition, we have
where \(\tilde{X}_j\) denotes the ith column of X. It is equivalent to
Because
Observe that
where \(\varPhi _{j'k}\) is the \(j'\times k\)th element of matrix \(\varPhi \). By the arguments of the proof of result (i) we know that \(\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}}-{\varvec{\beta }}_{0\mathcal {A}}) \mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}\) and \(\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}^c}-{\varvec{\beta }}_{0\mathcal {A}^c})\mathop {\longrightarrow }\limits ^{D}0\). So it can conclude that \(\frac{\tilde{X}_{j'}^TX(\tilde{{\varvec{\beta }}}-{\varvec{\beta }}_0)}{\sqrt{n}}\) converges in distribution to some normal distribution. Using the similar arguments to get \(\left[ \frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^T\mathop {\longrightarrow }\limits ^{D}{\varvec{\xi }}\), it is not difficult to get that \(\frac{[A(\hat{\rho })Y-X{\varvec{\beta }}_0]^T\tilde{X}_{j'}}{\sqrt{n}}\) converges in distribution to some normal distribution. By the assumptions of the Theorem 1, we know that for any \(j'\in \mathcal {A}^c\) we have \(\lambda _{j'}/\sqrt{n}\ge b_n/\sqrt{n}\rightarrow \infty \) and hence
That is \(P(\mathcal {A}^c\subseteq \mathcal {A}_n^c)\rightarrow 1\), which is equivalent that \(P(\mathcal {A}\supseteq \mathcal {A}_n)\rightarrow 1\). Combining this with (6.17), we get the result (ii).
Rights and permissions
About this article
Cite this article
Wu, Y., Sun, Y. Shrinkage estimation of the linear model with spatial interaction. Metrika 80, 51–68 (2017). https://doi.org/10.1007/s00184-016-0590-z
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00184-016-0590-z