Skip to main content
Log in

Shrinkage estimation of the linear model with spatial interaction

  • Published:
Metrika Aims and scope Submit manuscript

Abstract

The linear model with spatial interaction has attracted huge attention in the past several decades. Different from most existing research which focuses on its estimation, we study its variable selection problem using the adaptive lasso. Our results show that the method can identify the true model consistently, and the resulting estimator can be efficient as the oracle estimator which is obtained when the zero coefficients in the model are known. Simulation studies show that the proposed methods perform very well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. For matrix \(A=(a_{ij})_{1\le i,j \le n}\), its row matrix norm is defined by \(\Vert A\Vert =\max _{1\le i\le n} \sum \nolimits _{j=1}^n |a_{ij}|\), and its column matrix norm is defined by \(\Vert A\Vert =\max _{1\le j\le n} \sum \nolimits _{i=1}^n |a_{ij}|\).

References

  • Breiman L (1995) Better subset regression using the nonnegative garrot. Technometrics 37:373–384

    Article  MathSciNet  MATH  Google Scholar 

  • Case AC (1991) Spatial patterns in household demand. Economics 44:438–467

    Google Scholar 

  • Cliff AD, Ord JK (1973) Spatial autocorrelation. Pion Ltd., London

    Google Scholar 

  • Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360

    Article  MathSciNet  MATH  Google Scholar 

  • Geyer C (1994) On the asymptotics of constrained M-esimation. Ann Stat 22:1993–2010

    Article  MathSciNet  MATH  Google Scholar 

  • Jencks C, Mayer S (1990) The social consequences of growing up in a poor neighborhood. In: Lynn LE, McGeary MGH (eds) Inner-city poverty in the United States. National Academy, Washington

    Google Scholar 

  • Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbance. J Real Estate Finance Econ 17:99–121

    Article  Google Scholar 

  • Kelejian HH, Prucha IR (1999) A generalized moments estimator for the autoregressive parameter in a spatial model. Int Econ Rev 40:509–533

    Article  MathSciNet  Google Scholar 

  • Kelejian HH, Prucha IR (2010) Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J Econom 157:53–67

    Article  MathSciNet  MATH  Google Scholar 

  • Knight K, Fu W (2000) Asymptotics for lasso-type estimators. Ann Stat 28:1356–1378

    Article  MathSciNet  MATH  Google Scholar 

  • Lee LF (2003) Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econom Rev 22:307–335

    Article  MathSciNet  MATH  Google Scholar 

  • Lee LF (2004) Asymptotic distribution of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72:1899–1925

    Article  MathSciNet  MATH  Google Scholar 

  • Lee LF (2007) GMM and 2SLS estimation of mixed regressive spatial autoregressive models. J Econom 137:489–514

    Article  MathSciNet  MATH  Google Scholar 

  • Ord JK (1975) Estimation methods for models of spatial interaction. J Am Stat Assoc 70:120–126

    Article  MathSciNet  MATH  Google Scholar 

  • Smirnov O, Anselin L (2001) Fast maximum likelihood estimation of very large spatial autoregressive models: A characteristic polynomial approach. Comput Stat Data Anal 35:301–319

    Article  MathSciNet  MATH  Google Scholar 

  • Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33

    Article  MathSciNet  MATH  Google Scholar 

  • Sun Y, Yan H, Zhang W, Lu Z (2014) A semiparametric spatial dynamic model. Ann Stat 42:700–727

    Article  MathSciNet  MATH  Google Scholar 

  • Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Statl Assoc SerB 58:267–288

    MathSciNet  MATH  Google Scholar 

  • Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757

    Article  MathSciNet  MATH  Google Scholar 

  • Zhang Y, Li R, Tsai C (2010) Regularization parameter selections via generalized information criterion. J Am Stat Assoc 105:312–323

    Article  MathSciNet  MATH  Google Scholar 

  • Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

This work was partially supported by National Science Foundation of China (Grant 11271242), Key Laboratory of Mathematical Economics (SUFE), Ministry of Education of China, Program for Changjiang Scholars and Innovative Research Team in SUFE (IRT13077) and Shanghai Municipal Science and Technology Research Project (14DZ1201900).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yueqin Wu.

Appendix

Appendix

In this section, we will prove the Theorem. We first establish two fundamental Lemmas.

Lemma 1

Under Assumptions 15, \({\varvec{\theta }}_0=({\varvec{\beta }}_0^T,~ \rho _0, ~\sigma _0^2)^T\) is globally identifiable and \(\hat{{\varvec{\theta }}}_n=(\hat{{\varvec{\beta }}}(\hat{\rho })^T, ~\hat{\rho }, ~\hat{\sigma }(\hat{\rho })^2)^T\) is a consistent estimator of \({\varvec{\theta }}_0\).

The proof is similar to Theroem 3.1 of Lee (2004) and here is omitted.

Lemma 2

Under Assumptions 15, we have

\(-\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\mathop {\longrightarrow }\limits ^{P}\Sigma \)      and       \(\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}} \mathop {\longrightarrow }\limits ^{D}N(0,\Sigma +\varOmega )\).

Proof

By the proof of Theorem 3.2 of Lee (2004), we know that

$$\begin{aligned} -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\mathop {\longrightarrow }\limits ^{P}\lim _{n\rightarrow \infty }E\left( -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right) \end{aligned}$$
(6.1)

and

$$\begin{aligned} \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\mathop {\longrightarrow }\limits ^{D}N\left( 0,\lim _{n\rightarrow \infty }E\left( \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}^T}\right) \right) . \end{aligned}$$
(6.2)

The only difference is that Lee (2004) assumed that \(X_i\) were constant regressors while we assume that \(X_i\) can be random variables. Based on the results of Lee (2004), we have

$$\begin{aligned}&E\left( \left. -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right| X\right) \\&\quad =\left( \begin{array}{ccc} \frac{1}{n\sigma _0^2}X^TX &{}\quad \frac{1}{n\sigma _0^2}X^T(GX{\varvec{\beta }}_0) &{}\quad \text{0 }_{p\times 1} \\ \frac{1}{n\sigma _0^2}(GX{\varvec{\beta }}_0)^TX &{}\quad \frac{1}{n\sigma _0^2}(GX{\varvec{\beta }}_0)^TGX{\varvec{\beta }}_0+\frac{1}{n}tr((G+G^T)G) &{}\quad \frac{1}{n\sigma _0^2}tr(G)\\ \text{0 }_{1\times p} &{}\quad \frac{1}{n\sigma _0^2}tr(G) &{}\quad \frac{1}{2\sigma _0^4} \end{array} \right) . \end{aligned}$$

Therefore,

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }E\left( -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right) =\lim \limits _{n\rightarrow \infty }E\left( E\left( \left. -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right| X\right) \right) =\Sigma , \end{aligned}$$

and combing (6.1) we can conclude that \(-\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\mathop {\longrightarrow }\limits ^{P}\Sigma \).

On the other hand, from the results of Lee (2004) we also have

$$\begin{aligned} E\left( \left. \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}^T}\right| X\right) =-E\left( \left. \frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right| X\right) +\varOmega _n, \end{aligned}$$

where

$$\begin{aligned} \varOmega _n=\left( \begin{array}{ccc} \text{0 }_{p\times p} &{}\quad * &{}\quad * \\ \frac{\mu _3}{n\sigma _0^4}\sum \limits _{i=1}^ng_{ii}X_i^T &{}\quad \frac{2\mu _3}{n\sigma _0^4}\sum \limits _{i=1}^ng_{ii}G_iX{\varvec{\beta }}_0+ \frac{\mu _4-3\sigma _0^4}{n\sigma _0^4}\sum \limits _{i=1}^ng_{ii}^2 &{}\quad *\\ \frac{\mu _3}{2n\sigma _0^6}\sum \limits _{i=1}^nX_i^T &{}\quad \frac{\mu _3\varvec{1}_n^TGX{\varvec{\beta }}_0+(\mu _4-3\sigma _0^4)tr(G)}{2n\sigma _0^6} &{}\quad \frac{\mu _4-3\sigma _0^4}{4\sigma _0^8} \end{array} \right) \end{aligned}$$

and \(G_i\) is the ith row of G. Since \(\varOmega _n\) is a symmetric matrix and here we only list the lower triangular part. It is obvious that \(\lim _{n\rightarrow \infty }E_X\varOmega _n=\varOmega \), which leads to that \(\lim _{n\rightarrow \infty }E\left( \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}^T}\right) =\Sigma +\varOmega \). Substituting it to (6.2), we know that the proof of Lemma 2 is finished. \(\square \)

Proof of Theorem 1

The proof follows a similar outline to that of Zou (2006). Let \({\varvec{\beta }}={\varvec{\beta }}_0+\text{ u }/\sqrt{n}\) and

$$\begin{aligned} \varPsi _n(\text{ u })= & {} [A(\hat{\rho })Y-X({\varvec{\beta }}_0+\text{ u }/\sqrt{n})]^T [A(\hat{\rho })Y-X ({\varvec{\beta }}_0+\text{ u }/\sqrt{n})]\\&+\,\sum \limits _{j=1}^p\lambda _j| \beta _{0j}+u_j/\sqrt{n}|. \end{aligned}$$

Define \(\hat{\text{ u }}^{(n)}=\text{ argmin }\varPsi _n(\text{ u })\), then \(\tilde{{\varvec{\beta }}}={\varvec{\beta }}_0+\hat{\text{ u }}^{(n)}/\sqrt{n}\) or \(\hat{\text{ u }}^{(n)}=\sqrt{n}(\tilde{{\varvec{\beta }}}-{\varvec{\beta }}_0)\). By straightforward calculations, we have

$$\begin{aligned} \varPsi _n(\text{ u })-\varPsi _n(\text{0 })= & {} \frac{\text{ u }^TX^TX\text{ u }}{n}-2 \frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX\text{ u }}{\sqrt{n}}\nonumber \\&+ \sum _{j=1}^p\lambda _j(|\beta _{0j}+\frac{u_j}{\sqrt{n}}|-|\beta _{0j}|), \end{aligned}$$
(6.3)
$$\begin{aligned} \left[ -\frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^ T=\left[ -\frac{Y^T(A(\hat{\rho })-A(\rho _0))^TX}{\sqrt{n}}\right] ^T- \frac{X^T\epsilon }{\sqrt{n}} \end{aligned}$$
(6.4)

and

$$\begin{aligned} -\frac{Y^T(A(\hat{\rho })-A(\rho _0))^TX}{\sqrt{n}}=\sqrt{n}(\hat{\rho }-\rho _0) \frac{(GX{\varvec{\beta }}_0)^TX}{n}+(\hat{\rho }-\rho _0)\frac{\epsilon ^TG^TX}{\sqrt{n}}.\nonumber \\ \end{aligned}$$
(6.5)

By taking the Taylor expansion of \(\partial \log L(\hat{{\varvec{\theta }}})/\partial {\varvec{\theta }}\) at \({\varvec{\theta }}={\varvec{\theta }}_0\), we can get that

$$\begin{aligned} 0=\frac{\partial \log L(\hat{{\varvec{\theta }}})}{\partial {\varvec{\theta }}}=\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}+\frac{\partial ^2\log L({\varvec{\theta }}^*)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}(\hat{{\varvec{\theta }}}-{\varvec{\theta }}_0), \end{aligned}$$
(6.6)

where \({\varvec{\theta }}^*\) is between \({\varvec{\theta }}_0\) and \(\hat{{\varvec{\theta }}}\), and it is obvious that \({\varvec{\theta }}^*\mathop {\longrightarrow }\limits ^{P}{\varvec{\theta }}_0\) because \(\hat{{\varvec{\theta }}}\mathop {\longrightarrow }\limits ^{P}{\varvec{\theta }}_0\). (6.6) is equivalent to that

$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\theta }}}-{\varvec{\theta }}_0)=-\left[ \frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}^*)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T} \right] ^{-1}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}. \end{aligned}$$
(6.7)

Let

$$\begin{aligned} K=-\left[ \frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}^*)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T} \right] ^{-1}\mathop {=}\limits ^{\varDelta }\left( \begin{array}{ccc} * &{}\quad * &{}\quad *\\ \eta &{}\quad k_1 &{}\quad k_2 \\ * &{}\quad * &{}\quad * \end{array} \right) , \end{aligned}$$
(6.8)

where we use notations \((\eta ~k_1~k_2)\) to denote the second row of the matrix K. Following Lemma 2 we know that \(K\mathop {\longrightarrow }\limits ^{P}\Sigma ^{-1}\), where

$$\begin{aligned} \Sigma ^{-1}= & {} \left( \begin{array}{ccc} \varPhi ^{-1}+\varPhi ^{-1}\text{ v }\text{ v }^T\varPhi ^{-1}/\gamma _2 &{}\quad -\varPhi ^{-1}\text{ v }/ \gamma _2 &{}\quad 2\sigma _0^2\pi _2\varPhi ^{-1}\text{ v }/\gamma _2 \\ -\text{ v }^{T}\varPhi ^{-1}/\gamma _2 &{}\quad 1/\gamma _2 &{}\quad -2\pi _2\sigma _0^2/\gamma _2\\ 2\sigma _0^2\pi _2\text{ v }^T\varPhi ^{-1}/\gamma _2 &{}\quad -2\pi _2\sigma _0^2/ \gamma _2 &{}\quad 2\sigma _0^2\gamma _1/\gamma _2 \end{array} \right) \sigma _0^2. \, \hbox {It means that}\nonumber \\&\quad \eta \mathop {\longrightarrow }\limits ^{P}-\frac{\text{ v }^{T}\varPhi ^{-1}}{\gamma _2}\sigma _0^2,~~~~~~ k_1\mathop {\longrightarrow }\limits ^{P}\frac{\sigma _0^2}{\gamma _2},~~~~~~~~~k_2\mathop {\longrightarrow }\limits ^{P}-\frac{2\pi _2}{\gamma _2}\sigma _0^4. \end{aligned}$$
(6.9)

By (2.2) and through some calculation we can get that

$$\begin{aligned} \frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}=\left( \begin{array}{c} \frac{X^T\epsilon }{\sigma _0^2} \\ -tr(G)+\frac{\epsilon ^TGX{\varvec{\beta }}_0}{\sigma _0^2}+\frac{\epsilon ^TG\epsilon }{\sigma _0^2}\\ \frac{\epsilon ^T\epsilon -n\sigma _0^2}{2\sigma _0^4} \end{array} \right) . \end{aligned}$$
(6.10)

Combing (6.7), (6.8) and (6.10) we can obtain that

$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\theta }}}-{\varvec{\theta }}_0)=\left( \begin{array}{c} \sqrt{n}(\hat{{\varvec{\beta }}}-{\varvec{\beta }}_0) \\ \sqrt{n}(\hat{\rho }-\rho _0)\\ \sqrt{n}(\hat{\sigma }^2-\sigma _0^2) \end{array} \right) =\left( \begin{array}{ccc} * &{}\quad * &{}\quad *\\ \eta &{}\quad k_1 &{}\quad k_2 \\ * &{}\quad * &{}\quad * \end{array} \right) \left( \begin{array}{c} \frac{X^T\epsilon }{\sqrt{n}\sigma _0^2} \\ -\frac{tr(G)}{\sqrt{n}}+\frac{\epsilon ^TGX {\varvec{\beta }}_0}{\sqrt{n}\sigma _0^2}+\frac{\epsilon ^TG\epsilon }{\sqrt{n}\sigma _0^2}\\ \frac{\epsilon ^T\epsilon -n\sigma _0^2}{2\sqrt{n}\sigma _0^4} \end{array} \right) . \end{aligned}$$

It implies that

$$\begin{aligned} \sqrt{n}(\hat{\rho }-\rho _0)=\eta \frac{X^T\epsilon }{\sqrt{n} \sigma _0^2}+k_1\left[ -\frac{tr(G)}{\sqrt{n}}+ \frac{\epsilon ^TGX{\varvec{\beta }}_0}{\sqrt{n}\sigma _0^2}+\frac{\epsilon ^TG\epsilon }{\sqrt{n}\sigma _0^2}\right] +k_2\frac{\epsilon ^T\epsilon -n \sigma _0^2}{2\sqrt{n}\sigma _0^4}.\nonumber \\ \end{aligned}$$
(6.11)

From (6.4), (6.5), (6.10) and (6.11), we can get that

$$\begin{aligned} \left[ -\frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^T =D\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}} +(\hat{\rho }-\rho _0)\frac{X^TG\epsilon }{\sqrt{n}} \end{aligned}$$
(6.12)

where \(D=\frac{X^TGX{\varvec{\beta }}_0}{n}\big [\eta -\sigma _0^2\frac{(GX{\varvec{\beta }}_0)^TX}{n}\big [\frac{X^TGX{\varvec{\beta }}_0}{n} \frac{(GX{\varvec{\beta }}_0)^TX}{n}\big ]^{-1}, ~k_1,~k_2\big ].\) From (6.9) and the fact that \(\frac{X^TGX{\varvec{\beta }}_0}{n}\mathop {\longrightarrow }\limits ^{P}\text{ v }\), it is easy to obtain that \(D\mathop {\longrightarrow }\limits ^{P}B\). Combing this with the result from Lemma 2, we have

$$\begin{aligned} D\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\mathop {\longrightarrow }\limits ^{D}N(0, B(\Sigma +\varOmega )B^T). \end{aligned}$$
(6.13)

Moreover, following Lemma 1 we know that \(\hat{\rho }-\rho _0\mathop {\longrightarrow }\limits ^{P}0\) and \(X^TG\epsilon /\sqrt{n}\) converges in distribution to some normal distribution, which leads to that

$$\begin{aligned} (\hat{\rho }-\rho _0)\frac{X^TG\epsilon }{\sqrt{n}}\mathop {\longrightarrow }\limits ^{P}0. \end{aligned}$$
(6.14)

Therefore, from (6.12), (6.13) and (6.14) we have

$$\begin{aligned} \left[ \frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^T\mathop {\longrightarrow }\limits ^{D}{\varvec{\xi }}\sim N(0,B(\Sigma +\varOmega )B^T). \end{aligned}$$
(6.15)

Next we consider the last term of (6.3). If \(\beta _{0j}\ne 0\) (i.e., \(j\in \mathcal {A}\)), then

$$\begin{aligned} \left| \lambda _j(|\beta _{0j}+\frac{u_j}{\sqrt{n}}|-|\beta _{0j}|)\right| =\frac{|u_j|}{\sqrt{n}}\lambda _j\le a_nn^{-1/2}|u_j|\rightarrow 0 \end{aligned}$$

under the assumption that \(a_nn^{-1/2}\rightarrow 0\) as n goes to infinity. If \(\beta _{0j}=0\) (i.e., \(j\in \mathcal {A}^c\)), under the assumption that \(b_nn^{-1/2}\rightarrow \infty \), then we have

$$\begin{aligned} \lambda _j(|\beta _{0j}+\frac{u_j}{\sqrt{n}}|-|\beta _{0j}|)=\frac{|u_j|}{\sqrt{n}}\lambda _j\ge b_nn^{-1/2}|u_j|\rightarrow \left\{ \begin{array}{cc} \infty , &{}\quad \text {if}~~~ u_j\ne 0\\ 0, &{}\quad \text {if}~~~ u_j=0 \end{array} \right. . \end{aligned}$$

Hence, it is obvious that as n goes to infinity,

$$\begin{aligned} \sum \limits _{j=1}^p\lambda _j(|\beta _{0j}+\frac{u_j}{\sqrt{n}}|-| \beta _{0j}|)\rightarrow \left\{ \begin{array}{ll} 0, &{}\quad \text {if}~~~ u_j= 0 ~~~\text {for all} j\notin \mathcal {A}\\ \infty , &{}\quad \text {otherwise} \end{array} \right. . \end{aligned}$$
(6.16)

Let \(\text{ u }=\left( \begin{array}{c}\text{ u }_{\mathcal {A}}\\ \text{ u }_{\mathcal {A}^c}\end{array}\right) \) and \({\varvec{\xi }}=\left( \begin{array}{c}{\varvec{\xi }}_{\mathcal {A}}\\ {\varvec{\xi }}_{\mathcal {A}^c}\end{array}\right) \). From (6.3), (6.15), (6.16) and the fact that \(X^TX/n\mathop {\longrightarrow }\limits ^{P}\varPhi \) we can get that \(\varPsi _n(\text{ u })-\varPsi _n(0)\mathop {\longrightarrow }\limits ^{D}G(\text{ u })\), where

$$\begin{aligned} G(\text{ u })=\left\{ \begin{array}{ll} \text{ u }_{\mathcal {A}}^T\varPhi _1\text{ u }_{\mathcal {A}}-2{\varvec{\xi }}_{\mathcal {A}}^T\text{ u }_{\mathcal {A}}, &{}\quad \text {if}~~~ \text{ u }_{\mathcal {A}^c}=0 \\ \infty , &{}\quad \text {otherwise} \end{array} \right. . \end{aligned}$$

We see that \(G(\text{ u })\) has the unique minimum at \(\left( {\varvec{\xi }}_{\mathcal {A}}^T\varPhi _1^{-1},~ \text{0 }_{1\times (p-p_0)}\right) ^T\). Follow the epi-convergence results of Geyer (1994) and Knight and Fu (2000), we then have \(\hat{\text{ u }}_{\mathcal {A}}^{(n)}\mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}\) and \(\hat{\text{ u }}_{\mathcal {A}^c}^{(n)}\mathop {\longrightarrow }\limits ^{D}\text{0 }\). That is \(\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}}-{\varvec{\beta }}_{0\mathcal {A}})\mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}\) and (i) is proved.

Now we prove the consistency part. For any \(j\in \mathcal {A}\) (i.e., \(\beta _{0j}\ne 0\)), by the result of (i) we know that \(\tilde{\beta }_j\mathop {\longrightarrow }\limits ^{P}\beta _{0j}\) when \(n\rightarrow \infty \). Hence, \(P(\tilde{\beta }_j\ne 0)=P(j\in \mathcal {A}_n)\rightarrow 1\) as \(n\rightarrow \infty \). That is

$$\begin{aligned} P(\mathcal {A}\subseteq \mathcal {A}_n)\rightarrow 1 ~~~\text {as}~~~ n\rightarrow \infty . \end{aligned}$$
(6.17)

Next we are going to prove that \(P(\mathcal {A}\supseteq \mathcal {A}_n)\rightarrow 1\). In fact, consider the event \(j'\in \mathcal {A}_n\) which means that \(\tilde{\beta }_{j'}\ne 0\). By the Karush–Kuhn–Tucker (KKT) optimality condition, we have

$$\begin{aligned} -2\tilde{X}_{j'}^T(A(\hat{\rho })Y-X\tilde{{\varvec{\beta }}})+\lambda _{j'}sgn (\tilde{\beta }_{j'})=0, \end{aligned}$$

where \(\tilde{X}_j\) denotes the ith column of X. It is equivalent to

$$\begin{aligned} 2\tilde{X}_{j'}^T(A(\hat{\rho })Y-X\tilde{{\varvec{\beta }}})=\lambda _{j'}sgn(\tilde{\beta }_{j'}). \end{aligned}$$

Because

$$\begin{aligned} \frac{\tilde{X}_{j'}^T(A(\hat{\rho })Y-X\tilde{{\varvec{\beta }}})}{\sqrt{n}}=\frac{\tilde{X}_{j'}^T(A(\hat{\rho })Y-X{\varvec{\beta }}_0)}{\sqrt{n}} -\frac{\tilde{X}_{j'}^TX(\tilde{{\varvec{\beta }}}-{\varvec{\beta }}_0)}{\sqrt{n}}. \end{aligned}$$
(6.18)

Observe that

$$\begin{aligned} \frac{\tilde{X}_{j'}^TX}{n}=\left( \frac{\tilde{X}_{j'}^T \tilde{X}_1}{n}~~\cdots ~~\frac{\tilde{X}_{j'}^T\tilde{X}_p}{n}\right) \mathop {\longrightarrow }\limits ^{P}(\varPhi _{j'1}~~\cdots ~~{\varPhi _{j'p}}), \end{aligned}$$

where \(\varPhi _{j'k}\) is the \(j'\times k\)th element of matrix \(\varPhi \). By the arguments of the proof of result (i) we know that \(\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}}-{\varvec{\beta }}_{0\mathcal {A}}) \mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}\) and \(\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}^c}-{\varvec{\beta }}_{0\mathcal {A}^c})\mathop {\longrightarrow }\limits ^{D}0\). So it can conclude that \(\frac{\tilde{X}_{j'}^TX(\tilde{{\varvec{\beta }}}-{\varvec{\beta }}_0)}{\sqrt{n}}\) converges in distribution to some normal distribution. Using the similar arguments to get \(\left[ \frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^T\mathop {\longrightarrow }\limits ^{D}{\varvec{\xi }}\), it is not difficult to get that \(\frac{[A(\hat{\rho })Y-X{\varvec{\beta }}_0]^T\tilde{X}_{j'}}{\sqrt{n}}\) converges in distribution to some normal distribution. By the assumptions of the Theorem 1, we know that for any \(j'\in \mathcal {A}^c\) we have \(\lambda _{j'}/\sqrt{n}\ge b_n/\sqrt{n}\rightarrow \infty \) and hence

$$\begin{aligned} P(j'\in \mathcal {A}_n)=P\left( \frac{2\tilde{X}_{j'}^T(A(\hat{\rho }) Y-X\tilde{{\varvec{\beta }}})}{\sqrt{n}} =\frac{sgn(\tilde{\beta }_{j'})\lambda _{j'}}{\sqrt{n}}\right) \rightarrow 0,~~~~~\text {as}~~ n\rightarrow \infty . \end{aligned}$$

That is \(P(\mathcal {A}^c\subseteq \mathcal {A}_n^c)\rightarrow 1\), which is equivalent that \(P(\mathcal {A}\supseteq \mathcal {A}_n)\rightarrow 1\). Combining this with (6.17), we get the result (ii).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wu, Y., Sun, Y. Shrinkage estimation of the linear model with spatial interaction. Metrika 80, 51–68 (2017). https://doi.org/10.1007/s00184-016-0590-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00184-016-0590-z

Keywords

Mathematics Subject Classification

Navigation