Shrinkage estimation of the linear model with spatial interaction

Wu, Yueqin; Sun, Yan

doi:10.1007/s00184-016-0590-z

Shrinkage estimation of the linear model with spatial interaction

Published: 13 August 2016

Volume 80, pages 51–68, (2017)
Cite this article

Metrika Aims and scope Submit manuscript

Yueqin Wu¹ &
Yan Sun²

384 Accesses
12 Citations
Explore all metrics

Abstract

The linear model with spatial interaction has attracted huge attention in the past several decades. Different from most existing research which focuses on its estimation, we study its variable selection problem using the adaptive lasso. Our results show that the method can identify the true model consistently, and the resulting estimator can be efficient as the oracle estimator which is obtained when the zero coefficients in the model are known. Simulation studies show that the proposed methods perform very well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Spatial Shrinkage Prior: A Probabilistic Approach to Model for Categorical Variables with Many Levels

Variable selection for spatial semivarying coefficient models

Article 20 December 2016

Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

Article 06 November 2018

Notes

For matrix $A=(a_{ij})_{1\le i,j \le n}$, its row matrix norm is defined by $\Vert A\Vert =\max _{1\le i\le n} \sum \nolimits _{j=1}^n |a_{ij}|$, and its column matrix norm is defined by $\Vert A\Vert =\max _{1\le j\le n} \sum \nolimits _{i=1}^n |a_{ij}|$.

References

Breiman L (1995) Better subset regression using the nonnegative garrot. Technometrics 37:373–384
Article MathSciNet MATH Google Scholar
Case AC (1991) Spatial patterns in household demand. Economics 44:438–467
Google Scholar
Cliff AD, Ord JK (1973) Spatial autocorrelation. Pion Ltd., London
Google Scholar
Fan J, Li R (2001) Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc 96:1348–1360
Article MathSciNet MATH Google Scholar
Geyer C (1994) On the asymptotics of constrained M-esimation. Ann Stat 22:1993–2010
Article MathSciNet MATH Google Scholar
Jencks C, Mayer S (1990) The social consequences of growing up in a poor neighborhood. In: Lynn LE, McGeary MGH (eds) Inner-city poverty in the United States. National Academy, Washington
Google Scholar
Kelejian HH, Prucha IR (1998) A generalized spatial two-stage least squares procedure for estimating a spatial autoregressive model with autoregressive disturbance. J Real Estate Finance Econ 17:99–121
Article Google Scholar
Kelejian HH, Prucha IR (1999) A generalized moments estimator for the autoregressive parameter in a spatial model. Int Econ Rev 40:509–533
Article MathSciNet Google Scholar
Kelejian HH, Prucha IR (2010) Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J Econom 157:53–67
Article MathSciNet MATH Google Scholar
Knight K, Fu W (2000) Asymptotics for lasso-type estimators. Ann Stat 28:1356–1378
Article MathSciNet MATH Google Scholar
Lee LF (2003) Best spatial two-stage least squares estimators for a spatial autoregressive model with autoregressive disturbances. Econom Rev 22:307–335
Article MathSciNet MATH Google Scholar
Lee LF (2004) Asymptotic distribution of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72:1899–1925
Article MathSciNet MATH Google Scholar
Lee LF (2007) GMM and 2SLS estimation of mixed regressive spatial autoregressive models. J Econom 137:489–514
Article MathSciNet MATH Google Scholar
Ord JK (1975) Estimation methods for models of spatial interaction. J Am Stat Assoc 70:120–126
Article MathSciNet MATH Google Scholar
Smirnov O, Anselin L (2001) Fast maximum likelihood estimation of very large spatial autoregressive models: A characteristic polynomial approach. Comput Stat Data Anal 35:301–319
Article MathSciNet MATH Google Scholar
Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econom 157:18–33
Article MathSciNet MATH Google Scholar
Sun Y, Yan H, Zhang W, Lu Z (2014) A semiparametric spatial dynamic model. Ann Stat 42:700–727
Article MathSciNet MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Statl Assoc SerB 58:267–288
MathSciNet MATH Google Scholar
Wang H, Xia Y (2009) Shrinkage estimation of the varying coefficient model. J Am Stat Assoc 104:747–757
Article MathSciNet MATH Google Scholar
Zhang Y, Li R, Tsai C (2010) Regularization parameter selections via generalized information criterion. J Am Stat Assoc 105:312–323
Article MathSciNet MATH Google Scholar
Zou H (2006) The adaptive lasso and its oracle properties. J Am Stat Assoc 101:1418–1429
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

This work was partially supported by National Science Foundation of China (Grant 11271242), Key Laboratory of Mathematical Economics (SUFE), Ministry of Education of China, Program for Changjiang Scholars and Innovative Research Team in SUFE (IRT13077) and Shanghai Municipal Science and Technology Research Project (14DZ1201900).

Author information

Authors and Affiliations

School of Mathematics and Sciences, Shanghai Normal University, Shanghai, People’s Republic of China
Yueqin Wu
School of Economics, Shanghai University of Finance and Economics, Shanghai, People’s Republic of China
Yan Sun

Authors

Yueqin Wu
View author publications
You can also search for this author in PubMed Google Scholar
Yan Sun
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yueqin Wu.

Appendix

In this section, we will prove the Theorem. We first establish two fundamental Lemmas.

Lemma 1

Under Assumptions 1–5, ${\varvec{\theta }}_0=({\varvec{\beta }}_0^T,~ \rho _0, ~\sigma _0^2)^T$ is globally identifiable and $\hat{{\varvec{\theta }}}_n=(\hat{{\varvec{\beta }}}(\hat{\rho })^T, ~\hat{\rho }, ~\hat{\sigma }(\hat{\rho })^2)^T$ is a consistent estimator of ${\varvec{\theta }}_0$.

The proof is similar to Theroem 3.1 of Lee (2004) and here is omitted.

Lemma 2

Under Assumptions 1–5, we have

$-\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\mathop {\longrightarrow }\limits ^{P}\Sigma $ and $\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}} \mathop {\longrightarrow }\limits ^{D}N(0,\Sigma +\varOmega )$.

Proof

By the proof of Theorem 3.2 of Lee (2004), we know that

$$\begin{aligned} -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\mathop {\longrightarrow }\limits ^{P}\lim _{n\rightarrow \infty }E\left( -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right) \end{aligned}$$

(6.1)

and

$$\begin{aligned} \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\mathop {\longrightarrow }\limits ^{D}N\left( 0,\lim _{n\rightarrow \infty }E\left( \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}^T}\right) \right) . \end{aligned}$$

(6.2)

The only difference is that Lee (2004) assumed that $X_i$ were constant regressors while we assume that $X_i$ can be random variables. Based on the results of Lee (2004), we have

$$\begin{aligned}&E\left( \left. -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right| X\right) \\&\quad =\left( \begin{array}{ccc} \frac{1}{n\sigma _0^2}X^TX &{}\quad \frac{1}{n\sigma _0^2}X^T(GX{\varvec{\beta }}_0) &{}\quad \text{0 }_{p\times 1} \\ \frac{1}{n\sigma _0^2}(GX{\varvec{\beta }}_0)^TX &{}\quad \frac{1}{n\sigma _0^2}(GX{\varvec{\beta }}_0)^TGX{\varvec{\beta }}_0+\frac{1}{n}tr((G+G^T)G) &{}\quad \frac{1}{n\sigma _0^2}tr(G)\\ \text{0 }_{1\times p} &{}\quad \frac{1}{n\sigma _0^2}tr(G) &{}\quad \frac{1}{2\sigma _0^4} \end{array} \right) . \end{aligned}$$

Therefore,

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }E\left( -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right) =\lim \limits _{n\rightarrow \infty }E\left( E\left( \left. -\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right| X\right) \right) =\Sigma , \end{aligned}$$

and combing (6.1) we can conclude that $-\frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\mathop {\longrightarrow }\limits ^{P}\Sigma $.

On the other hand, from the results of Lee (2004) we also have

$$\begin{aligned} E\left( \left. \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}^T}\right| X\right) =-E\left( \left. \frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}\right| X\right) +\varOmega _n, \end{aligned}$$

where

$$\begin{aligned} \varOmega _n=\left( \begin{array}{ccc} \text{0 }_{p\times p} &{}\quad * &{}\quad * \\ \frac{\mu _3}{n\sigma _0^4}\sum \limits _{i=1}^ng_{ii}X_i^T &{}\quad \frac{2\mu _3}{n\sigma _0^4}\sum \limits _{i=1}^ng_{ii}G_iX{\varvec{\beta }}_0+ \frac{\mu _4-3\sigma _0^4}{n\sigma _0^4}\sum \limits _{i=1}^ng_{ii}^2 &{}\quad *\\ \frac{\mu _3}{2n\sigma _0^6}\sum \limits _{i=1}^nX_i^T &{}\quad \frac{\mu _3\varvec{1}_n^TGX{\varvec{\beta }}_0+(\mu _4-3\sigma _0^4)tr(G)}{2n\sigma _0^6} &{}\quad \frac{\mu _4-3\sigma _0^4}{4\sigma _0^8} \end{array} \right) \end{aligned}$$

and $G_i$ is the ith row of G. Since $\varOmega _n$ is a symmetric matrix and here we only list the lower triangular part. It is obvious that $\lim _{n\rightarrow \infty }E_X\varOmega _n=\varOmega $, which leads to that $\lim _{n\rightarrow \infty }E\left( \frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}^T}\right) =\Sigma +\varOmega $. Substituting it to (6.2), we know that the proof of Lemma 2 is finished. $\square $

Proof of Theorem 1

The proof follows a similar outline to that of Zou (2006). Let ${\varvec{\beta }}={\varvec{\beta }}_0+\text{ u }/\sqrt{n}$ and

$$\begin{aligned} \varPsi _n(\text{ u })= & {} [A(\hat{\rho })Y-X({\varvec{\beta }}_0+\text{ u }/\sqrt{n})]^T [A(\hat{\rho })Y-X ({\varvec{\beta }}_0+\text{ u }/\sqrt{n})]\\&+\,\sum \limits _{j=1}^p\lambda _j| \beta _{0j}+u_j/\sqrt{n}|. \end{aligned}$$

Define $\hat{\text{ u }}^{(n)}=\text{ argmin }\varPsi _n(\text{ u })$, then $\tilde{{\varvec{\beta }}}={\varvec{\beta }}_0+\hat{\text{ u }}^{(n)}/\sqrt{n}$ or $\hat{\text{ u }}^{(n)}=\sqrt{n}(\tilde{{\varvec{\beta }}}-{\varvec{\beta }}_0)$. By straightforward calculations, we have

$$\begin{aligned} \varPsi _n(\text{ u })-\varPsi _n(\text{0 })= & {} \frac{\text{ u }^TX^TX\text{ u }}{n}-2 \frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX\text{ u }}{\sqrt{n}}\nonumber \\&+ \sum _{j=1}^p\lambda _j(|\beta _{0j}+\frac{u_j}{\sqrt{n}}|-|\beta _{0j}|), \end{aligned}$$

(6.3)

$$\begin{aligned} \left[ -\frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^ T=\left[ -\frac{Y^T(A(\hat{\rho })-A(\rho _0))^TX}{\sqrt{n}}\right] ^T- \frac{X^T\epsilon }{\sqrt{n}} \end{aligned}$$

(6.4)

and

$$\begin{aligned} -\frac{Y^T(A(\hat{\rho })-A(\rho _0))^TX}{\sqrt{n}}=\sqrt{n}(\hat{\rho }-\rho _0) \frac{(GX{\varvec{\beta }}_0)^TX}{n}+(\hat{\rho }-\rho _0)\frac{\epsilon ^TG^TX}{\sqrt{n}}.\nonumber \\ \end{aligned}$$

(6.5)

By taking the Taylor expansion of $\partial \log L(\hat{{\varvec{\theta }}})/\partial {\varvec{\theta }}$ at ${\varvec{\theta }}={\varvec{\theta }}_0$, we can get that

$$\begin{aligned} 0=\frac{\partial \log L(\hat{{\varvec{\theta }}})}{\partial {\varvec{\theta }}}=\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}+\frac{\partial ^2\log L({\varvec{\theta }}^*)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T}(\hat{{\varvec{\theta }}}-{\varvec{\theta }}_0), \end{aligned}$$

(6.6)

where ${\varvec{\theta }}^*$ is between ${\varvec{\theta }}_0$ and $\hat{{\varvec{\theta }}}$, and it is obvious that ${\varvec{\theta }}^*\mathop {\longrightarrow }\limits ^{P}{\varvec{\theta }}_0$ because $\hat{{\varvec{\theta }}}\mathop {\longrightarrow }\limits ^{P}{\varvec{\theta }}_0$. (6.6) is equivalent to that

$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\theta }}}-{\varvec{\theta }}_0)=-\left[ \frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}^*)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T} \right] ^{-1}\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}. \end{aligned}$$

(6.7)

Let

$$\begin{aligned} K=-\left[ \frac{1}{n}\frac{\partial ^2\log L({\varvec{\theta }}^*)}{\partial {\varvec{\theta }}\partial {\varvec{\theta }}^T} \right] ^{-1}\mathop {=}\limits ^{\varDelta }\left( \begin{array}{ccc} * &{}\quad * &{}\quad *\\ \eta &{}\quad k_1 &{}\quad k_2 \\ * &{}\quad * &{}\quad * \end{array} \right) , \end{aligned}$$

(6.8)

where we use notations $(\eta ~k_1~k_2)$ to denote the second row of the matrix K. Following Lemma 2 we know that $K\mathop {\longrightarrow }\limits ^{P}\Sigma ^{-1}$, where

$$\begin{aligned} \Sigma ^{-1}= & {} \left( \begin{array}{ccc} \varPhi ^{-1}+\varPhi ^{-1}\text{ v }\text{ v }^T\varPhi ^{-1}/\gamma _2 &{}\quad -\varPhi ^{-1}\text{ v }/ \gamma _2 &{}\quad 2\sigma _0^2\pi _2\varPhi ^{-1}\text{ v }/\gamma _2 \\ -\text{ v }^{T}\varPhi ^{-1}/\gamma _2 &{}\quad 1/\gamma _2 &{}\quad -2\pi _2\sigma _0^2/\gamma _2\\ 2\sigma _0^2\pi _2\text{ v }^T\varPhi ^{-1}/\gamma _2 &{}\quad -2\pi _2\sigma _0^2/ \gamma _2 &{}\quad 2\sigma _0^2\gamma _1/\gamma _2 \end{array} \right) \sigma _0^2. \, \hbox {It means that}\nonumber \\&\quad \eta \mathop {\longrightarrow }\limits ^{P}-\frac{\text{ v }^{T}\varPhi ^{-1}}{\gamma _2}\sigma _0^2,~~~~~~ k_1\mathop {\longrightarrow }\limits ^{P}\frac{\sigma _0^2}{\gamma _2},~~~~~~~~~k_2\mathop {\longrightarrow }\limits ^{P}-\frac{2\pi _2}{\gamma _2}\sigma _0^4. \end{aligned}$$

(6.9)

By (2.2) and through some calculation we can get that

$$\begin{aligned} \frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}=\left( \begin{array}{c} \frac{X^T\epsilon }{\sigma _0^2} \\ -tr(G)+\frac{\epsilon ^TGX{\varvec{\beta }}_0}{\sigma _0^2}+\frac{\epsilon ^TG\epsilon }{\sigma _0^2}\\ \frac{\epsilon ^T\epsilon -n\sigma _0^2}{2\sigma _0^4} \end{array} \right) . \end{aligned}$$

(6.10)

Combing (6.7), (6.8) and (6.10) we can obtain that

$$\begin{aligned} \sqrt{n}(\hat{{\varvec{\theta }}}-{\varvec{\theta }}_0)=\left( \begin{array}{c} \sqrt{n}(\hat{{\varvec{\beta }}}-{\varvec{\beta }}_0) \\ \sqrt{n}(\hat{\rho }-\rho _0)\\ \sqrt{n}(\hat{\sigma }^2-\sigma _0^2) \end{array} \right) =\left( \begin{array}{ccc} * &{}\quad * &{}\quad *\\ \eta &{}\quad k_1 &{}\quad k_2 \\ * &{}\quad * &{}\quad * \end{array} \right) \left( \begin{array}{c} \frac{X^T\epsilon }{\sqrt{n}\sigma _0^2} \\ -\frac{tr(G)}{\sqrt{n}}+\frac{\epsilon ^TGX {\varvec{\beta }}_0}{\sqrt{n}\sigma _0^2}+\frac{\epsilon ^TG\epsilon }{\sqrt{n}\sigma _0^2}\\ \frac{\epsilon ^T\epsilon -n\sigma _0^2}{2\sqrt{n}\sigma _0^4} \end{array} \right) . \end{aligned}$$

It implies that

$$\begin{aligned} \sqrt{n}(\hat{\rho }-\rho _0)=\eta \frac{X^T\epsilon }{\sqrt{n} \sigma _0^2}+k_1\left[ -\frac{tr(G)}{\sqrt{n}}+ \frac{\epsilon ^TGX{\varvec{\beta }}_0}{\sqrt{n}\sigma _0^2}+\frac{\epsilon ^TG\epsilon }{\sqrt{n}\sigma _0^2}\right] +k_2\frac{\epsilon ^T\epsilon -n \sigma _0^2}{2\sqrt{n}\sigma _0^4}.\nonumber \\ \end{aligned}$$

(6.11)

From (6.4), (6.5), (6.10) and (6.11), we can get that

$$\begin{aligned} \left[ -\frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^T =D\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}} +(\hat{\rho }-\rho _0)\frac{X^TG\epsilon }{\sqrt{n}} \end{aligned}$$

(6.12)

where $D=\frac{X^TGX{\varvec{\beta }}_0}{n}\big [\eta -\sigma _0^2\frac{(GX{\varvec{\beta }}_0)^TX}{n}\big [\frac{X^TGX{\varvec{\beta }}_0}{n} \frac{(GX{\varvec{\beta }}_0)^TX}{n}\big ]^{-1}, ~k_1,~k_2\big ].$ From (6.9) and the fact that $\frac{X^TGX{\varvec{\beta }}_0}{n}\mathop {\longrightarrow }\limits ^{P}\text{ v }$, it is easy to obtain that $D\mathop {\longrightarrow }\limits ^{P}B$. Combing this with the result from Lemma 2, we have

$$\begin{aligned} D\frac{1}{\sqrt{n}}\frac{\partial \log L({\varvec{\theta }}_0)}{\partial {\varvec{\theta }}}\mathop {\longrightarrow }\limits ^{D}N(0, B(\Sigma +\varOmega )B^T). \end{aligned}$$

(6.13)

Moreover, following Lemma 1 we know that $\hat{\rho }-\rho _0\mathop {\longrightarrow }\limits ^{P}0$ and $X^TG\epsilon /\sqrt{n}$ converges in distribution to some normal distribution, which leads to that

$$\begin{aligned} (\hat{\rho }-\rho _0)\frac{X^TG\epsilon }{\sqrt{n}}\mathop {\longrightarrow }\limits ^{P}0. \end{aligned}$$

(6.14)

Therefore, from (6.12), (6.13) and (6.14) we have

$$\begin{aligned} \left[ \frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^T\mathop {\longrightarrow }\limits ^{D}{\varvec{\xi }}\sim N(0,B(\Sigma +\varOmega )B^T). \end{aligned}$$

(6.15)

Next we consider the last term of (6.3). If $\beta _{0j}\ne 0$ (i.e., $j\in \mathcal {A}$), then

$$\begin{aligned} \left| \lambda _j(|\beta _{0j}+\frac{u_j}{\sqrt{n}}|-|\beta _{0j}|)\right| =\frac{|u_j|}{\sqrt{n}}\lambda _j\le a_nn^{-1/2}|u_j|\rightarrow 0 \end{aligned}$$

under the assumption that $a_nn^{-1/2}\rightarrow 0$ as n goes to infinity. If $\beta _{0j}=0$ (i.e., $j\in \mathcal {A}^c$), under the assumption that $b_nn^{-1/2}\rightarrow \infty $, then we have

$$\begin{aligned} \lambda _j(|\beta _{0j}+\frac{u_j}{\sqrt{n}}|-|\beta _{0j}|)=\frac{|u_j|}{\sqrt{n}}\lambda _j\ge b_nn^{-1/2}|u_j|\rightarrow \left\{ \begin{array}{cc} \infty , &{}\quad \text {if}~~~ u_j\ne 0\\ 0, &{}\quad \text {if}~~~ u_j=0 \end{array} \right. . \end{aligned}$$

Hence, it is obvious that as n goes to infinity,

$$\begin{aligned} \sum \limits _{j=1}^p\lambda _j(|\beta _{0j}+\frac{u_j}{\sqrt{n}}|-| \beta _{0j}|)\rightarrow \left\{ \begin{array}{ll} 0, &{}\quad \text {if}~~~ u_j= 0 ~~~\text {for all} j\notin \mathcal {A}\\ \infty , &{}\quad \text {otherwise} \end{array} \right. . \end{aligned}$$

(6.16)

Let $\text{ u }=\left( \begin{array}{c}\text{ u }_{\mathcal {A}}\\ \text{ u }_{\mathcal {A}^c}\end{array}\right) $ and ${\varvec{\xi }}=\left( \begin{array}{c}{\varvec{\xi }}_{\mathcal {A}}\\ {\varvec{\xi }}_{\mathcal {A}^c}\end{array}\right) $. From (6.3), (6.15), (6.16) and the fact that $X^TX/n\mathop {\longrightarrow }\limits ^{P}\varPhi $ we can get that $\varPsi _n(\text{ u })-\varPsi _n(0)\mathop {\longrightarrow }\limits ^{D}G(\text{ u })$, where

$$\begin{aligned} G(\text{ u })=\left\{ \begin{array}{ll} \text{ u }_{\mathcal {A}}^T\varPhi _1\text{ u }_{\mathcal {A}}-2{\varvec{\xi }}_{\mathcal {A}}^T\text{ u }_{\mathcal {A}}, &{}\quad \text {if}~~~ \text{ u }_{\mathcal {A}^c}=0 \\ \infty , &{}\quad \text {otherwise} \end{array} \right. . \end{aligned}$$

We see that $G(\text{ u })$ has the unique minimum at $\left( {\varvec{\xi }}_{\mathcal {A}}^T\varPhi _1^{-1},~ \text{0 }_{1\times (p-p_0)}\right) ^T$. Follow the epi-convergence results of Geyer (1994) and Knight and Fu (2000), we then have $\hat{\text{ u }}_{\mathcal {A}}^{(n)}\mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}$ and $\hat{\text{ u }}_{\mathcal {A}^c}^{(n)}\mathop {\longrightarrow }\limits ^{D}\text{0 }$. That is $\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}}-{\varvec{\beta }}_{0\mathcal {A}})\mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}$ and (i) is proved.

Now we prove the consistency part. For any $j\in \mathcal {A}$ (i.e., $\beta _{0j}\ne 0$), by the result of (i) we know that $\tilde{\beta }_j\mathop {\longrightarrow }\limits ^{P}\beta _{0j}$ when $n\rightarrow \infty $. Hence, $P(\tilde{\beta }_j\ne 0)=P(j\in \mathcal {A}_n)\rightarrow 1$ as $n\rightarrow \infty $. That is

$$\begin{aligned} P(\mathcal {A}\subseteq \mathcal {A}_n)\rightarrow 1 ~~~\text {as}~~~ n\rightarrow \infty . \end{aligned}$$

(6.17)

Next we are going to prove that $P(\mathcal {A}\supseteq \mathcal {A}_n)\rightarrow 1$. In fact, consider the event $j'\in \mathcal {A}_n$ which means that $\tilde{\beta }_{j'}\ne 0$. By the Karush–Kuhn–Tucker (KKT) optimality condition, we have

$$\begin{aligned} -2\tilde{X}_{j'}^T(A(\hat{\rho })Y-X\tilde{{\varvec{\beta }}})+\lambda _{j'}sgn (\tilde{\beta }_{j'})=0, \end{aligned}$$

where $\tilde{X}_j$ denotes the ith column of X. It is equivalent to

$$\begin{aligned} 2\tilde{X}_{j'}^T(A(\hat{\rho })Y-X\tilde{{\varvec{\beta }}})=\lambda _{j'}sgn(\tilde{\beta }_{j'}). \end{aligned}$$

Because

$$\begin{aligned} \frac{\tilde{X}_{j'}^T(A(\hat{\rho })Y-X\tilde{{\varvec{\beta }}})}{\sqrt{n}}=\frac{\tilde{X}_{j'}^T(A(\hat{\rho })Y-X{\varvec{\beta }}_0)}{\sqrt{n}} -\frac{\tilde{X}_{j'}^TX(\tilde{{\varvec{\beta }}}-{\varvec{\beta }}_0)}{\sqrt{n}}. \end{aligned}$$

(6.18)

Observe that

$$\begin{aligned} \frac{\tilde{X}_{j'}^TX}{n}=\left( \frac{\tilde{X}_{j'}^T \tilde{X}_1}{n}~~\cdots ~~\frac{\tilde{X}_{j'}^T\tilde{X}_p}{n}\right) \mathop {\longrightarrow }\limits ^{P}(\varPhi _{j'1}~~\cdots ~~{\varPhi _{j'p}}), \end{aligned}$$

where $\varPhi _{j'k}$ is the $j'\times k$th element of matrix $\varPhi $. By the arguments of the proof of result (i) we know that $\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}}-{\varvec{\beta }}_{0\mathcal {A}}) \mathop {\longrightarrow }\limits ^{D}\varPhi _1^{-1}{\varvec{\xi }}_{\mathcal {A}}$ and $\sqrt{n}(\tilde{{\varvec{\beta }}}_{\mathcal {A}^c}-{\varvec{\beta }}_{0\mathcal {A}^c})\mathop {\longrightarrow }\limits ^{D}0$. So it can conclude that $\frac{\tilde{X}_{j'}^TX(\tilde{{\varvec{\beta }}}-{\varvec{\beta }}_0)}{\sqrt{n}}$ converges in distribution to some normal distribution. Using the similar arguments to get $\left[ \frac{(A(\hat{\rho })Y-X{\varvec{\beta }}_0)^TX}{\sqrt{n}}\right] ^T\mathop {\longrightarrow }\limits ^{D}{\varvec{\xi }}$, it is not difficult to get that $\frac{[A(\hat{\rho })Y-X{\varvec{\beta }}_0]^T\tilde{X}_{j'}}{\sqrt{n}}$ converges in distribution to some normal distribution. By the assumptions of the Theorem 1, we know that for any $j'\in \mathcal {A}^c$ we have $\lambda _{j'}/\sqrt{n}\ge b_n/\sqrt{n}\rightarrow \infty $ and hence

$$\begin{aligned} P(j'\in \mathcal {A}_n)=P\left( \frac{2\tilde{X}_{j'}^T(A(\hat{\rho }) Y-X\tilde{{\varvec{\beta }}})}{\sqrt{n}} =\frac{sgn(\tilde{\beta }_{j'})\lambda _{j'}}{\sqrt{n}}\right) \rightarrow 0,~~~~~\text {as}~~ n\rightarrow \infty . \end{aligned}$$

That is $P(\mathcal {A}^c\subseteq \mathcal {A}_n^c)\rightarrow 1$, which is equivalent that $P(\mathcal {A}\supseteq \mathcal {A}_n)\rightarrow 1$. Combining this with (6.17), we get the result (ii).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, Y., Sun, Y. Shrinkage estimation of the linear model with spatial interaction. Metrika 80, 51–68 (2017). https://doi.org/10.1007/s00184-016-0590-z

Download citation

Received: 12 February 2015
Published: 13 August 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s00184-016-0590-z

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Shrinkage estimation of the linear model with spatial interaction

Abstract

Access this article

Similar content being viewed by others

Spatial Shrinkage Prior: A Probabilistic Approach to Model for Categorical Variables with Many Levels

Variable selection for spatial semivarying coefficient models

Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Lemma 1

Lemma 2

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Shrinkage estimation of the linear model with spatial interaction

Abstract

Access this article

Similar content being viewed by others

Spatial Shrinkage Prior: A Probabilistic Approach to Model for Categorical Variables with Many Levels

Variable selection for spatial semivarying coefficient models

Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 1

Lemma 2

Proof

Proof of Theorem 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation