Skip to main content
Log in

Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator

  • Published:
Sankhya B Aims and scope Submit manuscript

Abstract

We propose generalized moments LASSO estimator, combining LASSO with GMM, for penalized variable selection and estimation under the spatial error model with spatially autoregressive errors. We establish parameter consistency and selection sign consistency of the proposed estimator in the low dimensional setting when the parameter dimension p < sample size n , as well as the high dimensional setting with p greater than and growing with n. Finite sample performance of the method is examined by simulation, compared against the LASSO for IID data. The methods are applied to estimation of a spatial Durbin model for the Aveiro housing market (Portugal).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Ahrens, A. and Bhattacharjee, A. (2015). Two-step lasso estimation of the spatial weights matrix. Econometrics (MDPI)3, 1, 128–155.

    Google Scholar 

  • Ando, T. and Bai, J. (2016). Panel data models with grouped factor structure under unknown group membership. J. Appl. Econ.31, 163–191.

    MathSciNet  Google Scholar 

  • Anselin, L. (1988). Spatial Econometrics: Methods and Models. Kluwer Academic, Boston.

    MATH  Google Scholar 

  • Bai, Z.D. (1999). Methodologies in spectral analysis of large dimensional random matrices: A review. Statistica Sinica9, 611–677.

    MathSciNet  MATH  Google Scholar 

  • Bailey, N., Holly, S. and Pesaran, M.H. (2016). A two-stage approach to spatio-temporal analysis with strong and weak cross-sectional dependence. J. Appl. Econ.31, 249–280.

    MathSciNet  Google Scholar 

  • Belloni, A. and Chernozhukov, V. (2011). High dimensional sparse econometric models: An introduction. arXiv:1106.5242v2.

  • Belloni, A. and Chernozhukov, V. (2013). Least squares after model selection in high-dimensional sparse models. Bernoulli19, 521–547.

    MathSciNet  MATH  Google Scholar 

  • Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root LASSO: pivotal recovery of sparse signals via conic programming. Biometrika98, 791–806.

    MathSciNet  MATH  Google Scholar 

  • Belloni, A., Chen, D., Chernozhukov, V. and Hansen, C. (2012). Sparse models and methods for optimal instruments with an application to eminent domain. Econometrica80, 2369–2429.

    MathSciNet  MATH  Google Scholar 

  • Belloni, A., Chernozhukov, V. and Wei, Y. (2016). Post-selection inference for generalized linear models with many controls. Journal of Business and Economic Statistics, forthcoming.

  • Bhattacharjee, A., Castro, E.A. and Marques, J.L. (2012). Understanding spatial diffusion with factor-based hedonic pricing models: the urban housing market of Aveiro, Portugal. Spat. Econ. Anal.7, 1, 133–167.

    Google Scholar 

  • Bhattacharjee, A., Castro, E., Maiti, T. and Marques, J. (2016). Endogenous spatial regression and delineation of submarkets: A new framework with application to housing markets. J. Appl. Econ.31, 32–57.

    MathSciNet  Google Scholar 

  • Bickel, P.J., Ritov, Y. and Tsybakov, A.B. (2009). Simultaneous analysis of LASSO and Dantzig selector. Ann. Stat.37, 1705–1732.

    MathSciNet  MATH  Google Scholar 

  • Brady, R.R. (2011). Measuring the diffusion of housing prices across space and over time. J. Appl. Econ.26, 2, 213–231.

    MathSciNet  Google Scholar 

  • Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data. Springer.

  • Caner, M. and Zhang, H.H. (2014). Adaptive elastic net for generalized methods of moments. J. Bus. Econ. Stat.32, 1, 30–47.

    MathSciNet  Google Scholar 

  • Castle, J.L. and Hendry, D.F. (2014). Model selection in under-specified equations with breaks. J. Econ.178, 286–293.

    MathSciNet  MATH  Google Scholar 

  • Castle, J.L., Doornik, J.A., Hendry, D.F. and Pretis, F. (2015). Detecting location shifts during model selection by step-indicator saturation. Econometrics (MDPI)3, 2, 240–264.

    Google Scholar 

  • Chudik, A. and Pesaran, M.H. (2011). Infinite-dimensional VARs and factor models. J. Econ.163, 1, 4–22.

    MathSciNet  MATH  Google Scholar 

  • Chudik, A., Grossman, V. and Pesaran, M.H. (2016). A multi-country approach to forecasting output growth using PMIs. Journal of Econometrics, forthcoming.

  • Cliff, A.D. and Ord, J.K. (1973). Spatial Autocorrelation. London: Pion.

  • Cuaresma, C. and Feldkircher, M. (2013). Spatial filtering, model uncertainty and the speed of income convergence in Europe. J. Appl. Econ.28, 4, 720–741.

    MathSciNet  Google Scholar 

  • Feng, W., Lim, C., Maiti, T. and Zhang, Z. (2016). Spatial regression and estimation of disease risks: A clustering based approach. Statistical Analysis and Data Mining, forthcoming.

    MathSciNet  Google Scholar 

  • Flores-Lagunes, A. and Schnier, K.E. (2012). Estimation of sample selection models with spatial dependence. J. Appl. Econ.27, 2, 173–204.

    MathSciNet  Google Scholar 

  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw.33, 1–22.

    Google Scholar 

  • Fu, W. and Knight, K. (2000). Asymptotics for LASSO-type estimators. Ann. Stat.28, 1356–1378.

    MathSciNet  MATH  Google Scholar 

  • Geyer, C.J. (1996). On the asymptotics of convex stochastic optimization. Unpublished manuscript.

  • Hall, P. and Horowitz, J.L. (2005). Nonparametric methods for inference in the presence of instrumental variables. Ann. Stat.33, 2904–2929.

    MathSciNet  MATH  Google Scholar 

  • Hendry, D.F., Johansen, S. and Santos, C. (2008). Automatic selection of indicators in a fully saturated regression. Comput. Stat.33, 317–335. Erratum, 337–339.

    MathSciNet  MATH  Google Scholar 

  • Ishwaran, H. and Rao, J.S. (2005). Spike and slab variable selection: Frequentist and Bayesian strategies. Ann. Stat.33, 2, 730–773.

    MathSciNet  MATH  Google Scholar 

  • Johansen, S. and Nielsen, B. (2009). An analysis of the indicator saturation estimator as a robust regression estimator. In The Methodology and Practice of Econometrics, (J.L. Castle and N. Shephard, eds.). Oxford University Press, Oxford, pp. 1–36 .

    Google Scholar 

  • Kapoor, M., Kelejian, H.H. and Prucha, I.R. (2007). Panel data models with spatially correlated error components. J. Econ.140, 97–130.

    MathSciNet  MATH  Google Scholar 

  • Kelejian, H.H. and Prucha, I.R. (1999). A generalized moments estimator for the autoregressive parameter in a spatial model. Int. Econ. Rev.40, 509–533.

    MathSciNet  Google Scholar 

  • Kelejian, H.H. and Prucha, I.R. (2010). Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. J. Econ.157, 53–67.

    MathSciNet  MATH  Google Scholar 

  • Kock, A.B. and Callot, L. (2015). Oracle inequalities for high dimensional vector autoregressions. J. Econ.186, 2, 325–344.

    MathSciNet  MATH  Google Scholar 

  • Lam, C. and Souza, P.C. (2016). Regularization for spatial panel time series using the adaptive LASSO. Journal of Regional Science, forthcoming.

  • Lee, L.-F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica72, 6, 1899–1925.

    MathSciNet  MATH  Google Scholar 

  • Lee, L.-F. and Yu, J. (2010). Estimation of spatial autoregressive panel data models with fixed effects. J. Econ.154, 165–185.

    MathSciNet  MATH  Google Scholar 

  • Lee, L.-F. and Yu, J. (2016). Identification of spatial Durbin panel models. J. Appl. Econ.31, 1, 133–162.

    MathSciNet  Google Scholar 

  • Lin, X. and Lee, L.-F. (2010). GMM estimation of spatial autoregressive models with unknown heteroskedasticity. J. Econ.157, 1, 34–52.

    MathSciNet  MATH  Google Scholar 

  • Lounici, K. (2008). Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators. Electr. J. Stat.2, 90–102.

    MathSciNet  MATH  Google Scholar 

  • Meinshausen, N. and Bühlmann, P. (2006). High dimensional graphs and variable selection with the LASSO. Ann. Stat.34, 1436–1462.

    MathSciNet  MATH  Google Scholar 

  • Nandy, S., Lim, C. and Maiti, T. (2016). Additive model building for spatial regression. Journal of the Royal Statistical Society Series B, forthcoming.

  • Nowak, A. and Smith, P. (2017). Textual analysis in real estate. Journal of Applied Econometrics, forthcoming.

  • Pesaran, M.H., Schuermann, T. and Weiner, S.M. (2004). Modelling regional interdependencies using a global error-correcting macroeconometric model. J. Business Econ. Stat.22, 2, 129–162.

    Google Scholar 

  • Pollard, D. (1991). Asymptotic for least absolute deviation regression estimators. Econ. Theory7, 186–199.

    MathSciNet  Google Scholar 

  • Stock, J.H. and Watson, M.W. (2002). Forecasting using principle components from a large number of predictors. J. Am. Stat. Assoc.97, 1167–1179.

    MATH  Google Scholar 

  • Su, L. and Yang, Z. (2015). QML estimation of dynamic panel data models with spatial errors. J. Econ.185, 1, 230–258.

    MathSciNet  MATH  Google Scholar 

  • Tibshirani, R. (1996). Regression shrinkage and selection via the LASSO. J. R. Stat. Soc. Series B58, 267–288.

    MathSciNet  MATH  Google Scholar 

  • Varian, H.R. (2014). Big data: new tricks for econometrics. J. Econ. Perspect.28, 3–28.

    Google Scholar 

  • Whittle, P. (1954). On stationary processes in the plane. Biometrica41, 434–449.

    MathSciNet  MATH  Google Scholar 

  • Yu, J., de Jong, R.M. and Lee, L.-F. (2008). Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both N and T are large. J. Econ.146, 1, 118–134.

    MathSciNet  MATH  Google Scholar 

  • Zhao, P. and Yu, B. (2006). On model selection consistency of LASSO. J. Mach. Learn. Res.7, 2541–2563.

    MathSciNet  MATH  Google Scholar 

  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Stat. Assoc.101, 1418–1429.

    MathSciNet  MATH  Google Scholar 

  • Zou, H. and Zhang, H. (2009). On the adaptive elastic-net with a diverging number of parameters. Ann. Stat.37, 1733–1751.

    MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

We thank organisers and participants in seminars at the University of Illinois and Indian Statistical Institute, the USC Dornsife INET Conference on Big Data in Economics, and invited presentations at the 26th (EC)2 Conference and American Statistical Association JSM (Business & Economic Statistics Section) for valuable comments and suggestions. The usual disclaimer applies.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liqian Cai.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 166 KB)

Appendix: Proofs of technical results

Appendix: Proofs of technical results

Proof Proof of Theorem 1.

Define a random function of ρ and ϕ,

$$Z_{n}(\phi ,\rho )=\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\rho )(Y_{n}-X_{n}\phi )+\frac{\lambda_{n}}{n}\sum\limits_{j = 1}^{p}|\phi_{j}|. $$

By the definition of LASSO estimator, for any fixed ρ, Zn(ϕ,ρ) is minimized at \(\phi =\hat {\beta }_{L}(\rho )\).

However, we not have the true value of ρ, but instead, we use the GMM estimator \(\hat {\rho }_{n}\) as a substitute. Then the function \(Z_{n}(\phi , \hat {\rho }_{n})\) is minimized at the generalized moments LASSO estimator \( \phi =\hat {\beta }_{L}(\hat {\rho }_{n})\). Furthermore, denote by β the true value of the unknown parameter, and let

$$Z(\phi ,\rho )=(\beta -\phi )^{\prime }C(\rho )(\beta -\phi )+\sigma^{2}. $$

Then, it is easy to see that for any given ρ, Z(β,ρ) is minimized at ϕ = β. For each ϕRp,

$$\begin{array}{@{}rcl@{}} Z_{n}(\phi ,\hat{\rho}_{n}) &=&\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\phi )+\frac{\lambda_{n}}{n}\sum\limits_{j = 1}^{p}|\phi_{j}| \\ &=&{\Phi}_{1}-{\Phi}_{2}+{\Phi}_{2}+{\Phi}_{3} \end{array} $$

where

$${\Phi}_{1}=\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\hat{\rho} _{n})(Y_{n}-X_{n}\phi ) $$
$${\Phi}_{2}=\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\rho )(Y_{n}-X_{n}\phi ) $$
$${\Phi}_{3}=\frac{\lambda_{n}}{n}\sum\limits_{j = 1}^{p}|\phi_{j}| $$

Since \(\frac {\lambda _{n}}{n}\rightarrow 0\), we have Φ3 → 0. Also,

$$\begin{array}{@{}rcl@{}} {\Phi}_{2} &=&\frac{1}{n}[(I-\rho M_{n})X_{n}(\beta -\phi )+\epsilon_{n}]^{\prime }[(I-\rho M_{n})X_{n}(\beta -\phi )+\epsilon_{n}] \\ &=&\frac{1}{n}(\beta -\phi )^{\prime }X_{n}^{\prime }{\Sigma} (\rho )X_{n}(\beta -\phi )+\frac{1}{n}\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}(\beta -\phi )+ \\ &&\frac{1}{n}(\beta -\phi )^{\prime }X_{n}^{\prime }(I-\rho M_{n})^{\prime }\epsilon_{n}+\frac{1}{n}\epsilon_{n}^{\prime }\epsilon_{n} \\ \rightarrow_{p} &&(\beta -\phi )^{\prime }C(\rho )(\beta -\phi )+\sigma^{2} \\ &=&Z(\phi ,\rho ) \end{array} $$

by Assumption 6 and the weak law of large numbers.

Moreover, since \(\hat {\rho }_{n}\) is a consistent estimator of ρ,

$$\begin{array}{@{}rcl@{}} {\Phi}_{1}-{\Phi}_{2} &=&\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }[{\Sigma} (\hat{ \rho}_{n})-{\Sigma} (\rho )](Y_{n}-X_{n}\phi ) \\ &=&\frac{1}{n}(Y_{n}-X_{n}\phi )^{\prime }[(\rho -\hat{\rho} _{n})(M_{n}+M_{n}^{\prime })+(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}](Y_{n}-X_{n}\phi ) \\ &=&\frac{1}{n}\left[ (\beta -\phi )^{\prime }X_{n}^{\prime }+\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}\right] \\ &&\left[ (\rho -\hat{\rho}_{n})(M_{n}+M_{n}^{\prime })+(\hat{\rho} _{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}\right] \\ &&\left[ X_{n}(\beta -\phi )+(I-\rho M_{n})^{-1}\epsilon_{n}\right] \end{array} $$
$$\begin{array}{@{}rcl@{}} &=&\frac{1}{n}(\rho -\hat{\rho}_{n})(\beta -\phi )^{\prime }X_{n}^{\prime }(M_{n}+M_{n}^{\prime })X_{n}(\beta -\phi ) \\ &&+\frac{1}{n}(\hat{\rho}_{n}^{2}-\rho^{2})(\beta -\phi )^{\prime }X_{n}^{\prime }(M_{n}^{\prime }M_{n})X_{n}(\beta -\phi ) \\ &&+\frac{1}{n}(\rho -\hat{\rho}_{n})\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}+M_{n}^{\prime })X_{n}(\beta -\phi ) \\ &&+\frac{1}{n}(\hat{\rho}_{n}^{2}-\rho^{2})\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }M_{n})X_{n}(\beta -\phi ) \\ &&+\frac{1}{n}(\rho -\hat{\rho}_{n})(\beta -\phi )^{\prime }X_{n}^{\prime }(M_{n}+M_{n}^{\prime })(I-\rho M_{n})^{-1}\epsilon_{n} \\ &&+\frac{1}{n}(\hat{\rho}_{n}^{2}-\rho^{2})(\beta -\phi )^{\prime }X_{n}^{\prime }(M_{n}^{\prime }M_{n})(I-\rho M_{n})^{-1}\epsilon_{n} \\ &&+\frac{1}{n}(\rho -\hat{\rho}_{n})\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}+M_{n}^{\prime })(I-\rho M_{n})^{-1}\epsilon_{n} \\ &&+\frac{1}{n}(\hat{\rho}_{n}^{2}-\rho^{2})\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }M_{n})(I-\rho M_{n})^{-1}\epsilon_{n} \\ \rightarrow_{p} &&0 \end{array} $$

Therefore, \(Z_{n}(\phi ,\hat {\rho }_{n})-Z(\phi ,\rho )\rightarrow _{p}0\) for any ϕRp. Combined with the fact that \(Z_{n}(\phi ,\hat {\rho }_{n})\) is a convex function of ϕ, we have

$$\sup\limits_{\phi \in \mathcal{K}}|Z_{n}(\phi ,\hat{\rho}_{n})-Z(\phi ,\rho )|\rightarrow_{p}0 $$

for any compact set K and \(\hat {\beta }_{L}(\hat {\rho }_{n})\in O_{p}(1)\) by applying the convexity lemma in Pollard (1991). From the above result we have

$$\arg \min (Z_{n}(\phi ,\hat{\rho}_{n}))\rightarrow_{p}\arg \min (Z(\phi ,\rho )) $$

which implies that

$$\hat{\beta}_{L}(\hat{\rho}_{n})\rightarrow_{p}\beta . $$

For asymptotic normality of the estimator, we need λn to grow slowly, and further assume that \(\lambda _{n}=O(\sqrt {n})\). From the above proof, we already know that

$$nZ_{n}(\phi ,\hat{\rho}_{n})=(Y_{n}-X_{n}\phi )^{\prime }{\Sigma} (\hat{\rho} _{n})(Y_{n}-X_{n}\phi )+\lambda_{n}\sum\limits_{j = 1}^{p}|\phi_{j}| $$

is minimized at \(\phi =\hat {\beta }_{L}(\hat {\rho }_{n})\). Now define \(w=\sqrt { n}(\phi -\beta )\). Then \(nZ_{n}(\phi ,\hat {\rho }_{n})\) can be treated as a function of w and

$$\begin{array}{@{}rcl@{}} nZ_{n}(\phi ,\hat{\rho}_{n}) &=&\left[ Y_{n}-X_{n}\left( \frac{w}{\sqrt{n}} +\beta \right) \right]^{\prime }{\Sigma} (\hat{\rho}_{n})\left[ Y_{n}-X_{n}\left( \frac{w}{\sqrt{n}}+\beta \right) \right] \\ &&+\lambda_{n}\sum\limits_{j = 1}^{p}\left\vert \frac{w_{j}}{\sqrt{n}}+\beta_{j}\right\vert \\ &=&\tilde{V}_{n}(w) \\ && \end{array} $$

is minimized at \(\sqrt {n}\left (\hat {\beta }_{L}(\hat {\rho }_{n})-\beta \right ) \). The same is true for

$$\begin{array}{@{}rcl@{}} V_{n}(w) &=&\tilde{V}_{n}(w)-(Y_{n}-X_{n}\beta )^{\prime }{\Sigma} (\hat{\rho} _{n})(Y_{n}-X_{n}\beta )-\lambda_{n}\sum\limits_{j = 1}^{p}|\beta_{j}|. \\ && \end{array} $$

It follows that

$$\lambda_{n}\sum\limits_{j = 1}^{p}\left[ \left\vert \frac{w_{j}}{\sqrt{n}}+\beta_{j}\right\vert -|\beta_{j}|\right] \rightarrow \lambda_{0}\sum\limits_{j = 1}^{p}[w_{j}sgn(\beta_{j})I(\beta_{j}\neq 0)+|w_{j}|I(\beta_{j}= 0)]. $$

Also, define

$$\begin{array}{@{}rcl@{}} {\Omega}_{n}(w) &=&\left( Y_{n}-X_{n}\frac{w}{\sqrt{n}}-X_{n}\beta \right)^{\prime }{\Sigma} (\hat{\rho}_{n})\left( Y_{n}-X_{n}\frac{w}{\sqrt{n}} -X_{n}\beta \right) - \\ &&(Y_{n}-X_{n}\beta )^{\prime }{\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\beta ) \\ &=&{\Omega}_{n}(w)-{\Omega}_{1}(w)+{\Omega}_{1}(w), \end{array} $$

where

$${\Omega}_{1}(w)=\left( Y_{n}-X_{n}\frac{w}{\sqrt{n}}-X_{n}\beta \right)^{\prime }{\Sigma} (\rho )\left( Y_{n}-X_{n}\frac{w}{\sqrt{n}}-X_{n}\beta \right) -\epsilon_{n}^{\prime }\epsilon _{n}. $$

Easy to see that

$$\begin{array}{@{}rcl@{}} {\Omega}_{1}(w) &=&\left[ \epsilon_{n}-(I-\rho M_{n})X_{n}\frac{w}{\sqrt{n}} \right]^{\prime }\left[ \epsilon_{n}-(I-\rho M_{n})X_{n}\frac{w}{\sqrt{n}} \right] -\epsilon_{n}^{\prime }\epsilon_{n} \\ &=&-2\frac{1}{\sqrt{n}}w^{\prime }X_{n}^{\prime }(I-\rho M_{n})^{\prime }\epsilon_{n}+\frac{1}{n}w^{\prime }X_{n}^{\prime }(I-\rho M_{n})^{\prime }(I-\rho M_{n})X_{n}w \\ \rightarrow_{D} &&-2w^{\prime }U+w^{\prime }C(\rho )w \end{array} $$

where UN(0,σ2C(ρ)). Also

$$\begin{array}{@{}rcl@{}} {\Omega}_{n}(w)-{\Omega}_{1}(w) &=&\frac{-2}{\sqrt{n}}\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\hat{\rho}_{n})X_{n}w+\frac{1}{n} w^{\prime }X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})X_{n}w \\ &&+\frac{2}{\sqrt{n}}\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}w-\frac{1}{n} w^{\prime }X_{n}^{\prime }{\Sigma} (\rho )X_{n}w \\ &=&\frac{2}{\sqrt{n}}\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}[(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})-(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}]X_{n}w \\ &&-\frac{1}{n}w^{\prime }X_{n}^{\prime }[(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})-(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}]X_{n}w \\ \rightarrow_{p} &&0 \end{array} $$

where we use the consistency of \(\hat {\rho }_{n}\) in the proof above. Thus Vn(w) →DV (w), and combined with the fact that Vn is convex and V has a unique minimum, it follows from Geyer (1996) that

$$\arg \min (V_{n})=\sqrt{n}\left[ \hat{\beta}_{L}(\hat{\rho}_{n})-\beta \right] \rightarrow_{D}\arg \min (V(w)). $$

Proof Proof of Proposition 1.

By the definition of estimator in the second estimation step,

$$\hat{\beta}_{L}(\hat{\rho}_{n})=\arg \min_{\phi }[(Y_{n}-X_{n}\phi ){\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\phi )]+\lambda_{n}||\phi ||_{1}, $$

where the estimator is the minimizer of the penalized least square when the true spatial parameter ρ is replaced by its consistent estimator \(\hat { \rho }_{n}\). Let φ = ϕβ, which is equivalent to \(\frac {w}{ \sqrt {n}}\) in the proof of Theorem 1. The following proof is similar to that of the proof of Theorem 1. Define

$$\begin{array}{@{}rcl@{}} D_{n}(\varphi ) &=&[(Y_{n}-X_{n}(\varphi +\beta ))^{\prime }{\Sigma} (\hat{\rho }_{n})(Y_{n}-X_{n}(\varphi +\beta ))]+\lambda_{n}||\varphi +\beta ||_{1} \\ &&-(Y_{n}-X_{n}\beta )^{\prime }{\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\beta ) \end{array} $$

Then

$$\begin{array}{@{}rcl@{}} \hat{\varphi} &=&\hat{\beta}_{L}(\hat{\rho}_{n})-\beta \\ &=&\arg \min_{\varphi }D_{n}(\varphi ). \end{array} $$

Separate Dn(φ) into two parts, Dn1(φ) and Dn2(φ). Let

$$\begin{array}{@{}rcl@{}} D_{n1}(\varphi ) &=&[(Y_{n}-X_{n}(\varphi +\beta ))^{\prime }{\Sigma} (\hat{ \rho}_{n})(Y_{n}-X_{n}(\varphi +\beta ))] \\ &&-(Y_{n}-X_{n}\beta )^{\prime }{\Sigma} (\hat{\rho}_{n})(Y_{n}-X_{n}\beta ) \\ &=&[(I-\hat{\rho}_{n}M_{n})((I-\rho M_{n})^{-1}\epsilon_{n}-X_{n}\varphi )]^{\prime }[(I-\hat{\rho}_{n}M_{n})((I-\rho M_{n})^{-1}\epsilon_{n}-X_{n}\varphi )] \\ &&-\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(I-\hat{\rho} _{n}M_{n}^{\prime })(I-\hat{\rho}_{n}M_{n})(I-\rho M_{n})^{-1}\epsilon_{n} \\ &=&-2\varphi^{\prime }X_{n}^{\prime }(I-\hat{\rho}_{n}M_{n})^{\prime }(I- \hat{\rho}_{n}M_{n})(I-\rho M_{n})^{-1}\epsilon_{n} \\ &&+\varphi^{\prime }X_{n}^{\prime }(I-\hat{\rho}_{n}M_{n}^{\prime })(I-\hat{ \rho}M_{n})X_{n}\varphi \\ &=&-2(\sqrt{n}\varphi )^{\prime }W^{n}+(\sqrt{n}\varphi )^{\prime }C^{n}(\hat{\rho}_{n})(\sqrt{n}\varphi ) \end{array} $$

where

$$W^{n}=W^{n}(\hat{\rho}_{n})=X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})(I-\rho M_{n})^{-1}\epsilon_{n}/\sqrt{n}, $$

Differentiate Dn(φ) w.r.t. φ, we have

$$\frac{dD_{n1}(\varphi )}{d\varphi }=-2\sqrt{n}W^{n}+ 2nC^{n}(\hat{\rho} _{n})\varphi . $$

Note here that both \(\hat {\varphi }(1)\) and Wn(1) are vectors of dimension p × 1. Let \(\hat {\varphi }(1)\), Wn(1) and \(\hat {\varphi } (2)\), Wn(2) denote the first q and last pq entries of \(\hat {\varphi }\) and Wn respectively. Then by definition:

$$\{sign(\hat{\beta}_{Lj}(\hat{\rho}_{n}))=sign(\beta_{j}),for j = 1,2,{\cdots} ,q.\}\!\supseteq\! \{sign(\beta (1))\hat{\varphi}(1)\!>\!-|\beta (1)|\}. $$

Hence if there exists \(\hat {\varphi }\) such that

$$C_{11}^{n}(\hat{\rho}_{n})(\sqrt{n}\hat{\varphi}(1))-W^{n}(1)=-\frac{\lambda_{n}}{2\sqrt{n}}sign(\beta (1)), $$
$$|\hat{\varphi}(1)|<|\beta (1)|, $$
$$-\frac{\lambda_{n}}{2\sqrt{n}}\mathbf{1}\leqslant C_{21}^{n}(\hat{\rho}_{n})(\sqrt{n}\hat{\varphi}(1))-W^{n}(2)\leqslant \frac{\lambda_{n}}{2\sqrt{ n}}\mathbf{1}, $$

then by Lemma 1 and the uniqueness of LASSO solution, \(sign(\hat {\beta }_{L}(\hat {\rho }_{n})(1))=sign(\beta (1))\) and \(\hat {\beta }_{L}(\hat {\rho } _{n})(2)=\beta (2)= 0\).

And the existence of such \(\hat {\varphi }\) is implied by

$$ |(C_{11}^{n}(\hat{\rho}_{n}))^{-1}W^{n}(1)|<\sqrt{n}(|\beta (1)|-\frac{ \lambda_{n}}{2n}|(C_{11}^{n}(\hat{\rho}_{n}))^{-1}sign(\beta (1)|), $$
(A.1)
$$ |C_{21}^{n}(\hat{\rho}_{n})(C_{11}^{n}(\hat{\rho}_{n}))^{-1}W^{n}(1)-W^{n}(2)|\leqslant \frac{\lambda_{n}}{2\sqrt{n}}\left( \mathbf{1}-|C_{21}^{n}(\hat{\rho}_{n})\left( C_{11}^{n}(\hat{\rho}_{n})\right)^{-1}sign(\beta (1))|\right) $$
(A.2)

here (A.1) coincides with An and (A.2) contains Bn. The result for Proposition 1 follows. □

Proof Proof of Theorem 2.

From Proposition 1, we have

$$P(\hat{\beta}_{L}(\hat{\rho}_{n};\lambda )=_{s}\beta )\geqslant P(A_{n}\cap B_{n}). $$

Thus,

$$\begin{array}{@{}rcl@{}} P(A_{n}\cap B_{n}) &\geqslant &1-P({A_{n}^{c}})-P({B_{n}^{c}}) \\ &\geqslant &1-\sum\limits_{i = 1}^{q}P(|{z_{i}^{n}}|\geqslant \sqrt{n}(|{\beta_{i}^{n}}|- \frac{\lambda_{n}}{2n}{b_{i}^{n}})-\sum\limits_{i = 1}^{p-q}P(|{\zeta_{i}^{n}}|>\frac{ \lambda_{n}}{2\sqrt{n}}\eta_{i}). \end{array} $$

where \(z^{n}=({z_{1}^{n}},{\cdots } ,{z_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}W^{n}(1)\), \(\zeta ^{n}=({\zeta _{1}^{n}},{\cdots } ,\zeta _{p-q}^{n})^{\prime }=C_{21}^{n}\)\((C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)\) and \(b^{n}=({b_{1}^{n}},{\cdots } ,{b_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}sign(\beta (1))\).

Since \(\hat {\rho }_{n}\) is a consistent estimator of ρ, similar to the proof of Theorem 1, and under the regularity conditions in Assumption 6, we have

$$(C_{11}^{n})^{-1}W^{n}(1)\rightarrow_{D}N(0,C_{11}^{-1}(\rho )\sigma^{2}) $$

This is because

$$\begin{array}{@{}rcl@{}} C^{n} &=&\frac{1}{n}X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})X_{n} \\ &=&\frac{1}{n}X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})X_{n}-\frac{1}{n} X_{n}^{\prime }{\Sigma} (\rho )X_{n}+\frac{1}{n}X_{n}^{\prime }{\Sigma} (\rho )X_{n} \\ &=&\frac{1}{n}X_{n}^{\prime }[(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}-(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})]X_{n}+\frac{1}{n} X_{n}^{\prime }{\Sigma} (\rho )X_{n} \\ \rightarrow_{p} &&C \end{array} $$

The final step follows from Assumption 3 and 6, together with the consistency of \(\hat {\rho }_{n}\). Thus, \((C_{11}^{n}(\hat {\rho }_{n}))^{-1}\rightarrow _{p}(C_{11}(\rho ))^{-1}\). Similarily,

$$\begin{array}{@{}rcl@{}} &&X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})(I-\rho M_{n})^{-1}\epsilon_{n}/ \sqrt{n} \\ &=&X_{n}^{\prime }[(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}-(\hat{ \rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})](I-\rho M_{n})^{-1}\epsilon_{n}/ \sqrt{n} \\ &&+X_{n}^{\prime }(I-\rho M_{n}^{\prime })\epsilon_{n}/\sqrt{n} \\ &=&o_{p}(1)O_{p}(1)+X_{n}^{\prime }(I-\rho M_{n}^{\prime })\epsilon_{n}/ \sqrt{n} \\ && \end{array} $$

Since \(X_{n}^{\prime }(I-\rho M_{n}^{\prime })\epsilon _{n}/\sqrt {n} \rightarrow _{d}N(0,\sigma ^{2}C(\rho ))\), we have

$$W_{n}=X_{n}^{\prime }{\Sigma} (\hat{\rho}_{n})(I-\rho M_{n})^{-1}\epsilon_{n}/ \sqrt{n}\rightarrow_{d}N(0,\sigma^{2}C(\rho )) $$

Thus Wn(1) →DN(0,σ2C11(ρ)). Applying Slutsky’s theorem, we have

$$z^{n}=(C_{11}^{n})^{-1}W^{n}(1)\rightarrow_{D}N(0,(C_{11}(\rho ))^{-1}\sigma^{2}). $$

Making use of the above result, combined with the fact that

$$C_{21}^{n}(C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)=(C_{21}^{n}(C_{11}^{n})^{-1},-I_{p-q})W_{n} $$

we have

$$\zeta^{n}=C_{21}^{n}(C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)\rightarrow_{d}N(0,C_{22}(\rho )-C_{21}(\rho )C_{11}(\rho )^{-1}C_{12}(\rho )\sigma^{2}). $$

Hence all \({z_{i}^{n}}\)’s and \({\zeta _{i}^{n}}\)’s converge in distribution to Gaussian random variables with mean 0 and finite variance bounded by s2(ρ) for some constant function s(ρ). For t > 0, the Gaussian distribution has its tail probability bounded by

$$1-{\Phi} (t)<t^{-1}e^{-\frac{1}{2}t^{2}} $$

Since λn/n → 0 and \(\lambda _{n}/n^{\frac {1+c}{2} }\geqslant r\) with 0 ≤ c < 1, we have

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{i = 1}^{q}P(|{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n}{b_{i}^{n}}) \\ &\leqslant &\left( 1+o(1)\right) {\sum}_{i = 1}^{q}2\left( 1-{\Phi} \left( \frac{1 }{s(\rho )}n^{\frac{1}{2}}|{\beta_{i}^{n}}|\left( 1+o(1)\right) \right) \right) \\ &=&o\left( s(\rho )e^{\frac{-n^{c}}{s^{2}(\rho )}}\right) \end{array} $$

and

$$\sum\limits_{i = 1}^{p-q}P\left( |{\zeta_{i}^{n}}|\geqslant \frac{\lambda_{n}}{2\sqrt{ n}}\eta_{i}\right) =\left( 1+o(1)\right) \sum\limits_{i = 1}^{p-q}2\left( 1-{\Phi} \left( \frac{1}{s}\frac{\lambda_{n}}{2\sqrt{n}}\eta_{i}\right) \right) =o\left( s(\rho )e^{\frac{-n^{c}}{s^{2}(\rho )}}\right) . $$

Theorem 2 follows.□

Proof Proof of Proposition 2.

$$\begin{array}{@{}rcl@{}} &&P\left( \max\limits_{1\leqslant j\leqslant p}2|\epsilon_{n}^{\prime }T^{(j)}|/n>\lambda_{0}\right) \\ &=&P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}+\frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\hat{\rho}_{n})X_{n}^{(j)}}{n}\right. \right. \\ &&\left. \left. -\frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n} \right\vert >\frac{\lambda_{0}}{2}\right) \\ &\leqslant &P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}\right\vert \right. \\ &&\left. +\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\hat{\rho}_{n})X_{n}^{(j)}-\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\rho )X_{n}^{(j)}}{n}\right\vert >\frac{\lambda_{0}}{2}\right) \end{array} $$

Let \(r=\sigma \sqrt {\frac {\log 2p}{n}}\), and denote

$$A=\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\hat{\rho}_{n})X_{n}^{(j)}-\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}{\Sigma} (\rho )X_{n}^{(j)}}{n} \right\vert $$

Then

$$\begin{array}{@{}rcl@{}} &&P(A>r) \\ &=&P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}X_{n}^{(j)}}{n}\right. \right. \\ &&\left. \left. -\frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})X_{n}^{(j)}}{n}\right\vert >r\right) \\ &\leqslant &P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}X_{n}^{(j)}}{n}\right\vert \right. \\ &&\left. +\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})X_{n}^{(j)}}{n}\right\vert >r\right) . \end{array} $$

Further define

$$A1=\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}^{2}-\rho^{2})M_{n}^{\prime }M_{n}X_{n}^{(j)}}{n}\right\vert \text{ \ \ and} $$
$$A2=\max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(\hat{\rho}_{n}-\rho )(M_{n}^{\prime }+M_{n})X_{n}^{(j)}}{n}\right\vert ; $$

therefore,

$$\begin{array}{@{}rcl@{}} P(A>r) &\leqslant &P(A1+A2>r) \\ &\leqslant &P\left( A1>\frac{r}{2}\right) +P\left( A2>\frac{r}{2}\right) , \end{array} $$

where the final inequality holds because, if {A1 + A2 > r}, at least one of {A1 > r/2} and {A2 > r/2} necessarily need to hold. Since \(\hat {\rho }_{n}\) is a consistent estimator of ρ, that is, \(\hat {\rho }_{n}\rightarrow _{p}\rho \), we have, for all t > 0, defining \(c=\frac {1}{2}\exp (-\frac {t^{2}}{2})\), when n is large enough,

$$P(|\hat{\rho}_{n}-\rho |>c)<c $$

and

$$P(|\hat{\rho}_{n}^{2}-\rho^{2}|>c)<c. $$

Then, it is easy to see that

$$\begin{array}{@{}rcl@{}} P\left( A1>\frac{r}{2}\right) &=&P\left( \frac{\left\{ \max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|\right\} |\hat{\rho}_{n}^{2}-\rho^{2}|}{n}>\frac{r}{2}\right) \\ &=&P\left( \left\{ \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|\right\} \left\vert \hat{\rho}_{n}^{2}-\rho^{2}\right\vert /n>\frac{r}{2}\right. \\ &&\left. \bigcap |\hat{\rho}_{n}^{2}-\rho^{2}|>c\right) \\ &&+P\left( \left\{ \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|\right\} \left\vert \hat{\rho}_{n}^{2}-\rho^{2}\right\vert /n>\frac{r}{2}\right. \\ &&\left. \bigcap |\hat{\rho}_{n}^{2}-\rho^{2}|\leq c\right) \\ &\leqslant &c+P\left( \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|>\frac{rn}{2c} \right) \end{array} $$

and

$$\begin{array}{@{}rcl@{}} P\left( A2>\frac{r}{2}\right) &=&P\left( \frac{\left\{ \max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|\right\} \left\vert \hat{\rho} _{n}-\rho \right\vert }{n}>\frac{r}{2}\right) \\ &=&P\left( \left\{ \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|\right\} \left\vert \hat{\rho}_{n}-\rho \right\vert /n>\frac{r}{2}\right. \\ &&\left. \bigcap \left\vert \hat{\rho}_{n}-\rho \right\vert >c\right) \\ &&+P\left( \left\{ \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|\right\} \left\vert \hat{\rho}_{n}-\rho \right\vert /n>\frac{r}{2}\right. \\ &&\left. \bigcap \left\vert \hat{\rho}_{n}-\rho \right\vert \leq c\right) \\ &\leqslant &c+P\left( \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|>\frac{rn}{ 2c}\right) \\ && \end{array} $$

Next, we need the tail probability of \(\max _{1\leqslant j\leqslant p}|\epsilon _{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|\) and \(\max _{1\leqslant j\leqslant p}|\epsilon _{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|\). However, note that in our case, we do not assume Gaussian distribution for the error 𝜖n, instead, we only have zero mean and finite second moment assumption (Assumption ??). Thus, we use the moment inequality derived from the Nemirovski’s inequality:

$$E\left( \max\limits_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }U^{(j)}|\right)^{2}\leqslant 8\log(2p)\sum\limits_{i = 1}^{n}\left( \max\limits_{1\leqslant j\leqslant p}|U_{i}^{(j)}|\right) ^{2}E{\epsilon_{i}^{2}} $$

for any design matrix U, with U(j) as its j th column. Based on the assumption, the row and column sums of Mn and (IρMn)− 1 are bounded uniformly in absolute value and each element of Xn are nonstochastic and uniformly bounded in absolute value. Also, we know that, if An and Bn are matrices that are conformable for multiplication with row and column sums uniformly bounded in absolute value, then the row and column sums of AnBn are also uniformly bounded in absolute value. Further, this result follows to 3 or more matrices.

Thus, the row and column sums of IρMn, \((I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}\) and \((I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})\) are all bounded uniformly in absolute value. So every element in matrices \((I-\rho M_{n})X_{n}^{(j)}\), \((I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}\) and \((I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}\) are bounded, and denote the common bound for all of them as κB.

Then, we have

$$\begin{array}{@{}rcl@{}} P\left( A1>\frac{r}{2}\right) &\leqslant &c+\frac{E[\max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}M_{n}^{\prime }M_{n}X_{n}^{(j)}|]^{2}}{(rn/2c)^{2}} \\ &\leqslant &c+\frac{8(2c)^{2}\log(2p)\sigma^{2}\kappa_{B}}{nr^{2}}, \end{array} $$

and similarly,

$$\begin{array}{@{}rcl@{}} P\left( A2>\frac{r}{2}\right) &\leqslant &c+\frac{E[\max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n}^{\prime })^{-1}(M_{n}^{\prime }+M_{n})X_{n}^{(j)}|]^{2}}{(rn/2c)^{2}} \\ &\leqslant &c+\frac{8(2c)^{2}\log(2p)\sigma^{2}\kappa_{B}}{nr^{2}}. \end{array} $$

As a result,

$$\begin{array}{@{}rcl@{}} P(A>r) &\leqslant &P\left( A1>\frac{r}{2}\right) +P\left( A2>\frac{r}{2} \right) \\ &\leqslant &2c+\frac{(2c)^{2}\log(2p)\sigma^{2}\kappa_{B0}}{nr^{2}} \end{array} $$

Substituting the above probability bounds, we have

$$\begin{array}{@{}rcl@{}} &&P\left( \max\limits_{1\leqslant j\leqslant p}2|\epsilon_{n}^{\prime }T^{(j)}|/n>\lambda_{0}\right) \\ &\leqslant &P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}\right\vert +A>\frac{\lambda_{0} }{2}\right) \\ &\leqslant &P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}\right\vert +A>\frac{\lambda_{0} }{2}\bigcap A>r\right) \\ &&+P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \frac{\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}}{n}\right\vert +A>\frac{\lambda_{0} }{2}\bigcap A\leqslant r\right) \end{array} $$
$$\begin{array}{@{}rcl@{}} &\leqslant &2c+\frac{(2c)^{2}\log(2p)\sigma^{2}\kappa_{B0}}{nr^{2}} +P\left( \max\limits_{1\leqslant j\leqslant p}\left\vert \epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}\right\vert >n\left( \frac{\lambda_{0}}{2} -r\right) \right) \\ &\leqslant &2c+\frac{(2c)^{2}\log(2p)\sigma^{2}\kappa_{B0}}{nr^{2}}+ \frac{E\left( \max_{1\leqslant j\leqslant p}|\epsilon_{n}^{\prime }(I-\rho M_{n})X_{n}^{(j)}|\right)^{2}}{n^{2}(\frac{\lambda_{0}}{2}-r)^{2}} \\ &\leqslant &2c+\frac{(2c)^{2}\log(2p)\sigma^{2}\kappa_{B0}}{nr^{2}}+ \frac{\log(2p)\sigma^{2}\kappa_{B0}}{n(\frac{\lambda_{0}}{2}-r)^{2}} \\ &\leqslant &\exp [-t^{2}/2]+\kappa_{B0}\exp [-t^{2}]+\kappa_{B0}\exp [-t^{2}/2] \\ &\leqslant &K\exp [-t^{2}/2]. \end{array} $$

This then implies the proof of the result:

$$\begin{array}{@{}rcl@{}} P(\Im ) &=&1-P\left( \max\limits_{1\leqslant j\leqslant p}2|\epsilon_{n}^{\prime }T^{(j)}|/n>\lambda_{0}\right) \\ &\geqslant &1-K\exp [-t^{2}/2]. \end{array} $$

Proof Proof of Theorem 3.

On the set I, with \(\lambda _{n}\geqslant 2\lambda _{0}\),

$$\begin{array}{@{}rcl@{}} &&2\frac{\left. \left\Vert (I-\hat{\rho}_{n}M_{n})X_{n}(\hat{\beta}-\beta )\right\Vert_{2}\right.^{2}}{n}+\lambda_{n}||\hat{\beta}-\beta ||_{1} \\ &=&2\frac{\left. \left\Vert (I-\hat{\rho}_{n}M_{n})X_{n}(\hat{\beta}-\beta )\right\Vert_{2}\right.^{2}}{n}+\lambda_{n}||\hat{\beta}_{S_{0}}-\beta_{S_{0}}||_{1}+\lambda_{n}||\hat{\beta}_{{S_{0}^{c}}}||_{1} \\ &\leqslant &4\lambda_{n}||\hat{\beta}_{S_{0}}-\beta_{S_{0}}||_{1} \\ &\leqslant &4\lambda_{n}\sqrt{s_{0}}\left\Vert (I-\hat{\rho}_{n}M_{n})X_{n}(\hat{\beta}-\beta )\right\Vert_{2}/(\sqrt{n}\phi_{0}) \\ &\leqslant &\left. \left\Vert (I-\hat{\rho}_{n}M_{n})X_{n}(\hat{\beta}-\beta )\right\Vert_{2}\right.^{2}/n + 4{\lambda_{n}^{2}}s_{0}/{\phi_{0}^{2}}, \end{array} $$

where the final inequality follows from the fact that

$$4uv\leqslant u^{2}+ 4v^{2}. $$

Further, combining the oracle inequality with the Proposition regarding the set I, the result follows. □

Proof Proof of Theorem 4.

Using the result of Proposition 1 and the line of proof of Theorem 2, we have

$$\begin{array}{@{}rcl@{}} P(A_{n}\cap B_{n}) &\geqslant &1-P({A_{n}^{c}})-P({B_{n}^{c}}) \\ &\geqslant &1-\sum\limits_{i = 1}^{q}P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n}{b_{i}^{n}}\right) -\sum\limits_{i = 1}^{p-q}P\left( |{\zeta_{i}^{n}}|>\frac{\lambda _{n}}{2\sqrt{n}}\eta_{i}\right) . \end{array} $$

where \(z^{n}=({z_{1}^{n}},{\cdots } ,{z_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}W^{n}(1)\), \(\zeta ^{n}=({\zeta _{1}^{n}},{\cdots } ,\zeta _{p-q}^{n})^{\prime }=\)\(C_{21}^{n}(C_{11}^{n})^{-1}W^{n}(1)-W^{n}(2)\) and \(b^{n}=({b_{1}^{n}},{\cdots } ,{b_{q}^{n}})^{\prime }=(C_{11}^{n})^{-1}sign(\beta (1))\).

Replace all the \(\hat {\rho }_{n}\) in the notations above with the true parameter value ρ, and denote these as \({C_{0}^{n}}\), \({W_{0}^{n}}\), \( {z_{0}^{n}}\), \({\zeta _{0}^{n}}\), and \({b_{0}^{n}}\) for simple notation. Then each element in the first term on the right hand side of the above inequality is:

$$\begin{array}{@{}rcl@{}} &&P(|{z_{i}^{n}}|\geqslant \sqrt{n}\left( |\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}\right) \\ &=&P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|>\delta ,|b_{0i}^{n}-{b_{i}^{n}}|>\delta \right) \\ &&+P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|\leqslant \delta ,|b_{0i}^{n}-{b_{i}^{n}}|\leqslant \delta \right) \\ &&+P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|>\delta ,|b_{0i}^{n}-{b_{i}^{n}}|\leqslant \delta \right) \\ &&+P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} {b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|\leqslant \delta ,|b_{0i}^{n}-{b_{i}^{n}}|>\delta \right) \\ &=&A_{1}+A_{2}+A_{3}+A_{4} \end{array} $$

for any δ > 0.

Since \(C^{n}-{C_{0}^{n}}\rightarrow _{p}0,W^{n}-{W_{0}^{n}}\rightarrow _{p}0\), then \(z^{n}-{z_{0}^{n}}=o_{p}(1),\zeta ^{n}-{\zeta _{0}^{n}}=o_{p}(1)\) and \( b^{n}-{b_{0}^{n}}=o_{p}(1)\). Note that here we cannot use \(C=\lim _{n\rightarrow \infty }\frac {1}{n} X_{n}^{\prime }{\Sigma } (\rho )X_{n}\) as defined in Assumption 6, since this may not be nonsingular or maybe not even convergent in the high-dimensional context.

Thus, A1 + A3 + A4 < 3δ, and

$$\begin{array}{@{}rcl@{}} A_{2} &=&P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}(|{\beta_{i}^{n}}|-\frac{ \lambda_{n}}{2n}{b_{i}^{n}}),|z_{0i}^{n}-{z_{i}^{n}}|\leqslant \delta ,|b_{0i}^{n}-{b_{i}^{n}}|\leqslant \delta \right) \\ &\leqslant &P\left( |z_{0i}|\geqslant \sqrt{n}(|{\beta_{i}^{n}}|-\frac{ \lambda_{n}}{2n}(b_{i0}^{n}+\delta ))-\delta \right) . \end{array} $$

Now if we write \({z_{0}^{n}}=H_{A}^{\prime }\epsilon _{n}\), where \(H_{A}^{\prime }=({h_{1}^{a}},{\cdots } ,{h_{q}^{a}})^{\prime }=(C_{11}^{0})^{-1} \frac {1}{\sqrt {n}}\) [(IρMn)Xn] (1), then

$$H_{A}^{\prime }H_{A}=(C_{11}^{0})^{-1}n^{-1}[(I-\rho M_{n})X_{n}](1)^{\prime }[(I-\rho M_{n})X_{n}](1)(C_{11}^{0})^{-1}=(C_{11}^{0})^{-1}. $$

Therefore, \(z_{0i}^{n}=({h_{i}^{a}})^{\prime }\epsilon _{n}\) with

$$ ||{h_{i}^{a}}||_{2}^{2}\leqslant \frac{1}{K_{2}}\forall i = 1,{\cdots} ,q. $$
(A.3)

Similarly,

$$\begin{array}{@{}rcl@{}} &&P\left( |{\zeta_{i}^{n}}|>\frac{\lambda_{n}}{2\sqrt{n}}\eta_{i}\right) \\ &=&P\left( |{\zeta_{i}^{n}}|>\frac{\lambda_{n}}{2\sqrt{n}}\eta_{i},|\zeta_{i}^{n}-\zeta_{0i}|>\delta )+P(|{\zeta_{i}^{n}}|>\frac{\lambda_{n}}{2\sqrt{ n}}\eta_{i},|{\zeta_{i}^{n}}-\zeta_{0i}|\leqslant \delta \right) \\ &\leqslant &\delta +P\left( |\zeta_{0i}|>\frac{\lambda_{n}}{2\sqrt{n}}\eta_{i}-\delta \right) . \end{array} $$

If we write \({\zeta _{0}^{n}}=H_{B}^{\prime }\epsilon _{n}\) where \(H_{B}^{\prime }=({h_{1}^{b}},{\cdots } ,h_{p-q}^{b})^{\prime }=C_{21}^{0}(C_{11}^{0})^{-1}n^{-\frac {1}{2}}[(I-\rho M_{n})X_{n}](1)^{\prime }-n^{-\frac {1}{2}}[(I-\rho M_{n})X_{n}](2)^{\prime }\), then

$$\begin{array}{@{}rcl@{}} H_{B}^{\prime }H_{B} &=&\frac{1}{n}[(I-\rho M_{n})X_{n}](2)^{\prime }\{I-[(I-\rho M_{n})X_{n}](1) \\ &&\{[(I-\rho M_{n})X_{n}](1)^{\prime }[(I-\rho M_{n})X_{n}](1)\}^{-1}[(I-\rho M_{n})X_{n}](1)^{\prime }\} \\ &&[(I-\rho M_{n})X_{n}](2). \end{array} $$

Since I − [(IρMn)Xn](1){[(IρMn)Xn](1)[(IρMn)Xn](1)}− 1[(IρMn)Xn](1) has eigenvalues between 0 and 1, therefore \(\zeta _{0i}^{n}=({h_{i}^{b}})^{\prime }\epsilon _{n}\) with

$$ ||{h_{i}^{b}}||_{2}^{2}\leqslant K_{1}\forall i = 1,{\cdots} ,q. $$
(A.4)

Also note that,

$$ \left\vert \frac{\lambda_{n}}{n}{b_{0}^{n}}\right\vert =\frac{\lambda_{n}}{n} \left\vert (C_{11}^{0})^{-1}sign(\beta (1))\right\vert \leqslant \frac{ \lambda_{n}}{nK_{2}}\left\Vert sign(\beta (1))\right\Vert_{2}=\frac{ \lambda_{n}}{nK_{2}}\sqrt{q} $$
(A.5)

Now given (A.3) and (A.4), it can be shown that \( E({\epsilon _{i}^{n}})^{4}<\infty \) in Assumption 1 implies \( E({z_{i}^{n}})^{4}<\infty \) and \(E({\zeta _{i}^{n}})^{4}<\infty \). In fact, given any constant n-dimensional vector α,

$$E(\alpha^{\prime }\epsilon^{n})^{2k}\leqslant (2k-1)!\left. \left\Vert \alpha \right\Vert_{2}\right.^{2}E({\epsilon_{i}^{n}})^{2k}. $$

For IID errors with bounded 4th moments, we have their tail probability bounded by

$$P(z_{i0}^{n}>t)=O(t^{-4}) $$

Therefore, for \(\lambda _{n}/\sqrt {n}=O(n^{\frac {c_{2}-c_{1}}{2}})\), using (A.5), if we make δ arbitrary small, we have

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{i = 1}^{q}P\left( |{z_{i}^{n}}|\geqslant \sqrt{n}\left( |\beta_{i}|- \frac{\lambda_{n}}{2n}{b_{i}^{n}}\right) \right) \\ &\leqslant &q(3\delta +O(\sqrt{n}(|\beta_{i}|-\frac{\lambda_{n}}{2n} (b_{i0}^{n}+\delta ))-\delta )^{-4}) \\ &=&qO\left( r(\rho )n^{-2c_{2}+ 2c_{1}-2}\right) \\ &=&O\left( r(\rho )n^{-2 + 2c_{2}}\right) , \end{array} $$

where r(ρ) is the bound for the absolute value of the elements in the matrix \((C_{11}^{n}(\rho ))^{-1}\). Likewise,

$$\begin{array}{@{}rcl@{}} &&\sum\limits_{i = 1}^{p-q}P\left( |{\zeta_{i}^{n}}|>\frac{\lambda_{n}}{2\sqrt{n}} \eta_{i}\right) \\ &\leqslant &\delta +(p-q)O\left( \frac{n^{2}}{{\lambda_{n}^{4}}}\right) \\ &=&O\left( \frac{pn^{2}}{{\lambda_{n}^{4}}}\right) \\ &=&o(1) \\ && \end{array} $$

Adding these two terms, Theorem 4 follows.□

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cai, L., Bhattacharjee, A., Calantone, R. et al. Variable Selection with Spatially Autoregressive Errors: A Generalized Moments LASSO Estimator. Sankhya B 81 (Suppl 1), 146–200 (2019). https://doi.org/10.1007/s13571-018-0176-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13571-018-0176-z

Keywords and phrases

AMS (2000) subject classification

Navigation