Skip to main content
Log in

Estimation for partially linear single-index spatial autoregressive model with covariate measurement errors

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

This paper explores the estimators of parameters for a partially linear single-index spatial model which has measurement errors in all variables. We propose an efficient methodology to estimate our model by combining a local-linear smoother based Pseudo-\(\theta \) algorithm, simulation-extrapolation (SIMEX) algorithm, the estimation equation and the estimation method for profile maximum likelihood. Under some regular conditions, we derive the asymptotic properties of the link function and unknown estimators. Some simulations indicate our estimation method performs well. Finally, we apply our method to a real data set of Boston Housing Price. The result shows that our model fits the data set well.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Alexeeff SE, Carroll RJ, Coull B (2016) Spatial measurement error and correction by spatial SIMEX in linear regression models when using predicted air pollution exposures. Biostatistics 17(2):377–389

    Article  MathSciNet  Google Scholar 

  • Anselin L (1988) Spatial econometrics: methods and models. Kluwer Academic, Dordrecht

    Book  Google Scholar 

  • Anselin L, Bera AK (1998) Spatial dependence in linear regression models with an introduction to spatial econometrics. Handbook of applied economic statistics, pp 237–289

  • Apanasovich TV, Carroll RJ, Maity A (2009) SIMEX and standard error estimation in semiparametric measurement error models. Electron J Stat 3:318–348

    Article  MathSciNet  Google Scholar 

  • Basile R, Gress B (2005) Semiparametric spatial auto-covariance models of regional growth behavior in Europe. Région Dév 21:93–118

    Google Scholar 

  • Can A (1992) Specification and estimation of hedonic housing price models. Reg Sci Urban Econ 22(3):453–474

    Article  Google Scholar 

  • Carroll RJ, Lombard F, Küchenhoff H, Stefanski L (1994) Asymptotics for the SIMEX estimator in structural measurement error models. J Am Stat Assoc 91:242–250

    Article  Google Scholar 

  • Carroll RJ, Fan J, Gijbels I, Wand MP (1997) Generalized partially linear single-index models. J Am Stat Assoc 92(438):477–489

    Article  MathSciNet  Google Scholar 

  • Carroll RJ, Maca JD, Ruppert D (1999) Nonparametric regression in the presence of measurement error. Biometrika 86(3):541–554

    Article  MathSciNet  Google Scholar 

  • Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Chapman and Hall, New York

    Book  Google Scholar 

  • Chang Z, Xue L, Zhu L (2010) On an asymptotically more efficient estimation of the single-index model. J Multivar Anal 101(8):1898–1901

    Article  MathSciNet  Google Scholar 

  • Cheng S, Chen J (2021) Estimation of partially linear single-index spatial autoregressive model. Stat Pap 62(4):495–531

    Article  MathSciNet  Google Scholar 

  • Cook JR, Stefanski LA (1994) Simulation-extrapolation estimation in parametric measurement error models. J Am Stat Assoc 89(428):1314–1328

    Article  Google Scholar 

  • Cressie NAC (1993) Statistics for spatial data. Wiley, New York

    Book  Google Scholar 

  • Delaigle A, Hall P (2008) Using SIMEX for smoothing parameter choice in errors-in-variables problems. J Am Stat Assoc 130:280–287

    Article  MathSciNet  Google Scholar 

  • Goodchild M, Gopal S (1989) The accuracy of spatial databases. Comput Geosci-UK 17(4):593–594

    Google Scholar 

  • Härdle W (1991) Applied nonparametric regression. Cambridge University Press, Cambridge

    Google Scholar 

  • Huang Z, Zhao X (2019) Statistical estimation for a partially linear single-index model with errors in all variables. Commun Stat-Theor M 48(5):1136–1148

    Article  MathSciNet  Google Scholar 

  • Huang J, Wang D (2021) Statistical inference for single-index-driven varying coefficient time series model with explanatory variables. Proc Math Sci 131:21

    Article  MathSciNet  Google Scholar 

  • Huque MH, Bondell HD, Carroll RJ, Ryan LM (2016) Spatial regression with covariate measurement error: a semi-parametric approach. Biometrics 72(3):678–686

    Article  MathSciNet  Google Scholar 

  • Kelejian HH, Prucha IR (2009) Specification and estimation of spatial autoregressive models with autoregressive and heteroscedastic disturbances. J Econ 157(1):53–67

    Google Scholar 

  • Lee LF (2004) Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica 72(6):1899–1925

    Article  MathSciNet  Google Scholar 

  • Li T, Mei C (2016) Statistical inference on the parametric component in partially linear spatial autoregressive models. Commun Stat-Simul C 45(6):1991–2006

    Article  MathSciNet  Google Scholar 

  • Liang H (2004) Generalized partially linear mixed-effects models incorporating mismeasured covariates. Ann Inst Stat Math 61:27–46

    Article  MathSciNet  Google Scholar 

  • Liang H, Wang N (2004) Partially linear single-index measurement error models. Stat Sin 15:99–116

    MathSciNet  Google Scholar 

  • Liang H, Ren H (2005) Generalized partially linear measurement error models. J Comput Graph Stat 14(1):237–250

    Article  MathSciNet  Google Scholar 

  • Liang H, Härdle W, Carroll RJ (1999) Estimation in a semiparametric partially linear errors-in-variables model. Ann Stat 27(5):1519–1535

    Article  MathSciNet  Google Scholar 

  • Liu S, Trenkler G, Kollo T et al (2023) Professor Heinz Neudecker and matrix differential calculus. Stat Pap. https://doi.org/10.1007/s00362-023-01499-w

    Article  Google Scholar 

  • Lin X, Carroll RJ (2000) Nonparametric function estimation for clustered data when the predictor is measured without/with error. J Am Stat Assoc 95:520–534

    Article  MathSciNet  Google Scholar 

  • Lv Y, Zhang R, Zhao W, Liu J (2015) Quantile regression and variable selection of partial linear single-index model. Ann Inst Stat Math 67:375–409

    Article  MathSciNet  Google Scholar 

  • Mack YP, Silverman BW (1982) Weak and strong uniform consistency of kernel regression estimates. Z Wahrsch Verw Gebiete 61(3):405–415

    Article  MathSciNet  Google Scholar 

  • Magnus JR, Neudecker H (2019) matrix differential calculus. Wiley. https://doi.org/10.1002/9781119541219.fmatter

    Article  Google Scholar 

  • Staudenmayer J, Ruppert D (2004) Local polynomial regression and simulation-extrapolation. J R Stat Soc B 66:17–30

    Article  MathSciNet  Google Scholar 

  • Su L, Jin S (2010) Profile quasi-maximum likelihood estimation of partially linear spatial autoregressive models. J Econometrics 157(1):18–33

    Article  MathSciNet  Google Scholar 

  • Yang Y, Tong T, Li G (2019) SIMEX estimation for single-index model with covariate measurement error. ASTA-ADV Stat Anal 103:137–161

    Article  MathSciNet  Google Scholar 

  • Wei C, Guo S, Zhai S (2017) Statistical inference of partially linear varying coefficient spatial autoregressive models. Econ Model 64:553–559

    Article  Google Scholar 

  • Zhu L, Xue L (2006) Empirical likelihood confidence regions in a partially linear single-index model. J R Stat Soc B 68:549–570

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the anonymous reviewers for their serious work and thoughtful suggestions that have helped improve this paper substantially. This work is supported by National Natural Science Foundation of China (Nos. 12271231, 12001229, 11901053).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dehui Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Proof of Theorem 3.1

Firstly, we consider the parameter \(\theta \). Assume \(\theta _0(\lambda )\) and \(\beta _0(\lambda )\) are the true values of the model \({\textrm{E}}[Y-\rho _0 WY-T\theta _0(\lambda )|\beta ^{\textrm{T}}_0(\lambda )Z_b(\lambda )]=g(\beta ^{\textrm{T}}_0(\lambda )Z_b(\lambda ))\). By Liang et al. (1999), we have

$$\begin{aligned} \sqrt{n}(\hat{\theta }_b(\lambda )-\theta _0(\lambda ))=&n^{-1/2}\left\{ {\textrm{E}}[(\xi -{\textrm{E}}(\xi |\beta ^{\textrm{T}}_0(\lambda )X))(\xi -{\textrm{E}}(\xi |\beta ^{\textrm{T}}_0(\lambda )X))^{\textrm{T}}]\right\} ^{-1}\\&\times \sum _{i=1}^{n}\left[ (T_i-{\textrm{E}}(\xi _i|\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda )))(\varepsilon _i-\theta ^{\textrm{T}}_0(\lambda )V_i)+\Sigma _{v}\theta _0(\lambda )\right] \\&+o_P(1)\\ =&n^{-1/2}\left\{ {\textrm{E}}\left[ (\xi -{\textrm{E}}(\xi |\beta ^{\textrm{T}}_0(\lambda )X))(\xi -{\textrm{E}}(\xi |\beta ^{\textrm{T}}_0(\lambda )X))^{\textrm{T}}\right] \right\} ^{-1}\\&\times \sum _{i=1}^{n}\left[ (\xi _i-{\textrm{E}}(\xi _i|\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda )))(\varepsilon _i-\theta ^{\textrm{T}}_0(\lambda )V_i)\right. \\&\left. -(V_iV_i^{\textrm{T}}-\Sigma _{v})\theta _0(\lambda )+V_i\varepsilon _i\right] +o_P(1). \end{aligned}$$

Along with the research result by Liang et al. (1999), we can yield the SIMEX estimator \(\hat{\theta }_{SIMEX}\) variance \(\mathcal {G}_{\Gamma _1}(-1,\Gamma _1)\Sigma (\Gamma _1)\mathcal {G}^{\textrm{T}}_{\Gamma _1}(-1,\Gamma _1)\), where \(\Sigma (\Gamma _1)=D^{-1}(\Gamma _1)s(\Gamma _1)\Omega s^{\textrm{T}}(\Gamma _1)D^{-1}(\Gamma _1)\) (see the proof of \(\beta \) for more definitions).

Next, we show the proof of the index parameter \(\beta \). Assume \(\beta _0(\lambda )\) is the true value of the model \({\textrm{E}}[Y-\rho _0 WY-T\theta _0(\lambda )|\beta ^{\textrm{T}}_0(\lambda )Z_b(\lambda )]=g(\beta ^{\textrm{T}}_0(\lambda )Z_b(\lambda ))\). For each fixed b, we have

$$\begin{aligned} \sqrt{n}({\hat{\beta }}_b(\lambda )-\beta _0(\lambda ))=\sqrt{n}J_{\beta _0^{(r)}(\lambda )}A_n^{-1}(\beta _0(\lambda ),\lambda )B_n(\beta _0(\lambda ),\lambda )+o_{_P}(1), \end{aligned}$$

where

$$\begin{aligned} A_n(\beta _0(\lambda ),\lambda )=\frac{1}{n}\sum _{i=1}^n[g_0'(\lambda ;\beta _0^\textrm{T}(\lambda )Z_{ib}(\lambda ))]^2J_{\beta _0^{(r)}(\lambda )}^\textrm{T}{\widetilde{Z}}_{ib}(\lambda ){\widetilde{Z}}_{ib}^{\textrm{T}}(\lambda )J_{\beta _0^{(r)}(\lambda )}, \end{aligned}$$

and

$$\begin{aligned} B_n(\beta _0(\lambda ),\lambda )=\frac{1}{n}\sum _{i=1}^n\varepsilon _{ib}(\lambda )g_0'(\lambda ;\beta _0^\textrm{T}(\lambda )Z_{ib}(\lambda ))J_{\beta _0^{(r)}(\lambda )}^\textrm{T}{\widetilde{Z}}_{ib}(\lambda ) \end{aligned}$$

with

$$\begin{aligned} \varepsilon _{ib}(\lambda )=Y_i-\rho _0\sum _{j\ne i}w_{ij}Y_j-\theta _0^{\textrm{T}}\xi _i-g_0(\lambda ;\beta _0^\textrm{T}(\lambda )Z_{ib}(\lambda )) \end{aligned}$$

and \({\widetilde{Z}}_{ib}(\lambda )=Z_{ib}(\lambda )-{\textrm{E}}(Z_{ib}(\lambda )\vert \beta _0^\textrm{T}(\lambda )Z_{ib}(\lambda ))\).

Thus,

$$\begin{aligned} \sqrt{n}({\hat{\beta }}(\lambda )-\beta _0(\lambda ))=J_{\beta _0^{(r)}(\lambda )}\mathcal {A}^{-1}(\beta _0(\lambda ),\lambda )n^{-\frac{1}{2}}\sum _{i=1}^n\eta _{iB}(\beta _0(\lambda ),\lambda )+o_{_P}(1),\quad \end{aligned}$$
(16)

where

$$\begin{aligned} \eta _{iB}(\beta _0(\lambda ),\lambda )=\frac{1}{B}\sum _{b=1}^{B}\varepsilon _{ib}(\lambda )g_0'(\lambda ;\beta _0^\textrm{T}(\lambda )Z_{ib}(\lambda ))J_{\beta _0^{(r)}(\lambda )}^\textrm{T}{\widetilde{Z}}_{ib}(\lambda ), \end{aligned}$$

and

$$\begin{aligned} \mathcal {A}(\beta _0(\lambda ),\lambda )={\textrm{E}}\Big \{[g_0'(\lambda ;\beta _0^\textrm{T}(\lambda )Z_{ib}(\lambda )]^2J_{\beta _0^{(r)}(\lambda )}^\textrm{T}{\widetilde{Z}}_{ib}(\lambda ){\widetilde{Z}}_{ib}^{\textrm{T}}(\lambda )J_{\beta _0^{(r)}(\lambda )}\Big \}. \end{aligned}$$

Calculating \(\hat{\beta }(\lambda )\) on a grid of values \(\Lambda =\{\lambda _1,\ldots ,\lambda _M\}\). Denote \(\hat{\beta }(\Lambda )\) as the estimators vec\(\{\hat{\beta }(\lambda ),\lambda \in \Lambda \}\). Then, by Equation(16), we obtain \(\sqrt{n}({\hat{\beta }}(\Lambda )-\beta _0(\Lambda ))\) is asymptotically multivariate normal (0\(\Omega \)) with

$$\begin{aligned} \Omega =\mathcal {J}\{\beta _0(\Lambda ),\Lambda \}\mathcal {A}_{11}^{-1}\{\beta _0(\Lambda ),\Lambda \}C_{11}\{\beta _0(\Lambda ),\Lambda \}\{\mathcal {A}_{11}^{-1}\{\beta _0(\Lambda ),\Lambda \}\}^{\textrm{T}}\mathcal {J}^{\textrm{T}}\{\beta _0(\Lambda ),\Lambda \}. \end{aligned}$$

\(\beta _0(\lambda )=\mathcal {G}(\lambda ,\Gamma _2)\) is denoted with parameter vector \(\Gamma _2\) and fits \(\mathcal {G}(\lambda ,\Gamma _2),~\lambda \in \Lambda \) via the sample \( \{(\lambda ,\hat{\beta }(\lambda ))|\lambda \in \Lambda \}\). \(\hat{\Gamma }_2\) in the extrapolation step is obtained by minimizing \(\textrm{Res}(\Gamma _2)\textrm{Res}^{\textrm{T}}(\Gamma _2)\). The estimating equation for \(\hat{\Gamma }_2\) is \(s(\Gamma _2)\textrm{Res}(\Gamma _2)=0\), via Taylor expansion, we have

$$\begin{aligned} \sqrt{n}s(\Gamma _2)\textrm{Res}(\Gamma _2)+\sqrt{n}\left[ \frac{\partial s(\Gamma ^*)}{\partial \Gamma _2}\mathrm{\textrm{Res}}(\Gamma ^*)+D(\Gamma ^*)\right] (\hat{\Gamma }_2-\Gamma _2)=0, \end{aligned}$$

where \(\Gamma ^*\) lies between \(\Gamma _2\) and \({\hat{\Gamma }}_2\).

Because

$$\begin{aligned} \frac{\partial s(\Gamma ^*)}{\partial \Gamma _2}\textrm{Res}(\Gamma ^*)+D(\Gamma ^*)\xrightarrow {P}D(\Gamma _2), \end{aligned}$$

we obtain

$$\begin{aligned} \sqrt{n}(\hat{\Gamma }_2-\Gamma _2)\xrightarrow {L}N(\textbf{0},\Sigma (\Gamma _2)), \end{aligned}$$

where \(\Sigma (\Gamma _2)=D^{-1}(\Gamma _2)s(\Gamma _2)\Omega s^{\textrm{T}}(\Gamma _2)D^{-1}(\Gamma _2)\).

According to Delta method,we get

$$\begin{aligned} \sqrt{n}(\hat{\beta }(\lambda )-\mathcal {G}(\lambda ,\Gamma _2))\xrightarrow {L}N\left( \textbf{0},\frac{\partial \mathcal {G}(\lambda ,\Gamma _2)}{\partial \Gamma _2}\Sigma (\Gamma _2)\frac{\partial \mathcal {G}(\lambda ,\Gamma _2)}{\partial \Gamma _2^{\textrm{T}}}\right) . \end{aligned}$$

Because \({\hat{\beta }}_{_{SIMEX}}={\hat{\beta }}(-1)=\mathcal {G}(-1,\hat{\Gamma }_2)\), we obtain

$$\begin{aligned} \sqrt{n}(\hat{\beta }_{_{SIMEX}}-\beta _0)\xrightarrow {L}N(\textbf{0},\mathcal {G}_{\Gamma _2}(-1,\Gamma _2)\Sigma (\Gamma _2)\mathcal {G}^{\textrm{T}}_{\Gamma _2}(-1,\Gamma _2)). \end{aligned}$$

This completes the proof. \(\square \)

Proof of Theorem 3.2

Given \(\hat{\theta }_{_{SIMEX}}\xrightarrow {P}\theta _0\), \(\hat{\beta }_{_{SIMEX}}\xrightarrow {P}\beta _0\) and \({\hat{\rho }}\xrightarrow {P}\rho _0\), we have

$$\begin{aligned} \begin{aligned}&{\hat{g}}(\lambda ;t_0)-g_0(\lambda ;t_0)\\&\quad =\frac{1}{n}f^{-1}_{\lambda }(t_0)\sum _{i=1}^n\left\{ \frac{1}{B}\sum _{b=1}^{B}\left[ Y_i-\rho _0\sum _{j\ne i}w_{ij}Y_j-\theta ^{\textrm{T}}T_i-g_0(\lambda ;\beta _0^\textrm{T}Z_{ib}(\lambda ))\right] \right. \\&\qquad \left. K_{h_2}(\beta _0^\textrm{T}Z_{ib}(\lambda )-t_0)\right\} \\&\qquad +\frac{1}{2}{h_2}^2\mu _2g_0''(\lambda ;t_0)+o_{_P}({h_2}^2+(nh_2)^{-1/2}). \end{aligned} \end{aligned}$$
(17)

If \(\lambda =0\), by calculation

$$\begin{aligned} \begin{aligned}&{\hat{g}}(0;t_0)-g_0(0;t_0)\\&\quad =\frac{1}{nf_0(t_0)}\sum _{i=1}^n\left\{ Y_i-\rho _0\sum _{j\ne i}w_{ij}Y_j-\theta ^{\textrm{T}}T_i-g_0(0;\beta _0^\textrm{T}Z_{ib}(\lambda ))K_{h_2}(\beta _0^\textrm{T}Z_{ib}(\lambda )-t_0)\right\} \\&\qquad +\frac{1}{2}{h_2}^2\mu _2g_0''(0;t_0)\\&\qquad +o_{_P}({h_2}^2+(nh_2)^{-\frac{1}{2}}), \end{aligned} \end{aligned}$$

which has mean zero and asymptotic variance

$$\begin{aligned}{}[nh_2f_0(t_0)]^{-1}\textrm{var}(Y-\rho _0WY-\theta _0^{\textrm{T}}T\vert \beta _0^\textrm{T}Z=t_0)v_2. \end{aligned}$$

For \(\lambda >0\), using the similar argument of (A8) in Carroll et al. (1999), we have

$$\begin{aligned} \textrm{var}({\hat{g}}(\lambda ;t_0))=O\left( (nh_2B)^{-1}\right) +O\left( n^{-1}\right) , \end{aligned}$$
(18)

while for \(\lambda =0\),

$$\begin{aligned} \textrm{var}({\hat{g}}(\lambda ;t_0))=O\left( (nh_2)^{-1}\right) . \end{aligned}$$
(19)

If we compare (18) with (19), we note that, for n and B sufficiently large, the latter will be negligible. Hence, we will ignore this variability by treating B as if it was equal to infinity. This makes the analysis of the SIMEX extrapolants easy.

We obtain \(\hat{\mathbb {A}}\) by minimizing \( \sum _{\lambda \in \Lambda }\{{\hat{g}}(\rho _0,\beta _0,\theta _0,\lambda ;t_0)-\mathcal {G}(\lambda ,\mathbb {A})\}^2,\) yielding

$$\begin{aligned} \hat{\mathbb {A}}-\mathbb {A}=\left\{ \sum _{\lambda \in \Lambda }\gamma (\lambda ,\mathbb {A})\gamma ^{\textrm{T}}(\lambda ,\mathbb {A})\right\} ^{-1}\sum _{\lambda \in \Lambda }\left\{ {\hat{g}}(\rho _0,\beta _0,\theta _0,\lambda ;t_0)-\mathcal {G}(\lambda ,\mathbb {A})\right\} \gamma (\lambda ,\mathbb {A}).\nonumber \\ \end{aligned}$$
(20)

The rightside of (20) has approximate mean

$$\begin{aligned} \left\{ \sum _{\lambda \in \Lambda }\gamma (\lambda ,\mathbb {A})\gamma ^{\textrm{T}}(\lambda ,\mathbb {A})\right\} ^{-1}\sum _{\lambda \in \Lambda }\frac{1}{2}{h_2}^2\mu _2g_0''(\lambda ;t_0)\gamma (\lambda ,\mathbb {A}), \end{aligned}$$

and because B is large, its approximate variance is

$$\begin{aligned} {[}nh_2f_0(t_0)]^{-1}&v_2\textrm{var}(Y-\rho _0WY-T^{\textrm{T}}\theta _0\vert \beta _0^\textrm{T}Z=t_0)\\&\left\{ \sum _{\lambda \in \Lambda }\gamma (\lambda ,\mathbb {A})\gamma ^{\textrm{T}}(\lambda ,\mathbb {A})\right\} ^{-1}D\left\{ \sum _{\lambda \in \Lambda }\gamma (\lambda ,\mathbb {A})\gamma ^{\textrm{T}}(\lambda ,\mathbb {A})\right\} ^{-1}, \end{aligned}$$

where \(D=\gamma (0,\mathbb {A})\gamma ^{\textrm{T}}(0,\mathbb {A})\). Due to \({\hat{g}}_{_{SIMEX}}(t_0)=\mathcal {G}(-1,\hat{\mathbb {A}})\), we have its asymptotic bias and variance are

$$\begin{aligned} \textrm{bias}\{\hat{g}_{_{SIMEX}}(t_0)\}&=\frac{1}{2} C(\Lambda ,\mathbb {A})\sum _{\lambda \in \Lambda }h^2_2\mu _2g_0''(\lambda ;t_0)\gamma (\lambda ,\mathbb {A})~~and\\ \textrm{var}\{\hat{g}_{_{SIMEX}}(t_0)\}&=[nh_2f_0(t_0)]^{-1}v_2\textrm{var}(Y-\rho _0 WY-T^{\textrm{T}}\theta _0\vert \beta _0^{\textrm{T}}Z=t_0)\\&\quad C(\Lambda ,\mathbb {A})DC^{\textrm{T}}(\Lambda ,\mathbb {A}). \end{aligned}$$

This completes the proof. \(\square \)

Proof of Theorem 3.3

Firstly, we prove \(\hat{\rho }\xrightarrow {P}\rho _0\). We adopt the idea of Lee (2004) and Su and Jin (2010) to prove the theorem. The major difference lies in the appearance of nonparametric objects in our setting. It suffices to show

$$\begin{aligned} \frac{1}{n}({\textrm{ln}}L(\rho )-Q(\rho ))=o_{_P}(1) ~~uniformly~on~\triangle , \end{aligned}$$
(21)

and

$$\begin{aligned} \mathop {\lim \sup }\limits _{n\rightarrow \infty }\mathop {\max }\limits _{\rho \in N^c_{\varepsilon }(\rho _0)}\frac{1}{n}(Q(\rho )-Q(\rho _0))<0~~for~any~~\varepsilon >0, \end{aligned}$$
(22)

where \(N^c_{\varepsilon }(\rho _0)\) is the complement of an open neighborhood of \(\rho _0\) on \(\bigtriangleup \) of diameter \(\varepsilon \).

By (8) and (31), we have

$$\begin{aligned} \frac{1}{n}({\textrm{ln}}L(\rho )-Q(\rho ))=-\frac{1}{2}({\textrm{ln}}\hat{\sigma }^{2}(\rho )-{\textrm{ln}}\sigma ^{*2}(\rho )). \end{aligned}$$

To show (21), it is sufficient to show \(\hat{\sigma }^2(\rho )-\sigma ^{*2}(\rho )=o_{_P}(1)\) uniformly on \(\bigtriangleup \).

By (8) and (30), we have

$$\begin{aligned} \hat{\sigma }^2(\rho )-\sigma ^{*2}(\rho )= & {} ~2H_1(\rho )+H_2(\rho )-2H_3(\rho )+H_4(\rho )-2H_5(\rho )\nonumber \\{} & {} -\frac{\sigma _0^2}{n}\textrm{tr}\left\{ (A(\rho )A^{-1}(\rho _0))^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)A(\rho )A^{-1}(\rho _0)\right\} \nonumber \\{} & {} -\frac{1}{n}{\textrm{E}}[V\theta ]^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)V\theta , \end{aligned}$$
(23)

where \(H_i(\rho )(i=1,2,3)\) are defined in Lemma 3,

$$\begin{aligned} H_4(\rho )=\frac{1}{n}[V\theta ]^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)V\theta , \end{aligned}$$

and

$$\begin{aligned} H_5(\rho )= \frac{1}{n}[\xi (\theta _0-\theta )]^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)V\theta . \end{aligned}$$

Therefore (21) follows from (23) and Lemma 3.

To show (22), we define \(Q_n(\rho )=\mathop {\max }\limits _{\sigma ^2}{\textrm{E}}[{\textrm{ln}} L(\rho )]\) and write

$$\begin{aligned} \frac{1}{n}(Q(\rho )-Q(\rho _0))=\frac{1}{n}(Q_n(\rho )-Q_n(\rho _0))+\frac{1}{2}H_6(\rho )+\frac{1}{2}H_7, \end{aligned}$$

where

$$\begin{aligned} Q_n(\rho )&=-\frac{n}{2}({\textrm{ln}}2\pi +1)-\frac{n}{2}{\textrm{ln}}\sigma _n^{2}(\rho )+\vert A(\rho )\vert ,\\ \sigma _n^2(\rho )&=\frac{\sigma _0^2}{n}\textrm{tr}\left\{ (A(\rho )A^{-1}(\rho _0))^{\textrm{T}}A(\rho )A^{-1}(\rho _0)\right\} ,\\ H_6(\rho )&={\textrm{ln}}\sigma _n^2(\rho )-{\textrm{ln}}\sigma ^{*2}(\rho ),~and~H_7={\textrm{ln}}\sigma ^{*2}(\rho _0)-{\textrm{ln}}\sigma _n^{2}(\rho _0). \end{aligned}$$

To show \(\frac{1}{n}(Q_n(\rho )-Q_n(\rho _0))\le 0\) uniformly on \(\bigtriangleup \), we follow Lee (2004) and define an auxiliary SAR process: \(Y=\rho WY+\varepsilon \), where \(\varepsilon \sim N(0,\sigma _0^2I_n)\). Denote the log-likelihood of this process as \({\textrm{ln}}L_a(\rho ,\sigma ^2)\). Here, \(Q_n(\rho )=\mathop {\max }\limits _{\sigma ^2}{\textrm{E}}_a({\textrm{ln}}L_a(\rho ,\sigma ^2))\) and \({\textrm{E}}_a\) denotes expectation under the auxiliary SAR process. Consequently, for any \(\rho \in \bigtriangleup ,~Q_n(\rho )\le \mathop {\max }\limits _{\rho ,\sigma ^2}{\textrm{E}}_a({\textrm{ln}}L_a(\rho ,\sigma ^2))={\textrm{E}}_a({\textrm{ln}}L_a(\rho _0,\sigma _0^2))=Q_n(\rho _0)\). Hence, \(\frac{1}{n}(Q_n(\rho )-Q_n(\rho _0))\le 0\) uniformly on \(\bigtriangleup \).

We simplify \(\sigma _n^2(\rho _0)-\sigma ^{*2}(\rho _0)=o_{_P}(1)\), which implies that \(H_7=o_{_P}(1)\). In \(H_6(\rho )\), write

$$\begin{aligned} \hat{\sigma }^2(\rho )-\sigma ^{*2}(\rho )= & {} \frac{\sigma _0^2}{n}\left\{ \textrm{tr}[(A(\rho )A^{-1}(\rho _0))^{\textrm{T}}A(\rho )A^{-1}(\rho _0)]\right. \nonumber \\{} & {} \left. -\textrm{tr}\left[ (A(\rho )A^{-1}(\rho _0))^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)A(\rho )A^{-1}(\rho _0)\right] \right\} \nonumber \\{} & {} -\frac{1}{n}(G_0+(\rho _0-\rho )R)^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)(G_0+(\rho _0-\rho )R)\nonumber \\{} & {} -\frac{2}{n}(G_0+(\rho _0-\rho )R)^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)(\xi (\theta _0-\theta )-V\theta )\nonumber \\{} & {} -\frac{1}{n}(\xi (\theta _0-\theta )-V\theta )^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)(\xi (\theta _0-\theta )-V\theta ).\nonumber \\ \end{aligned}$$
(24)

The first term of the above expression is o(1) uniformly while others are nonnegative. Consequently, \(\mathop {\lim \sup }\limits _{n\rightarrow \infty }\mathop {\max }\limits _{\rho \in N^c_{\varepsilon }(\rho _0)}\frac{1}{n}(Q(\rho )-Q(\rho _0))\le 0\) for any \(\varepsilon >0\). Similar to the proof of Theorem 4.1 in Su and Jin (2010), by using Lemma 1, we can find (22) holds.

Secondly, we prove \(\hat{\sigma }^2\xrightarrow {P}\sigma _0^2.\)

From \(\hat{\rho }\xrightarrow {P}\rho _0\), Lemma 1 and Assumption 2, we have \(\hat{\sigma }^{*2}(\rho )\xrightarrow {P}\sigma _0^2\). According to \(\hat{\sigma }^2(\rho )-\sigma ^{*2}(\rho )=o_{_P}(1)\), it suffices to show \(\hat{\sigma }^2\xrightarrow {P}\hat{\sigma }^2(\rho )\).

Because \(\beta ^{\textrm{T}}Z_i\) is continuous with regard to \(\beta \), we have \(S=S(\beta _0^{\textrm{T}}Z_i)(1+o_{_P}(1))\), where \(S(\beta _0^{\textrm{T}}Z_i)=(S_1^{\textrm{T}}(\beta _0^{\textrm{T}}Z_1),\ldots ,S_n^{\textrm{T}}(\beta _0^{\textrm{T}}Z_n))^{\textrm{T}}\). It is easy to verify that

$$\begin{aligned} \hat{\sigma }^2=&\frac{1}{n}[(I-S)(A(\hat{\rho })Y-T^{\textrm{T}}\theta _0)]^{\textrm{T}}(I-S)(A(\hat{\rho })Y-T^{\textrm{T}}\theta _0)\\ =&(1+o_{_P}(1))^2\frac{1}{n}[(I-S(\beta _0^{\textrm{T}}Z_i))[(A(\hat{\rho })Y-T^{\textrm{T}}\theta _0)]]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))[(A(\hat{\rho })Y\\&-T^{\textrm{T}}\theta _0)], \end{aligned}$$

where

$$\begin{aligned}&\frac{1}{n}[(I-S(\beta _0^{\textrm{T}}Z_i))(A(\hat{\rho })Y-T^{\textrm{T}}\theta _0)]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))(A(\hat{\rho })Y-T^{\textrm{T}}\theta _0)\\&=\frac{1}{n}[(I-S(\beta _0^{\textrm{T}}Z_i))(A(\hat{\rho })Y-A(\rho _0)Y)]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))(A(\hat{\rho })Y-A(\rho _0)Y)\\&\qquad +\frac{2}{n}[(I-S(\beta _0^{\textrm{T}}Z_i))(A(\hat{\rho })Y-A(\rho _0)Y)]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))(A(\rho _0)Y-T^{\textrm{T}}\theta _0)\\&\qquad +\frac{1}{n}[(I-S(\beta _0^{\textrm{T}}Z_i))(A(\rho _0)Y-T^{\textrm{T}}\theta _0)]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))(A(\rho _0)Y-T^{\textrm{T}}\theta _0). \end{aligned}$$

Since

$$\begin{aligned}&\frac{1}{n}[(I-S(\beta _0^{\textrm{T}}Z_i))(A(\hat{\rho })Y-A(\rho _0)Y)]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))(A(\hat{\rho })Y-A(\rho _0)Y)\\&\quad =\frac{1}{n}(\rho _0-\hat{\rho })^2[(I-S(\beta _0^{\textrm{T}}Z_i))T(G_0+\varepsilon )]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))T(G_0+\varepsilon )\\&\quad =o_{_P}(1), \end{aligned}$$

and

$$\begin{aligned}&\frac{2}{n}[(I-S(\beta _0^{\textrm{T}}Z_i))(A(\hat{\rho })Y-A(\rho _0)Y)]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))(A(\rho _0)Y-T^{\textrm{T}}\theta _0)\\&\quad =\frac{2}{n}(\rho _0-\hat{\rho })[(I-S(\beta _0^{\textrm{T}}Z_i))T(G_0+\varepsilon )]^{\textrm{T}}(I-S(\beta _0^{\textrm{T}}Z_i))(G_0+\varepsilon )\\&\quad =o_{_P}(1), \end{aligned}$$

we have

$$\begin{aligned} \hat{\sigma }^2=(1+o_{_P}(1))^2(\hat{\sigma }^2(\rho )+o_{_P}(1)), \end{aligned}$$

that is \(\hat{\sigma }^2\xrightarrow {P}\hat{\sigma }^2(\rho )\). \(\square \)

Proof of Theorem 3.4

By Theorem 3.2 of Lee (2004) and Theorem 4.3 of Su and Jin (2010), the first-order Taylor expansion of \(\frac{\partial {\textrm{ln}}L(\alpha )}{\partial \alpha }\bigg \vert _{\alpha =\hat{\alpha }}\) at \(\mathrm {\alpha }_0\) is

$$\begin{aligned} \frac{\partial {\textrm{ln}}L(\alpha )}{\partial \alpha }\bigg \vert _{\alpha =\alpha _0}+\frac{\partial ^2 {\textrm{ln}}L(\alpha )}{\partial \alpha \partial \alpha ^{\textrm{T}}}\bigg \vert _{\alpha ={\widetilde{\alpha }}}(\hat{\alpha }-\alpha _0)=0, \end{aligned}$$

where \({\widetilde{\alpha }}\) lies between \({\hat{\alpha }}\) and \(\alpha _0\), and \({\widetilde{\alpha }}\) converges to \(\alpha _0\) in probability by Theorem 1. Then we have

$$\begin{aligned} \sqrt{n}(\hat{\alpha }-\alpha _0)=-\left( \frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L({\widetilde{\alpha }})}{\partial \alpha \partial \alpha ^{\textrm{T}}}\right) ^{-1}\frac{1}{\sqrt{n}}\frac{\partial {\textrm{ln}}L(\alpha _0)}{\partial \alpha }. \end{aligned}$$

The proof is complete if we can show

$$\begin{aligned}&\frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L({\widetilde{\alpha }})}{\partial \alpha \partial \alpha ^{\textrm{T}}}-\frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}=o_{_P}(1)~~~~~uniformly~in~\widetilde{\alpha },\end{aligned}$$
(25)
$$\begin{aligned}&\frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}+\Sigma _{\alpha }=o_{_P}(1),\end{aligned}$$
(26)
$$\begin{aligned}&\frac{1}{\sqrt{n}}\frac{\partial {\textrm{ln}}L(\alpha _0)}{\partial \alpha }\xrightarrow {L}N(0,\Sigma _{\alpha }), \end{aligned}$$
(27)

and \(\Sigma _{\alpha }\) is a nonsingular matrix.

To show (25), we need to show that each element of \(\frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L({\widetilde{\alpha }})}{\partial \alpha \partial \alpha ^{\textrm{T}}}\) converges to \(\frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}\) uniformly in probability, where

$$\begin{aligned} \frac{\partial {\textrm{ln}}L(\alpha )}{\partial \rho }=\frac{1}{\sigma ^2}(WY)^{\textrm{T}}P(A(\rho )Y-T^{\textrm{T}}\theta )-\textrm{tr}(WA^{-1}(\rho )), \end{aligned}$$

and

$$\begin{aligned} \frac{\partial {\textrm{ln}}L(\alpha )}{\partial \sigma ^2}=-\frac{n}{2\sigma ^2}+\frac{1}{2\sigma ^4}(A(\rho )Y-T^{\textrm{T}}\theta )^{\textrm{T}}P(A(\rho )Y-T^{\textrm{T}}\theta ). \end{aligned}$$

Then,

$$\begin{aligned}&\frac{\partial ^2 {\textrm{ln}}L(\alpha )}{\partial \rho \partial \rho }=-\frac{1}{\sigma ^2}(WY)^{\textrm{T}}PWY-\textrm{tr}\{(WA^{-1}(\rho ))^2\},\\&\frac{\partial ^2 {\textrm{ln}}L(\alpha )}{\partial \rho \partial \sigma ^2}=-\frac{1}{\sigma ^4}(WY)^{\textrm{T}}P(A(\rho )Y-T^{\textrm{T}}\theta ),\\&\frac{\partial ^2 {\textrm{ln}}L(\alpha )}{\partial \sigma ^2\partial \sigma ^2}=\frac{n}{2\sigma ^4}-\frac{1}{\sigma ^6}(A(\rho )Y-T^{\textrm{T}}\theta )^{\textrm{T}}P(A(\rho )Y-T^{\textrm{T}}\theta ). \end{aligned}$$

Noting that \(\frac{1}{\sigma ^2}\) appears only in linear, quadratic or cubic form in \(\frac{\partial ^2 {\textrm{ln}}L(\alpha )}{\partial \alpha \partial \alpha ^{\textrm{T}}}\), it is easy to show that (28) holds for all elements but the second derivative of \({\textrm{ln}}L(\alpha )\) with respect to \(\rho \). By the mean value theorem, we have \(\textrm{tr}[\mathcal {T}^2({\widetilde{\rho }})]=\textrm{tr}(\mathcal {T}^2)+2({\widetilde{\rho }}-\rho _0)\textrm{tr}[\mathcal {T}^3({\widetilde{\rho }}^{*})]\) for some \({\widetilde{\rho }}^{*}\) between \({\widetilde{\rho }}\) and \(\rho _0\), where \(\mathcal {T}=WA^{-1}(\rho _0),~\mathcal {T}({\widetilde{\rho }})=WA^{-1}({\widetilde{\rho }})\) and \(\mathcal {T}({\widetilde{\rho }}^{*})=WA^{-1}({\widetilde{\rho }}^{*})\). Consequently,

$$\begin{aligned} \frac{1}{n}\left( \frac{\partial ^2{\textrm{ln}}L({\widetilde{\alpha }})}{\partial \rho ^2}-\frac{\partial ^2{\textrm{ln}}L(\alpha _0)}{\partial \rho ^2}\right)= & {} \left( \frac{1}{\sigma _0^2}-\frac{1}{{\widetilde{\sigma }}^2}\right) \frac{1}{n}(WY)^{\textrm{T}}PWY\nonumber \\{} & {} -\frac{2}{n}({\widetilde{\rho }}-\rho _0)\textrm{tr}\{\mathcal {T}^3({\widetilde{\rho }}^*)\}. \end{aligned}$$
(28)

By Lemma 1, the first term in above equation is \(o_{_P}(1)\) because \(\frac{1}{n}(WY)^{\textrm{T}}PWY=O_{_P}(1/l_n)\) (see the Theorem 4.3 in Su and Jin (2010). Since \(\mathcal {T}\) is uniformly bounded in row and column sums in a neighborhood of \(\rho _0\) by Assumption 1, implying that the second term in (28) is also \(o_{_P}(1)\). Consequently, we have

$$\begin{aligned} \frac{1}{n}\left( \frac{\partial ^2{\textrm{ln}}L({\widetilde{\alpha }})}{\partial \rho ^2}-\frac{\partial ^2{\textrm{ln}}L(\alpha _0)}{\partial \rho ^2}\right) =o_{_P}(1). \end{aligned}$$

Next, we use Assumption 1, Lemma 1 and Lemma 3 to show (26).

$$\begin{aligned} {\textrm{E}}\left( \frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}\right) =\begin{pmatrix} L_{11} &{} L_{12}\\ L_{21} &{} L_{22} \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} L_{11}&=\frac{1}{n\sigma _0^2}R^{\textrm{T}}P_0R+\frac{1}{n}\textrm{tr}(\mathcal {T}^{\textrm{T}}P_0\mathcal {T})+\frac{1}{n}\textrm{tr}(\mathcal {T}^2),\\ L_{12}&=L_{21}^{\textrm{T}}=\frac{1}{n\sigma _0^4}R^{\textrm{T}}P_0G_0+\frac{1}{n\sigma _0^2}\textrm{tr}(\mathcal {T}^{\textrm{T}}P_0),\\ L_{22}&=-\frac{1}{2\sigma _0^4}+\frac{1}{n\sigma _0^4}\textrm{tr}(P_0)+\frac{1}{n\sigma _0^6}G_0^{\textrm{T}}P_0G_0. \end{aligned}$$

Denote

$$\begin{aligned} \Sigma _{\alpha }=\displaystyle \lim _{n \rightarrow \infty }{\textrm{E}}\left( -\frac{1}{n}\frac{\partial ^2 {\textrm{ln}} L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}\right) =\begin{pmatrix} {\widetilde{L}}_{11} &{} {\widetilde{L}}_{12}\\ {\widetilde{L}}_{21} &{} {\widetilde{L}}_{22} \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} {\widetilde{L}}_{11}&=\frac{1}{n}\textrm{tr}(\mathcal {T}^{\textrm{T}}P_0\mathcal {T})+\frac{1}{n}\textrm{tr}(\mathcal {T}^2),\\ {\widetilde{L}}_{12}&={\widetilde{L}}_{21}^{\textrm{T}}=\frac{1}{n\sigma _0^2}\textrm{tr}(\mathcal {T}^{\textrm{T}}P_0),\\ {\widetilde{L}}_{22}&=\frac{1}{2\sigma _0^4}. \end{aligned}$$

By the law of large numbers, we have

$$\begin{aligned} \frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}-{\textrm{E}}\left( \frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}\right) \xrightarrow {P}0, \end{aligned}$$

that is

$$\begin{aligned} \frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}+\Sigma _{\alpha }=o_{_P}(1). \end{aligned}$$

The proof of (27) is straightforward by showing that linear or quadratic functions of \(\varepsilon \) deviated from their means are all \(o_{_P}(1)\). We apply the central limit theorem of Kelejian and Prucha (2009) to obtain

$$\begin{aligned} \frac{1}{\sqrt{n}}\frac{\partial {\textrm{ln}}L(\alpha _0)}{\partial \alpha }\xrightarrow {L}N(\textbf{0},I(\alpha _0)), \end{aligned}$$

where

$$\begin{aligned} I(\alpha _0)={\textrm{E}}\left( \frac{1}{\sqrt{n}}\frac{\partial {\textrm{ln}}L(\alpha _0)}{\partial \alpha }\frac{1}{\sqrt{n}}\frac{\partial {\textrm{ln}}L(\alpha _0)}{\partial \alpha ^{\textrm{T}}}\right) . \end{aligned}$$

According to Lee (2004),

$$\begin{aligned} {\textrm{E}}\left( \frac{1}{\sqrt{n}}\frac{\partial {\textrm{ln}}L(\alpha _0)}{\partial \alpha }\frac{1}{\sqrt{n}}\frac{\partial {\textrm{ln}}L(\alpha _0)}{\partial \alpha ^{\textrm{T}}}\right) =-{\textrm{E}}\left( \frac{1}{n}\frac{\partial ^2 {\textrm{ln}}L(\alpha _0)}{\partial \alpha \partial \alpha ^{\textrm{T}}}\right) +\Omega _{\alpha ,n}. \end{aligned}$$

Noting that \(\varepsilon \sim N(\textbf{0},\sigma ^2I)\), that is \(\Omega _{\alpha ,n}=0\). Then \(I(\alpha _0)=\Sigma _{\alpha }+o_{_P}(1)\). Thus, \(\frac{1}{\sqrt{n}}\frac{\partial {\textrm{ln}}L(\alpha _0)}{\partial \alpha }\xrightarrow {L}N(\textbf{0},\Sigma _{\alpha })\).

We need to show \(\Sigma _{\alpha }\) is a nonsingular matrix, that is \(\Sigma _{\alpha }\xi =\textbf{0}\) if and only if \(\xi =\textbf{0}\), where \(\xi =(\xi _1^{\textrm{T}},\xi _2^{\textrm{T}})^{\textrm{T}}\), both \(\xi _1\) and \(\xi _2\) are constant. We have

$$\begin{aligned} {\left\{ \begin{array}{ll} L_{11}\xi _1+L_{12}\xi _2=0, \\ L_{21}\xi _1+L_{22}\xi _2=0. \end{array}\right. } \end{aligned}$$
(29)

It follows from (29) and Assumption 4 that \(\Sigma _{\alpha }\) is nonsingular. \(\square \)

Appendix B

To investigate the asymptotic properties of the estimators and the link function, let \(\alpha _0=(\rho _0,\sigma _0^2)^{\textrm{T}}\), \(\theta _0\) and \(\beta _0\) represent the true parameter values, respectively. From models (2) and (3), the reduced vector form equation of Y may be written as follows:

$$\begin{aligned} Y=\xi \theta _0+G_0+\rho _0\mathcal {T}(\xi \theta _0+G_0)+A^{-1}(\rho _0)\varepsilon , \end{aligned}$$

where \(A^{-1}(\rho _0)=I+\rho _0\mathcal {T}\), \(\mathcal {T}=WA^{-1}(\rho _0)\), \(R=\mathcal {T}(\xi \theta _0+G_0)=WA^{-1}(\rho _0)(\xi \theta _0+G_0)\), \(G_0=(g(\beta _0^{\textrm{T}}X_1),\ldots ,g(\beta _0^{\textrm{T}}X_n))^{\textrm{T}}\) and \(A(\rho _0)\) is nonsingular. Define \(Q(\rho )=\mathop {\max }\limits _{\sigma ^{*2}}{\textrm{E}}[{\textrm{ln}}L(\rho )]\). The optimal solution of this maximization problem is

$$\begin{aligned} \sigma ^{*2}(\rho )= & {} {\textrm{E}}\Big \{\frac{1}{n}(A(\rho )Y-T^{\textrm{T}}\theta )^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)(A(\rho )Y-T^{\textrm{T}}\theta )\Big \}\nonumber \\= & {} \frac{1}{n}(G_0+(\rho _0-\rho )R)^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)(G_0+(\rho _0-\rho )R)\nonumber \\{} & {} +2{\textrm{E}}\Big \{\frac{1}{n}(G_0+(\rho _0-\rho )R)^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)(\xi (\theta _0-\theta )-V^{\textrm{T}}\theta )\Big \}\nonumber \\{} & {} +{\textrm{E}}\Big \{\frac{1}{n}(\xi (\theta _0-\theta )-V^{\textrm{T}}\theta )^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)(\xi (\theta _0-\theta )-V^{\textrm{T}}\theta )\Big \}\nonumber \\{} & {} +\frac{\sigma _0^2}{n}\textrm{tr}\Big \{(A(\rho )A^{-1}(\rho _0))^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)A(\rho )A^{-1}(\rho _0)\Big \}. \end{aligned}$$
(30)

Consequently, we have

$$\begin{aligned} Q(\rho )=-\frac{n}{2}({\textrm{ln}}2\pi +1)-\frac{n}{2}{\textrm{ln}}\sigma ^{*2}(\rho )+{\textrm{ln}}\vert A(\rho )\vert . \end{aligned}$$
(31)

To prove consistency and asymptotic properties of the estimators and the link function, we need present some regularity conditions first.

Assumption 1

(i) For all \(\vert \rho \vert ~<~1\), the matrix \(A(\rho )\) is nonsingular. (ii) Elements of spatial weight matrix W are non-random, \(w_{ii}=0, w_{ij}=O_{_P}(1/l_n)\) for all \(i,j = 1,2,\ldots ,n\) and \(\displaystyle \lim _{n \rightarrow \infty } l_n/n = 0\). (iii) The matrices W and \(A^{-1}(\rho _0)\) are uniformly bounded in both row and column sums in absolute value. (iv) The matrix \(A^{-1}(\rho )\) is uniformly bounded in either row or column sums, uniformly in \(\rho \) in a compact convex parameter space \(\triangle \). The true \(\rho _0\) is an interior point of \(\triangle \).

Assumption 2

(i) The density function, f(t), of \(\beta ^\textrm{T}x_i\) is positive and satisfies the first-order Lipschitz condition for \(\beta \) in a neighborhood of \(\beta _0\). Furthermore, f(t) is bounded on \(T^*\), where \(T^*=\{t=\beta ^\textrm{T}x_i:x_i\in \mathbb {R}^q\}\). f(t) has second-order differentiable at \(t_{i,0}=\beta _0^\textrm{T}x_i\), and it is uniformly bounded away from zero on \(\mathbb {R}^q\). (ii) \(g(\cdot )\) has a second-order continuously derivative on \(T^*\). It satisfies the first-order Lipschitz condition at any \(t\in T^*\), which means that there exists \(M_g\) such that \(\vert g(t) \vert \le M_g \), at any \(t\in T^*\), where \(M_g\) is a positive constant.

Assumption 3

(i) The kernel function \(K(\cdot )\) is a bounded continuous symmetric function with a bounded closed support set satisfying the Lipschitz condition of order 1 and \(\int v^2k^2(v)dv\ne 0\). (ii) Let \(f_0(\cdot )\) be the density function of \(\beta _0^{\textrm{T}}Z\), \(\mu _l=\int k(v)v^l dv,~v_l=\int k^l(v) dv\), where l is a nonnegative integer. (iii) \(\vert k^2(\cdot )\vert \le M_k,\) where \(M_k\) is a constant. (iv) If \(n\rightarrow \infty \) and \(h \rightarrow 0\), then \(nh\rightarrow \infty \)\(nh^2/({\textrm{ln}}n)^2\rightarrow \infty \)\(nh^4{\textrm{ln}}n\rightarrow 0\)\(nhh_1^3/({\textrm{ln}}n)^2\rightarrow \infty \), and \(\mathop {\lim \sup }\limits _{n\rightarrow \infty }nh_1^5 <\infty \).

Assumption 4

The limit \(\displaystyle \lim _{n\rightarrow \infty }\Big \{\frac{1}{n}\textrm{tr}(\mathcal {T}^\textrm{T}P_0\mathcal {T})+\frac{1}{n}\textrm{tr}(\mathcal {T}^2)-\frac{1}{n^2}\textrm{tr}^2(\mathcal {T}^\textrm{T}P_0) \Big \}\) exists and is nonnegative, where \(P_0=(I-S(\beta _0^\textrm{T}Z_i))^\textrm{T}(I-S(\beta _0^\textrm{T}Z_i))\), \(S(\beta _0^{\textrm{T}}Z_i)=(S_1^{\textrm{T}}(\beta _0^{\textrm{T}}Z_1),\ldots ,S_n^{\textrm{T}}(\beta _0^{\textrm{T}}Z_n))^{\textrm{T}}\).

Assumption 5

\(\mathcal {A}(\beta _0(\lambda ),\lambda )\) is a positive definite matrix for \(\lambda \in \Lambda \), where

$$\begin{aligned} \mathcal {A}(\beta _0(\lambda ),\lambda )={\textrm{E}}\Big \{[{g}_0'(\lambda ;\beta _0^\textrm{T}(\lambda )Z_{ib}(\lambda )]^2J_{\beta _0^{(r)}(\lambda )}^\textrm{T}{\widetilde{Z}}_{ib}(\lambda ){\widetilde{Z}}_{ib}^{\textrm{T}}(\lambda )J_{\beta _0^{(r)}(\lambda )} \Big \} \end{aligned}$$

with \({\widetilde{Z}}_{ib}=Z_{ib}-{\textrm{E}}[Z_{ib}\vert \beta ^\textrm{T}(\lambda )Z_{ib}(\lambda )]\).

Assumption 6

The extrapolant function is theoretically exact.

Remark 7.1

Assumption 1 involves the basic characteristics of the spatial weight matrix. It is similar to Assumption 2, Assumption 5 and Assumption 7 of Lee (2004), and Assumptions 3-4 of Su and Jin (2010). It is always satisfied if \(l_n\) is a bounded sequence. In Anselin (1988), it is a routine to have W be row-normalized such that its ith row \(w_i=(w_{i1}, w_{i2},\dots ,w_{in})/\sum _{j=1}^n w_{ij}\), where \(w_{ij}\ge 0\), represents a function of the spatial distance between the ith and jth units in some spaces. The weighting operation can be interpreted as the average of adjacent values. Assumption 2 provides the essential features of the regressors and disturbances for the model. Assumption 3 involves the kernel function and bandwidth sequence. In the nonparametric literature of local linear estimation, it is a general condition. For simplicity of proof, we denote \(P=(I-S)^{\textrm{T}}(I-S)\). Then, we give the following assumptions. Assumption 4 is necessary for the consistency and asymptotic normality of the estimators. Assumption 5 ensures that there is asymptotic variance for the estimator \({\hat{\beta }}_{SIMEX}\), and Assumption 6 is a common assumption for the SIMEX method(see Liang and Ren 2005).

Proof of Lemma 1

According to Lemma 1 and Theorem 1 in Cheng and Chen (2021), \(A(\rho )A^{-1}(\rho )\varepsilon =\varepsilon +(\rho _0-\rho )\mathcal {T}\varepsilon \) and \(H_1(\rho )=\frac{1}{n}[(\rho _0-\rho )R+G_0+\xi (\theta _0-\theta )-V^{\textrm{T}}\theta ]^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)\varepsilon +\frac{\rho _0-\rho }{n}[(\rho _0-\rho )R+G_0+\xi (\theta _0-\theta )-V^{\textrm{T}}\theta ]^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)\mathcal {T}\varepsilon ,\) we have \(H_1(\rho )=o_{_P}(1).\) Similarly. we have \(H_3(\rho )=\frac{1}{n}[(\rho _0-\rho )R+G_0]^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)V^{\textrm{T}}\theta =o_{_P}(1).\) By \({\textrm{E}}(H_2(\rho ))=\frac{\sigma _0^2}{n}\textrm{tr}\left\{ (A(\rho )A^{-1}(\rho _0))^{\textrm{T}}(I-S)^{\textrm{T}}(I-S)A(\rho )A^{-1}(\rho _0)\right\} \) and the Theorem A in Mack and Silverman (1982), we have \(H_2(\rho )\xrightarrow {P}{\textrm{E}}(H_2(\rho ))\). This completes the proof of Lemma 1. \(\square \)

Proof of Lemma 2

Using the similar method in Theorem 3.2 of Huang and Wang (2021), We know that \({\hat{\beta }}^{(r)}(\lambda )\) is the solution to

$$\begin{aligned}{} & {} \frac{1}{\sqrt{n}}\sum _{i=1}^n\left\{ Y_i-\rho _0\sum _{j\ne i}w_{ij}Y_j-\theta _0^{\textrm{T}}T_i-\hat{g}(\rho _0,{\hat{\beta }},\lambda ;{\hat{\beta }}^\textrm{T}Z_{ib}(\lambda ))\right\} \\{} & {} \quad {\hat{g}}'(\rho _0,{\hat{\beta }},\lambda ;{\hat{\beta }}^\textrm{T}Z_{ib}(\lambda ))J^{\textrm{T}}_{\beta ^{(r)}}Z_{ib}(\lambda )=0. \end{aligned}$$

Through direct calculation, we find that

$$\begin{aligned}&\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )A_{is}-\frac{1}{\sqrt{n}}\sum _{i=1}^n[{\hat{g}}(\rho _0,{\hat{\beta }},\lambda ;{\hat{\beta }}^\textrm{T}(\lambda )Z_{ib}(\lambda ))-g_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))]A_{is}\\&\quad +\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )B_{is}-\frac{1}{\sqrt{n}}\sum _{i=1}^n[{\hat{g}}(\rho _0,{\hat{\beta }},\lambda ;{\hat{\beta }}^\textrm{T}(\lambda )Z_{ib}(\lambda ))\\&\quad -g_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))]B_{is}=0, \end{aligned}$$

where

$$\begin{aligned} A_{is}=g'_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))J^{\textrm{T}}_{\beta _0^{(r)}(\lambda )}Z_{ib}(\lambda ), \end{aligned}$$

and

$$\begin{aligned} B_{is}=\left[ {\hat{g}}'(\rho _0,{\hat{\beta }},\lambda ;{\hat{\beta }}^\textrm{T}(\lambda )Z_{ib}(\lambda ))-g_0'(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))\right] J^{\textrm{T}}_{\beta _0^{(r)}(\lambda )}Z_{ib}(\lambda ). \end{aligned}$$

So, we have the equation

$$\begin{aligned}&\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )A_{is}-\frac{1}{\sqrt{n}}\sum _{i=1}^n\left[ {\hat{g}}(\rho _0,{\hat{\beta }},\lambda ;{\hat{\beta }}^\textrm{T}(\lambda )Z_{ib}(\lambda ))-g_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))\right] A_{is}\nonumber \\&\quad +o_{_P}(1)=0. \end{aligned}$$
(32)

Obviously,

$$\begin{aligned} \begin{aligned}&{\hat{g}}(\rho _0,{\hat{\beta }},\lambda ;{\hat{\beta }}^\textrm{T}(\lambda )Z_{ib}(\lambda ))-g_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))\\&\quad =g'_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))({\hat{\beta }}^{(r)}(\lambda )-\beta _0^{(r)}(\lambda ))J^{\textrm{T}}_{\beta _0^{(r)}(\lambda )}Z_{ib}(\lambda )\\&\qquad +{\hat{g}}(\rho _0,\hat{\beta },\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))\\&\quad -g_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))+o_{_P}(n^{-1/2}). \end{aligned} \end{aligned}$$
(33)

Substituting (33) into (32) and using the Ergodic theorem, we have

$$\begin{aligned} \begin{aligned}&\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )A_{is}-\sqrt{n}{\textrm{E}}[A_{is}A_{is}^{\textrm{T}}]({\hat{\beta }}^{(r)}(\lambda )-\beta _0^{(r)}(\lambda ))\\&\quad -\frac{1}{\sqrt{n}}\sum _{i=1}^n\left[ {\hat{g}}(\rho _0,{\hat{\beta }},\lambda ;\beta _0^\textrm{T}(\lambda )Z_{ib}(\lambda ))-g_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))\right] A_{is}+o_{_P}(1)=0. \end{aligned} \end{aligned}$$
(34)

On the other hand, following the estimation procedure, \((\hat{a}_0,\hat{a}_1)\) minimize

$$\begin{aligned} \sum _{i=1}^n\{Y_i-\rho _0\sum _{j\ne i}w_{ij}Y_j-[a_0+a_1(\hat{\beta }^{\textrm{T}}(\lambda )Z_{ib}(\lambda )-t_0)]\}^2K_h(\hat{\beta }^{\textrm{T}}(\lambda )Z_{ib}(\lambda )-t_0), \end{aligned}$$

then \((\hat{a}_0,\hat{a}_1)\) satisfies the formula

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^n\{Y_i-&\rho _0\sum _{j\ne i}w_{ij}Y_j-[a_0+ha_1(\hat{\beta }^{\textrm{T}}(\lambda )Z_{ib}(\lambda )-t_0)/h]\}\\&\begin{pmatrix} 1\\ \frac{\hat{\beta }^{\textrm{T}}(\lambda )Z_{ib}(\lambda )-t_0}{h} \end{pmatrix}K_h(\hat{\beta }^{\textrm{T}}(\lambda )Z_{ib}(\lambda )-t_0)=0, \end{aligned}$$

Thus,

$$\begin{aligned} \begin{pmatrix} {\hat{a}}_0-a_0\\ h({\hat{a}}_1-a_1) \end{pmatrix}&=\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )f_0^{-1}(t_0)\begin{pmatrix} 1\\ \frac{\hat{\beta }^{\textrm{T}}(\lambda )Z_{ib}(\lambda )-t_0}{h\mu _2} \end{pmatrix}K_h(\beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda )-t_0)\\&\quad -f_0^{-1}(t_0)\begin{pmatrix} g_0'(t_0){\textrm{E}}\left( Z_{ib}^{\textrm{T}}(\lambda )\vert \beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda )=t_0\right) f_0(t_0)\\ 0 \end{pmatrix}\\&\quad ({\hat{\beta }}^{(r)}(\lambda )-\beta _0^{(r)}(\lambda ))J^{\textrm{T}}_{\beta ^{(r)}(\lambda )}+o_{_P}(n^{-\frac{1}{2}}). \end{aligned}$$

Then, it can be shown that

$$\begin{aligned} \begin{aligned} {\hat{g}}(\rho _0,{\hat{\beta }},\lambda ;t_0)-g_0(\lambda ;t_0)=&\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )f_0^{-1}(t_0)K_h(\beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda )-t_0)\\&-g_0'(\lambda ;t_0)({\hat{\beta }}^{(r)}(\lambda )-\beta _0^{(r)}(\lambda ))J^{\textrm{T}}_{\beta ^{(r)}(\lambda )}{\textrm{E}}\\&\quad \left( Z_{ib}^{\textrm{T}}(\lambda )\vert \beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda )=t_0\right) +o_{_P}(n^{-1/2}). \end{aligned} \end{aligned}$$
(35)

Substituting (35) into (34) and applying the Ergodic theorem at the same time, we get

$$\begin{aligned} \begin{aligned}&\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )A_{is}-\sqrt{n}A_n({\hat{\beta }}^{(r)}(\lambda )-\beta _0^{(r)}(\lambda ))\\&\quad -\frac{1}{\sqrt{n}}\sum _{i=1}^n\frac{1}{n}\sum _{j=1}^nf_0^{-1}(\beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda ) )K_h(\beta _0^{\textrm{T}}(\lambda )Z_{jb}(\lambda )-\beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda ))\\&\quad A_{is}\varepsilon _{jb}(\lambda )+o_{_P}(1)=0, \end{aligned} \end{aligned}$$
(36)

where

$$\begin{aligned} \begin{aligned} A_n=&{\textrm{E}}[A_{is}A^{\textrm{T}}_{is}]-{\textrm{E}}\Big \{{\textrm{E}}\left( J^{\textrm{T}}_{\beta _0^{(r)}(\lambda )}Z_{ib}^{\textrm{T}}(\lambda )g'_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))\vert \beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda )=t_0\right) \\&g'_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda )){\textrm{E}}\left( Z_{ib}^{\textrm{T}}(\lambda )\vert \beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda )=t_0\right) J_{\beta _0^{(r)}(\lambda )}\Big \}. \end{aligned} \end{aligned}$$

Handle the third term in (36) by interchanging the summations and we have

$$\begin{aligned} -\frac{1}{\sqrt{n}}\sum _{j=1}^n\varepsilon _{jb}(\lambda ) \frac{1}{n}\sum _{i=1}^nf_0^{-1}(\beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda ) )K_h(\beta _0^{\textrm{T}}(\lambda )Z_{jb}(\lambda )-\beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda ))A_{is}. \end{aligned}$$

Furthermore, by the Ergodic theorem, the term is equivalent asymptotically to

$$\begin{aligned} -\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )J^{\textrm{T}}_{\beta _0^{(r)}(\lambda )} {\textrm{E}}\left( Z_{ib}(\lambda )\vert \beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda )\right) g'_0(\lambda ;\beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda )). \end{aligned}$$
(37)

Combining (36) and (37), we obtain

$$\begin{aligned} \sqrt{n}({\hat{\beta }}^{(r)}(\lambda )-\beta _0^{(r)}(\lambda ))=&A_n^{-1}\frac{1}{\sqrt{n}}\sum _{i=1}^n\varepsilon _{ib}(\lambda )g'_0(\lambda ; \beta ^{\textrm{T}}_0(\lambda )Z_{ib}(\lambda ))J^{\textrm{T}}_{\beta _0^{(r)}(\lambda )}\\&\times \Big [Z_{ib}-{\textrm{E}}\left( Z_{ib}(\lambda )\vert \beta _0^{\textrm{T}}(\lambda )Z_{ib}(\lambda )=t_0\right) \Big ]. \end{aligned}$$

After a simple collation, the proof is complete. \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, K., Wang, D. Estimation for partially linear single-index spatial autoregressive model with covariate measurement errors. Stat Papers (2024). https://doi.org/10.1007/s00362-024-01551-3

Download citation

  • Received:

  • Revised:

  • Published:

  • DOI: https://doi.org/10.1007/s00362-024-01551-3

Keywords

Navigation