Skip to main content
Log in

Robust estimation and inference of spatial panel data models with fixed effects

  • Original Paper
  • Spatial statistics
  • Published:
Japanese Journal of Statistics and Data Science Aims and scope Submit manuscript

Abstract

It is well established that the quasi maximum likelihood (QML) estimation of the spatial regression models is generally inconsistent under unknown cross-sectional heteroskedasticity (CH) and the CH-robust methods have been developed. The same issue remains for the spatial panel data (SPD) models but the similar studies based on QML approach do not seem to have been carried out. This paper focuses on the SPD model with fixed effects (FE). We argue that under unknown CH the QML estimator for the SPD-FE model is inconsistent in general, but there are ‘special cases’ where it may remain consistent although the exact conditions may not be possible to check, as in practice the type of CH is generally unknown. Thus, we introduce a new set of estimation and inference methods based on the adjusted quasi scores (AQS), which are fully robust against unknown CH. Consistency and asymptotic normality of the proposed AQS estimators are established. Robust standard error estimates are provided and their consistency is proved. To improve the finite sample performance, a set of AQS methods based on concentrated quasi scores is also introduced and its asymptotic properties examined. Extensive Monte Carlo results show that the new estimator outperforms the QML estimator even when the latter seems robust.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Amemiya, T. (1985). Advanced econometrics. Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Anselin, L. (1988). Spatial econometrics: Methods and models. The Netherlands: Kluwer.

    Book  Google Scholar 

  • Anselin, L., Le Gallo, J., & Jayet, J. (2008). Spatial panel econometrics. In L. Mátyás & P. Sevestre (Eds.), The econometrics of panel data: Fundamentals and recent developments in theory and practice (pp. 625–660). Berlin, Heidelberg: Springer.

    Chapter  Google Scholar 

  • Badinger, H., & Egger, P. (2011). Estimation of higher-order spatial autoregressive cross-section models with heteroskedastic disturbances. Papers in Regional Science, 90, 213–235.

    Article  Google Scholar 

  • Badinger, H., & Egger, P. (2015). Fixed effects and random effects estimation of higher-order spatial autoregressive models with spatial autoregressive and heteroskedastic disturbances. Spatial Economic Analysis, 10(1), 11–35.

    Article  Google Scholar 

  • Baltagi, B., Egger, P., & Pfaffermayr, M. (2013). A generalised spatial panel data model with random effects. Econometric Reviews, 32, 650–685.

    Article  MathSciNet  Google Scholar 

  • Baltagi, B., Egger, P., & Kesina, M. (2016). Firm level productivity spillovers in China’s chemical industry: A spatial Hausman-Taylor approach. Journal of Applied Econometrics, 31, 214–248.

    Article  MathSciNet  Google Scholar 

  • Baltagi, B., Song, S. H., & Kon, W. (2003). Testing panel data regression models with spatial error correlation. Journal of Econometrics, 117, 123–150.

    Article  MathSciNet  Google Scholar 

  • Baltagi, B., & Yang, Z. L. (2013). Heteroskedasticity and non-normality robust LM tests of spatial dependence. Regional Science and Urban Economics, 43, 725–739.

    Article  Google Scholar 

  • Davidson, J. (1994). Stochastic limit theory. Oxford: Oxford University Press.

    Book  Google Scholar 

  • Fingleton, B. (2008). A generalised method of moments estimator for a spatial panel model with an endogenous spatial lag and spatial moving average errors. Spatial Economic Analysis, 3, 27–44.

    Article  Google Scholar 

  • Griffith, D. A. (1988). Advanced spatial statistics. Dordrecht, The Netherlands: Kluwer.

    Book  Google Scholar 

  • Hsieh, C. S., & Lee, L. F. (2014). A social interactions model with endogenous friendship formation and selectivity. Journal of Applied Econometrics, 31, 301–319.

    Article  MathSciNet  Google Scholar 

  • Jin, F., & Lee, L. F. (2012). Approximated likelihood and root estimators for spatial interaction in spatial autoregressive models. Regional Science and Urban Economics, 42, 446–458.

    Article  Google Scholar 

  • Kapoor, M., Kelejian, H. H., & Prucha, I. R. (2007). Panel data models with spatially correlated error components. Journal of Econometrics, 140, 97–130.

    Article  MathSciNet  Google Scholar 

  • Kelejian, H. H., & Piras, G. (2016). An extension of the J-test to spatial panel data framework. Journal of Applied Econometrics, 31, 387–402.

    Article  MathSciNet  Google Scholar 

  • Kelejian, H. H., & Prucha, I. R. (2001). On the asymptotic distribution of the Moran \(I\) test statistic with applications. Journal of Econometrics, 104, 219–257.

    Article  MathSciNet  Google Scholar 

  • Kelejian, H. H., & Prucha, I. R. (2007). HAC estimation in a spatial framework. Journal of Econometrics, 140, 131–154.

    Article  MathSciNet  Google Scholar 

  • Kelejian, H. H., & Prucha, I. R. (2010). Specification and estimation of spatial autoregressive models with autoregressive and heteroskedastic disturbances. Journal of Econometrics, 157, 53–67.

    Article  MathSciNet  Google Scholar 

  • Lee, L. F. (2004). Asymptotic distributions of quasi-maximum likelihood estimators for spatial autoregressive models. Econometrica, 72, 1899–1925.

    Article  MathSciNet  Google Scholar 

  • Lee, L. F., & Yu, J. (2010a). Estimation of spatial autoregressive panel data models with fixed effects. Journal of Econometrics, 154, 165–185.

    Article  MathSciNet  Google Scholar 

  • Lee, L. F., & Yu, J. (2010b). Some recent developments in spatial panel data models. Regional Science and Urban Economics, 40, 255–271.

    Article  Google Scholar 

  • Lee, L. F., & Yu, J. (2012). Spatial panels: Random components vs. fixed effects. International Economic Review, 53, 1369–1412.

    Article  MathSciNet  Google Scholar 

  • Lee, L. F., & Yu, J. (2015). Spatial panel data models. In B. H. Baltagi (Ed.), The Oxford handbook of panel data (pp. 363–401). Oxford: Oxford University Press.

    Google Scholar 

  • LeSage, J. (1997). Bayesian estimation of spatial autoregressive models. International Regional Science Review, 20, 113–129.

    Article  Google Scholar 

  • Li, L. Y., & Yang, Z. L. (2019). Spatial dynamic panel data models with correlated random effects. Journal of Econometrics. Conditionally accepted.

  • Li, L. Y., & Yang, Z. L. (2020). M-estimation of fixed effects spatial dynamic panel data models with small \(T\) and unknown heteroskedasticity. Regional Science and Urban Economics. forthcoming.

  • Lin, X., & Lee, L. F. (2010). GMM estimation of spatial autoregressive models with unknown heteroskedasticity. Journal of Econometrics, 157, 34–52.

    Article  MathSciNet  Google Scholar 

  • Liu, S. F., & Yang, Z. L. (2015). Modified QML estimation of spatial autoregressive models with unknown heteroskedasticity and non-normality. Regional Science and Urban Economics, 52, 50–70.

    Article  Google Scholar 

  • Millimet, D. L., & Roy, J. (2016). Empirical tests of the pollution haven hypothesis when environmental regulations is endogenous. Journal of Applied Econometrics, 31, 652–677.

    Article  MathSciNet  Google Scholar 

  • Moscone, F., & Tosetti, E. (2011). GMM estimation of spatial panels with fixed effects and unknown heteroskedasticity. Regional Science and Urban Economics, 41, 487–497.

    Article  Google Scholar 

  • Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica, 46, 69–85.

    Article  MathSciNet  Google Scholar 

  • Neyman, J., & Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica, 16, 1–32.

    Article  MathSciNet  Google Scholar 

  • Newey, W. K. (1991). Uniform convergence in probability and stochastic equicontinuity. Econometrica, 59, 1161–1167.

    Article  MathSciNet  Google Scholar 

  • Robinson, P. M., & Rossi, F. (2015). Refinements in maximum likelihood inference on spatial autocorrelation in panel data. Journal of Econometrics, 189, 447–456.

    Article  MathSciNet  Google Scholar 

  • Serfling, R. J. (1980). Approximation theorems of mathematical statistics. London: Wiley.

    Book  Google Scholar 

  • van der Vaart, A. W. (1998). Asymptotic statistics. Cambridge: Cambridge University Press.

    Book  Google Scholar 

  • White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48, 817–838.

    Article  MathSciNet  Google Scholar 

  • Xu, Y. H., & Yang, Z. L. (2020). Specification tests for temporal heterogeneity in spatial panel data models with fixed effects. Regional Science and Urban Economics, forthcoming.

  • Yang, Z. L. (2018). Unified M-estimation of fixed-effects spatial dynamic models with short panels. Journal of Econometrics, 205, 423–447.

    Article  MathSciNet  Google Scholar 

  • Yang, Z. L., Yu, J., & Liu, S. F. (2016). Bias correction and refined inferences for fixed effects spatial panel data models. Regional Science and Urban Economics, 61, 52–72.

    Article  Google Scholar 

  • Yu, J., de Jong, R., & Lee, L. F. (2008). Quasi-maximum likelihood estimators for spatial dynamic panel data with fixed effects when both \(n\) and \(T\) are large. Journal of Econometrics, 146, 118–134.

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

We thank the two referees for their helpful comments. Zhenlin Yang gratefully acknowledges the financial support from Singapore Management University under Grant C244/MSS16E003.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhenlin Yang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Some useful lemmas

Following lemmas extend the selected lemmas from Lee (2004), Yu et al. (2008), Lin and Lee (2010), and Kelejian and Prucha (2010), which are essential in proving our main results.

Lemma A.1

For \({\mathbb {X}}_N(\rho )\) defined in Sect. 2, under Assumptions 1, 3and 4, the projection matrices, \({\mathbb {P}}_N(\rho )={\mathbb {X}}_N(\rho )[{\mathbb {X}}_{N}^{\prime }(\rho ){\mathbb {X}}_N(\rho )]^{-1}{\mathbb {X}}_{N}^{\prime }(\rho )\) and \({\mathbb {M}}_N(\rho )={\mathbf {I}}_N-{\mathbb {P}}_N(\rho )\) and are uniformly bounded in both row and column sums, for each \(\rho \) in its compact parameter space.

Lemma A.2

Let \({\mathbf {A}}_N\) and \({\mathbf {B}}_N\) be \(N\times N\) matrices, uniformly bounded in both row and column sums, and \({\mathbb {M}}_N(\rho )\) be defined in Lemma A.1. Then, we have,

  1. (i)

    the elements of \(A_{N}\) are uniformly bounded,

  2. (ii)

    \(\mathtt{tr}({\mathbf {A}}^m_N)=O(N)\) for \(m\ge 1\),

  3. (iii)

    \(\mathtt{tr}({\mathbf {A}}'_N{\mathbf {A}}_N)=O(N)\),

  4. (iv)

    \(\mathtt{tr}(({\mathbb {M}}_N(\rho ){\mathbf {A}}_N)^m)=\mathtt{tr}({\mathbf {A}}^m_N)+O(1)\) for \(m\ge 1\) and each \(\rho \),

  5. (v)

    \(\mathtt{tr}(({\mathbf {A}}'_N{\mathbb {M}}_N(\rho ){\mathbf {A}}_N)^m) =\mathtt{tr}(({\mathbf {A}}'_N{\mathbf {A}}_N)^m)+O(1)\) for \(m\ge 1\) and each \(\rho \),

  6. (vi)

    \({\mathbf {A}}_N{\mathbf {B}}_N\) is uniformly bounded in both row and column sums.

Lemma A.3

Let \({\mathbf {A}}_N\) be an \(N\times N\) matrix of uniformly bounded column sums, \({\mathbf {C}}_{N}\) be an \(N\times k\) matrix (\(k<N\)) of uniformly bounded elements, and \({\mathbb {V}}_{N}\) be an \(N\times 1\) random vector of independent elements with zero mean, and uniformly bounded third absolute moments. Then,

  1. (i)

    \(\frac{1}{\sqrt{N}}{\mathbf {C}}_{N}^{\prime }{\mathbf {A}}_{N}{\mathbb {V}} = O_{p}(1)\)  and  \(\frac{1}{N}{\mathbf {C}}_{N}^{\prime }{\mathbf {A}}_{N}{\mathbb {V}} = o_{p}(1)\),

  2. (ii)

    \(\frac{1}{\sqrt{N}}{\mathbf {C}}_{N}^{\prime }{\mathbf {A}}_{N}{\mathbb {V}} \overset{D}{\rightarrow } N(0, \lim _{N\rightarrow \infty }\frac{1}{N}{\mathbf {C}}_{N}^{\prime }{\mathbf {A}}_{N}{\mathbf {H}}_{N}{\mathbf {A}}_{N}^{\prime }{\mathbf {C}}_{N})\), where \({\mathbf {H}}_{N}={\text{Var}}({\mathbb{V}})\) and the ‘limit’ is assumed to exist and to be positive definite.

Lemma A.4

(Moments and Limiting Distribution for Linear Quadratic forms) Let \({\mathbf {B}}_{rN}\) be \(N\times N\) matrices of uniformly bounded row and column sums, and \({\mathbf {c}}_{rN}\) be \(N\times 1\) vectors with elements \(c_{ri}\) satisfying \(\sup _{N}\frac{1}{N}\sum _{i=1}^{N}|c_{ri}|^{2+\epsilon } < \infty \) for some \(\epsilon >0\). Let \({\mathbb {V}}_{N}\) be an \(N\times 1\) random vector with elements: \(\{v_{i}\} \sim inid(0,\sigma ^2_{0}h_{i})\), where \(h_{i}>0\) such that \(\frac{1}{N}\sum _{i=1}^{N}h_{i}=1\), and \(E|v_{i}|^{4+\epsilon }< c < \infty \) for all i, for some \(\epsilon >0\) and constant c. Consider the linear-quadratic forms: \({\mathbf {Q}}_{rN}={\mathbb {V}}'_N{\mathbf {B}}_{rN}{\mathbb {V}}_N +{\mathbf {c}}'_{rN}{\mathbb {V}}_N,\ r=1,2\). Denote the diagonal elements of \({\mathbf {B}}_{rN}\) by \(b_{r,ii}\). Let \(s_{i}\) and \(\kappa _{i}\) be, respectively, the measures of skewness and excess kurtosis of \(v_{i}\). We have,

  1. (i)

    \(\mathrm{E}({\mathbf {Q}}_{rN})=\sigma ^2_{0}\mathtt{tr}({\mathbf {H}}_{N}{\mathbf {B}}_{rN})\), where \({\mathbf {H}}_N=\mathtt{diag}(h_{1},\ldots , h_{N})\),

  2. (ii)

    \(\begin{aligned} \mathrm{Var}({\mathbf {Q}}_{rN})= & {} \sigma ^4_{0}\mathtt{tr}[{\mathbf {H}}_{N}{\mathbf {B}}_{rN}({\mathbf {H}}_{N}{\mathbf {B}}_{rN}+{\mathbf {B}}'_{rN}{\mathbf {H}}_{N})]+\sigma _0^2{\mathbf {c}}'_{rN}{\mathbf {H}}_N{\mathbf {c}}_{rN}\\&+\sum _{i=1}^{N}(\sigma _{0}^{4}b_{r,ii}^{2}h_i^2\kappa _{i}+2\sigma _{0}^{3}b_{r,ii}c_{ri}h_{i}^{3/2}s_{i}), \end{aligned}\)

  3. (iii)

    \(\begin{aligned} \mathrm{Cov}({\mathbf {Q}}_{1N},{\mathbf {Q}}_{2N})= & {} 2\sigma _{0}^{4}\mathtt{tr}({\mathbf {B}}_{1N}{\mathbf {H}}_N{\mathbf {B}}_{2N} {\mathbf {H}}_N)+\sigma ^2_0{\mathbf {c}}'_{1N}{\mathbf {H}}_N{\mathbf {c}}_{2N}\\&+\sum _{i=1}^{N}\big [\sigma _{0}^{4}b_{1,ii}b_{2,ii}h^2_{i}\kappa _{i} +\sigma _{0}^{3}(b_{1,ii}c_{2i}+b_{2,ii}c_{1i})h^{3/2}_{i}s_i\big ], \end{aligned}\)

  4. (iv)

    \(\mathrm{E}({\mathbf {Q}}_{rN})=O(N)\), \(Var({\mathbf {Q}}_{rN})=O(N)\), and \({\mathbf {Q}}_{rN}=O_{p}(N)\),

  5. (v)

    \(\begin{aligned} \frac{1}{N}{\mathbf {Q}}_{rN}-\frac{1}{N}\mathrm{E}({\mathbf {Q}}_{rN}) =O_p\big (N^{-\frac{1}{2}}\big ), \end{aligned}\)

  6. (vi)

    \(\frac{{\mathbf {Q}}_{rN}-\mathrm{E}({\mathbf {Q}}_{rN})}{\sqrt{\mathrm{Var}({\mathbf {Q}}_{rN})}} \overset{D}{\longrightarrow }N(0,1)\), and for \({\mathbf {Q}}_N=({\mathbf {Q}}_{1N},{\mathbf {Q}}_{2N})'\),

  7. (vii)

    \(\Sigma _{N}^{-1/2}({\mathbf {Q}}_N-\mathrm{E}({\mathbf {Q}}_N))\overset{D}{\longrightarrow }N({\mathbf {0}}, I_2)\), where \(\Sigma _{N}=\mathrm{Var}({\mathbf {Q}}_N)\), and \(\Sigma _{N}^{1/2}\Sigma _{N}^{1/2}=\Sigma _{N}\).

Appendix B: Proofs of theorems

More on the robustness of QMLE We continue on the discussion at the end of Sect. 2.1 to give some more useful details on the nature of Condition I and Condition II.

First, given \(\delta \), \({\bar{\ell }}_{N}(\theta )=\mathrm{E}[\ell _N(\theta )]\) is partially maximized at

$$\begin{aligned} {\bar{\beta }}_{N}(\delta )&= [{\mathbb {X}}_{N}^{\prime }(\rho ){\mathbb {X}}_{N}(\rho )]^{-1}{\mathbb {X}}_{N}^{\prime } (\rho ) {\mathbf {D}}_{N}(\delta ){\mathbf {f}}_{N}, \end{aligned}$$
(B-1)
$$\begin{aligned} {\bar{\sigma }}_{N}^{2}(\delta )&=\textstyle \frac{1}{N}{\mathbf {f}}_{N}^{\prime } {\mathbf {D}}_{N}^{\prime }(\delta ){\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ) {\mathbf {f}}_{N} + \frac{\sigma _{0}^{2}}{N}{} \mathtt{tr}[{\mathbf {H}}_{N} {\mathbf {D}}_{N}^{\prime -1}{\mathbf {D}}_{N}^{\prime }(\delta ){\mathbf {D}}_{N} (\delta ){\mathbf {D}}_{N}^{-1}], \end{aligned}$$
(B-2)

giving the population counterpart of \(\ell ^c_N(\delta )\) (see (2.6)) upon substitution:

$$\begin{aligned} \textstyle {\bar{\ell }}^c_N(\delta )=\max _{\beta ,\sigma ^2}\mathrm{E}[\ell _N(\theta )]=-\frac{N}{2}\ln (2\pi +1)+\ln |{\mathbf {D}}_{N}(\delta )|-\frac{N}{2}\ln ({\bar{\sigma }}^2_N(\delta )), \end{aligned}$$
(B-3)

recalling \({\mathbf {D}}_{N}(\delta )=I_{T-1}\otimes D_{n}(\delta ), {\mathbf {D}}_{N}={\mathbf {D}}_{N}(\delta _{0})\) and \({\mathbf {f}}_{N}={\mathbf {A}}_{1N}^{-1}{\mathbf {X}}_{N}\beta _{0}\). We have \({\bar{\sigma }}_{N}^{2}(\delta _0)={\sigma }_{0}^{2}\), and \({\bar{\sigma }}_{N}^{2}(\delta )=\sigma _{n}^{2}(\delta )\big [1+\frac{1}{N\sigma _{n}^{2}(\delta )}{\mathbf {f}}_{N}^{\prime }{\mathbf {D}}_{N}^{\prime }(\delta ){\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ){\mathbf {f}}_{N}\big ] \equiv \sigma _{n}^{2}(\delta )\mu _{N}(\delta )\). Thus,

$$\begin{aligned} \textstyle {\bar{\ell }}_N^{c}(\delta )-{\bar{\ell }}_N^{c}(\delta _0) = \ln |{\mathbf {D}}_{N}(\delta )|-\ln |{\mathbf {D}}_{N}| - \frac{N}{2}(\ln (\sigma _{n}^{2}(\delta ))-\ln (\sigma _0^{2})) - \frac{N}{2}\ln (\mu _{N}(\delta )). \end{aligned}$$

It can be shown that \(\sigma _{n}^{2}(\delta )\) (which is the 2nd part of (B-3)) is bounded from below away from 0 (see the proof of 3.1). By the first part of Condition I(a), \(\frac{1}{N}{\mathbf {f}}_{N}^{\prime }{\mathbf {D}}_{N}^{\prime }(\delta ){\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ){\mathbf {f}}_{N}>0\), and thus \(\mu _{N}(\delta )>1\) for \(\lambda \ne \lambda _{0}\) given any \(\rho \). Now, given \(\lambda _{0}\), \(\lim _{N\rightarrow \infty }\frac{1}{N}[{\bar{\ell }}_N^{c}(\lambda _{0},\rho )-{\bar{\ell }}_N^{c}(\delta _0)] \ne 0\) for \(\rho \ne \rho _0\) by the second part of Condition I(a). Hence, \(\delta _0\) is identified if further: \(\lim _{N\rightarrow \infty }\frac{1}{N}[{\bar{\ell }}_N^{c}(\lambda _{0},\rho )-{\bar{\ell }}_N^{c}(\delta _0)] \le 0\) for \(\rho \ne \rho _0\), which is a special case of the following.

When Condition I(a) fails, \({\bar{\ell }}_N^{c}(\delta )-{\bar{\ell }}_N^{c}(\delta _0)\ne 0\) \(\forall \delta \ne \delta _0\) by Condition I(b). To ensure \({\bar{\ell }}_N^{c}(\delta ) < {\bar{\ell }}_N^{c}(\delta _0)\) \(\forall \delta \ne \delta _0\), one needs additional conditions so that \({\bar{\ell }}_N^{c}(\delta ) \le {\bar{\ell }}_N^{c}(\delta _0)\) \(\forall \delta \ne \delta _0\). Note that \(p_N(\theta _0)=\exp [\ell _N(\theta _0)]\) is the quasi joint pdf of \({\mathbf {Y}}_{N}\) under \({\mathbf {V}}_N \sim N(0,\sigma ^2I_N)\). Let \(p_N^0(\theta _0)\) be the true joint pdf of \({\mathbf {Y}}_{N}\) under \({\mathbf {V}}_N\sim (0,\sigma ^2{\mathbf {H}}_N)\). Let \(\hbox {E}^q\) denote the expectation with respect to \(p_N(\delta _0)\), to differentiate from the usual notation E that corresponds to \(p_N^0(\theta _0)\). Write

$$\begin{aligned} {\mathbf {D}}_{N}(\delta ){\mathbf {Y}}_N = {\mathbf {D}}_{N}(\delta ){\mathbf {f}}_{N}+{\mathbf {B}}_N(\delta ){\mathbf {V}}_{N}, \text { and } {\mathbf {V}}_N(\beta ,\delta ) =\ {\mathbf {B}}_N(\delta ){\mathbf {V}}_N+{\mathbf {b}}_N(\beta ,\delta ), \end{aligned}$$

where \({\mathbf {B}}_N(\delta )={\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}\) and \({\mathbf {b}}_N(\beta ,\delta )={\mathbf {D}}_{N}(\delta ){\mathbf {f}}_{N}-{\mathbf {A}}_{2N}(\rho ){\mathbf {X}}_N\beta \). Then, for \(\ell _N(\theta )\) in (2.4),

$$\begin{aligned} \mathrm{E}^q[\ell _N(\theta _0)]&= \mathrm{E}[\ell _N(\theta _0)] =\textstyle -\frac{N}{2}\ln (2\pi \sigma ^2)+\ln |{\mathbf {D}}_{N}|-\frac{N}{2},\text { as } \frac{1}{N}\mathtt{tr}({\mathbf {H}}_N)=1, \text { and} \\ \mathrm{E}^q[\ell _N(\theta )]&= \textstyle -\frac{N}{2}\ln (2\pi \sigma ^2)+\ln |{\mathbf {D}}_{N}(\delta )| - \frac{1}{2\sigma ^2}[\sigma ^2_0\mathtt{tr}({\mathbf {B}}^{\prime }_N(\delta ){\mathbf {B}}_N(\delta ))\\&\quad +{\mathbf {b}}^{\prime }_N(\beta ,\delta ){\mathbf {b}}_N(\beta ,\delta )],\\ \mathrm{E}[\ell _N(\theta )]&= \textstyle -\frac{N}{2}\ln (2\pi \sigma ^2)+\ln |{\mathbf {D}}_{N}(\delta )| -\frac{1}{2\sigma ^2}[\sigma ^2_0\mathtt{tr}({\mathbf {H}}_N{\mathbf {B}}^{\prime }_N (\delta ){\mathbf {B}}_N(\delta ))\\&\quad +{\mathbf {b}}^{\prime }_N(\beta ,\delta ) {\mathbf {b}}_N(\beta ,\delta )]. \end{aligned}$$

By Jensen’s inequality, \(\mathrm{E}^q\big [\ln \big (\frac{p_N(\theta )}{p_N(\theta _0)}\big )\big ] \le \ln \mathrm{E}^q\big (\frac{p_N(\theta )}{p_N(\theta _0)}\big ) = 0\). If, \(\mathrm{E}[\ell _N(\theta )]-\mathrm{E}^q[\ell _N(\theta )]=o(N)\), then \(\mathrm{E}[\ln p_N(\theta )]\le \mathrm{E}[\ln p_N(\theta _0)]\), for large enough N. Thus, \({\bar{\ell }}_{N}^{c}(\delta )=\max _{\beta ,\sigma ^2}\mathrm{E}[\ln p_N(\theta )]\le \max _{\beta ,\sigma ^2}\mathrm{E}[\ln p_N(\theta _0)]={\bar{\ell }}_{N}^{c}(\delta _0), \forall \delta \ne \delta _0\), and N large enough. Clearly,

$$\begin{aligned} \textstyle \mathrm{E}[\ell _N(\theta )]-\mathrm{E}^q[\ell _N(\theta )] = \frac{\sigma ^2_0}{2\sigma ^2}[\mathtt{tr}({\mathbf {B}}^{\prime }_N(\delta ){\mathbf {B}}_N(\delta ))-\mathtt{tr}({\mathbf {H}}_N{\mathbf {B}}^{\prime }_N(\delta ){\mathbf {B}}_N(\delta ))]. \end{aligned}$$

Using \({\mathbf {A}}_{1N}(\lambda ) = {\mathbf {A}}_{1N} + (\lambda _0-\lambda ) {\mathbf {W}}_{1N}\) and \({\mathbf {A}}_{2N}(\rho ) = {\mathbf {A}}_{2N} + (\rho _0-\rho ) {\mathbf {W}}_{2N}\), we have

$$\begin{aligned} {\mathbf {B}}_{N}(\delta ) = I_{N}+(\rho _{0}-\rho ){\mathbf {G}}_{2N}+(\lambda _{0}-\lambda ){\bar{\mathbf {G}}}_{1N}+(\lambda _{0}-\lambda )(\rho _{0}-\rho ){\mathbf {G}}_{2N}{\bar{\mathbf {G}}}_{1N}. \end{aligned}$$
(B-4)

Using (B-4) it is easy to see that Condition II ensures \(\mathrm{E}[\ell _N(\theta )]-\mathrm{E}^q[\ell _N(\theta )]=o(N)\). Therefore, if Condition I and Condition II are met, \(\sup _{\delta : d(\delta ,\delta _{0})>\varepsilon }{\bar{\ell }}_N^{c}(\delta ) < {\bar{\ell }}_N^{c}(\delta _0)\) for every \(\varepsilon >0\), i.e., \(\delta _{0}\) are uniquely identified by the QML estimation. Finally, it can be seen that the uniform convergence, \(\sup _{\delta \in \Delta }\frac{1}{N}|\ell _N^{c}(\delta )-{\bar{\ell }}_N^{c}(\delta )| \overset{p}{\longrightarrow } 0\), also requires Condition II.

Proof of Theorem 3.1:

Proof of consistency. Let \({\bar{\psi }}_{N}(\theta ) = \mathrm{E}[\psi _{N}(\theta )]\), the population counterpart of the joint estimating function \(\psi _{N}(\theta )\) given in (3.2). Given \(\delta \), \( {\bar{\psi }}_{N}(\theta ) \) is partially solved at \({\bar{\beta }}_{N}(\delta )\) and \({\bar{\sigma }}_{N}^{2}(\delta )\), given in (B-1) and (B-2). Plugging \({\bar{\beta }}_{N}(\delta )\) and \({\bar{\sigma }}_{N}^{2}(\delta )\) back into the \(\lambda \)- and \( \rho \)-components \({\bar{\psi }}_{N}(\theta )\), we get the population counterpart of \({{\tilde{\psi }}}_{N}^{c}(\delta )\):

$$\begin{aligned} {\bar{\psi }}_{N}^{c}(\delta )= {\left\{ \begin{array}{ll} \frac{1}{{\bar{\sigma }}_{N}^{2}(\delta )}\text{ E }\big \{ {\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )^{\prime }[\varvec{\eta }_{N}({\bar{\beta }}_{N}(\delta ),\delta )+\bar{{\mathbf {G}}}_{1N}^{\circ }(\delta ){\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )] \big \} \\ \frac{1}{{\bar{\sigma }}_{N}^{2}(\delta )}\text{ E }\big \{ {\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )^{\prime }{\mathbf {G}}_{2N}^{\circ }(\rho ){\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta ) \big \}. \end{array}\right. } \end{aligned}$$

where \({\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )={\mathbb {Y}}_{N}(\delta )-{\mathbb {X}}_{N}(\rho ){\bar{\beta }}_{N}(\delta )\). Working on the numerators of \({\bar{\psi }}_{N}^{c}(\delta )\) and dropping the terms of smaller order, we arrive at \(F_N(\delta )\) given in Assumption 6, which shows that the identification uniqueness condition of Theorem 5.9 of van der Vaart (1998) holds, i.e., for every \(\epsilon >0\), \(\inf _{\delta : d(\delta , \delta _0) \ge \epsilon }\frac{1}{N}\Vert {\bar{\psi }}_{N}^{c}(\delta )\Vert > 0 = \frac{1}{N}\Vert {\bar{\psi }}_{N}^{c}(\delta _0)\Vert \), provided that \({\bar{\sigma }}_{N}^{2}(\delta )\) is bounded from below away from zero. Then, \({\hat{\delta }}_\mathtt{AQS1}\) is consistent if the uniform convergence condition of Theorem 5.9 of van der Vaart (1998) holds, i.e., \(\sup _{\delta \in \Delta }\frac{1}{N}\Vert {{\tilde{\psi }}}_{N}^{c}(\delta )-{\bar{\psi }}_{N}^{c}(\delta )\Vert =o_{p}(1)\). These amount to show

(a):

\({\bar{\sigma }}_{N}^{2}(\delta )\) is bounded from below away from zero;

(b):

\(\sup _{\delta \in \Delta }|{\tilde{\sigma }}_{N}^{2}(\delta )-{\bar{\sigma }}_{N}^{2}(\delta )| = o_{p}(1)\), uniformly in \(\delta \in \Delta \);

(c):

\(\sup _{\delta \in \Delta }\frac{1}{N}\big | {\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )^{\prime }\varvec{\eta }_{N}({\tilde{\beta }}_{N}(\delta ),\delta ) - \mathrm{E}[{\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )]^{\prime }\varvec{\eta }_{N}({\bar{\beta }}_{N}(\delta ),\delta )\big | = o_{p}(1)\);

(d):

\(\sup _{\delta \in \Delta }\frac{1}{N}\big | {\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )^{\prime }\bar{{\mathbf {G}}}_{1N}^{\circ }(\delta ){\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta ) - \text{ E }[{\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )^{\prime }\bar{{\mathbf {G}}}_{1N}^{\circ }(\delta ){\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )] \big | = o_{p}(1)\);

(e):

\(\sup _{\delta \in \Delta }\frac{1}{N}\big | {\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )^{\prime }{\mathbf {G}}_{2N}^{\circ }(\delta ){\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta ) - \text{ E }[{\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )^{\prime }{\mathbf {G}}_{2N}^{\circ }(\delta ){\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )] \big | = o_{p}(1)\);

where \({\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )={\mathbb {Y}}_{N}(\delta )-{\mathbb {X}}_{N}(\rho ){\tilde{\beta }}_{N}(\delta ) = {\mathbb {M}}_N(\rho ){\mathbb {Y}}_{N}(\delta )\), following the notation defined between (2.4) and (2.6). Similarly, \({\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )={\mathbb {Y}}_{N}(\delta )-[{\mathbf {I}}_{N}-{\mathbb {M}}_N(\rho )]\mathrm{E}[{\mathbb {Y}}_{N}(\delta )]\).

For condition (a), from (B-2), it is obvious that the first term of \({\bar{\sigma }}_{N}^{2}(\delta )\) is nonnegative. It suffices to show that the second term, which is \(\sigma _{n}^{2}(\delta )\) defined in Condition I, is uniformly bounded from below away from zero. Consider the model with \(\beta _0=0\) and \(H_{n}=I_{n}\). We have the loglikelihood: \(\ell ^{*}_{N}(\theta ) = -\frac{N}{2} \ln (2\pi \sigma ^2) + \ln |{\mathbf {D}}_{N}(\delta )| - \frac{1}{2\sigma ^2} {\mathbb {Y}}^{\prime }_{N}(\delta ){\mathbb {Y}}_{N}(\delta )\) and \({\bar{\ell }}^{*}_{N}(\delta ) = \max _{\sigma ^2} \mathrm{E}[\ell ^{*}_{N}(\theta )]= const. - \frac{N}{2}\ln (\sigma _{\circ n}^{2}(\delta )) + \ln |{\mathbf {D}}_{N}(\delta )|\), where \(\sigma _{\circ n}^{2}(\delta )=\frac{\sigma _{0}^{2}}{n}\mathtt{tr}[D_{n}^{\prime -1}D_{n}^{\prime }(\delta )D_{n}(\delta )D_{n}^{-1}]\). As \(D_{n}^{\prime -1}D_{n}^{\prime }(\delta )D_{n}(\delta )D_{n}^{-1}\) is positive semidefinite (p.s.d.), \(\sigma _{\circ n}^{2}(\delta ) \ge 0\). By Jensen’s inequality, \({\bar{\ell }}^{*}_{N}(\delta ) \le \max _{\sigma ^2} \mathrm{E}[\ell ^{*}_{N}(\theta _0)] = {\bar{\ell }}^{*}_{N}(\delta _0)\), implying \(-\ln (\sigma _{n}^{2}(\delta )) \le -\ln (\sigma ^2_0) + \frac{2}{N}\ln |{\mathbf {D}}_{N}| - \frac{2}{N}\ln |{\mathbf {D}}_{N}(\delta )| = O(1)\) by Lemma A.2 and the fact that \(\sigma _{0}^{2}\) is bounded away from 0. Thus, \(-\ln (\sigma _{\circ n}^{2}(\delta ))\) is bounded from above, implying \(\sigma _{\circ n}^{2}(\delta )\ne 0\). Therefore, \(\sigma _{\circ n}^{2}(\delta )\) is bounded from below away from 0. Finally, \(\sigma _{n}^{2}(\delta )=\frac{\sigma _{0}^{2}}{n}\mathtt{tr}[H_{n}D_{n}^{\prime -1}D_{n}^{\prime }(\delta )D_{n}(\delta )D_{n}^{-1}] \ge \min (h_{i})\sigma _{\circ n}^{2}(\delta ) \ge c > 0\), as \(D_{n}^{\prime -1}D_{n}^{\prime }(\delta )D_{n}(\delta )D_{n}^{-1}\) is p.s.d., and \(H_{n}\) is a diagonal matrix with strictly positive elements.

For condition (b), using \({\mathbb {Y}}_{N}(\delta )={\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}({\mathbf {A}}_{2N}{\mathbf {X}}_{N}\beta _{0}+(F_{T,T-1}^{\prime }\otimes I_{n}){\mathbb {V}}_{nT})\), where \({\mathbb {V}}_{nT}\) is the \(nT\times 1\) vector of original errors, we can write \({{\tilde{\sigma }}}_{N}^{2}(\delta )= \frac{1}{N}{\mathbb {Y}}_{N}^{\prime }(\delta ){\mathbb {M}}_N(\rho ) {\mathbb {Y}}_{N}(\delta )\) as

$$\begin{aligned} {{\tilde{\sigma }}}_{N}^{2}(\delta )&= \textstyle \frac{1}{N}{\mathbf {f}}_{N}^{\prime }{\mathbf {D}}_{N}^{\prime }(\delta ) {\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ){\mathbf {f}}_{N} + \frac{2}{N}{\mathbf {f}}_{N}^{\prime }{\mathbf {D}}_{N}^{\prime }(\delta ) {\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1} (F_{T,T-1}^{\prime }\otimes I_{n}){\mathbb {V}}_{nT}\\&\quad \textstyle + \frac{1}{N}{\mathbb {V}}_{nT}^{\prime }(F_{T,T-1}\otimes I_{n}) {\mathbf {D}}_{N}^{\prime -1}{\mathbf {D}}_{N}^{\prime }(\delta ) {\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1} (F_{T,T-1}^{\prime }\otimes I_{n}){\mathbb {V}}_{nT}, \end{aligned}$$

giving \({\tilde{\sigma }}_{N}^{2}(\delta )-{\bar{\sigma }}_{N}^{2}(\delta ) = Q_{1} + Q_{2}-\sigma _{N}^{2}(\delta )\), where \(Q_{1}(\delta ) = \frac{2}{N}{\mathbf {f}}_{N}^{\prime }{\mathbf {D}}_{N}^{\prime }(\delta ) {\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1} (F_{T,T-1}^{\prime }\otimes I_{n}){\mathbb {V}}_{nT}\), and \(Q_{2}(\delta ) = \frac{1}{N}{\mathbb {V}}_{nT}^{\prime }(F_{T,T-1}\otimes I_{n}) {\mathbf {D}}_{N}^{\prime -1}{\mathbf {D}}_{N}^{\prime }(\delta ){\mathbb {M}}_{N}(\rho ) {\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}(F_{T,T-1}^{\prime }\otimes I_{n}) {\mathbb {V}}_{nT}\).

For \(Q_{1}(\delta )\), it is easy to see that, under Assumptions 35 and by Lemma A.2, the elements of \({\mathbf {f}}_{N}^{\prime }{\mathbf {D}}_{N}^{\prime }(\delta ){\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}(F_{T,T-1}^{\prime }\otimes I_{n})\) are uniformly bounded for each \(\delta \in \Delta \), the pointwise convergence, \(Q_{1}(\delta )\overset{p}{\rightarrow } 0\), therefore follows from Lemma A.3. For \(Q_{2}(\delta )\), under Assumptions 35 and by Lemma A.2(v), \(\mathtt{tr}[{\mathbf {D}}_{N}^{\prime -1}{\mathbf {D}}_{N}^{\prime }(\delta ){\mathbb {M}}_{N}(\rho ){\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}] = \mathtt{tr}[{\mathbf {D}}_{N}^{\prime -1}{\mathbf {D}}_{N}^{\prime }(\delta ){\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}] + O(1)\). It follows that, by Lemma A.4(v), \(Q_{2}(\delta )-\sigma _{N}^{2}(\delta )\overset{p}{\rightarrow } 0\), for each \(\delta \in \Delta \).

To show that \(Q_{r}(\delta ), r=1,2\), are stochastically equicontinuous, let \(\delta _{1}\) and \(\delta _{2}\) be two points in \(\Delta \). We have by the mean value theorem:

$$\begin{aligned} \textstyle Q_{r}(\delta _{2}) - Q_{r}(\delta _{1}) = \frac{\partial }{\partial \delta ^{\prime }}Q_{r}({{\bar{\delta }}})(\delta _{2}-\delta _{1}),\; r=1,2, \end{aligned}$$

where \({{\bar{\delta }}}\) lies between \(\delta _{1}\) and \(\delta _{2}\) elementwise. It is easy to show that \(\sup _{\delta \in \Delta }|\frac{\partial }{\partial \lambda }Q_{r}(\delta )| = O_{p}(1)\), by Assumptions 1, 3, 4, and 5, and Lemma A.2, as \(Q_{r}(\delta )\) are linear or quadratic in \(\lambda \) by the expression \({\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}=I_{N}+(\rho _{0}-\rho ){{\mathbf {G}}}_{2N}+(\lambda _{0}-\lambda )\bar{{\mathbf {G}}}_{1N}+(\lambda _{0}-\lambda )(\rho _{0}-\rho ){{\mathbf {G}}}_{2N}\bar{{\mathbf {G}}}_{1N}\). Now, \(\rho \) appears in \(Q_{r}(\delta )\) nonlinearly only through \({\mathbb {M}}_{N}(\rho )\). It is easy to show that \(\frac{\partial }{\partial \rho }{\mathbb {M}}_{N}(\rho )\) is uniformly bounded in both row and column sums by Lemma A.2, uniformly in \(\rho \) in its compact space, and that \(\sup _{\delta \in \Delta }|\frac{\partial }{\partial \rho }Q_{r}(\delta )| = O_{p}(1)\). Therefore, \(Q_{r}(\delta ), r=1,2\), are stochastically equicontinuous. The pointwise convergence and stochastic equicontinuity imply that \(Q_{r}(\delta )-\mathrm{E}[Q_{r}(\delta )] \overset{p}{\longrightarrow } 0\), uniformly in \(\delta \in \Delta , r=1,2\), leading to condition (b) (Newey 1991).

For condition (c), we have \(\varvec{\eta }_{N}({\tilde{\beta }}_{N}(\delta ),\delta )=\bar{{\mathbf {G}}}_{1N}(\delta ){\mathbb {P}}_{N}(\rho ){\mathbb {Y}}_{N}(\delta )\) and \(\varvec{\eta }_{N}({\bar{\beta }}_{N}(\delta ),\delta )=\bar{{\mathbf {G}}}_{1N}(\delta ){\mathbb {P}}_{N}(\rho )\mathrm{E}[{\mathbb {Y}}_{N}(\delta )]\). With \({\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )= {\mathbb {M}}_N(\rho ){\mathbb {Y}}_{N}(\delta )\) and \({\mathbb {Y}}_{N}(\delta )={\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}({\mathbf {A}}_{2N}{\mathbf {X}}_{N}\beta _{0}+(F_{T,T-1}^{\prime }\otimes I_{n}){\mathbb {V}}_{nT})\), we see that \(\frac{1}{N}\{{\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )^{\prime }\varvec{\eta }_{N}({\tilde{\beta }}_{N}(\delta ),\delta )-\mathrm{E}[{\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )]^{\prime }\varvec{\eta }_{N}({\bar{\beta }}_{N}(\delta ),\delta )\}\) is of the linear-quadratic form: \({\mathbb {V}}_{nT}^{\prime }{\mathbb {A}}_{nT}(\delta ){\mathbb {V}}_{nT} + {\mathbf {c}}_{nT}^{\prime }(\delta ){\mathbb {V}}_{nT}\), for suitably defined matrix \({\mathbb {A}}_{nT}(\delta )\) and vector \({\mathbf {c}}_{nT}(\delta )\). Its pointwise convergence follows from Lemma A.4(v), and uniform convergence is proved in a similar way as that for (b), based on the theorem of Newey (1991).

For condition (d), again with the expressions for \({\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )\) and \({\mathbb {Y}}_{N}(\delta )\), we can write \(\frac{1}{N}\{{\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )^{\prime }\bar{{\mathbf {G}}}_{1N}^{\circ }(\delta ){\mathbf {V}}({\tilde{\beta }}_{N}(\delta ),\delta )-\text{ E }[{\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )^{\prime }\bar{{\mathbf {G}}}_{1N}^{\circ }(\delta ){\mathbf {V}}({\bar{\beta }}_{N}(\delta ),\delta )]\}\) as a linear-quadratic form in \({\mathbb {V}}_{nT}\), and the proof of uniform convergence proceeds similarly.

For condition (e), similar to the proof of (d).

Proof of asymptotic normality. First note that \(\hbox {tr}(H_n) = n\). By the mean value theorem,

$$\begin{aligned} \textstyle \sqrt{N}({\hat{\theta }}_\mathtt{AQS1} - \theta _{0}) = -\big [ \frac{1}{N} \frac{\partial }{\partial \theta '} \psi _{N}({\tilde{\theta }}) \big ]^{-1} \frac{1}{\sqrt{N}} \psi _{N}(\theta _{0}), \end{aligned}$$

where \({\tilde{\theta }}\) lies element-wise between \({\hat{\theta }}_\mathtt{AQS1}\) and \(\theta _{0}\). It mounts to show that,

(i):

\(\frac{1}{\sqrt{N}}\psi _{N}(\theta _{0}) \overset{D}{\longrightarrow } N(0,\; \lim _{N\rightarrow \infty } \Omega _{N})\), where \(\Omega _{N} = \frac{1}{N} \mathrm{Var}[\psi _{N}(\theta _{0})]\)

(ii):

\(\frac{1}{N}\big [\frac{\partial }{\partial \theta '}\psi _{N}({\tilde{\theta }}) - \frac{\partial }{\partial \theta '} \psi _{N}(\theta _{0})\big ] = o_p(1)\), and

(iii):

\(\frac{1}{N}\big [\frac{\partial }{\partial \theta '}\psi _{N}(\theta _{0}) - \text {E}(\frac{\partial }{\partial \theta '}\psi _{N}(\theta _{0})) \big ]= o_p(1)\).

As argued above Theorem 3.1, the components of \(\psi _{N}(\theta _{0})\) are linear or linear-quadratic forms in the original error vector \({\mathbb {V}}_{nT}\) since \({\mathbf {V}}_{N}=(f_{T,T-1}^{\prime }\otimes I_{n}){\mathbb {V}}_{nT}\). Assumptions 15 ensures that every fixed linear combination of \(\frac{1}{\sqrt{N}}\psi _{N}(\theta _{0})\) satisfies the conditions of the central limit theorem (CLT) for linear-quadratic (LQ) forms of Kelejian and Prucha (2001) and hence is asymptotically normal. Therefore, Cramér-Wold device leads to \(\frac{1}{\sqrt{N}} \psi _{N}(\theta _{0}) \overset{D}{\longrightarrow } N(0,\lim _{N\rightarrow \infty }\Omega _{N})\).

For condition (ii): letting \({\mathcal {H}}_N(\theta ) = -\frac{1}{N}\frac{\partial }{\partial \theta '} \psi _{N}(\theta )\) and denoting \(A_{n}^{s}= A_{n}+A_{n}^{\prime }\) for a matrix \(A_{n}\), we have the expression for \(N\sigma ^{2}{\mathcal {H}}_N(\theta )\):

$$\begin{aligned} \left( \begin{array}{llll} {\mathbb {X}}_{N}^{\prime }(\rho ){\mathbb {X}}_{N}(\rho ), &{} \frac{1}{\sigma ^{2}}{\mathbb {X}}_{N}^{\prime }(\rho )\mathbf{V}_{N}(\beta ,\delta ), &{} {\mathbb {X}}_{N}^{\prime }(\rho )\varvec{\Pi }_{1N}, &{} {\mathbb {X}}_{N}^{\prime }(\rho )\mathbf{G}_{2N}^{s}(\rho )\mathbf{V}_{N}(\beta ,\delta )\\ \frac{1}{\sigma ^{2}}\mathbf{V}_{N}^{\prime }(\beta ,\delta ){\mathbb {X}}_{N}(\rho ), &{} \frac{1}{\sigma ^{4}}\Vert \mathbf{V}_{N}(\beta ,\delta )\Vert ^{2}-\frac{N}{2\sigma ^{2}}, &{} \frac{1}{\sigma ^{2}}\mathbf{V}_{N}^{\prime }(\beta ,\delta )\varvec{\Pi }_{1N}, &{} \frac{1}{\sigma ^{2}}\mathbf{V}_{N}^{\prime }(\beta ,\delta )\varvec{\Pi }_{2N}\\ \varvec{\Pi }_{1N}^{\circ \prime }{\mathbb {X}}_{N}(\rho ), &{} \frac{1}{\sigma ^{2}}\mathbf{V}_{N}^{\prime }(\beta ,\delta )\varvec{\Pi }_{1N}^{\circ }, &{} {\mathcal {H}}_{33}(\beta ,\delta ), &{} {\mathcal {H}}_{34}(\beta ,\delta )\\ \mathbf{V}_{N}^{\prime }(\beta ,\delta )\mathbf{G}_{2N}^{\circ s}(\rho ){\mathbb {X}}_{N}(\rho ), &{} \frac{1}{\sigma ^{2}}\mathbf{V}_{N}^{\prime }(\beta ,\delta )\varvec{\Pi }_{2N}^{\circ }, &{} {\mathcal {H}}_{43}(\beta ,\delta ), &{} {\mathcal {H}}_{44}(\beta ,\delta ) \end{array}\right) , \end{aligned}$$

where \({\mathcal {H}}_{33}(\beta ,\delta ) = \varvec{\Pi }_{1N}^{\circ \prime }\varvec{\Pi }_{1N}^{\circ }+\mathbf{V}_{N}^{\prime }(\beta ,\delta )\dot{\varvec{\Pi }}_{1N}^{\circ }\), \({\mathcal {H}}_{43}(\beta ,\delta ) =\varvec{\Pi }_{1N}^{\circ \prime }\varvec{\Pi }_{2N}^{\circ }+\mathbf{V}_{N}^{\prime }(\beta ,\delta )\mathbf{G}_{2N}^{\circ s}(\rho )\mathbf{V}_{N}(\beta ,\delta )={\mathcal {H}}_{34}^{\prime }(\beta ,\delta )\), \({\mathcal {H}}_{44}(\beta ,\delta ) = \mathbf{V}_{N}^{\prime }(\beta ,\delta )\mathbf{G}_{2N}^{\circ s}(\rho )\mathbf{V}_{N}(\beta ,\delta )\), \(\varvec{\Pi }_{1N}=\varvec{\eta }_{N}(\beta ,\delta )+\bar{\mathbf{G}}_{1N}(\delta )\mathbf{V}_{N}(\beta ,\delta )\), \(\dot{\varvec{\Pi }}_{1N}=\frac{\partial }{\partial \lambda }\varvec{\Pi }_{1N}\), \(\varvec{\Pi }_{2N}=\mathbf{G}_{2N}(\rho )\mathbf{V}_{N}(\beta ,\delta )\), \(\varvec{\Pi }_{1N}^{\circ }=\varvec{\eta }_{N}(\beta ,\delta )+\bar{\mathbf{G}}_{1N}^{\circ }(\delta )\mathbf{V}_{N}(\beta ,\delta )\), and \(\varvec{\Pi }_{2N}^{\circ }=\mathbf{G}_{2N}^{\circ }(\rho )\mathbf{V}_{N}(\beta ,\delta )\).

By Assumptions 35, Lemma A.2A.3, and the following facts: \({\tilde{\theta }} - \theta _0 = o_p(1)\), \(\mathbf{V}_{N}({\tilde{\beta }},{\tilde{\delta }}) = \mathbf{A}_{2N}\mathbf{X}_{N}(\beta _{0}-{{\tilde{\beta }}})+(\lambda _{0}-{{\tilde{\lambda }}})\mathbf{A}_{2N}{} \mathbf{W}_{1N}{} \mathbf{Y}_{N}+(\rho _{0}-{{\tilde{\rho }}})\mathbf{W}_{2N}{} \mathbf{A}_{1N}\mathbf{Y}_{N}+(\lambda _{0}-{{\tilde{\lambda }}})(\rho _{0}-{{\tilde{\rho }}})\mathbf{W}_{2N}{} \mathbf{W}_{1N}{} \mathbf{Y}_{N}-(\rho _{0}-{{\tilde{\rho }}})\mathbf{W}_{2N}{} \mathbf{X}_{N}{{\tilde{\beta }}} + \mathbf{V}_{N}\), \(\frac{1}{N}\mathbf{V}^{\prime }_{N}({\tilde{\beta }},{\tilde{\delta }})\mathbf{V}_{N}({\tilde{\beta }},{\tilde{\delta }}) = \frac{1}{N}\mathbf{V}^{\prime }_{N} \mathbf{V}_{N} + o_p(1)\), and the \(\varvec{\eta }_{N}\) and G-quantities are all smooth functions of \(\beta \) and \(\delta \), it is straightforward but tedious to show that each term in \({\mathcal {H}}_N({\tilde{\theta }})-{\mathcal {H}}_N(\theta _{0})\) is \(o_p(1)\). We thus omit the details.

For condition (iii), recall \(\Phi _{N}=\mathrm{E}[{\mathcal {H}}_N(\theta _{0})]\). We have

$$\begin{aligned} \Phi _{N} = \frac{1}{N\sigma _{0}^{2}}\left( \begin{array}{llll} {\mathbb {X}}_{N}^{\prime }{\mathbb {X}}_{N}, &{} \sim , &{} \sim , &{} \sim \\ 0, &{} \frac{N}{2\sigma _{0}^{2}},\; &{} \mathtt{tr}(\mathbf{H}_{N}\bar{\mathbf{G}}_{1N}), &{} \mathtt{tr}(\mathbf{H}_{N}{} \mathbf{G}_{2N})\\ \varvec{\eta }_{N}^{\prime }{\mathbb {X}}_{N} &{} 0, &{} \varvec{\eta }_{N}^{\prime }\varvec{\eta }_{N}+\sigma _{0}^{2}\mathtt{tr}(\mathbf{H}_{N}\bar{\mathbf{G}}_{1N}^{\circ s}\bar{\mathbf{G}}_{1N}^{\circ }), &{} \sim \\ 0, &{} 0, &{} \sigma _{0}^{2}\mathtt{tr}(\mathbf{H}_{N}{} \mathbf{G}_{2N}^{\circ s}\bar{\mathbf{G}}_{1N}^{\circ }), &{} \sigma _{0}^{2}\mathtt{tr}(\mathbf{H}_{N}{} \mathbf{G}_{2N}^{\circ s}{} \mathbf{G}_{2N}^{\circ }) \end{array}\right) . \end{aligned}$$

By Lemma A.4 and \(\mathbf{V}_{N}=(F_{T,T-1}^{\prime }\otimes I_{n}){\mathbb {V}}_{nT}\), we have, \(\mathrm{Var}[\frac{1}{N}(\mathbf{V}^{\prime }_{N}{} \mathbf{B}_N \mathbf{V}_{N} + c_{N}^{\prime }{} \mathbf{V}_{N})] = o(1)\) for any \(N\times N\) matrix \(\mathbf{B}_N\) and \(N\times 1\) vector \(c_{N}\) satisfying the conditions of Lemma A.4. By these results and Chebyshev inequality, we can show that all the terms in \({\mathcal {H}}_N(\theta _{0})-\Phi _N\) are \(o_p(1)\). \(\square \)

Proof of Theorem 3.2

The result \({{\widehat{\Phi }}}_\mathtt{AQS1}-\Phi _{N} \overset{p}{\longrightarrow } 0\) follows from the results (ii) and (iii) in the proof of asymptotic normality part of the proof of Theorem 3.1. This result holds irrespective of whether the errors are normal or non-normal, and T is small or large.

To show \({{\widehat{\Omega }}}_\mathtt{AQS1}^{\dagger }-\Omega _{N} \overset{p}{\longrightarrow } 0\), we first prove the following general result:

$$\begin{aligned} \textstyle \frac{1}{N}\sum _{j=1}^{N}[\hat{\mathbf{s}}_{N,j}\hat{\mathbf{s}}_{N,j}^{\prime }-\mathrm{E}(\mathbf{s}_{N,j}{} \mathbf{s}_{N,j}^{\prime })] \overset{p}{\longrightarrow } 0. \end{aligned}$$
(B-5)

Under normality, \(\Omega _{N}=\frac{1}{N}\sum _{j=1}^{N}\mathrm{E}(\mathbf{s}_{N,j}{} \mathbf{s}_{N,j}^{\prime })\) and therefore (B-5) already gives the desired result. The proof of (B-5) is relatively simple, as in this case the transformed errors \(\mathbf{v}_{j}\) are inid normal, and hence \(\{\mathbf{s}_{N,j}, {{\mathcal {F}}}_{N,j}\}\) form an MD sequence. See the proof of Theorem 3.5 for details.

Under non-normality, the proof of (B-5) is not trivial, and therefore for the proof of this theorem we concentrate on the case of non-normal errors. First, we prove (B-5) by showing \(\frac{1}{N}\sum _{j=1}^{N}(\hat{\mathbf{s}}_{N,j}\hat{\mathbf{s}}_{N,j}^{\prime }-\mathbf{s}_{N,j}\mathbf{s}_{N,j}^{\prime })\overset{p}{\longrightarrow } 0\), and \(\frac{1}{N}\sum _{j=1}^{N}[\mathbf{s}_{N,j}\mathbf{s}_{N,j}^{\prime }-\mathrm{E}(\mathbf{s}_{N,j}\mathbf{s}_{N,j}^{\prime })]\overset{p}{\longrightarrow } 0\). The proof of the former is trivial by applying the mean value theorem, due to the consistency of the parameter estimates. We focus on the proof of the latter result. To facilitate the proofs, we freely switching between the single index j for the combined unit and time, and the double indices (it) for unit i and time t. Recall \(v_{it}\) are the original errors and \(v_{it}^{*}\) are the transformed errors, and \(v_{t}^{*}\) is the \(n\times 1\) vector of transformed errors for period t. As \(\mathbf{s}_{N,j}\) or \(\mathbf{s}_{N,it}\) contains only two types of quantities: \(\Pi _{N,j}{} \mathbf{v}_{N,j}\) and \(\mathbf{v}_{N,j}\varvec{\zeta }_{N,j}^{\circ }\) or \(\Pi _{it}\mathbf{v}_{it}^{*}\) and \(\mathbf{v}_{it}^{*}\varvec{\zeta }_{it}^{\circ }\), it suffices to show

(a):

\(\frac{1}{N}\sum _{j=1}^{N}[\Pi _{N,j}\Pi _{N,j}^{\prime }(\mathbf{v}_{N,j}^{2}-\mathrm{E}{} \mathbf{v}_{N,j}^{2})] \overset{p}{\rightarrow } 0\),

(b):

\(\frac{1}{N}\sum _{j=1}^{N}[\Pi _{N,j}(\mathbf{v}_{N,j}^{2} \varvec{\zeta }_{N,j}^{\circ } -\mathrm{E}(\mathbf{v}_{N,j}^{2} \varvec{\zeta }_{N,j}^{\circ }))] \overset{p}{\rightarrow } 0\), and

(c):

\(\frac{1}{N}\sum _{j=1}^{N}[(\mathbf{v}_{N,j}\varvec{\zeta }_{N,j}^{\circ })^{2}-\mathrm{E}((\mathbf{v}_{N,j}\varvec{\zeta }_{N,j}^{\circ })^{2}))] \overset{p}{\rightarrow } 0\).

To show (a), we have \(\frac{1}{N}\sum _{j=1}^{N}[\Pi _{N,j}\Pi _{N,j}^{\prime }(\mathbf{v}_{N,j}^{2}-\mathrm{E}(\mathbf{v}_{N,j}^{2}))]=\frac{1}{T-1}\sum _{t=1}^{T-1}\big \{\frac{1}{n}\sum _{i=1}^{n}[\Pi _{it}\Pi _{it}^{\prime }(v_{it}^{* 2}-\mathrm{E}(v_{it}^{* 2}))]\big \} \equiv \frac{1}{T-1}\sum _{t=1}^{T-1} P_{nt}\). For each t, \(v_{it}^{*}\) are independent over i, and thus \(\{v_{it}^{* 2}-\mathrm{E}(v_{it}^{* 2})\}\) form an MD sequence. The weak law of large numbers (WLLN) for MD arrays of Davidson (1994, p.299) leads to \(P_{nt} \overset{p}{\longrightarrow } 0\). Thus, \(\frac{1}{T-1}\sum _{t=1}^{T-1}P_{nt}\overset{p}{\longrightarrow } 0\), as \(n \rightarrow \infty \) and then \(T\rightarrow \infty \).

To show (b), note that \(\varvec{\zeta }_{N}^{\circ }=\mathbf{B}_{N}{} \mathbf{V}_{N}\) by definition given above (3.6), where \(\mathbf{B}_{N}\) is a strictly lower triangular matrix. Decompose \(\varvec{\zeta }_{N}^{\circ }\) into \(\{\varvec{\zeta }_{t}^{\circ }\}\) and \(\mathbf{B}_{N}\) into \(\{\mathbf{B}_{ts}\}\), \(t,s=1,\ldots ,T-2\). Note that \(\mathbf{B}_{ts}\) is a zero matrix if \(s>t\), a strictly lower triangular matrix if \(s=t\) and a full \(n\times n\) matrix if \(s<t\). We have, \(\frac{1}{N}\sum _{j=1}^{N}[\Pi _{N,j}(\mathbf{v}_{N,j}^{2} \varvec{\zeta }_{N,j}^{\circ } -\mathrm{E}(\mathbf{v}_{N,j}^{2} \varvec{\zeta }_{N,j}^{\circ }))] = \frac{1}{T-1}\sum _{t=1}^{T-1}\big \{\frac{1}{n}\sum _{i=1}^{n}[\Pi _{it}(v_{it}^{* 2}\varvec{\zeta }_{it}^{\circ } -\mathrm{E}(v_{it}^{* 2}\varvec{\zeta }_{it}^{\circ }))]\} \equiv \frac{1}{T-1}\sum _{t=1}^{T-1}Q_{nt}\). We shall show that for each t, \(Q_{nt}\overset{p}{\longrightarrow } 0\). First, we have,

$$\begin{aligned} \textstyle Q_{n1} = \frac{1}{n}\sum _{i=1}^{n}[\Pi _{i1}((v_{i1}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i1}^{\circ } +\sigma _{0}^{2}h_{i}\varvec{\zeta }_{i1}^{\circ }-\mathrm{E}(v_{i1}^{* 2}\varvec{\zeta }_{i1}^{\circ }))] = Q_{n1}^{a}+Q_{n1}^{b}. \end{aligned}$$

Let \({\mathcal {G}}_{n,i}\) be the increasing \(\sigma \)-field generated by \((v_{1\cdot }, \ldots , v_{i\cdot })\), where \(v_{i\cdot }\) is the \(T\times 1\) vector of the original idiosyncratic errors corresponding to the ith spatial unit. As \(\varvec{\zeta }_{i1}^{\circ }\) is \({\mathcal {G}}_{n,i-1}\)-measurable, \(\mathrm{E}[(v_{i1}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i1}^{\circ }|{\mathcal {G}}_{n,i-1}]=0\). Thus, \(Q_{n1}^{a}=\frac{1}{n}\sum _{i=1}^{n}\Pi _{i1}(v_{i1}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i1}^{\circ }\) is the sum of an MD array. By the WLLN for MD arrays of (Davidson 1994, p. 299), \(Q_{n1}^{a}\overset{p}{\longrightarrow } 0\). Now, as \(\mathrm{E}(v_{i1}^{* 2}\varvec{\zeta }_{i1}^{\circ }))=0\), \(Q_{n1}^{b} = \frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}\Pi _{i1}h_{i}\varvec{\zeta }_{i1}^{\circ }\). Then, \(Q_{n1}^{b}=\frac{\sigma _{0}^{2}}{n}\Pi _{1}^{\prime }H_{n}\varvec{\zeta }_{1}^{\circ }=\frac{\sigma _{0}^{2}}{n}\Pi _{1}^{\prime }H_{n}\mathbf{B}_{11}v_{1}^{*}\overset{p}{\longrightarrow } 0\), by Assumptions 25, Lemma A.2 and Chebyshev’s WLLN (Serfling 1980, p. 27). Therefore, \(Q_{n1}\overset{p}{\longrightarrow } 0\).

Next, to show \(Q_{n2}\overset{p}{\longrightarrow } 0\), first note that \(\varvec{\zeta }_{2}^{\circ }=\mathbf{B}_{21}v_{1}^{*}+\mathbf{B}_{22}v_{2}^{*}=(\mathbf{B}_{21}^{u}+\mathbf{B}_{21}^{l}+\mathbf{B}_{21}^{d})v_{1}^{*}+\mathbf{B}_{22}v_{2}^{*}=\varvec{\zeta }_{2,1}^{\circ u}+\varvec{\zeta }_{2,1}^{\circ l}+\varvec{\zeta }_{2,1}^{\circ d}+\varvec{\zeta }_{2,2}^{\circ }\). We have, \(Q_{n2} = \frac{1}{n}\sum _{i=1}^{n}[\Pi _{i2}(v_{i2}^{* 2}\varvec{\zeta }_{i2}^{\circ }-\mathrm{E}(v_{i2}^{* 2}\varvec{\zeta }_{i2}^{\circ }))] = \sum _{r=1}^{4}Q_{n2}^{(r)}\), where

$$\begin{aligned} Q_{n2}^{(1)}= & {} \frac{1}{n}\sum _{i=1}^{n}[\Pi _{i1}(v_{i2}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i2,1}^{\circ u}] + \frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}\varvec{\zeta }_{i2,1}^{\circ u},\\ Q_{n2}^{(2)}= & {} \frac{1}{n}\sum _{i=1}^{n}[\Pi _{i1}(v_{i2}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i2,1}^{\circ l}] + \frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}\varvec{\zeta }_{i2,1}^{\circ l},\\ Q_{n2}^{(3)}= & {} \frac{1}{n}\sum _{i=1}^{n}[\Pi _{i1}(v_{i2}^{* 2}\varvec{\zeta }_{i2,1}^{\circ d} - \mathrm{E}(v_{i2}^{* 2}\varvec{\zeta }_{i2,1}^{\circ d})].\\ Q_{n2}^{(4)}= & {} \frac{1}{n}\sum _{i=1}^{n}[\Pi _{i1}(v_{i2}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i2,2}^{\circ }] + \frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}\varvec{\zeta }_{i2,2}^{\circ }, \end{aligned}$$

The first terms of \(Q_{n2}^{(1)}\) and \(Q_{n2}^{(4)}\) are like \(Q_{n1}^{a}\) and their second terms are like \(Q_{n1}^{b}\); thus \(Q_{n2}^{(2)}=o_{p}(1)\) and \(Q_{n2}^{(4)}=o_{p}(1)\). As \(\varvec{\zeta }_{i2,1}^{\circ u}\) is \(\bar{{\mathcal {G}}}_{n,i+1}\)-measurable, where \(\bar{{\mathcal {G}}}_{n,i}\) is a decreasing \(\sigma \)-field generated by \((v_{i\cdot }, \ldots , v_{n\cdot })\), \(\frac{1}{n}\sum _{i=1}^{n}[\Pi _{i1}(v_{i2}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i2,1}^{\circ u}]\) is the sum of an MD sequence, shown to be \(o_{p}(1)\). That \(\frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}\varvec{\zeta }_{i2,1}^{\circ u}\) is \(o_{p}(1)\) follows from some similar arguments for \(Q_{n1}^{b}\). Thus, \(Q_{n2}^{(2)}=o_{p}(1)\). Finally, as \(v_{i2}^{* 2}\varvec{\zeta }_{i2,1}^{\circ d}\) is measurable w.r.t. \(v_{i\cdot }\) and thus are independent. An application of WLLN for MD arrays shows that \(Q_{n2}^{(3)}=o_{p}(1)\). Therefore, \(Q_{n2}=o_{p}(1)\). The proof of \(Q_{nt}\overset{p}{\longrightarrow } 0\) for \(t \ge 3\) follows similar arguments as those for \(Q_{n2}\), although more tedious.

To show (c), we have \(\frac{1}{N}\sum _{j=1}^{N}[(\mathbf{v}_{N,j}\varvec{\zeta }_{N,j}^{\circ })^{2}-\mathrm{E}((\mathbf{v}_{N,j}\varvec{\zeta }_{N,j}^{\circ })^{2}))] = \frac{1}{T-1}\sum _{t=1}^{T-1}\big \{\frac{1}{n}\sum _{i=1}^{n}[(v_{it}^{*}\varvec{\zeta }_{it}^{\circ })^{2}-\mathrm{E}((v_{it}^{*}\varvec{\zeta }_{it}^{\circ })^{2})]\big \} = \frac{1}{T-1}\sum _{t=1}^{T-1}R_{nt}\). Thus, the result follows if each \(R_{nt}\) is \(o_{p}(1)\).

First, we have \(R_{n1}=\frac{1}{n}\sum _{i=1}^{n}(v_{i1}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i1}^{\circ 2} + \frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}(\varvec{\zeta }_{i1}^{\circ 2}-\mathrm{E}(\varvec{\zeta }_{i1}^{\circ 2}))\). Obviously, the first term of \(R_{n1}\) is the sum of an MD sequence, which can easily be shown to be \(o_{p}(1)\) by applying the WLLN for MD arrays. For the second term, note that \(\varvec{\zeta }_{i1}^{\circ }=\sum _{k=1}^{i-1}b_{11,ik}v_{1k}^{*}\), where \(b_{11,ik}\) is the (ik)th element of \(\mathbf{B}_{11}\). Thus, \(\varvec{\zeta }_{i1}^{\circ 2}=\sum _{k=1}^{i-1}b_{11,ik}^{2}v_{1k}^{* 2} + 2\sum _{k=1}^{i-1}\sum _{l=1}^{k-1}b_{11,ik}v_{1k}^{*}b_{11,il}v_{1l}^{*}\). Then,

$$\begin{aligned}&\textstyle \frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}(\varvec{\zeta }_{i1}^{\circ 2}-\mathrm{E}(\varvec{\zeta }_{i1}^{\circ 2})) \\&\quad = \textstyle \frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}[\sum _{k=1}^{i-1}b_{11,ik}^{2}(v_{1k}^{* 2}-\mathrm{E}(v_{1k}^{* 2}))]+\frac{2\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}[\sum _{k=1}^{i-1}\sum _{l=1}^{k-1}b_{11,ik}b_{11,il}v_{1k}^{*}v_{1l}^{*}]\\&\quad = \textstyle \frac{\sigma _{0}^{2}}{n}\sum _{k=1}^{n-1}(\sum _{i=k+1}^{n}h_{i}b_{11,ik}^{2})(v_{1k}^{* 2}-\mathrm{E}(v_{1k}^{* 2})) + \frac{2\sigma _{0}^{2}}{n}\sum _{k=1}^{n-1}\xi _{k}^{*}v_{1k}^{*}, \end{aligned}$$

where \(\xi _{k}^{*}=\sum _{l=1}^{k-1}(\sum _{i=k+1}^{n}h_{i}b_{11,ik}b_{11,il})v_{1l}^{*}\), and the last equality is obtained by switching the orders of summations. Both terms are sums of MD sequences as \(\xi _{k}^{*}\) is \({{\mathcal {G}}}_{n,k-1}\)-measurable, which are shown to be \(o_{p}(1)\) by applying the WLLN for MD sequences. Therefore, \(R_{n1}=o_{p}(1)\).

Next, we have \(R_{n2}=\frac{1}{n}\sum _{i=1}^{n}(v_{i2}^{* 2}-\sigma _{0}^{2}h_{i})\varvec{\zeta }_{i2}^{\circ 2} + \frac{\sigma _{0}^{2}}{n}\sum _{i=1}^{n}h_{i}(\varvec{\zeta }_{i2}^{\circ 2}-\mathrm{E}(\varvec{\zeta }_{i2}^{\circ 2}))\). Applying the decomposition \(\varvec{\zeta }_{2}^{\circ }=\varvec{\zeta }_{2,1}^{\circ u}+\varvec{\zeta }_{2,1}^{\circ l}+\varvec{\zeta }_{2,1}^{\circ d}+\varvec{\zeta }_{2,2}^{\circ }\) as used in proving \(Q_{n2}\overset{p}{\rightarrow } 0\), we are able to decompose \(R_{n2}\) into a sum of a finite number of terms, of which each is \(o_{p}(1)\), and hence \(R_{n2}\) itself is \(o_{p}(1)\). The detail for this and these for \(R_{nt}, t\ge 3\), are very tedious and hence are omitted.

It remains to show that \(\frac{2}{n(T-1)}\sum _{i=1}^{n}\sum _{t=2}^{T-1}\sum _{s=1}^{t-1}[{\mathbf {s}}_{N,it}{\mathbf {s}}_{N,is}^{\prime }-\mathrm{E}({\mathbf {s}}_{N,it}{\mathbf {s}}_{N,is}^{\prime })]\overset{p}{\longrightarrow } 0\), and that \(\frac{2}{n(T-1)}\sum _{i=1}^{n}\sum _{t=2}^{T-1}\sum _{s=1}^{t-1}[\hat{{\mathbf {s}}}_{N,it}\hat{{\mathbf {s}}}_{N,is}^{\prime }-{\mathbf {s}}_{N,it}{\mathbf {s}}_{N,is}^{\prime }]\overset{p}{\longrightarrow } 0\). The latter is straightforward by applying the mean value theorem, and the former can proved a long the same line of the proof above.

Finally, we offer a discussion on the magnitude of the additional term in \({\widehat{\Omega }}_\mathtt{AQS1}^{\dagger }\). It is asymptotically equivalent to \(\frac{2}{N}\sum _{i=1}^{n}\sum _{t=2}^{T-1}\sum _{s=1}^{t-1}\mathrm{E}({\mathbf {s}}_{N,it}{\mathbf {s}}_{N,is}^{\prime })\). Denote the elements of \(\mathrm{E}({\mathbf {s}}_{N,it}{\mathbf {s}}_{N,is}^{\prime })\) by \(\Upsilon _{i,pq}\), where \(p,q=1,2,3,4\), corresponding to \(\beta , \sigma ^{2},\lambda \), and \(\rho \), respectively. Let \(f_{t}\) be the tth column of \(F_{T,T-1}\) and \(v_{i\cdot }\) be the \(T\times 1\)vector of idiosyncratic errors of the spatial unit ith. We have \(v_{it}^{*}=f_{t}^{\prime }v_{i\cdot }\). It is easy to see that \(\Upsilon _{i,11}=0\). By Lemma A.4 (iii) and the homoskedasticity of \(v_{it}^{*}\) across t given i, we have \(\Upsilon _{i,22}=\frac{1}{4\sigma _{0}^{8}}\mathrm{Cov}(v_{it}^{* 2},v_{is}^{* 2})=\frac{1}{4\sigma _{0}^{4}}h_{i}^{2}\kappa _{i}(f_{t}^{2})^{\prime }f_{s}^{2}\), \(\Upsilon _{i,33}=\frac{1}{\sigma _{0}^{4}}\mathrm{Cov}[\mathbf{v}_{N,it}(\varvec{\eta }_{N,it}+\varvec{\zeta }_{N,is}), \mathbf{v}_{N,is}(\varvec{\eta }_{N,is}+\varvec{\zeta }_{N,is})]=\frac{1}{\sigma _{0}^{4}}\mathrm{E}(v_{it}^{* 2}v_{is}^{*} b_{ts,ii}\varvec{\eta }_{is})=\frac{1}{\sigma _{0}^{4}}\gamma _{i}f_{t}^{\prime }f_{s}^{2}b_{ts,ii}\varvec{\eta }_{is}\), and \(\Upsilon _{i,44}=0\), where \(\{b_{ts,ii}\}\) are diagonal elements of \(\mathbf{B}_{ts}\), \(\varvec{\eta }_{is}\) are the (is)th element of \(\varvec{\eta }_{N}\), and \(\gamma _{i}\) and \(\kappa _{i}\) are the measures of skewness and excess kurtosis of \(v_{it}\). Thus, \(\frac{1}{N}\sum _{i=1}^{n}\sum _{t=2}^{T-1}\sum _{s=1}^{t-1}\mathrm{Cov}({\mathbf {s}}_{N,it},{\mathbf {s}}_{N,is})=o(1)\), if

(a):

\(\frac{1}{N}\sum _{i=1}^{n}h_{i}^{2}\kappa _{i}\sum _{t=2}^{T-1}\sum _{s=1}^{t-1}(f_{t}^{2})^{\prime }f_{s}^{2}=o(1)\), and

(b):

\(\frac{1}{N}\sum _{i=1}^{n}\gamma _{i}\sum _{t=2}^{T-1}\sum _{s=1}^{t-1}f_{t}^{\prime }f_{s}^{2}b_{ts,ii}\varvec{\eta }_{is}=o(1)\),

as for the other terms with \(p \ne q\), we have \(|\Upsilon _{i,pq}| \le |\Upsilon _{i,pp}||\Upsilon _{i,qq}|\). \(\square \)

Proof of Theorem 3.3

Proof of consistency. Let \({\bar{\psi }}_{N}^{*}(\delta ) = \mathrm{E}({{\tilde{\psi }}}_{N}^{*}(\delta ))\). By Theorem 5.9 of van der Vaart (1998), consistency of \({{\hat{\delta }}}_\mathtt{AQS1}^*\) follows from (a) \(\text {sup}_{\delta \in \Delta }\frac{1}{N}\Vert {{\tilde{\psi }}}_{N}^{*}(\delta )-{\bar{\psi }}^{*}(\delta )\Vert = o_{p}(1)\) and (b) for every \(\varepsilon >0\), \(\inf _{\delta : d(\delta , \delta _0) \ge \epsilon }\frac{1}{N}\Vert {\bar{\psi }}^{*}(\delta )\Vert > 0 = \frac{1}{N}\Vert {\bar{\psi }}^{*}(\delta _0)\Vert \). Write the two components of the AQS function \({{\tilde{\psi }}}_{N}^{*}(\delta )\) as \(R_{rN}(\delta ) = T_{rN}(\delta )-S_{rN}(\delta ), r=1,2\), where \(T_{rN}(\delta )= {\mathbb {Y}}'_N(\delta ){\mathbb {M}}_N(\rho ){\bar{\mathbf {G}}}_{rN}(\delta ){\mathbb {Y}}_N(\delta )\) and \(S_{rN}(\delta )= {\mathbb {Y}}'_N(\delta ){\mathbb {M}}_N(\rho )\text {diag}[{\mathbb {M}}_N(\rho )]^{-1}\text {diag}[{\mathbb {M}}_N(\rho ){\bar{\mathbf {G}}}_{rN}(\delta )]{\mathbb {Y}}_N(\delta )\).

For condition (a), with \({\mathbb {Y}}_{N}(\delta )={\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}({\mathbf {A}}_{2N}{\mathbf {X}}_{N}\beta _{0}+(F_{T,T-1}^{\prime }\otimes I_{n}){\mathbb {V}}_{nT})\), we see that \(T_{rN}(\delta )\) and \(S_{rN}(\delta )\) are all linear-quadratic in \({\mathbb {V}}_{nT}\). Therefore, for each \(\delta \), the pointwise convergence to zero of \(\frac{1}{N}[T_{rN}(\delta )-\mathrm{E}(T_{rN}(\delta ))]\) and \(\frac{1}{N}[S_{rN}(\delta )-\mathrm{E}(S_{rN}(\delta ))]\) for \(r=1,2\), can easily be established along the lines of the proof for Theorem 3.1. For stochastic equicontinuity of the two types of quantities, note that \({\mathbf {D}}_{N}(\delta ){\mathbf {D}}_{N}^{-1}=I_{N}+(\rho _{0}-\rho ){{\mathbf {G}}}_{2N}+(\lambda _{0}-\lambda )\bar{{\mathbf {G}}}_{1N}+(\lambda _{0}-\lambda )(\rho _{0}-\rho ){{\mathbf {G}}}_{2N}\bar{{\mathbf {G}}}_{1N}\), and the partial derivatives \(\frac{\partial }{\partial \lambda }{\bar{\mathbf {G}}}_{1N}(\delta )\), \(\frac{\partial }{\partial \rho }{\bar{\mathbf {G}}}_{1N}(\delta )\), \(\frac{\partial }{\partial \rho }{\bar{\mathbf {G}}}_{2N}(\rho )\), and \(\frac{\partial }{\partial \rho }{\mathbb {M}}_N(\rho )\) are all uniformly bounded in row and column sums, uniformly in \(\delta \in \Delta \) by Lemma A.2. Therefore, \(T_{rN}(\delta )\) and \(S_{rN}(\delta )\) are stochastically equicontinuous. The pointwise convergence and stochastic equicontinuity lead to the uniform convergence results: \(\sup _{\delta \in \Delta }\frac{1}{N}|T_{rN}(\delta )-\mathrm{E}(T_{rN}(\delta ))| = o_p(1)\) and \(\sup _{\delta \in \Delta }\frac{1}{N}|S_{rN}(\delta )-\mathrm{E}(S_{rN}(\delta ))| = o_p(1)\) for \(r=1,2\), under Assumptions 16 and using the theorem of Newey (1991). Thus, \(\frac{1}{N}[R_{rN}(\delta )-\mathrm{E}[R_{rN}(\delta )]] = o_p(1)\).

For condition (b), first, we have E\([R_{rN}(\delta _0)]=0\). By Assumption 6 and Lemma A.2, \(\mathrm{E}[R_{rN}(\delta )] \ne 0\), for any \(\delta \ne \delta _0\). It follows that the conditions of Theorem 5.9 of van der Vaart (1998) hold, and thus the consistency of \({{\hat{\delta }}}_\mathtt{AQS1}^*\) follows.

Proof of asymptotic normality To establish the asymptotic normality of \({{\hat{\delta }}}_\mathtt{AQS1}^{*}\), we have, by the mean value theorem,

$$\begin{aligned} \textstyle 0 = \frac{1}{\sqrt{N}}{\tilde{\psi }}_{N}^{*}({{\hat{\delta }}}_\mathtt{AQS1}^{*}) = \frac{1}{\sqrt{N}}{\tilde{\psi }}_{N}^{*}(\delta _{0}) + \frac{1}{N}\frac{\partial }{\partial \delta '}{\tilde{\psi }}_{N}^{*}({\bar{\delta }}_N) \sqrt{N}({{\hat{\delta }}}_\mathtt{AQS1}^{*}-\delta _0), \end{aligned}$$
(B-6)

where \({\bar{\delta }}_N\) lies between \({{\hat{\delta }}}_\mathtt{AQS1}^{*}\) and \(\delta _0\) elementwise. It suffices to show that

(i):

\(\frac{1}{\sqrt{N}}{\tilde{\psi }}_{N}^{*}(\delta _{0}) \overset{D}{\longrightarrow } N(0,\; \lim _{N\rightarrow \infty }\Omega _{N}^{*})\),

(ii):

\(\frac{1}{N}\big [\frac{\partial }{\partial \delta '}{\tilde{\psi }}_{N}^{*}({\bar{\delta }}_N) - \frac{\partial }{\partial \delta '} {\tilde{\psi }}_{N}^{*}(\delta _{0})\big ] = o_p(1)\), and

(iii):

\(\frac{1}{N}\big [\frac{\partial }{\partial \delta '} {\tilde{\psi }}_{N}^{*}(\delta _{0}) - \text {E}(\frac{\partial }{\partial \delta '} {\tilde{\psi }}_{N}^{*}(\delta _{0})) \big ] = o_p(1)\).

To prove (i), note \({\tilde{\psi }}_{N}^{*}(\delta _{0})\) can be written in LQ forms in original errors, the CLT for LQ forms of Kelejian and Prucha (2001) leads to the result.

To prove (ii), let \({{\mathcal {H}}}_{N}^{*}(\delta )=-\frac{\partial }{\partial \delta '}{\tilde{\psi }}_{N}^{*}(\delta ) =[{{\mathcal {H}}}_{N,11}^{*}(\delta ), {{\mathcal {H}}}_{N,12}^{*}(\delta );\; {{\mathcal {H}}}_{N,21}^{*}(\delta ), {{\mathcal {H}}}_{N,22}^{*}(\delta )]\), where,

$$\begin{aligned} {{\mathcal {H}}}_{N,11}^{*}(\delta )&={\mathbb {Y}}_{N}^{\prime }(\delta ) [{\dot{\mathbf {B}}}_{11N}^{*}(\delta )+{\bar{\mathbf {G}}}_{1N}^{'}(\lambda ) {\mathbf {B}}_{1N}^{*}(\delta )+{\mathbf {B}}_{1N}^{*}(\delta ) {\bar{\mathbf {G}}}_{1N}(\lambda )] {\mathbb {Y}}_{N}(\delta ), \\ {{\mathcal {H}}}_{N,12}^{*}(\delta )&={\mathbb {Y}}_{N}^{\prime }(\delta ) [{\dot{\mathbf {B}}}_{12N}^{*}(\delta )+{\mathbf {G}}_{2N}^{'}(\lambda ) {\mathbf {B}}_{1N}^{*}(\delta )\\&\quad +{\mathbf {B}}_{1N}^{*}(\delta ) {\mathbf {G}}_{2N}(\lambda )+\dot{{\mathbb {M}}}_{N}(\rho ) {\bar{\mathbf {G}}}_{1N}^{*}(\delta )] {\mathbb {Y}}_{N}(\delta ), \\ {{\mathcal {H}}}_{N,21}^{*}(\delta )&={\mathbb {Y}}_{N}^{\prime }(\delta ) [{\bar{\mathbf {G}}}_{1N}^{'}(\delta ){\mathbf {B}}_{2N}^{*}(\rho ) +{\mathbf {B}}_{2N}^{*}(\rho ){\bar{\mathbf {G}}}_{1N}(\delta )]{\mathbb {Y}}_{N}(\delta ),\\ {{\mathcal {H}}}_{N,22}^{*}(\delta )&={\mathbb {Y}}_{N}^{\prime }(\delta ) [{\dot{\mathbf {B}}}_{22N}^{*}(\delta )+{\mathbf {G}}_{2N}^{'}(\lambda ) {\mathbf {B}}_{2N}^{*}(\rho )+{\mathbf {B}}_{2N}^{*}(\rho ){\mathbf {G}}_{2N}(\lambda )\\&\quad +\dot{{\mathbb {M}}}_{N}(\rho ){\bar{\mathbf {G}}}_{2N}^{*}(\delta )] {\mathbb {Y}}_{N}(\delta ), \end{aligned}$$

where \({\mathbf {B}}_{rN}^*(\delta )={\mathbb {M}}_{N}(\rho )\bar{{\mathbf {G}}}_{rN}^{*}(\delta )\), \({\dot{\mathbf {B}}}_{rsN}^*(\delta )={\mathbb {M}}_{N}(\rho )\dot{\bar{{\mathbf {G}}}}_{rs,N}^{*}(\delta )\), \(\dot{\bar{{\mathbf {G}}}}_{r1,N}^{*}(\delta )\) is the partial derivative of \(\bar{{\mathbf {G}}}_{rN}^{*}(\delta )\) (\(r=1,2\)), w.r.t. \(\lambda \) and \(\rho \) (\(s=1,2\)), \(\dot{{\mathbb {M}}}_{N}(\rho )\) is the derivative of \({\mathbb {M}}_{N}(\rho )\) w.r.t. \(\rho \),

$$\begin{aligned} \dot{\bar{{\mathbf {G}}}}_{11,N}^{*}(\delta )= & {} \bar{{\mathbf {G}}}_{1N}^{2}(\delta )-\text{ diag }[{\mathbb {M}}_{N}(\rho )]^{-1}\text{ diag }[{\mathbf {B}}_{1N}(\rho )\bar{{\mathbf {G}}}_{1N}(\delta )],\\ \dot{\bar{{\mathbf {G}}}}_{12,N}^{*}(\delta )= & {} \bar{{\mathbf {G}}}_{1N}(\delta ){\mathbf {G}}_{2N}(\rho )-{\mathbf {G}}_{2N}(\rho )\bar{{\mathbf {G}}}_{1N}(\delta ) \\&+\text{ diag }[{\mathbb {M}}_{N}(\rho )]^{-2}\text{ diag }[\dot{{\mathbb {M}}}_{N}(\rho )]\text{ diag }[{\mathbf {B}}_{2N}(\rho )]\\&+\text{ diag }[{\mathbb {M}}_{N}(\rho )]^{-1}\text{ diag }[{\mathbb {M}}_{N}(\rho ){\mathbf {G}}_{2N}(\rho )\bar{{\mathbf {G}}}_{1N}(\delta )-{\mathbf {B}}_{1N}{\mathbf {G}}_{2N}(\rho )\\&-\dot{{\mathbb {M}}}_{N}(\rho )\bar{{\mathbf {G}}}_{1N}(\delta )],\\ \dot{\bar{{\mathbf {G}}}}_{22,N}^{*}(\delta )= & {} {\mathbf {G}}_{2N}(\rho )\bar{{\mathbf {G}}}_{2N}(\rho )+{\mathbf {G}}_{2N}(\rho )\dot{{\mathbb {M}}}_{N}(\rho ) \\&+\text{ diag }[{\mathbb {M}}_{N}(\rho )]^{-2}\text{ diag }[\dot{{\mathbb {M}}}_{N}(\rho )]\text{ diag }[{\mathbf {B}}_{2N}(\rho )]\\&-\text{ diag }[{\mathbb {M}}_{N}(\rho )]^{-1}\text{ diag }[{\mathbb {M}}_{N}(\rho ){\mathbf {G}}_{2N}(\rho )\bar{{\mathbf {G}}}_{2N}(\rho )\\&+{\mathbb {M}}_{N}(\rho ){\mathbf {G}}_{2N}(\rho )\dot{{\mathbb {M}}}_{N}(\rho )+\dot{{\mathbb {M}}}_{N}(\rho )\bar{{\mathbf {G}}}_{2N}(\rho )]\\ \dot{{\mathbb {M}}}_{N}(\rho )= & {} {\mathbb {M}}_{N}(\rho ){\mathbf {G}}_{2N}(\rho ){\mathbf {P}}_{N}(\rho )+{\mathbf {P}}_{N}(\rho ){\mathbf {G}}_{2N}^{\prime }(\rho ){\mathbb {M}}_{N}(\rho ),\\ {\mathbf {B}}_{rN}(\delta )= & {} {\mathbb {M}}_{N}(\rho )\bar{{\mathbf {G}}}_{rN}(\delta ), \text { for } r=1,2. \end{aligned}$$

By Assumptions 4, 5 and continuous mapping theorem (CMT), \(\bar{{\mathbf {G}}}_{rN}^\circ ({{\bar{\delta }}}_N)=\bar{{\mathbf {G}}}_{rN}^*+o_p(1)\) and \(\dot{\bar{{\mathbf {G}}}}_{rN}^*({{\bar{\delta }}}_N)=\dot{\bar{{\mathbf {G}}}}_{rN}^*+o_p(1)\) for \(r=1,2\). Thus using a Taylor expansion, terms of the sort \({\mathbb {Q}}_{1N}({{\bar{\delta }}}) = \frac{1}{N} {\mathbb {Y}}_{n}^{\prime }({{\bar{\delta }}})Q_{1N}({{\bar{\delta }}}){\mathbb {Y}}_{N}({{\bar{\delta }}})\) can be written as, \({\mathbb {Q}}_{1N}+({{\bar{\delta }}}-\delta _0)^\prime \frac{\partial }{\partial \delta }{\mathbb {Q}}_{1N}\). Together with the CMT, Lemma A.2, Assumptions 35 and some tedious algebra, we have \({\mathbb {Q}}_{1N}({{\bar{\delta }}}) = {\mathbb {Q}}_{1N} + o_p(1)\). Collecting these results we have \(\frac{\partial }{\partial \delta '}{\tilde{\psi }}_{N}^{*}({\bar{\delta }}_{N})-\frac{\partial }{\partial \delta '}{\tilde{\psi }}_{N}^{*}=o_{p}(1)\).

To prove (iii), the negative of the expected Hessian, \(\Phi _{N}^{*}\), is given as:

$$\begin{aligned} \Phi _{N}^{*} = \frac{1}{N}\left( \begin{array}{cc} \sigma _{0}^{2}\text{ tr }({\mathbf {H}}_{N}\phi _{11,N})+\beta _{0}^{'}{\mathbb {X}}_{N}^{'}\phi _{11,N}{\mathbb {X}}_{N}\beta , &{}\sigma _{0}^{2}\text{ tr }({\mathbf {H}}_{N}\phi _{12,N})+\beta _{0}^{'}{\mathbb {X}}_{N}^{'}\phi _{12,N}{\mathbb {X}}_{N}\beta \\ \sigma _{0}^{2}\text{ tr }({\mathbf {H}}_{N}\phi _{21,N})+\beta _{0}^{'}{\mathbb {X}}_{N}^{'}\phi _{21,N}{\mathbb {X}}_{N}\beta , &{}\sigma _{0}^{2}\text{ tr }({\mathbf {H}}_{N}\phi _{22,N})+\beta _{0}^{'}{\mathbb {X}}_{N}^{'}\phi _{22,N}{\mathbb {X}}_{N}\beta \end{array}\right) , \end{aligned}$$

where

$$\begin{aligned} \phi _{11,N}= & {} {\dot{\mathbf {B}}}_{11N}^{*}+{\bar{\mathbf {G}}}_{1N}^{'} {\mathbf {B}}_{1N}^{*}+{\mathbf {B}}_{1N}^{*}{\bar{\mathbf {G}}}_{1N},\\ \phi _{12,N}= & {} {\dot{\mathbf {B}}}_{12N}^{*}+{\mathbf {G}}_{2N}^{'} {\mathbf {B}}_{1N}^{*}+{\mathbf {B}}_{1N}^{*}{\mathbf {G}}_{2N} +\dot{{\mathbb {M}}}_{N}{\bar{\mathbf {G}}}_{1N}^{*},\\ \phi _{21,N}= & {} {\bar{\mathbf {G}}}_{1N}^{'}{\mathbf {B}}_{2N}^{*} +{\mathbf {B}}_{2N}^{*}{\bar{\mathbf {G}}}_{1N} \text { and }\\ \phi _{22,N}= & {} {\dot{\mathbf {B}}}_{22N}^{*}+{\mathbf {G}}_{2N}^{'} {\mathbf {B}}_{2N}^{*}+{\mathbf {B}}_{2N}^{*} {\mathbf {G}}_{2N}+\dot{{\mathbb {M}}}_{N}{\bar{\mathbf {G}}}_{2N}^{*}. \end{aligned}$$

The result of (iii) follows by showing \({{\mathcal {H}}}_{N,rs}-\Phi ^*_{N,rs}=o_p(1), r,s = 1, 2\).

With (B-6), and (i)-(iii), the asymptotic normality follows. \(\square \)

Proof of Theorem 3.4

The proof is straightforward following the derivations above Theorem 3.4 in the main text and the proof of Theorem 3.3, and thus is omitted. \(\square \)

Proof of Theorem 3.5

Similarly, the result \({{\widehat{\Phi }}}_\mathtt{AQS1}^{*}-\Phi _{N}^{*} \overset{p}{\longrightarrow } 0\) follows from the results (ii) and (iii) in the proof of asymptotic normality part of the proof of Theorem 3.3. This result holds irrespective of whether the errors are normal or non-normal, and T is small or large.

To prove the consistency of the OPMD-type estimators of \(\Omega _N^*\) and \(\Sigma _N\), we focus on the case of normal errors for this theorem, i.e., the estimators \({{\widehat{\Omega }}}_\mathtt{AQS1}^{*}\) and \({{\widehat{\Sigma }}}_\mathtt{AQS1}\). The case of non-normal errors can be proved in a similar manner as the proof of Theorem 3.2. It amounts to show that \(\frac{1}{N}\sum _{j=1}^{N}[\hat{{\mathbf {v}}}_{N,j}^{2}\hat{{\mathbf {s}}}_{N,j}^{*}\hat{{\mathbf {s}}}_{N,j}^{* \prime } - \mathrm{E}({\mathbf {v}}_{N,j}^{2}{\mathbf {s}}_{N,j}^{*}{\mathbf {s}}_{N,j}^{* \prime })] = o_p(1)\), where \({\mathbf {s}}_{N,j}^{*}=(\zeta _{rN,j}+c_{rN,j})^{\prime }_{r=1,2} \equiv ({\mathbf {s}}_{1N,j}^{*},{\mathbf {s}}_{2N,j}^{*})^{\prime }\) and \(\hat{{\mathbf {v}}}_{N,j}\) and \(\hat{{\mathbf {s}}}_{N,j}^{*}\) are estimates based on \({{\hat{\theta }}}_\mathtt{AQS1}^{*}\). The result holds if

$$\begin{aligned} \frac{1}{N}\sum _{j=1}^{N}[\hat{{\mathbf {v}}}_{N,j}^{2}\hat{{\mathbf {s}}}_{N,j}^{*}\hat{{\mathbf {s}}}_{N,j}^{* \prime } - {\mathbf {v}}_{N,j}^{2}{\mathbf {s}}_{N,j}^{*}{\mathbf {s}}_{N,j}^{* \prime }] = o_p(1) \text { and }\\ \frac{1}{N}\sum _{j=1}^{N}[{{\mathbf {v}}}_{N,j}^{2}{{\mathbf {s}}}_{N,j}^{*}{{\mathbf {s}}}_{N,j}^{* \prime } - \mathrm{E}({\mathbf {v}}_{N,j}^{2}{\mathbf {s}}_{N,j}^{*}{\mathbf {s}}_{N,j}^{* \prime })] = o_p(1). \end{aligned}$$

The former is straightforward by using the mean value theorem, and therefore we focus on the latter. Denote \(\{\Delta _{r,s}\}_{r,s=1,2} = \frac{1}{N}\sum _{j=1}^{N}[{{\mathbf {v}}}_{N,j}^{2}{{\mathbf {s}}}_{N,j}^{*}{{\mathbf {s}}}_{N,j}^{* \prime } - \mathrm{E}({\mathbf {v}}_{N,j}^{2}{\mathbf {s}}_{N,j}^{*}{\mathbf {s}}_{N,j}^{* \prime })]\). We have, for \(r,s=1,2\),

$$\begin{aligned} \Delta _{r,s}&=\textstyle \frac{1}{N}\sum _{j=1}^N [{\mathbf {s}}_{rN,j}^{*}{\mathbf {s}}_{sN,j}^{*}({\mathbf {v}}_{N,j}^{2}-\mathrm{E}({\mathbf {v}}_{N,j}^{2})] \\&\quad \textstyle + \frac{1}{N}\sum _{j=1}^N [\mathrm{E}({\mathbf {v}}_{N,j}^{2})(\zeta _{rN,j}\zeta _{sN,j}-\mathrm{E}(\zeta _{rN,j}\zeta _{sN,j}))]\\&\quad \textstyle + \frac{1}{N}\sum _{j=1}^N (c_{rN,j}\zeta _{sN,j}+c_{sN,j}\zeta _{rN,j})\mathrm{E}({\mathbf {v}}_{N,j}^{2}) \equiv \textstyle \sum _{k=1}^3 T_{kN}. \end{aligned}$$

As \({\mathbf {s}}_{1N,j}^{*}\) and \({\mathbf {s}}_{2N,j}^{*}\) are \({\mathcal {F}}_{N,j-1}\)-measurable, where \({\mathcal {F}}_{N,j}\) is an increasing \(\sigma \)-field generated by \(\{{\mathbf {v}}_{N,1}, \dots ,{\mathbf {v}}_{N,j}\}\), \(\mathrm{E}[{\mathbf {s}}_{rN,j}^{*}{\mathbf {s}}_{sN,j}^{*}({\mathbf {v}}_{N,j}^{2}-\mathrm{E}({\mathbf {v}}_{N,j}^{2})|{\mathcal {F}}_{N,j-1}]=0\), and thus \(T_{1N}\) is the sum of an MD sequence. The conditions of Theorem 19.7 of Davidson (1994) (the WLLN for MD sequences) can be easily verified under Assumptions 15, and hence \(T_{1N}=o_{p}(1)\).

For \(T_{2N}\), note that \(\zeta _{rN,j} = \sum _{k=1}^{j-1}b_{rN,jk}{\mathbf {v}}_{N,k}\), where \(b_{rN,jk}\) is the (jk)th element of \({\mathbf {B}}_{rN}^{u \prime }+{\mathbf {B}}_{rN}^{l}\). Hence, E\((\zeta _{rN,j}\zeta _{sN,j})=\sum _{k=1}^{j-1}b_{rN,jk}b_{sN,jk}\mathrm{E}({\mathbf {v}}_{N,k}^{2}) \equiv d_{rsN,j}\) and,

$$\begin{aligned} T_{2N}= & {} \textstyle \frac{1}{N}\sum _{j=1}^N \mathrm{E}({\mathbf {v}}_{N,j}^{2})(\zeta _{rN,j} \zeta _{sN,j}-d_{rsN,j})\\= & {} \textstyle \frac{1}{N}\sum _{j=1}^N \mathrm{E}({\mathbf {v}}_{N,j}^{2})\sum _{k=1}^{j-1}b_{rN,jk}b_{sN,jk}({\mathbf {v}}_{N,k}^{2} - \mathrm{E}({\mathbf {v}}_{N,k}^{2}))\\&\textstyle + \frac{2}{N}\sum _{j=1}^N \mathrm{E}({\mathbf {v}}_{N,j}^{2})\sum _{k=1}^{j-1}b_{rN,jk}{\mathbf {v}}_{N,k}\sum _{l=1}^{k-1} b_{sN,jl} {\mathbf {v}}_{N,l}\\= & {} \textstyle \frac{1}{N}\sum _{j=1}^{N-1}\phi _{rsN,j}({\mathbf {v}}_{N,j}^{2}-\mathrm{E}({\mathbf {v}}_{N,j}^{2})) + \frac{1}{N}\sum _{j=1}^{N-1} \varphi _{rsN,j}{\mathbf {v}}_{N,j}, \end{aligned}$$

where, by switching the order of summations, we have \(\phi _{rsN,j}=\frac{1}{N}\sum _{k=j+1}^{N}b_{rN,kj}b_{sN,kj}\mathrm{E}({\mathbf {v}}_{N,k}^{2})\), \(\varphi _{rsN,j}=\sum _{k=1}^{j-1}\xi _{rsN,jk}{\mathbf {v}}_{N,k}\) and \(\xi _{rsN,jk} = 2\sum _{l=j+1}^{N} b_{rN,lj} b_{sN,lk}\mathrm{E}({\mathbf {v}}_{N,l}^{2})\). Thus \(T_{2N}\) is the sum of two MD sequences and the WLLN for MD sequences implies \(T_{2N} \overset{p}{\longrightarrow } 0\). The last term \(T_{3N}\) is simpler than \(T_{3N}\). Thus, similar but simpler arguments show \(T_{3N} \overset{p}{\longrightarrow } 0\).

For the case of non-normal errors, refer to the proof of Theorem 3.2 for details.

A similar line of arguments can be used to show \({{\hat{\Sigma }}}_\mathtt{AQS1}-\Sigma _N = o_p(1)\). \(\square \)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, S.F., Yang, Z. Robust estimation and inference of spatial panel data models with fixed effects. Jpn J Stat Data Sci 3, 257–311 (2020). https://doi.org/10.1007/s42081-020-00075-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42081-020-00075-y

Keywords

Navigation