1 Introduction

Progress in spatial econometrics has been stimulated by the increasing availability of spatio-temporal data, as shown by Anselin et al. (2008), Elhorst (2014), Pesaran (2015) and Baltagi (2021) among others. Static spatial panel data methods are long-established, but more recently dynamic spatial panel data models have come to the fore as highly informative and relevant (see for example Parent and LeSage 2012; Yu and Lee 2008; Elhorst 2014). The predominant approach in the spatial panel model literature has been an assumption that the regressors are exogenous, apart from spatial and/or temporal lag terms relating to the endogenous variableFootnote 1 and typically estimation has focused on maximum likelihood and related methods. This presents a challenge in the presence of endogenous regressors. The focus in this paper is on dynamic spatial panel models in which endogeneity extends beyond spatial and temporal lags of the dependent variable. Baltagi et al. (2019) give extensive Monte Carlo simulations demonstrating the consistency of the estimates produced by a spatial version of ‘difference generalized method of moments (GMM)’, an approach which easily accommodates endogenous regressors.Footnote 2

The main contribution of the paper is via the approach of Le Gallo and Páez (2013), who advocated the use of synthetic instruments to eliminate endogeneity bias in cross-sectional regressions, which is here extended to GMM estimation of dynamic spatial panel models. The intuition behind synthetic instruments is that because they are based on the invariant topology of the geographic space, typically they are exogenous to the spatio-temporal process being modelled. Specifically, because the endogenous variable(s) invariably has a non-random spatial distribution, some inherent dimensions of the topology will often be quite strongly correlated with the endogenous variable. Synthetic instruments deriving from these inherent dimensions typically possess hard-to-find ideal properties of instrumental variables, namely exogeneity and yet correlation with the endogenous variable.

In this paper, synthetic instruments help resolve some pitfalls of GMM estimation arising when there is an overabundance of instrumental variables. This can be exacerbated with spatial data, where the suite of instruments might be enhanced by the use of the spatial lags of variables in addition to variables per se (Kelejian and Prucha 1998, 1999; Pace et al. 2012; Baltagi et al. 2019). Moreover, instruments are often weak, with negligible correlation with endogenous variables. In contrast, synthetic instruments are invariably strong, with typically very high correlations. By replacing weak instruments with fewer synthetic instruments, one is likely to obtain more reliable inference. In particular, reducing the number of instruments eliminates problems relating to the crucial Sargan–Hansen test of overidentifying restrictions. An associated problem is the downwardly biased parameter standard errors associated with two-step GMM estimation, which causes upward bias in t-ratios. To remedy this, two related finite sample corrections are reported: the well-known Windmeijer correction and the ‘HKL’ double correction (Hwang et al. 2022), which also allows for overidentification bias. Finally as an illustration, synthetic instruments and standard error corrections are applied to real data to provide a stronger inferential basis to published research.

2 A dynamic spatial panel model

Consider first the estimation of the simple dynamic model given by Eq. (1),

$${y_{{it}}} = \gamma {y_{{it - 1}}} + \rho {\sum\limits_{{j = 1}}^{N} {w_{{ij}} {y_{{it}}} }} + \theta {\sum\limits_{{j = 1}}^{N} {w_{{ij}} {y_{{it - 1}}} }} + {\beta _{1}} {x_{{it}}} + {\beta _{2}} {\tilde{x}_{{it}}} + {\varepsilon _{{it}}} ;\quad i = 1,...,N,t = 1,...,T $$
(1)

In which there are \(N\) regions/locations/individuals and \(T\) times, \(x\) is an exogenous variable, \(\tilde{x}\) is an endogenous variable, \(w_{ij}\) is the \(i,j\) th element of an exogenous time-invariant \(N\) by \(N\) connectivity matrix \({\mathbf{W}}_{N}\), \(\gamma ,\rho ,\theta ,\beta_{1}\) and \(\beta_{2}\) are parameters to be estimated. The error term is compound, thus

$$\varepsilon_{it} = \mu_{i} + \nu_{it}$$

where \(\mu_{i}\) is a set of individual effects, one for each of the \(N\) regions, controlling for unobserved time-invariant heterogeneity across regions or locations. The term \(\nu_{it}\) varies both by region and by time and represents other, unpredictable, random effects. The assumption is that each \(\mu_{i}\) and \(\nu_{it}\) is a random draw from independent and identically distributed distributions thus \(\mu_{i} \sim iid(0,\sigma_{\mu }^{2} )\) and \(\nu_{it} \sim iid(0,\sigma_{\nu }^{2} )\) with \(\mu_{i}\) and \(\nu_{it}\) independent of each other and among themselves. Given \(\sigma_{\mu }^{2} > 0\), there is interregional heterogeneity with \(\mu_{i}\) capturing unmodelled individual effects such as physical geography and also regional variation in unobserved effects.

A more general specification written in matrix terms is

$${\mathbf{y}}_{t} = {\mathbf{B}}_{N}^{ - 1} {\mathbf{C}}_{N} {\mathbf{y}}_{t - 1} + {\mathbf{B}}_{N}^{ - 1} {\mathbf{x}}_{t} \beta_{1} + {\mathbf{B}}_{N}^{ - 1} {\tilde{\mathbf{x}}}_{t} \beta_{2} + {\mathbf{B}}_{N}^{ - 1} {{\varvec{\upvarepsilon}}}_{t}$$
(2)

in which \({\mathbf{B}}_{N} = \left( {{\mathbf{I}}_{N} - \rho {\mathbf{W}}_{N} } \right)\), \({\mathbf{C}}_{N} = \left( {\gamma {\mathbf{I}}_{N} + \theta {\mathbf{W}}_{N} } \right)\), \({\mathbf{B}}_{N} ,{\mathbf{C}}_{N}\) and identity matrix \({\mathbf{I}}_{N}\) are matrices of dimension \(N\) by \(N\), \(\rho ,\gamma\) and \(\theta\) are scalar coefficients, \({\mathbf{y}}_{t}\) is an \(N\) by 1 vector, \({\mathbf{x}}_{t}\) is an \(N\) by \(k_{1}\) matrix of exogenous regressors, \(\beta_{1}\) is a \(k_{1}\) by 1 vector of coefficients, \({\tilde{\mathbf{x}}}_{t}\) is an \(N\) by \(k_{2}\) matrix of endogenous regressors, \(\beta_{2}\) is a \(k_{2}\) vector of coefficients, and vector \({{\varvec{\upvarepsilon}}}_{t} = {{\varvec{\upmu}}} + {{\varvec{\upnu}}}_{t}\) is a \(N\) by 1 compound error term. Standard assumptions are that \({\mathbf{B}}_{N}\) is a non-singular matrix and \({\mathbf{W}}_{N} ,{\mathbf{B}}_{N}^{ - 1}\) and the regressors are uniformly bounded in absolute value. The model satisfies stationarity conditions only if the maximum absolute characteristic root of \({\tilde{\mathbf{A}}} = {\mathbf{C}}_{N} {\mathbf{B}}_{N}^{ - 1}\) is less than one (Elhorst 2001, 2014), Parent and LeSage (2011, 2012) and Debarsy et al. (2012).

Consistent estimation of Eq. (2) by maximum likelihood is challenged by the presence of the endogenous variables, and assumptions regarding initial conditions and on how \(T\) and \(N\) tend to infinity. Bond (2002) argues that the distribution of the dependent variable depends in a non-negligible way on what is assumed about the distribution of the initial conditions. For example, the initial condition could be stochastic or non-stochastic, correlated or uncorrelated with the individual effects, or have to satisfy stationarity properties. Different assumptions about the nature of the initial conditions will lead to different likelihood functions, and the resulting ML estimators can be inconsistent when the assumptions on the initial conditions are misspecified. Hsiao (2003, pp. 80–135) has more details. GMM makes weaker assumptions about the initial conditions, indeed according to Baltagi (2013), p. 158, the ‘GMM estimator requires no knowledge concerning the initial conditions’, and naturally accommodates the presence of endogenous regressors. The paper therefore focusses on GMM which is very well documented in the literature so only a brief summary is provided here.

Following Arellano and Bond (1991), estimation of linear GMM panel data regressions involves first differencesFootnote 3 to avoid dynamic panel bias (Nickell 1981), eliminating the individual effects \(\mu_{i}\) which would otherwise be correlated with the spatial lag and time lag of the dependent variable. First differencing Eq. (2) gives

$$\Delta {\mathbf{y}}_{t} = \gamma \Delta {\mathbf{y}}_{t - 1} + \rho {\mathbf{W}}_{N} \Delta {\mathbf{y}}_{t} + \theta {\mathbf{W}}_{N} \Delta {\mathbf{y}}_{t - 1} + \Delta {\mathbf{x}}_{t} \beta_{1} + \Delta {\tilde{\mathbf{x}}}_{t} \beta_{2} + \Delta {{\varvec{\upvarepsilon}}}_{t}$$
(3)

Because of the presence of endogenous variables, instrumental variables are required for consistent estimation. Typically, instruments that are correlated with endogenous variables and yet independent of the errors are difficult to find. One solution is to use lags of regressors already present in the model, but typically, for IV estimation generally, more lags means less data. However, the usual instrument set for difference GMM, namely HENR instruments (after Holtz-Eakin, Newey and Rosin 1988), avoid this by zeroing out missing observations while including separate instruments for each time period. So with HENR one has one instrument per variable, time period and lag distance. This amounts to \((T - 2)(T - 1)/2\) instruments for each endogenous variable, since endogenous variables are contemporaneously correlated with the errors, provided the \(\nu_{it}\) are not serially autocorrelated of order one, regressors lagged by two periods are appropriate in order to satisfy the orthogonality conditions relating to instruments and differenced errors. Arellano and Bond (1991) provide a test for serial correlation, the \(m_{2}\) test statistic, which tests for second order serial correlation in the first differenced residuals, and which is assumed to be asymptotically normal under the null of zero correlation. Additionally, following the approach adopted by Baltagi et al. (2019), given data with identifiable spatial locations, spatially weighted earlier time-lagged levels of the dependent and explanatory variables are also potentially viable instruments. Accordingly, Baltagi et al. (2019) set out moments equations thus

$$E\left( {y_{il} \Delta \nu_{it} } \right) = 0{\text{ hence}}\begin{array}{*{20}c} {\sum\limits_{i} {y_{il} \Delta \nu_{it} = 0} } & {} \\ \end{array} , \, \forall i, \, l = 0,1,...,t - 2,t = 2,3,...,T$$
(4)
$$E\left( {{\mathbf{w}}_{i} {\mathbf{y}}_{l} \Delta \nu_{it} } \right) = 0\begin{array}{*{20}c} {{\text{ hence }}\sum\limits_{i} {\sum\limits_{i \ne j} {w_{ij} y_{il} } \Delta \nu_{it} = 0} } & {} \\ \end{array} \forall i, \, l = 0,1,...,t - 2,t = 2,3,...,T$$
(5)

where \({\mathbf{w}}_{i} = \left( {w_{i1} ,...,w_{iN} } \right)\) is a 1 by \(N\) vector which corresponds to the \(i^{\prime}th\) row of \({\mathbf{W}}_{N}\). Similar expressions giving additional moments equations involve the lagged endogenous regressors, for regressor \(j\), \(\tilde{x}_{j,il}\) and \({\tilde{\mathbf{x}}}_{j,l}\) replaces \(y_{il}\) and \({\mathbf{y}}_{l}\) in Eqs. (4) and (5). With regard to strictly exogenous regressors, there is no feedback from the dependent variable. In this case, the moments conditions are

$$E\left( {x_{j,im} \Delta \nu_{it} } \right) = 0, \, \forall i,j, \, m = 1,...,T; \quad t = 2,...,T$$
(6)
$$E\left( {{\mathbf{w}}_{i} {\mathbf{x}}_{j,m} \Delta \nu_{it} } \right) = 0, \, \forall i,j, \, m = 1,...,T; \quad t = 2,...,T \,$$
(7)

Baltagi et al. (2019) also introduce a spatial moving average (SMA) error dependence process, but for simplicity the Monte Carlo simulations assume that the errors are spatially independent.Footnote 4

HENR instruments lead to quadratic growth in the number of instruments with respect to \(T\), so there is the possibility of an overabundance of instruments. One can limit the number of lags applying to endogenous variables in the moments equations. A second solution is to collapse the instrument matrix so that there is one instrument for each variable and lag distance, rather than one for each time period, variable, and lag distance, which amounts to adding together columns of the instrument matrix. Also, both solutions can be combined. Collapsing means that one replaces the set of instruments, one for each period, into one column. Therefore, the set of moments equations given by Eqs. (4) and (5) are replaced by

$$\sum\limits_{i,t} {y_{it - 2} \Delta \nu_{it} } = 0$$
(8)
$$\sum\limits_{i,t} {\sum\limits_{i \ne j} {w_{ij} y_{it - 2} } \Delta \nu_{it} } = 0$$
(9)

with similar expressions for the other endogenous regressors.

However, these approaches have limitations, limiting lags alone may not solve the instrument proliferation problem, depending on the context. Collapsing, by omitting time variation, will tend to give less precise estimates. Alternatively, strictly exogenous variables can be introduced as a single column in the matrix of instruments for each exogenous variable, thus producing far fewer instruments than by using the moments conditions of Eqs. (6) and (7). These are referred to these as IV-style instruments rather than HENR-type instruments.

3 Consequences of instruments proliferation

3.1 Parameter standard errors

With numerous instruments, the estimated asymptotic standard errors of the efficient, two-step, GMM estimator are downward biased in small samples. Windmeijer (2005) corrects for the bias which results from estimating the optimal weight matrix used in the second step of linear two-step GMM. The optimal weight matrix is the inverse of the covariance of the sample moments leading to the smallest covariance matrix for the GMM estimator. The bias results from the weight matrix being evaluated at estimated, rather than the true values of parameters. However, additional bias may also occur because of overidentification bias, which affects the finite sample bias in the GMM estimator itself. Hwang et al. (2022)Footnote 5 also correct for overidentification bias.

3.2 The Sargan–Hansen J test statistic

Theoretically, the moments conditions require that instruments should be orthogonal to the error term. However, with more instruments than variables to be instrumented, one has overidentification and a problem of not being able to exactly satisfy all the moments equations simultaneously. The solution is to attempt to satisfy the moments as closely as possible, and the success of this is given by the outcome of Sargan–Hansen’s J test (Sargan 1958; Hansen 1982), as defined by Eq. (10), which tests the null hypothesis of joint validity of the moments conditions under overidentification. Though it is robust to non-sphericity of the errors, it can be greatly weakened by instrument proliferation (Anderson and Sørenson 1996; Bowsher 2002; Roodman 2009a,b).

The \(J\) test statistic is given by

$$J = {\mathbf{S}}_{1} {\mathbf{AS^{\prime}}}_{2} /N$$
(10)
$$\begin{gathered} {\mathbf{S}}_{1} = \sum\limits_{i = 1}^{N} \Delta \nu^{\prime}_{i2} {\mathbf{Z}}_{i} \hfill \\ {\mathbf{S}}_{2} = \sum\limits_{i = 1}^{N} {{\mathbf{Z}}_{i}^{\prime } \Delta } \nu_{i2} \hfill \\ {\mathbf{A}} = \left( {\frac{1}{N}\sum\limits_{i = 1}^{N} {{\mathbf{Z}}_{i} \Delta \nu_{i1} \Delta \nu_{i1}^{\prime } {\mathbf{Z}}_{i} } } \right)^{ - 1} \hfill \\ \hfill \\ \end{gathered}$$

In \({\mathbf{S}}_{1}\) and \({\mathbf{S}}_{2}\), \(\Delta \nu_{i2} \,\) are differenced second-step errors, \({\mathbf{Z}}\) is the matrix of instruments, comprised of \(N\) \({\mathbf{Z}}_{i}\) s, each of dimension \(\left( {T - 2,p} \right)\) where \(p\) is the number of instruments. Under the null hypothesis that the moments conditions are valid, \(J\) is distributed as \(\chi_{p - k}^{2}\), where \(k\) is the number of estimated parameters and \(p > k\), so if \(J\) exceeds the relevant critical value of \(\chi_{p - k}^{2}\), some or all of the moments conditions are not supported by the data. The preliminary (one-step) consistent estimator giving the differenced first-step errors is based on

$$\ {{\mathbf{A_{1}}}} = \left( {\frac{1}{N}\sum\limits_{i = 1}^{N} {{\mathbf{Z}}_{i}^{\prime } {\mathbf{HZ}}_{i} } } \right)^{ - 1} \hfill $$
(11)

In the above, \({\mathbf{H}}\) is a \((T - 2,T - 2)\) matrix of 2 s on the main diagonal and − 1 s on the adjacent upper and lower diagonals.

4 Synthetic instruments

Normally, it is difficult to find legitimate external exogenous instruments that correlate with the endogenous variables and which are yet unrelated to the error term. Typically, instruments that correlate closely with endogenous variables tend to also correlate with the errors, but trying to avoid correlation with the errors commonly leads to instruments that are weak and irrelevant with respect to the endogenous variable. A solution to the problem is based on the spatial filtering literature (notably Griffith 1988, 1996, 2000, 2003; Getis and Griffith 2002; Boots and Tiefelsdorf 2000; Patuelli et al. 2006) which is the basis for the construction of synthetic instruments, as advocated by Le Gallo and Páez (2013) for cross-sectional regression. Synthetic instruments have the properties of ideal instruments, because normally they are well correlated with the endogenous variables and yet independent of the errors. In fact, the synthetic instruments are the fitted values resulting from regressing the endogenous regressors and their spatial lags on weighted linear combinations of subsets of orthogonal eigenvectors deriving from a symmetrical \(N\) by \(N\) contiguity matrixFootnote 6\({\mathbf{M}}_{N}\), in which the \(m_{ij}^{{}} ,i = 1,...,N,j = 1,...N\), take the values

$$\begin{gathered} m_{ij}^{{}} = 1{\text{ if}}\; \, i\;{\text{ and}}\; \, j \, \;{\text{are }}\;{\text{neighbours}} \hfill \\ m_{ij}^{{}} = 0{\text{ otherwise}} \hfill \\ \end{gathered}$$

\({\mathbf{M}}_{N}\) simply reflects the spatial connectivity of \(N\) regions and this is normally unaffected by the data under analysis. Likewise, the eigenvectors are exogenous, in other words not determined by \({\mathbf{y}}_{t}\), and so are an appropriate basis for synthetic instruments.

The effectiveness of the eigenvectors as instruments derives from the fact that each one represents a different orthogonal latent map pattern and so it is likely that one or more will correlate strongly with a non-randomly spatially distributed endogenous variable. Following Griffith (2000) and much related literature, we first consider the Moran Coefficient spatial autocorrelation index \(MC_{t}\) which measures the spatial autocorrelation in \({\mathbf{y}}\) at time \(t\) as given by

$$MC_{t} = \frac{N}{{{\mathbf{1^{\prime}}}_{N} {\mathbf{M}}_{N} {\mathbf{1}}_{N} }}\frac{{{\mathbf{y^{\prime}}}_{t} {\mathbf{P}}_{N} {\mathbf{y}}_{t} }}{{{\mathbf{y^{\prime}}}_{t} \left( {{\mathbf{I}}_{N} - {\mathbf{1}}_{N} {\mathbf{1}}_{N} ^{\prime}/N} \right){\mathbf{y}}_{t} }}$$
(12)

where

$${\mathbf{P}}_{N} = \left( {{\mathbf{I}}_{N} - {\mathbf{1}}_{N} {\mathbf{1}}_{N} ^{\prime}/N} \right){\mathbf{M}}_{N} \left( {{\mathbf{I}}_{N} - {\mathbf{1}}_{N} {\mathbf{1}}_{N} ^{\prime}/N} \right)$$

in which \({\mathbf{I}}_{N}\) is an \(N\) by \(N\) identity matrix, and \({\mathbf{1}}_{N}\) denotes an \(N\) by 1 vector of ones. The \(N\) by \(N\) matrix \({\mathbf{P}}_{N}\) yields \(N\) ‘orthogonal’ eigenvectorsFootnote 7\({\mathbf{E}}_{i} ,i = 1,...,N\), each of which is of dimension \(N\) by 1. Replacing \({\mathbf{y}}_{t}\) in Eq. (12) by \({\mathbf{E}}_{i}\) measures the spatial autocorrelation for eigenvector \({\mathbf{E}}_{i}\). So, each of the eigenvectors of \({\mathbf{P}}_{N}\) can be understood as a distinctive map pattern, with a separate \(MC\), ranging from strongly positive to strongly negative autocorrelation, given by the different \({\mathbf{E}}_{i}\) values distributed across the regions implied by the connectivity matrix \({\mathbf{M}}_{N}\).

The synthetic instruments actually employed are weighted linear combinations of subsets of the \({\mathbf{E}}_{i}\) which are referred to in the literature as spatial filters (see Griffith 2003; Boots and Tiefelsdorf 2000). In the empirical analysis below, we use iterative regression to identify which subset of the \({\mathbf{E}}_{i}\) are appropriate for a given endogenous regressor, with the regression coefficient estimates giving the weights to apply in the weighted linear combination.

Applying such a spatial filter to even a completely spatially random variable will tend to find a significant relationship and a moderately strong correlation between the random variable and the synthetic instrument because it is the outcome of a search through many candidate \({\mathbf{E}}_{i}\) s, but with a spatially organised variable, for example, a quadratic trend surface defined by its geographic coordinates, the outcome will typically be a much more significant correlation.Footnote 8 Because spatio-temporal panel data are unlikely to be randomly distributed and almost invariably spatially organised in some way, the spatial filter can be used to obtain a synthetic instrument which is highly correlated with an endogenous variable, and thus, one has a way to generate a relevant instrumental variables that are unrelated to the error term in the data, and yet which are highly correlated with endogenous variables which are related to the error term. This is very helpful, because relevant and exogenous instrumental variables are difficult to find. As noted by Le Gallo and Páez (2013), working in the context of cross-sectional data, ‘Synthetic variables, being artificial map patterns derived from the spatial configuration of the system, provide a near ideal solution—as long as spatial partitioning is not codetermined with other variables, which is typically the case’.

The aim is to obtain spatial filters for endogenous regressors \(\tilde{x}_{itk} ;i = 1,...,N,t = 1,...,T,k = 1,...,k_{2}\). The approach adopted involves iteratively fitting regressions in which the dependent variable is the \(k^{\prime}th\) endogenous variable \({\tilde{\mathbf{x}}}_{tk}\) and the independent variables are the eigenvectors \({\mathbf{E}}_{i}\). The outcome is the isolation of the relevant subset of \({\mathbf{E}}_{i}\) and their relative weights as given by estimated regression coefficients. So then for each endogenous variable we can form a weighted linear combination of the subset so as to give an appropriate synthetic instrument. This can be summarised thus

  1. (1)

    Set up empty vectors \({\mathbf{z}}_{k}; \quad k = 1,...,k_{2}\) to ultimately contain synthetic instrument for variable \(k\)

  2. (2)

    Use data for panel at time \(t = 1\)

  3. (3)

    \(k = 1\)

  4. (4)

    Set \(N\) by 1 vector \({\mathbf{V}} = 1\)

  5. (5)

    \(j = 1\)

  6. (6)

    For variable \(k\) and eigenvector \(j\), regress \({\tilde{\mathbf{x}}}_{tk}\) on \({\mathbf{E}}_{j}\)

  7. (7)

    If the regression coefficient \(\beta\) is significantly different from 0,\({\mathbf{V}} = [{\mathbf{V}} + \beta {\mathbf{E}}_{j} ]\)

  8. (8)

    \(j = j + 1\), if \(j \le N\) go to 6)

  9. (9)

    Regress \({\tilde{\mathbf{x}}}_{tk}\) on \({\mathbf{V}}\) to obtain \({\mathbf{\hat{\tilde{x}}}}_{tk}\)

  10. (10)

    Append \({\mathbf{\hat{\tilde{x}}}}_{tk}\) to the column matrix so that \({\mathbf{z}}_{k} = \left[ {{\mathbf{z}}_{k} ;{\mathbf{\hat{\tilde{x}}}}_{tk} } \right]\)

  11. (11)

    \(k = k + 1\), if \(k \le k_{2}\) go to 4)

  12. (12)

    \(t = t + 1,\) if \(t \le T\) go to 3)

The \(NT\) by 1 vectors \({\mathbf{z}}_{k} ,k = 1,...,k_{2}\) are then used as external synthetic instruments in GMM estimation.

5 The Monte Carlo simulations

5.1 The basic DGP

Our simulations assume that there are four regressors, though these are subsequently extended to give a spatial Durbin type of specification. At its simplest, our data generating process (DGP) is based on a version of Eqs. (2) and (3) thus

$$\begin{gathered} y_{it} = \gamma y_{it - 1} + \rho \sum\limits_{j = 1}^{N} {w_{ij} y_{it} } + \theta \sum\limits_{j = 1}^{N} {w_{ij} y_{it - 1} } + \beta_{1} x_{1it} + \beta_{2} x_{2it} + ... \hfill \\ \beta_{3} \tilde{x}_{3it} + \beta_{4} \tilde{x}_{4it} + \varepsilon_{it}; \quad i = 1,...,N,t = 1,...,T \hfill \\ \varepsilon_{it} = \mu_{i} + \nu_{it} \hfill \\ \end{gathered}$$
(13)

The aim is to devise a design that captures all sources of endogeneity, the ultimate outcome of which is \(\tilde{x}_{3it}\) and \(\tilde{x}_{4it}\) being correlated with \(\varepsilon_{it}\). The approach adopted is similar to Liu and Saraiva (2015), but in the context of compound errors, so that endogeneity occurs because of correlation between the regressors and \(\nu\) and hence \(\varepsilon\). In this simple initial case, the DGP draws from the Gaussian multivariate distribution,

$$\left( \begin{gathered} \begin{array}{*{20}c} {\begin{array}{*{20}c} \nu \\ {\tilde{\nu }} \\ \end{array} } \\ {{\mathbf{x}}_{1} } \\ {{\mathbf{x}}_{2} } \\ {{\tilde{\mathbf{x}}}_{3} } \\ \end{array} \hfill \\ {\tilde{\mathbf{x}}}_{4} \hfill \\ \end{gathered} \right)\sim N\left( {\left( \begin{gathered} \begin{array}{*{20}c} {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \\ 0 \\ 0 \\ 0 \\ \end{array} \hfill \\ 0 \hfill \\ \end{gathered} \right)\left[ {\begin{array}{*{20}c} {\sigma_{\nu }^{2} } & {p_{1} } & 0 & 0 & 0 & 0 \\ {p_{1} } & 1 & 0 & 0 & {p_{2} } & {p_{3} } \\ 0 & 0 & 1 & {p_{4} } & 0 & 0 \\ 0 & 0 & {p_{4} } & 1 & 0 & 0 \\ 0 & {p_{2} } & 0 & 0 & 1 & 0 \\ 0 & {p_{3} } & 0 & 0 & 0 & 1 \\ \end{array} } \right]} \right)$$
(14)

in which the leading diagonal of the covariance matrix contains the variances and the off-diagonals, \(p_{1} ,p_{2} ,p_{3}\) and \(p_{4}\), are the covariances between the random variables. The set-up in Eq. (14) indicates that the exogenous variables are unrelated to other variables, except that they are correlated with each other via \(p_{4}\). The endogenous regressors \({\tilde{\mathbf{x}}}_{3}\) and \({\tilde{\mathbf{x}}}_{4}\) are correlated with \(\tilde{\nu }\) which is a separate equation system, but \(\tilde{\nu }\) is correlated with the remainder error component \(\nu\) (with \(\tilde{\nu }\) separate from \(\nu\), there is the option of different \(\tilde{\nu }\) s for different endogenous regressors, as applied subsequently). The outcome is a set of \(NT\) by 1 random vectors. The individual error component \(\mu_{i}\) is generated via a univariate normal distribution with zero mean and variance \(\sigma_{\mu }^{2}\). Combining the error components \(\mu_{i}\) and \(\nu_{it}\) gives \(\varepsilon_{it}\), so that \(\tilde{x}_{3it}\) and \(\tilde{x}_{4it}\) are correlated with \(\varepsilon_{it}\). Note that we apply a similar approach in the context of non-spatial data and the spatial Durbin specification subsequently.

Given true values of the various parameters, and drawing in each replication from the multivariate normal distribution provides numerous realisations of \(y_{it} ,i = 1,...,N;t = 1,...,T\). Note that the draws leading to a maximum absolute characteristic root of \({\tilde{\mathbf{A}}}\) equal to or greater than 1 are rejected, so the simulation data sets are all dynamically stable and stationary. These data are the basis of estimates of the model parameters. The aim is to compare the resulting estimates with the true values of the parameters of Eq. (13).

In practice, for the purposes of simulation, various alternative true parameter values have been considered, but the results presented subsequently for the DGP are based on \(\sigma_{\mu }^{2} = 0.2,0.8\) and \(\sigma_{\nu }^{2} = 0.8,0.2\). The simulations thus encompass low and high individual heterogeneity, and low and high levels of remainder variance. Also, \(p_{1} = 0.5,p_{2} = 0.75\), \(p_{3} = 0.25\) and \(p_{4} = 0.3\), so \(\tilde{x}_{3it}\) is strongly endogenous, and \(\tilde{x}_{4it}\) is weakly endogenous. Also, it is assumed that \(\gamma = 0.75,\rho = 0.3,\theta = - 0.2\) and \(\beta_{1} = 4,\beta_{2} = 3,\beta_{3} = 2\),\(\beta_{4} = 1\).

The reported outcomes for this simple specification are based on a ‘r ahead and r behind’ connectivity matrix of Kelejian and Prucha (1999), which is subsequently row-standardised. Assuming r = 5 means that each row of spatial matrix \({\mathbf{W}}_{N}\, (i.e. \, w_{ij} ,{\text{ with }}i = 1,...,N,j = 1,...,N)\) has up to 10 connections (five ahead and five behind each with equal weights), with zeros elsewhere and on the main diagonal. Additionally, we subsequently consider results based on a dense \({\mathbf{W}}_{N}\) matrix in the context of the spatial Durbin specification.

Results are reported for 100 replications, which nullifies aberrant outcomes and is sufficient to highlight the main traits in the simulation. In each replication, initial 51 simulation outcomes of \(x_{1it} ,x_{2it} ,\tilde{x}_{3it} {\text{ and }}\tilde{x}_{4it}\) are ignored in order to minimise any potential effect of initial values at \(t = - 50\) of zero (i.e. simulation outcomes for \(t = - 50, - 49,...,0\) are discarded). Also, \(T = 10 \,\) and there are \(N = 100 \,\) regions.

5.2 Results for simple spatial DGP

The results obtained depend of various set-ups regarding instruments. The idea is that we wish to show the impact of having fewer instruments on estimation as well as the effect of different error distribution assumptions. The largest number of instruments is given by applying the standard solution to the existence of endogenous variables, namely HENR instrumentation. Temporally lagged values of the endogenous variables \(y_{it}\) and \(Wy_{it}\) together with temporally lagged values of \(Wy_{it - 1,} \tilde{x}_{3it} ,\tilde{x}_{4it}\) and their spatial lags \(W^{2} y_{it - 1} ,W\tilde{x}_{3it} ,W\tilde{x}_{4it}\), with full HENR instrumentation amount to \(8(T - 1)(T - 2)/2\) = 288 instruments. In addition, there are four exogenous IV-style instruments equal to the exogenous variables and their spatial lags, namely \(x_{1it} ,x_{2it} ,Wx_{1it} ,Wx_{2it}\). The result is 288 + 4 = 292 instruments overall.

One side-issue relating to the existence of many instruments is the possibility that some are almost collinear. One could drop them from the instrument set, but this would change the number of degrees of freedom available for the Sargan–Hansen J statistics. Moreover, the definition of collinearity is somewhat subjectiveFootnote 9 and eliminating collinear instruments has minimal impact on outcomes.

Synthetic instruments are usually strongly correlated. For example, on the basis of 100 replications the mean correlation between \(\tilde{x}_{4it}\) and its spatial lag \(W\tilde{x}_{4it}\) and their respective synthetic instruments is 0.6687 and 0.7205. Applying them to the endogenous regressors and their spatial lags, \(\tilde{x}_{3it} ,W\tilde{x}_{3it}\) and \(\tilde{x}_{4it} ,W\tilde{x}_{4it}\), together with the exogenous regressors and their spatial lags gives eight IV-style instruments. Full HENR instrumentation for \(y_{it,} Wy_{it,} Wy_{it - 1,} W^{2} y_{it - 1}\), adds \(4(T - 1)(T - 2)/2\) = 144 instruments, giving a total of 152 instruments overall. There is considerable reduction in the number of instruments resulting from both collapsing and using synthetic instruments. Collapsing the standard HENR instrumentation for \(y_{it} ,Wy_{it} ,Wy_{it - 1}\) and \(W^{2} y_{it - 1}\) gives \(4(T - 2)\) = 32 instruments. Adding the aforementioned eight IV-style instruments gives an overall total of \(4(T - 2) + 8\) = 40 instruments. Collapsing standard HENR instrumentation for \(y_{it}\) alone gives \((T - 2)\) = 8 instruments. Using synthetic instruments for \(Wy_{it} ,Wy_{it - 1}\) and \(W^{2} y_{it - 1}\), together with \(\tilde{x}_{3it} ,W\tilde{x}_{3it} ,\tilde{x}_{4it} ,W\tilde{x}_{4it}\) creates seven IV-style instruments, and there are four more from the exogenous variables and their spatial lags \(x_{1it} ,x_{2it} ,Wx_{1it} ,Wx_{2it}\). Combined, there are 19 instruments in total.

The results obtained show remarkably little negative consequence as a result of reducing the number of instruments, plus the significant benefit of an unbiased Sargan–Hansen J statistic and less biased standard errors. Tables 1 and 2 show the mean parameter estimates resulting from 100 Monte Carlo simulations according to the number of instruments, true parameter values, and error distribution assumptions.Footnote 10 It is clear that the estimates closely approximate the true values. Table 3 summarises the outcomes, showing the mean absolute parameter bias, obtained by averaging across the mean absolute bias for each parameter, and the average of the mean RMSE’s, again averaging across the mean RMSE of each parameter. Bias tends to be smaller with larger \(\sigma_{\nu }^{2}\), but the opposite is true for RMSE. There is little variation in either as the number of instruments varies. In Tables 4 and 5, for each table cell relating to each parameter, we give the average of the mean simulation outcomes averaging across classic two-step, Windmeijer and HKL standard errors. Also, the mean classic two-step, Windmeijer and HKL standard errors are obtained by taking the average of the mean standard errors, averaging across parameter standard errors. The tables indicate that standard errors tend to rise as the number of instruments falls, though this is confounded by the fact that with 152 and 292 instruments the weight matrix is not symmetric positive definite and a generalised inverse is applied, so the standard errors cannot be guaranteed to be accurate. Compared with the uncorrected classic two-step standard errors, the corrections due to Windmeijer and the HKL correction are associated with larger standard errors. Tables 4 and 5 also indicate that standard errors are higher when remainder variance is high and individual variance is low.

Table 1 Mean parameter estimates: \(\sigma_{\mu }^{2} = 0.8\)\(\sigma_{\nu }^{2} = 0.2\)
Table 2 Mean parameter estimates: \(\sigma_{\mu }^{2} = 0.2\)\(\sigma_{\nu }^{2} = 0.8\)
Table 3 Mean parameter bias and root mean squared error (RMSE) for selected data generated processes (DGP)
Table 4 Mean standard errors: \(\sigma_{\mu }^{2} = 0.8\) \(\sigma_{\nu }^{2} = 0.2\)
Table 5 Mean standard errors: \(\sigma_{\mu }^{2} = 0.2\) \(\sigma_{\nu }^{2} = 0.8\)
Table 6 Mean value of diagnostics:\(\sigma_{\mu }^{2} = 0.8\)\(\sigma_{\nu }^{2} = 0.2\)

Tables 6 and 7 summarise diagnostic indicators in terms of means taken across the 100 Monte Carlo simulations, again broken down by the number of instruments and error distribution assumptions. A prominent feature is the downward bias in the Sargan–Hansen J statistic for overidentifying restrictions with a large number of instruments. The J test statistics reflect the problem of low power associated with too many moments conditions. This leads to an implausibly high mean p-value of 1.000 (Anderson and Sørenson 1996, Bowsher 2002, Roodman 2009b). Reducing from 292 to 152 instruments is insufficient to reduce the mean J test statistic much below 1.0. While this is an unreliable indicator of instrument validity, the z-ratios for error serial correlation point to significant negative first order correlation (\(m_{1}\)) and effectively zero second-order serial correlation (\(m_{2}\)) in the first differenced residuals (see Arellano and Bond 1991) when referred to the N(0,1) distribution. This points clearly to consistent estimates. The tables of diagnostics also show that the estimates obtained are dynamically stable, as measured by the maximum absolute eigenvalue of \({\tilde{\mathbf{A}}}\).

Table 7 Mean value of diagnostics:\(\sigma_{\mu }^{2} = 0.2\)\(\sigma_{\nu }^{2} = 0.8\)
Table 8 Mean parameter estimates: \(\sigma_{\mu }^{2} = 0.8\)\(\sigma_{\nu }^{2} = 0.2\)

5.3 Results with non-spatial data

Thus far, we have reduced instrument proliferation using collapsing combined with synthetic instruments derived from the \(N\) by \(N\) contiguity matrix \({\mathbf{M}}_{N}\). In this section, we explore the efficacy of this approach in the absence of spatial effects. Consider therefore the DGP as in Eq. (13) but with \(\rho = 0,\theta = 0.\) There are four estimators under consideration. First, full HENR instruments based on endogenous variables \(y_{it}\), \(\tilde{x}_{3it}\) and \(\tilde{x}_{4it}\), with IV-style instruments for the exogenous \(x_{1it}\) and \(x_{2it}\) gives 3(T − 2)(T − 1)/2 + 2 = 110 instruments. Secondly, full HENR instruments based on \(y_{it}\), with synthetic instruments for \(\tilde{x}_{3it}\) and \(\tilde{x}_{4it}\) which become IV-style instruments alongside \(x_{1it}\) and \(x_{2it}\) gives (T − 2)(T − 1)/2 + 4 = 40 instruments. Thirdly, collapsing the HENR instruments based on \(y_{it}\) plus the 4 above-mentioned IV-style instruments gives (T − 2) + 4 = 12 instruments. Fourthly, collapsing the HENR instruments based on \(y_{it}\) \(\tilde{x}_{3it}\) and \(\tilde{x}_{4it}\) gives 3(T − 2) + 2 = 26 instruments.

Data are generated by again randomly sampling from a multivariate normal distribution as in Eq. (15). In this case correlation between \({\mathbf{x}}_{1}\) and endogenous regressors \({\tilde{\mathbf{x}}}_{3}\) and \({\tilde{\mathbf{x}}}_{4}\) is introduced via \(p_{5} = 0.4\) and \(p_{6} = 0.5\). Also \(p_{1} = 0.5,p_{2} = 0.75,p_{3} = 0.25\) and \(p_{4} = 0.3\).

$$\left( \begin{gathered} \begin{array}{*{20}c} {\begin{array}{*{20}c} \nu \\ {\tilde{\nu }} \\ \end{array} } \\ {{\mathbf{x}}_{1} } \\ {{\mathbf{x}}_{2} } \\ {{\tilde{\mathbf{x}}}_{3} } \\ \end{array} \hfill \\ {\tilde{\mathbf{x}}}_{4} \hfill \\ \end{gathered} \right)\sim N\left( {\left( \begin{gathered} \begin{array}{*{20}c} {\begin{array}{*{20}c} 0 \\ 0 \\ \end{array} } \\ 0 \\ 0 \\ 0 \\ \end{array} \hfill \\ 0 \hfill \\ \end{gathered} \right)\left[ {\begin{array}{*{20}c} {\sigma_{\nu }^{2} } & {p_{1} } & 0 & 0 & 0 & 0 \\ {p_{1} } & 1 & 0 & 0 & {p_{2} } & {p_{3} } \\ 0 & 0 & 1 & {p_{4} } & {p_{5} } & {p_{6} } \\ 0 & 0 & {p_{4} } & 1 & 0 & 0 \\ 0 & {p_{2} } & {p_{5} } & 0 & 1 & 0 \\ 0 & {p_{3} } & {p_{6} } & 0 & 0 & 1 \\ \end{array} } \right]} \right)$$
(15)

Tables 8, 9, 10, 11, 12, 13 and 14 summarise the outcomes over a range of simulation set-ups. Broadly stated, there is minimal impact resulting from reducing the number of instruments, apart from the usual effects of eliminating bias in the Sargan Hansen J test static and in the parameter standard errors. However, the fact that estimates of \(\beta_{1}\) are persistently downwardly biased, and \(\beta_{2}\), \(\beta_{3}\) and \(\beta_{4}\) upwardly biased is worthy of mention. Also, Table 10 indicates that bias tends to be larger with greater individual heterogeneity, but no obvious change in bias and RMSE occurs with varying numbers of instruments. In contrast, as shown in Tables 11 and 12, there is a clear increase in parameter standard errors as the number of instruments diminishes. Also, comparing Tables 11 and 12, standard errors tend to be larger as remainder error variance increases and individual heterogeneity reduces. Again, the corrected standard errors are invariably larger than the classic two-step standard errors. Tables 13 and 14 highlight the usual downward bias in the Sargan–Hansen J statistic, reflecting the rule-of-thumb that the J statistic should be more reliable when the number of instruments is less than the number of individuals (\(N =\) 100).

Table 9 Mean parameter estimates: \(\sigma_{\mu }^{2} = 0.2\)\(\sigma_{\nu }^{2} = 0.8\)
Table 10 Mean parameter bias and root mean squared error (RMSE) for selected data generated processes (DGP)
Table 11 Mean standard errors: \(\sigma_{\mu }^{2} = 0.8\)\(\sigma_{\nu }^{2} = 0.2\)
Table 12 Mean standard errors: \(\sigma_{\mu }^{2} = 0.2\) \(\sigma_{\nu }^{2} = 0.8\)
Table 13 Mean value of diagnostics:\(\sigma_{\mu }^{2} = 0.8\)\(\sigma_{\nu }^{2} = 0.2\)
Table 14 Mean value of diagnostics:\(\sigma_{\mu }^{2} = 0.2\)\(\sigma_{\nu }^{2} = 0.8\)

5.4 Results for Spatial Durbin specification with dense connectivity matrix

The data generating process for spatial data implemented a row-normalised ‘five ahead and five behind’ matrix \({\mathbf{W}}_{N}\) for the spatial lag of the dependent variable and for instruments. In this section, we replace this with a Lehmer matrix, which is a symmetric and positive definite matrix \({\mathbf{W}}_{N}^{*} (i.e. \, w_{ij}^{*} ,{\text{ with }}i = 1,...,N,j = 1,...,N)\) in which

$$\begin{gathered} w_{ij}^{*} = i/j,j \ge i \hfill \\ w_{ij}^{*} = j/i,j < i \hfill \\ \end{gathered}$$
(16)

\({\mathbf{W}}_{N}^{*}\) is a dense matrix with cell values diminishing away from the main diagonal.\({\mathbf{W}}_{N}\) is \({\mathbf{W}}_{N}^{*}\) with a main diagonal by zeros, then row normalised, as illustrated in Fig. 1. As with the other \({\mathbf{W}}_{N}\) the row and column sums are uniformly bounded.

Fig. 1
figure 1

Based on Lehmer matrix

To create added diversity, an attempt is made to approximate the reality of many analytical situations in which all variables, denoted by \(\tilde{x}_{1it} ,\tilde{x}_{2it} ,\tilde{x}_{3it}\) and \(\tilde{x}_{4it}\), are endogenous. Also, we include spatial lags of explanatory variables to give a spatial Durbin specification (see Halleck Vega and Elhorst 2015, for discussion). Inclusion of spatial lags \(W\tilde{x}_{1it} ,W\tilde{x}_{2it} ,W\tilde{x}_{3it} ,W\tilde{x}_{4it}\) with corresponding parameters \(\beta_{5} ,\beta_{6} ,\beta_{7}\) and \(\beta_{8}\), gives

$$\begin{gathered} y_{it} = \gamma y_{it - 1} + \rho \sum\limits_{j = 1}^{N} {w_{ij} y_{it} } + \theta \sum\limits_{j = 1}^{N} {w_{ij} y_{it - 1} } + \beta_{1} \tilde{x}_{1it} + \beta_{2} \tilde{x}_{2it} + \beta_{3} \tilde{x}_{3it} + \beta_{4} \tilde{x}_{4it} ... \hfill \\ + \beta_{5} \sum\limits_{j = 1}^{N} {w_{ij} } \tilde{x}_{1it} + \beta_{6} \sum\limits_{j = 1}^{N} {w_{ij} } \tilde{x}_{2it} + \beta_{7} \sum\limits_{j = 1}^{N} {w_{ij} } \tilde{x}_{3it} + \beta_{8} \sum\limits_{j = 1}^{N} {w_{ij} } \tilde{x}_{4it} + \varepsilon_{it} ;i = 1,...,N,t = 1,...,T \hfill \\ \varepsilon_{it} = \mu_{i} + \nu_{it} \hfill \\ \end{gathered}$$
(17)

Again the variables are generated from a multivariate Gaussian distribution, in this case allowing each endogenous variable to depend on a separate error process, as might occur in reality with diverse sources of endogeneity. Thus, \({\tilde{\mathbf{x}}}_{1}\) correlates with \(\tilde{\nu }_{1}\) via \(p_{7}\) in Eq. (18), \(p_{8}\) is the \({\tilde{\mathbf{x}}}_{2}\), \(\tilde{\nu }_{2}\) correlation, \(p_{2}\) is the \({\tilde{\mathbf{x}}}_{3}\), \(\tilde{\nu }_{3}\) correlation and \(p_{3}\) is the correlation between \({\tilde{\mathbf{x}}}_{4}\) and \(\tilde{\nu }_{4}\). The correlation between the \(\tilde{\nu }\) s and \(\nu\) is \(p_{1}\), and the correlations between the regressors are \(p_{4} ,p_{5}\) and \(p_{6}\).

$$\left( {\begin{array}{*{20}c} \nu \\ {\tilde{\nu }_{1} } \\ {\tilde{\nu }_{2} } \\ {\tilde{\nu }_{3} } \\ {\tilde{\nu }_{4} } \\ {{\tilde{\mathbf{x}}}_{1} } \\ {{\tilde{\mathbf{x}}}_{2} } \\ {{\tilde{\mathbf{x}}}_{3} } \\ {{\tilde{\mathbf{x}}}_{4} } \\ \end{array} } \right)\sim N\left( {\left( {\begin{array}{*{20}c} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ \end{array} } \right)\left[ {\begin{array}{*{20}c} {\sigma_{\nu }^{2} } & {p_{1} } & {p_{1} } & {p_{1} } & {p_{1} } & 0 & 0 & 0 & 0 \\ {p_{1} } & 1 & 0 & 0 & 0 & {p_{7} } & 0 & 0 & 0 \\ {p_{1} } & 0 & 1 & 0 & 0 & 0 & {p_{8} } & 0 & 0 \\ {p_{1} } & 0 & 0 & 1 & 0 & 0 & 0 & {p_{2} } & 0 \\ {p_{1} } & 0 & 0 & 0 & 1 & 0 & 0 & 0 & {p_{3} } \\ 0 & {p_{7} } & 0 & 0 & 0 & 1 & {p_{4} } & {p_{5} } & {p_{6} } \\ 0 & 0 & {p_{8} } & 0 & 0 & {p_{4} } & 1 & 0 & 0 \\ 0 & 0 & 0 & {p_{2} } & 0 & {p_{5} } & 0 & 1 & 0 \\ 0 & 0 & 0 & 0 & {p_{3} } & {p_{6} } & 0 & 0 & 1 \\ \end{array} } \right]} \right)$$
(18)

Variables entering Eq. (17) are generated on the basis of \(p_{1} = 0.5,p_{2} = 0.75,p_{3} = 0.25\), \(p_{4} = 0.3\), \(p_{5} = 0.4\), \(p_{6} = 0.5\),\(p_{7} = 0.8\) and \(p_{8} = 0.2\). Additionally, data are generated on the basis of \(p_{1} = 0.75,p_{2} = 0.75,p_{3} = 0.25\), \(p_{4} = - 0.5\), \(p_{5} = - 0.2\), \(p_{6} = 0.75\), \(p_{7} = 0.8\) and \(p_{8} = 0.3.\) In this case, the closer correlation between \(\nu\) and the \(\tilde{\nu }\) s suggests increased endogeneity.

The true parameter values utilised in Eq. (17) are \(\gamma = 0.3,\rho = 0.2,\theta = - 0.4\) and \(\beta_{1} = 4,\beta_{2} = 3,\beta_{3} = 2\), \(\beta_{4} = 1,\beta_{5} = 1,\beta_{6} = 2\), \(\beta_{7} = 3\) and \(\beta_{8} = 4\). Consider first an estimator which applies synthetic instruments, based on \({\mathbf{M}}_{N}\), to the endogenous variables \(\tilde{x}_{1it} ,\tilde{x}_{2it} ,\tilde{x}_{3it} ,\tilde{x}_{4it}\) their spatial lags \(W\tilde{x}_{1it} ,W\tilde{x}_{2it} ,W\tilde{x}_{3it} ,W\tilde{x}_{4it}\). Combining these eight IV-style instruments with HENR instruments from \(y_{it} ,Wy_{it} ,Wy_{it - 1}\) and \(W^{2} y_{it - 1}\), gives s 4(T − 1)(T − 2)/2 + 8 = 152 instrumental variables. Secondly, applying synthetic instruments to all variables except \(y_{it}\) and its spatial lag \(Wy_{it}\), results in (T − 1)(T − 2) + 10 = 82 instruments. Collapsing these latter two HENR sets of instruments gives 2(T − 2) + 10 = 26 instruments. A simple comparator assumes \(\beta_{5} = 0,\beta_{6} = 0,\beta_{7} = 0\) and \(\beta_{8} = 0\), so that the spatial lags and their respective instruments are eliminated, leaving 148, 78 and 22 instruments, respectively.

Tables 15, 16, 17, 18, 19, 20, 21, 22 and 23 give the mean outcomes, broken down by number of instruments, the values assigned to \(\sigma_{\mu }^{2}\) and \(\sigma_{\nu }^{2}\) and the values of \(p_{1} ,...,p_{8}\) denoted either by \(p_{1} = 0.5\; \text{etc}\) or \(p_{1} = 0.75\; \text{etc}\). The italicised numbers are the outcomes for a simple comparator. The salient aspects are the usual bias with many instruments in the Sargan–Hansen J statistic and parameter standard errors. For the spatial Durbin specification, some parameter estimation bias is apparent in Tables 15 and 16; notably, the \(\rho\) and \(\theta\) estimates are persistently upwardly biased regardless of the number of instruments or error assumptions, and likewise the spatial lag parameters \(\beta_{5} ,\beta_{6} ,\beta_{7}\) and \(\beta_{8}\) are negatively biased.Footnote 11 In contrast, the bias for the comparator is negligible, and since everything else is the same, it appears that the primary cause of the larger bias is the presence of spatial lags. For the spatial Durbin specification, Table 17 shows that on the whole bias tends to increase as the number of instruments decreases. It is also larger as endogeneity increases, with \(p_{1}\) going from 0.5 to 0.75. Additionally, it increases as error variance goes from \(\sigma_{\nu }^{2} = 0.2\) to \(\sigma_{\nu }^{2} = 0.8\). Mean RMSE follows a similar pattern. For the simple comparator, there are similar traits, but the bias and RMSE are always much smaller. Tables 18 and 19 show that for the spatial Durbin specification we again see that more instruments are associated with extra downward bias in standard errors and therefore upward bias in t-ratios. Downward bias is also evident from comparing the classic two-step standard errors and the corrected standard errors due to Windmeijer and HKL. The spatial Durbin diagnostics in Tables 20, 21, 22, 23 all point to the viability of the instruments, according to the evidence provided by the \(m_{2}\) statistic, although again the J test statistic is severely undersized when there are a large number of instruments.

Table 15 Mean parameter estimates: spatial Durbin and simple specifications
Table 16 Mean parameter estimates: spatial Durbin and simple specifications
Table 17 Mean parameter bias and root mean squared error (RMSE) for selected data generated processes (DGP): spatial Durbin and simple specifications
Table 18 Mean standard errors: spatial Durbin specification
Table 19 Mean standard errors: spatial Durbin specification
Table 20 Spatial Durbin specification mean value of diagnostics \(\sigma_{\mu }^{2} = 0.8\)\(\sigma_{\nu }^{2} = 0.2,p_{1} = 0.5{\text{etc}}\)
Table 21 Spatial Durbin specification mean value of diagnostics \(\sigma_{\mu }^{2} = 0.2\)\(\sigma_{\nu }^{2} = 0.8,p_{1} = 0.5\;{\text{etc}}\)
Table 22 Spatial Durbin specification mean value of diagnostics \(\sigma_{\mu }^{2} = 0.8\)\(\sigma_{\nu }^{2} = 0.2,p_{1} = 0.75\;{\text{etc}}\)
Table 23 Spatial Durbin specification mean value of diagnostics \(\sigma_{\mu }^{2} = 0.2\)\(\sigma_{\nu }^{2} = 0.8,p_{1} = 0.75\;{\text{etc}}\)

6 Example with real data

Fingleton et al. (2020) provides a good example where the application of synthetic instruments produces valuable information regarding the reality of causal effects. The data analysed are taken from successive UK censuses carried out in 1971, 1981, 1991, 2001 and 2011 and relate to employment across each of 760 small areas of Greater London known as wards. For each ward and each census, data are available on the level of employment, and the number of people born in Ireland, India, Pakistan, mainland Europe, the UK, and the rest of the world (i.e. elsewhere), so-called country-of-birth cohorts. Thus, for each ward and each census year there are six country-of-birth cohorts. Additionally, data are available on the location quotientFootnote 12 of the unemployment rate. Therefore, there are seven possibly endogenous right hand side explanatory variables. These data, in logarithmic form, are analysed via a dynamic spatial panel data model in the spirit of Eqs. (1) and (2), in which log employment is the dependent variable and the other variables are explanatory variables, and employment depends on the level of employment in the previous census, the level of employment in nearby (contiguous wards), and the level of employment in contiguous wards in the previous census. Additionally, unlike Fingleton et al. (2020), year dummies are introduced, since they are statistically significant. The data only allow two, because of the effect of lagging and differencing which leaves only three years, and one year dummy has to be dropped to avoid perfect collinearity. The choice of dummy years has no effect on the estimates obtained, apart from the dummy variable parameter estimates. Also, time-invariant heterogeneity across wards is eliminated by differencing. The analysis employs a row-standardised contiguity matrix \({\mathbf{W}}_{N}\) in order to capture spatial spillover effects. Additionally, following Baltagi et al. (2019), spatial error dependence is represented by a spatial moving average (SMA) error process. As they explain, this should control for omitted spatial lags of regressors typical of the spatial Durbin specification. With SMA errors, the negative coefficient for the spatial error parameter \(\lambda\) indicates positive spatial error dependence.

Fingleton et al. (2020) consider two scenarios: one is that the explanatory variables are exogenous, and the other assumes that the regressors are endogenous, with instruments controlling for endogeneity inducing effects, for example, due to reverse causation where an increase in the level of employment, the dependent variable, causes country-of-birth numbers to increase, maybe resulting from inward migration attracted by employment opportunities. Eliminating this kind of feedback effect is likely to provide evidence of causal impacts, isolating the effect of an increase in a country-of-birth cohort on the level of employment. However, their conclusions are cautionary because of the short time period considered. Given that the data allow only \(T = 4\) effective census years, with 1971 providing the lagged data, this means that one is unable to calculate the Arellano and Bond (1991) \(m_{2}\) test statistic for zero second order serial correlation in the first differenced residuals, which is only defined for \(T > = 5\).

Applying synthetic instruments involves just 22 instruments and leads to a Sargan–Hansen J test that does support an assumption of consistent estimation. The instrument set includes 14 IV-style synthetic instruments derived from the seven presumably endogenous explanatory variables and their spatial lags, plus eight collapsed HENR instruments based on lagged values of \(y_{it} ,Wy_{it} ,Wy_{it - 1}\) and \(W^{2} y_{it - 1}\), giving 22 instruments in total. The synthetic instruments are based on the eigenvectors \({\mathbf{E}}_{i} ,i = 1,...,N\) deriving from the matrix \({\mathbf{P}}_{N}\) and the symmetric contiguity matrix \({\mathbf{M}}_{N}\). Given the large number of eigenvectors \((N = 760)\), a quite stringent rule is applied to rule out spurious correlation between eigenvectors and each endogenous variable, with a t-ratio of 2.58 applied, which is equivalent to a two-sided p value = 1%. The estimates are given in Tables 24 and 25.

Table 24 Model diagnostics
Table 25 Parameter estimates using synthetic instruments

The diagnostics in Table 24 indicate acceptable model estimates. The maximum characteristic root of \({\tilde{\mathbf{A}}}\) points to a dynamically stable and stationary model. Moreover, there is evidence from Sargan–Hansen’s J that the 10 overidentifying restrictions are valid, indicating that endogeneity bias has been purged, estimates are consistent, and the effects are interpreted as causal.

The interpretation of Table 25 reflects to some extent the analysis of Fingleton et al. (2020). The signs of the parameter estimates are not dissimilar, although the presence of the year dummies in the current specification has an impact; for example, the estimate of the spatial lag parameter \(\rho\) is smaller though still significantly different from zero (see Lee and Yu 2010). Some estimates are larger and although we have larger classic standard errors, t-ratios are larger and pick out some significant effects. On the basis of the uncorrected standard errors, the significant variables are Irish, Indian, UK, Pakistani and rest of the world countries-of-birth, plus a significant negative effect due to the unemployment location quotient and significant positive spatial spillovers from log employment (\(\rho\)) and residuals (\(\lambda\)) in contiguous districts. A 1% increase in Indian-born residents evidently causes an approximately 0.16% increase in employment. For UK-born residents, the estimate is a 0.6% increase, and for migrants from the rest of the world, we estimate a 0.22% increase. In contrast, a 1% increase in the Irish-born population evidently causes a 0.35% reduction in employment, and for Pakistani-born residents there is a 0.085% reduction. While we might infer causality due to the avoidance of endogeneity-bias in our estimation, but caution is required, partly because these conclusions are based on uncorrected standard errors. As shown in Table 25, correction due to Windmeijer and HKL does increase standard errors, but the significant effects persist.

An additional consideration is the fact that the parameter point estimates are not the true elasticities as a result of the significant spatial spillover effect (\(\rho\)) which magnifies the initial parameter point estimates. Accordingly, the true elasticity for Indian-born residents (variable \(x_{2}\)) is derived from

$$\left[ {\frac{dy}{{dx_{12} }}...\frac{dy}{{dx_{N2} }}} \right]_{t} = \hat{\beta }_{2} {\mathbf{B}}_{N}^{ - 1}$$
(19)

which is an \(N\) by \(N\) matrix of partial derivatives. From this, a simplified average measure of the total effect of a 1% increase in the Indian-born population at time t is given by the mean column sum of \(\hat{\beta }_{2} {\mathbf{B}}_{N}^{ - 1}\), which is equal to 0.269%. For migrants born in the rest of the world, the elasticity is 0.374%. By comparison, the true elasticity for the UK-born cohort is 1.01%. On the negative side, the elasticity for Pakistani-born residents is − 0.144% and for the Irish-born it is − 0.603%.

Extending beyond time \(t\), assuming dynamic stability and stationarity, elasticities converge in the long-run to steady-state levels given by

$$\left[ {\frac{dy}{{dx_{1k} }}...\frac{dy}{{dx_{Nk} }}} \right] = \left[ { - {\mathbf{C}}_{N} + {\mathbf{B}}_{N} } \right]^{ - 1} \left( {\hat{\beta }_{k} {\mathbf{I}}_{N} } \right)$$
(20)

The total long-run effect of a 1% increase in the Indian-born population is given by the mean column sum of Eq. (20) with \(k = 2\), which is equal to 0.699%. For migrants born in the rest of the world, the elasticity is 0.971%. By comparison, the true elasticity for the UK-born cohort is 2.599%. On the negative side, the elasticity for Pakistani-born residents is − 0.374% and for the Irish-born it is − 1.567%.

7 Conclusion

One of the main problems in analysing dynamic (spatial) panel data estimated by GMM with endogenous explanatory variables is the difficulty in finding appropriate instruments. A standard approach is to use instruments that are internal to the model, lagging the regressors appropriately in order to satisfy the moments conditions which require instruments to be orthogonal to differenced errors. The default approach using HENR instrumentation typically generates a large number of internal instruments, as could using multiple spatial lags as suggested in the spatial econometrics literature. But instrument proliferation makes the Hansen–Sargan J test statistic severely undersized and possessing low power and therefore unreliable as an indicator of estimation consistency. There are several options available to reduce the problem, and this paper proposes a new one, which is to extend the use of synthetic instruments as advocated by Le Gallo and Páez (2013) for cross-sectional data to dynamic spatial panel data modelling. The key to the approach is a set of exogenous instruments derived from a connectivity matrix which is not causally related to the data. Nevertheless, the synthetic instruments thus created tend to be strongly correlated with endogenous regressors and thus can remedy the problem of weak instruments. Using Monte Carlo simulation, the paper provides evidence that collapsing and also applying synthetic instruments, and as a result significantly reducing the number of instruments, reduces problems associated with instrument overabundance while producing, on the whole, plausible parameter estimates, although there is evidence of some parameter estimation bias. An important additional problem is the downward bias in parameter standard error estimates, which can result in serious upward bias in t-ratios, although collapsing and the use of synthetic instruments causes this problem to greatly diminish. The results of Monte Carlo simulation are of course conditional on assumptions made, so one cannot be too dogmatic regarding the generality of the conclusions reached. However, application to real data given in Fingleton et al. (2020) produces plausible outcomes and new insights which put the analysis provided in that paper on a stronger inferential footing.