1 Introduction

The paper aims to simulate potential outcomes for employment across 255 EU regions as a result of the impending and, at the time of writing, probable departure of the UK from the EU, commonly known as Brexit. Most analysts agree that Brexit will have momentous consequences for the UK and (remaining) EU economies, but there is very minimal analysis at the regional scale, and analysis typically fails to account for interconnectivity at the regional level. Some regional impact studies have been carried out by Dhingra et al. (2017a, b), Los et al. (2017) and McCann (2018), and the current paper complements or contrasts with this research by applying a state-of-the-art dynamic spatial panel data model, in which a pan-European approach is adopted involving the majority of EU regionsFootnote 1 and all UK regions. This modelling approach is ideally suited to capturing the impact of spatial interconnectivity of the European regions and projecting the long-run consequences of Brexit across EU and UK regions, thus enabling comparison of the impact on both sides of the Channel and the Irish Sea. To this end, we use the dynamic spatial panel data model and prediction equation recently introduced into the literature by Baltagi et al. (2018) and applied in different contexts by Fingleton et al. (2018) and Fingleton and Szumilo (2019). The model assumes that employment in a given region depends on the levels of production and investment within that region, as shown in the basic economic model which underpins the estimating equation, and it also depends on demand coming from all the regions of the EU and UK, as determined by interregional trade flows. Additionally, the model proposes that employment levels in any regional are closely linked to employment levels in the region in the previous period and on employment levels in trade-connected EU and UK regions in the previous period. Following this literature, a rational basis for the presence of spatial and temporal lags is introduced which more typically is ad hoc in the spatial econometrics literature. In addition, the model takes account of unobserved factors which also affect the level of employment. These are captured by region-specific random effects which are also spatially interdependent. An additional feature of the approach adopted is the way in which endogeneity is handled, with the application of internal instruments in the spirit of Arellano and Bond (1991), thus eliminating the often difficult search for valid external instruments.

The focus of analysis is the so-called job-shortfall which could arise due to Brexit. In other words, the intention is not to forecast what happens to the actual levels of employment in each region, which would be the predicted change in the number of jobs, but to simulate what the impact of Brexit would be assuming no consequential responses such as jobs created by new trade links formed post-Brexit, changes to the UK’s competitivity and consequences for demand and employment due to changes in exchange rates and prices, changes to migration flows in and out of the UK, changes in the competitivity of firms if trade barriers are increased and regulations relaxed, and possible changes in levels of inward and outward investment and capital stock if capital relocates. Contemplating these and other possible consequences enhance uncertainty regarding what might be the actual change in the levels of employment in the UK regions, so in this paper the focus is on attempting to simulate the job-shortfall due to Brexit per se.

Stated more explicitly, the empirical analysis bases the spatial interdependence of levels of employment across different regions on how closely they are connected in terms of trade. We assume that employment levels partly reflect demand for a given region’s good and services coming from the UK and EU regions. Naturally, since about \(50\%\) of the UK’s trade in 2019 is with countries outside the EU, demand coming from these non-EU countries will also affect the levels of employment. For both UK and EU regions generally, we assume that the non-EU component of demand is reflected by the levels of production and investment within each region. In this way, we have attempted to isolate the impact of reduced trade between UK and EU regions from the potential effects of changes in non-EU trade on employment. In the simulations, the non-EU trade flows and hence capital and output are assumed to remain at the same level as previously, thus leading to our focus on job-shortfall rather than job-loss or job-gain. Moreover, in an effort to make the simulations robust, simplifying assumptions are made regarding the impact of Brexit on trade flows between individual UK and EU regions, leading to a geography of the Brexit impact that is immune to changes in the actual level of assumed trade.

More specifically, estimates are made of employment levels across \(N=255\) EU regions both with and without Brexit. The explicit drivers of employment are output and capital, which are approximated by Gross Value Added (GVA) and a function of Gross Fixed Capital Formation (GFCF), respectively. Estimation is based on a viable data series over the period from 2001 to 2010. Data for 2011 and 2012 are not used in estimation but held for one- and two-step ahead prediction. Different assumptions can be made about post-2011 paths for GVA and GFCF, given that accessible data with the same geography are not available over the more recent period, although it has been found that these have relatively little effect on outcomes.

The structure of the paper is as follows. In Sect. 2, the model is outlined, and the data described in Sect. 3. Section 4 summarizes the estimator, and Sect. 5 the resulting estimates. Section 6 focuses on prediction methodology, and Sect. 7 gives details of the method for simulating the Brexit effect. Section 8 gives the simulation results, and Sect. 9 concludes. The “Appendices” section give the theoretical basis of the estimating equation, provide details of the outcomes from alternative estimators, and summarize the Chow–Lin approach to obtaining an interregional connectivity matrix. The last section lists the references.

2 The model

The reduced form used as a basis to simulate the Brexit effect assumes that employment partly depends on level of output, as measured by GVA (Gross Value Added), denoted by \({\mathbf {q}}_{t}\), and (a proxy for) the level of capital within the region, based on GFCF (Gross Fixed Capital Formation), which is denoted by \({\mathbf {k}}_{t}\). To show this, we start with the theoretical model given as Eq. (1), which is based on Eq. (30) given in “Appendix”. The N by 1 vector \({\mathbf {e}}_{t}^{d}\) is the density of employment per unit area, and \({\mathbf {a}}_{t}\) is the level of efficiency of labour at time t, so that the product \({\mathbf {e}} _{t}^{d}{\mathbf {a}}_{t}\) is the number of labour efficiency units. This is related to \(\widetilde{{\mathbf {q}}}_{t}\), which is a measure of output in the competitive final goods and services sector in each region at time t, via the constant parameters \(\phi\) and \({\widetilde{\gamma }}\), thus

$$\begin{aligned} \widetilde{{\mathbf {q}}}_{t}=\phi ({\mathbf {e}}_{t}^{d}{\mathbf {a}}_{t})^{ {\widetilde{\gamma }}} \end{aligned}$$
(1)

In order to obtain total output \({\mathbf {q}}_{t}\), it is assumed that \(\widetilde{{\mathbf {q}}}_{t}=\varvec{\pi } \mathbf {q}_{t}\), in which \(\varvec{\pi }\) is an N by 1 vector giving the share of total output in each region that is competitive final goods and services output. For simplicity of estimation, it is assumed that \(\varvec{\pi }\) is constant over time. Also the employment levels are \({\mathbf {e}}_{t}={\mathbf {he}}_{t}^{d}\) in which \({\mathbf {h}}\) is the area of land in each region. Taking logs gives

$$\begin{aligned} \ln \varvec{\pi }+\ln {\mathbf {q}}_{t}=\ln \phi +{\widetilde{\gamma }}\ln {\mathbf {e}}_{t}^{{}}+{\widetilde{\gamma }}\ln {\mathbf {a}}_{t}-{\widetilde{\gamma }} \ln {\mathbf {h}} \end{aligned}$$
(2)

Rearranging (2) gives

$$\begin{aligned} \ln {\mathbf {e}}_{t}=\frac{1}{{\widetilde{\gamma }}}\left( \ln \varvec{\pi }+\ln {\mathbf {q}}_{t}-\ln \phi \right) -\ln {\mathbf {a}}_{t}+\ln {\mathbf {h}} \end{aligned}$$

To obtain (3), I assume that labour efficiency \({\mathbf {a}}_{t}=\frac{ {\mathbf {q}}_{t}}{\widetilde{{\mathbf {k}}}_{t}}\), with more efficient labour having a higher level of output per unit of capital \(\widetilde{{\mathbf {k}}} _{t}\). As shown below in Eq. (14), an approximation to the log level of capital is \(\ln \widetilde{{\mathbf {k}}}_{t}=-\ln {\widetilde{a}}+ {\widetilde{b}}\ln {\mathbf {k}}_{t}\), hence \(\ln {\mathbf {a}}_{t}=\ln {\mathbf {q}} _{t}+\ln {\widetilde{a}}-{\widetilde{b}}\ln {\mathbf {k}}_{t}\), and from this

$$\begin{aligned} \ln {\mathbf {e}}_{t}=\frac{1}{{\widetilde{\gamma }}}\left( \ln \varvec{\pi }+\ln {\mathbf {q}}_{t}-\ln \phi \right) +\ln {\mathbf {h}}-\ln {\mathbf {q}}_{t}-\ln {\widetilde{a}}+{\widetilde{b}}\ln {\mathbf {k}}_{t} \end{aligned}$$
(3)

Collecting together constants as c with \(\varvec{\iota }\) an N by 1 vector of ones, and reorganizing gives

$$\begin{aligned} \ln {\mathbf {e}}_{t}=c\varvec{\iota }+\frac{1-{\widetilde{\gamma }}}{\widetilde{ \gamma }}\ln {\mathbf {q}}_{t}+{\widetilde{b}}\ln {\mathbf {k}}_{t}+\varvec{ \varepsilon }_{t} \end{aligned}$$
(4)

in which the error term \(\varvec{\varepsilon }_{t}\) captures the time-invariant regional heterogeneity in land \({\mathbf {h}}\) and in shares \(\varvec{\pi }\), which are unobserved, as given in Eq. (12).

In the dynamic context, it is reasonable to assume that disparities in employment levels across locations will persist as an equilibrium outcome to unchanging and fundamental causes. We therefore proceed, following in parallel the exposition in Baltagi et al. (2018), to assume that (log) employment levels across regions, denoted by the N by 1 vector \(\ln {\mathbf {e}}_{t}\) at time t will persist at dynamically stable levels so that \(\ln {\mathbf {e}}_{t}=\ln {\mathbf {e}}_{t-1}\) unless there are changes in the levels of \({\mathbf {q}}_{t}\) or \({\mathbf {k}}_{t}\), or changes in common factors, interregional trade, or unobserved effects. If such a disturbance occurs at time t and is ephemeral, then \(\ln {\mathbf {e}}_{t}\ne \ln {\mathbf {e}}_{t-1}\) but over a subsequent period of quiescence \(t\rightarrow T\) then once again we expect employment levels to converge on a new equilibrium, at which \(\ln {\mathbf {e}}_{T}=\ln {\mathbf {e}}_{T-1}.\) Assume data are observed where \(\ln {\mathbf {e}}_{t}\ne \ln {\mathbf {e}}_{t-1}\) but tending to converge, so that \(\ln {\mathbf {e}}_{t}=f(\ln {\mathbf {e}}_{t-1})\), and an autoregressive process is assumed, hence

$$\begin{aligned} \ln {\mathbf {e}}_{t}=\varvec{\varsigma }+\gamma \ln {\mathbf {e}}_{t-1} \end{aligned}$$
(5)

in which \(\varvec{\varsigma }\) is an N by 1 vector and \(\gamma\) is a scalar parameter. In the long-run with \(abs(\gamma )<1\), and with no subsequent disturbances, the process converges to \(\ln {\mathbf {e}}_{T}= \frac{\varvec{\varsigma }}{\left( 1-\gamma \right) }\).

Consider next connectivity between regions in the form of a matrix \(\mathbf {W }_{N}^{*}\), which is a time-invariant N by N matrix, where N is the number of regions. Spatial interdependence between regions is a feature of many different situations, and can be modelled either via an autoregressive process involving the dependent variable, or via spatial interdependence of the errors, or by both as in this paper. The problem of how to model dependence between N regions is typically resolved by application of an N by N matrix of constant quantitative values or weights assigned to the cells of \({\mathbf {W}}_{N}^{*}\) which indicate the existence and importance of a link between each pair of regions. In many spatial econometric applications, connectivity of the N regions will be some function of the distance between them, be it geographical distance or some measure of economic distance. In the paper, we proxy economic distance by the level of trade between each pair of regions, more trade equals shorter economic distance. Usefully, \({\mathbf {W}}_{N}^{*}\) provides a parsimonious parametrization for interdependence between, in this case, employment levels in different regions. As explained by LeSage and Pace (2009), once we allow for dependence relations between a set of N by N entities on a single variable, for example as represented by the N by 1 vector \(\ln {\mathbf {e}}_{t}\), there are potentially \(N^{2}\)\(-N\) parameters that define individual interdependence, such as the relation between \(\ln {\mathbf {e}}_{it}\) and \(\ln {\mathbf {e}}_{jt}\) , having excluded dependence of an observation on itself. This leads to an over-parametrization problem, which can be solved by imposing an a priori structure, or weights matrix \({\mathbf {W}}_{N}^{*}\), on the interdependence relations, thus reducing the number of parameters to be estimated from \(N^{2}\)\(-N\) to one, denoted here by \(\rho _{1}\). For purposes of interpreting parameter estimates, we normalize by dividing \({\mathbf {W}}_{N}^{*}\) by the maximum eigenvalue of \({\mathbf {W}}_{N}^{*}\) to giveFootnote 2\({\mathbf {W}}_{N}\). Using this normalization, the maximum eigenvalue of \({\mathbf {W}}_{N}\) is 1, and the continuous range for which \(\left( {\mathbf {I}}_{N}-\rho _{1}{\mathbf {W}} _{N}\right)\) is non-singular is \(\frac{1}{\min (eig)}<\rho _{1}<\frac{1}{ \max (eig)}=1\),in which \(\rho _{1}\)is a scalar spatial autoregressive parameter.

Multiplying (5) by \(\rho _{1}{\mathbf {W}}_{N}\) gives

$$\begin{aligned} \rho _{1}{\mathbf {W}}_{N}\ln {\mathbf {e}}_{t}=\rho _{1}{\mathbf {W}}_{N}\varsigma +\rho _{1}{\mathbf {W}}_{N}\gamma \ln {\mathbf {e}}_{t-1} \end{aligned}$$
(6)

Subtracting (6) from (5) leads to another logically consistent expression, in which the spatial dependence implied by (6) can be seen in (7) as an explicit cause of variation in \(\ln {\mathbf {e}}_{t}\). Thus,

$$\begin{aligned} \ln {\mathbf {e}}_{t}-\rho _{1}{\mathbf {W}}_{N}\ln {\mathbf {e}}_{t}&= \varvec{ \varsigma }+\gamma \ln {\mathbf {e}}_{t-1}-\left( \rho _{1}{\mathbf {W}}_{N} \varvec{\varsigma }+\rho _{1}{\mathbf {W}}_{N}\gamma \ln {\mathbf {e}}_{t-1}\right) \\ \left( {\mathbf {I}}_{N}-\rho _{1}{\mathbf {W}}_{N}\right) \ln {\mathbf {e}} _{t}&= \left( \gamma {\mathbf {I}}_{N}-\rho _{1}\gamma {\mathbf {W}}_{N}\right) \ln {\mathbf {e}}_{t-1}+\left( {\mathbf {I}}_{N}-\rho _{1}{\mathbf {W}}_{N}\right) \varsigma \end{aligned}$$

Writing \(\theta =-\rho _{1}\gamma\) gives

$$\begin{aligned} \ln {\mathbf {e}}_{t}={\mathbf {B}}_{N}^{-1}\left[ {\mathbf {C}}_{N}\ln {\mathbf {e}} _{t-1}+{\mathbf {B}}_{N}\varvec{\varsigma }\right] \end{aligned}$$
(7)

in which \({\mathbf {B}}_{N}=\left( {\mathbf {I}}_{N}-\rho _{1}{\mathbf {W}} _{N}\right) ,{\mathbf {C}}_{N}=(\gamma {\mathbf {I}}_{N}+\theta {\mathbf {W}}_{N})\) and \({\mathbf {I}}_{N}\) is an identity matrix of order N. In order to solve Eq. (7), given appropriate parameter restrictions, Eq. (7) converges to \(\ln {\mathbf {e}}_{T}=\left( {\mathbf {B}}_{N}-{\mathbf {C}} _{N}\right) ^{-1}{\mathbf {B}}_{N}\varvec{\varsigma }\).

Introducing additional covariates by writing \({\mathbf {B}}_{N}\varsigma =\left( c\varvec{\iota }+\mathbf {x\beta }\right)\), in which \(c\varvec{\iota }\) is a constant N by 1 vector, \({\mathbf {x}}\) is an N by k matrix and \(\varvec{\beta }\) is a k by 1 vector, gives

$$\begin{aligned} \ln {\mathbf {e}}_{t}={\mathbf {B}}_{N}^{-1}\left[ {\mathbf {C}}_{N}\ln {\mathbf {e}} _{t-1}+c\varvec{\iota }+\mathbf {x\beta }\right] \end{aligned}$$

In order to maintain dynamically stable simulations, following Elhorst (2001, 2014, p. 98), Parent and LeSage (2011, p. 478; 2012, p. 731) and Debarsy et al. (2012, p. 162), the largest characteristic root \(\left( e_{\max }\right)\) of \({\mathbf {B}}_{N}^{-1} {\mathbf {C}}_{N}\) should be less than 1. This restriction ensures that employment converges to equilibrium levels

$$\begin{aligned} \ln {\mathbf {e}}_{T}=\left( {\mathbf {B}}_{N}-{\mathbf {C}}_{N}\right) ^{-1}\left( c \varvec{\iota }+\mathbf {x\beta }\right) \end{aligned}$$
(8)

Additional realism is introduced in three ways. First, the restriction that \(\theta =-\rho _{1}\gamma\) is removed since this greatly simplifies estimation. However, we anticipate that \({\widehat{\theta }}\approx -\widehat{ \rho }_{1}{\widehat{\gamma }}.\) Second, taking account of the variables in Eq. (4), the time-invariant matrix \({\mathbf {x}}\) is replaced by time-varying matrixFootnote 3\({\mathbf {x}}_{t}\). Third, spatially dependent unobservables are represented by the error term \(\varepsilon _{t}\). Although the system may, depending on \({\mathbf {B}}_{N}^{-1}{\mathbf {C}}_{N}\), still tend towards equilibrium, equilibrium will be continuously disturbed and new equilibrium levels established as t varies. For simplicity of estimation, interregional connectivity is assumed to remain constant over the estimation period. These considerations lead to the model of employment levelsFootnote 4 given in Eqs. (9,10,11,12), which is a time-space dynamic panel data model, thus

$$\begin{aligned} \ln {\mathbf {e}}_{t}={\mathbf {B}}_{N}^{-1}\left[ {\mathbf {C}}_{N}\ln {\mathbf {e}} _{t-1}+c\varvec{\iota }+{\mathbf {x}}_{t}\varvec{\beta }+\varvec{\varepsilon } _{t}\right] \end{aligned}$$
(9)

Given \({\mathbf {x}}_{1t}=\ln {\mathbf {q}}_{t}\), \({\mathbf {x}}_{2t}=\ln {\mathbf {k}} _{t}\), \({\mathbf {x}}_{3t}=\ln \overline{{\mathbf {e}}}_{t},{\mathbf {x}}_{t}=\left[ {\mathbf {x}}_{1t\text { \ }}{\mathbf {x}}_{2t}\text { }{\mathbf {x}}_{3t}\right]\) and \(\varvec{\beta }=\left[ \text { }\beta _{1}\text { }\beta _{2}\text { }\beta _{3}\right] ^{T}\), Eq. (9) can be stated more explicitly as

$$\begin{aligned}\ln {\mathbf {e}}_{t} &=c\varvec{\iota }+\gamma \ln {\mathbf {e}}_{t-1}+\rho _{1} {\mathbf {W}}_{N}\ln {\mathbf {e}}_{t}+\beta _{1}\ln {\mathbf {q}}_{t} \nonumber \\&\quad +\cdots \beta _{2}\ln {\mathbf {k}}_{t}+\beta _{3}\ln \overline{{\mathbf {e}}} _{t}+\theta {\mathbf {W}}_{N}\ln {\mathbf {e}}_{t-1}+\varvec{\varepsilon }_{t} \end{aligned}$$
(10)
$$\begin{aligned}&\varvec{\varepsilon }_{t} ={\mathbf {u}}_{t}-\rho _{2}{\mathbf {M}}_{N}{\mathbf {u}} _{t} \end{aligned}$$
(11)
$$\begin{aligned}&u_{it} =\mu _{i}+\nu _{it} \quad i=1,\ldots ,N,t=1,\ldots ,T \nonumber \\&\quad \mu _{i} \sim iid(0,\sigma _{\mu }^{2}) \nonumber \\&\quad \nu _{it} \sim iid(0,\sigma _{\nu }^{2}) \end{aligned}$$
(12)

The presence of the district-invariant mean of the dependent variable \(\overline{{\mathbf {e}}}_{t}\) attempts to allow for the presence of observed or unobserved common factors affecting all districts at each point in time. This approach is motivated by Pesaran (2015) who provides a major treatise on the different approaches to modelling dynamic spatial panel data with common factors, and by Bailey et al. (2016) who ask, ‘to what extent are the observed dependencies between different spatial units due to common factors—for example, aggregate shocks—that affect different units rather than being the result of local interactions that generate spatial spillover effects?’. They propose the use of cross-unit averages to extract common factors, an approach that has also been applied by Fingleton et al. (2018) and Fingleton and Szumilo (2019). The introduction of common factors to spatial econometric models has also been considered by among others Vega and Elhorst (2016) and Ertur and Musolesi (2017).

The disturbances \(\varvec{\varepsilon }_{t}\) capture the effects of the spatially dependent unobserved variables, with a compound structure (12) comprising time-invariant unobserved unit-specific interregional heterogeneity represented by \(\mu _{i}\) with \(i=1,\ldots ,N\) and unobserved idiosyncratic shocks represented by \(\nu _{it};i=1,\ldots ,N,t=1,\ldots ,T\). These are assumed to be independent of each other and are collectively represented by \(u_{it}\). It is important to recognize that the \(\mu _{i}s\) represent the net effect of unobserved variables which in the short run can be treated as time invariant.

Most usually, the assumption is that spatial dependence is an autoregressive (SAR-RE) process, such that \(\varvec{\varepsilon }_{t}=\rho _{2}{\mathbf {M}} _{N}\varvec{\varepsilon }_{t}+{\mathbf {u}}_{t}\). However, in this paper the assumption for the error process is a spatial moving average process (SMA-RE) as in Eq. (11), thus \(\varvec{\varepsilon }_{t} = {\mathbf {G}}_{N}{\mathbf {u}}_{t}\), where \({\mathbf {G}}_{N}=\left( {\mathbf {I}} _{N}-\rho _{2}{\mathbf {M}}_{N}\right)\). This means that the error process is such that a shock in a region affects only neighbouring regions as defined by a row-standardized interregional contiguity matrixFootnote 5\({\mathbf {M}}_{N}\). In contrast, an SAR-RE process would entail shocks affecting all regions. There are two reasons for this. First, assuming SMA-RE rather than SAR-RE errors improves the predictive performance of the estimator, as described in Sect. 6. Second, SMA-RE errors might proxy for omitted spillovers, which otherwise might be captured by the spatial lags \({\mathbf {W}}_{N}{\mathbf {x}}_{t}\mathbf {.}\) This is pertinent since the presence of \({\mathbf {W}}_{N}{\mathbf {x}}_{t}\) on the right-hand side of (9) could adversely affect estimation. As explained by Fingleton et al. (2017) and Baltagi et al. (2018), an SMA-RE error specification ‘mitigates against the problem for instrumental variable estimation identified by Pace et al. (2012)’. In two-stage least squares (2SLS) estimation, the instrument set should comprise the ‘exogenous’ variables (\({\mathbf {x}}_{t}\)) and their spatial lags (\({\mathbf {W}} _{N}{\mathbf {x}}_{t}\)), and kept to a low order to avoid linear dependence and retain full column rank for the matrix of instruments (1998, Kelejian and Prucha 1999). The performance of the estimation procedure could be suboptimal, as explained by Pace et al. (2012), by including \({\mathbf {W}} _{N}{\mathbf {x}}_{t}\) among the set of explanatory variables. This is because with spatial lags (\({\mathbf {W}}_{N}{\mathbf {x}}_{t}\)) among the set of regressors, then spatial lags of the spatial lags (\({\mathbf {W}}_{N}^{2} {\mathbf {x}}_{t},{\mathbf {W}}_{N}^{3}{\mathbf {x}}_{t}\), . . .) feature among the instruments, and this could lead to a weak instrument problem. To avoid this, SMA-RE errors are adopted as an alternative way to capture local spillovers.

3 Data

In estimating Eq. (10), data for employment (\({\mathbf {e}}_{t}\)), output as measured by Gross Value Added (GVA, \({\mathbf {q}}_{t})\) and capital as proxied by a function of Gross Fixed Capital Formation (GFCF, \({\mathbf {k}}_{t}\)), both denominated in €2005m, are taken from the Cambridge Econometrics European Regional Economic database, with observations over the 10-year period from 2001 to 2010 used to estimate the model. Data are also available for \(2011{-}2012\), but are held back to allow out-of-sample prediction tests of the model and some rivals. \({\mathbf {k}}_{t}\) is used to reflect capital stock \(\widetilde{{\mathbf {k}}}_{t}\), for which data are unavailable, on the basis of a simple relationship which is assumed to exist between the two variables. \({\mathbf {k}}_{t}\) measures gross net investment (acquisitions minus disposals of produced fixed assets) in fixed capital assets and so provides an indicator of changes to the stock of capital. The assumption is that \({\mathbf {k}}_{t}\) is a nonlinear function of a constant fraction \({\widetilde{a}}\) of \(\widetilde{{\mathbf {k}}}_{t}\) so that

$$\begin{aligned} {\mathbf {k}}_{t}=\left( {\widetilde{a}}\widetilde{{\mathbf {k}}}_{t}\right) ^{\frac{ 1}{{\widetilde{b}}}} \end{aligned}$$
(13)

hence

$$\begin{aligned} \widetilde{{\mathbf {k}}}_{t}=\frac{1}{{\widetilde{a}}}\left( {\mathbf {k}}_{t}^{ {\widetilde{b}}}\right) \end{aligned}$$
(14)

As a test of the viability of this approximation, assume a standard model for the evolution of capital stock which is depreciating at a constant rate \({\widetilde{d}}\) so that

$$\begin{aligned} \widetilde{{\mathbf {k}}}_{t}={\mathbf {k}}_{t}+(1-{\widetilde{d}})\widetilde{ {\mathbf {k}}}_{t-1}; \quad t=2,\ldots ,T \end{aligned}$$
(15)

in which T is a large number. One problem with (15) is that it requires the initial capital stock at time \(t=1\), i.e. \(\widetilde{{\mathbf {k}}} _{1}.\) However, given arbitrary values for \(\widetilde{{\mathbf {k}}}_{1}\) and \({\widetilde{d}}\), values for \({\widetilde{a}}\) and \({\widetilde{b}}\) can be found, whereby (14) provides a reasonable approximation to the outcome of iterations (15). A more realistic test is provided by the existence of both (albeit experimental estimates of) capital stockFootnote 6 (Derbyshire et al. 2010) and of well-founded GFCF data. Using the latest available data for both \({\mathbf {k}}_{t}\) and \(\widetilde{{\mathbf {k}}}_{t}\), which is for the year, \(t=2008\), and taking logs of (14), leads to a loglinear regression of \(\ln \widetilde{{\mathbf {k}}}_{t}\) on \(\ln {\mathbf {k}} _{t}\) which gives OLS estimates of the constant \(\ln {\widetilde{a}} ^{-1}=2.4546\) (t\(\text {ratio}=13.5628\)) and slope \({\widetilde{b}}=1.0195\,(t\, \text {ratio}=50.8118)\), with \(R^{2}=0.8888.\) The plot of \(\ln {\mathbf {k}}_{t}\) against \(\ln \widetilde{{\mathbf {k}}}_{t}\) shows a significant linear relationship and no evidence of outliers or of heteroscedasticity It thus appears that the model given as Eq. (13) provides a good approximation. The estimated \({\widetilde{a}}=0.0859\) suggests the approximate proportion of the capital stock that is invested, and, by comparison, \(\sum {\mathbf {k}}_{t} / \sum \widetilde{{\mathbf {k}}}_{t}=0.0686\).

The matrix \({\mathbf {W}}_{N}\) is based on estimated bilateral trade flows between EU NUTS2 regions. The data come from the PBL (the Netherlands Environmental Assessment Agency)Footnote 7 who developed a new methodology which is close to that of Simini et al. (2012). Details of the methodology are given in Thissen et al. (2013, 2013a, b), see also Gianelle (2014). The method follows a top-down approach and therefore is consistent with the national accounts of the different countries. Given the total international exports and imports on the country level, interregional trade flows are derived using data on business travel (services) and on freight transport (goods). Additionally, exports that went to EU destination countries’ final demandFootnote 8 were also included. Trade flows involving regions of non-EU countries such as Switzerland and Norway were obtained on the basis of interregional trade flows estimated by the best linear disaggregation method of Chow and Lin (1971), which was initially used to break down annual time series into quarterly series (see Abeysinghe and Lee 1998; Doran and Fingleton 2014). In this, commencing with aggregate trade valuesFootnote 9 between 21 EU counties, these were allocated to the NUTS2 regions. A parallel approach has been used by Polasek et al. (2010), Vidoli and Mazziotta (2010), and Fingleton et al. (2015). More details of the method are provided in “Appendix”. Finally, OLS regression of the log PBL trade flows on log Chow–Lin trade flows produced parameters used to predict the missing PBL regional trade flows for Switzerland and Norway using the values for these regions obtained via the Chow–Lin approach. For estimation, the start-of-period trade flows for the year 2000 are used. This year is chosen because it is the earliest available, so it is treated as exogenous to \({\mathbf {e}}_{t}\), \({\mathbf {q}}_{t}\) and \({\mathbf {k}}_{t}\), for \(t=2001\) to 2010. Prediction is based on the 2010 trade flows supplemented in the same way by Chow–Lin data. Estimates are also given in “Appendix” Table A3 based on a \({\mathbf {W}}_{N}\) matrix constructed entirely from the Chow–Lin trade flows. These simply use great circle distances and year 2000 GVA levels, and so are also assumed to be exogenous. The comparative predictive performance of each set of estimates is discussed in Sect. 6.

4 Estimator for the time-space dynamic panel data model

Comprehensive overviews of spatial panel econometrics are given by Pesaran (2015, Chapters 29 and 30) and Baltagi (2013, Chapter 13) which highlight its growing importance for the applied econometrician. The estimator used in this paper, introduced by Baltagi et al. (2018), adds to the available methodology by allowing a wider range of spatial interaction effects which include the spatial lag of the temporal lag of the dependent variable \(W_{N}\ln {\mathbf {e}}_{t-1}\), thus avoiding bias due to constraints necessary for dynamic stability and stationarity, and also by allowing spatial moving average compound error dependence rather than the usual autoregressive compound error process found in the majority of spatial econometric models. The estimator, which is applied to Eq. (9), is based on the earlier paper by Baltagi et al. (2014), which extends the approach of Arellano and Bond (1991) by the introduction of extra moments in line with the presence and availability of spatial lags (see also Bouayad-Agha and Védrine 2010). Since the estimator is described elsewhere, a simple outline sketch is provided here focussing on the treatment of regressors as predetermined rather than exogenous.Footnote 10 Hence in Eq. (10), \(\ln {\mathbf {q}}_{t}\) and \(\ln {\mathbf {k}}_{t}\) are considered to be predetermined alongside endogenous right-hand side variables \(\ln {\mathbf {e}}_{t-1},{\mathbf {W}}_{N}\ln {\mathbf {e}} _{t-1}\)and \({\mathbf {W}}_{N}\ln {\mathbf {e}}_{t}\) and \(\ln \overline{{\mathbf {e}}} _{t}\).

Focussing on the endogenous dependent variable \(\ln {\mathbf {e}}_{t}\), the instruments include \(\ln {\mathbf {e}}_{t}\) lagged by two periods, and its spatial lag \({\mathbf {W}}_{N}\ln {\mathbf {e}}_{t}\) also lagged by two periods, so that the moments equations (16 and 17) hold assuming \(\nu _{it}\) is serially uncorrelated and \(E(\Delta \nu _{it},\Delta \nu _{it-2})=0\). Thus, following Baltagi et al. (2018), with, we have

$$\begin{aligned} E\left( \ln e_{il}\Delta \nu _{it}\right)&= 0\ \ \ \forall i,l=1,2,\ldots ,T-2;t=3,4,\ldots T \end{aligned}$$
(16)
$$\begin{aligned} E\left( \sum \limits _{i\ne j}w_{ij}\ln e_{il}\Delta \nu _{it}\right)&= 0\ \ \ \forall i,l=1,2,\ldots ,T-2;t=3,4,\ldots T \end{aligned}$$
(17)

in which E denotes the expectation. Also, if we were to assume exogenous rather than predetermined regressors \(\left( {\mathbf {x}}_{1},{\mathbf {x}} _{2}\right)\) this leads to (18)

$$\begin{aligned} {\mathbf {Z}}_{t}=\left( \begin{array}{c} \ln {\mathbf {e}}_{1},\ldots ,\ln {\mathbf {e}}_{t-2,}{\mathbf {W}}_{N}\ln {\mathbf {e}} _{1},\ldots ,{\mathbf {W}}_{N}\ln {\mathbf {e}}_{t-2},{\mathbf {x}}_{11},\ldots ,{\mathbf {x}} _{1T}, \\ {\mathbf {x}}_{21},\ldots ,{\mathbf {x}}_{2T},{\mathbf {W}}_{N}{\mathbf {x}}_{11},\ldots , {\mathbf {W}}_{N}{\mathbf {x}}_{1T},{\mathbf {W}}_{N}{\mathbf {x}}_{21},\ldots ,{\mathbf {W}} _{N}{\mathbf {x}}_{2T}, \\ {\mathbf {x}}_{31},\ldots ,{\mathbf {x}}_{3t-2,}\mathbf {Wx}_{31},\ldots ,{\mathbf {W}}_{N} {\mathbf {x}}_{3t-2,} \end{array} \right) \end{aligned}$$
(18)

for \(t=3,\ldots ,T\). Given that in (18) the regressors \(\left( {\mathbf {x}} _{1},{\mathbf {x}}_{2}\right)\) are exogenous, the moments equations are satisfied including the entire set

\({\mathbf {x}}_{11},\ldots ,{\mathbf {x}}_{1T},{\mathbf {x}}_{21},\ldots ,{\mathbf {x}}_{2T}, {\mathbf {W}}_{N}{\mathbf {x}}_{11},\ldots ,{\mathbf {W}}_{N}{\mathbf {x}}_{1T}\) and \({\mathbf {W}}_{N}{\mathbf {x}}_{21},\ldots ,{\mathbf {W}}_{N}{\mathbf {x}}_{2T}\) regardless of time t. As explained in Baltagi et al. (2018), additional instruments can be generated via the matrix \({\mathbf {W}}_{N}^{2}\), but for simplicity these are omitted from the estimators used in the current paper.

Strict exogeneity rules out any feedback from past shocks to current values of the variable, and the need to accommodate feedback leads to the preferred estimator based on predetermined regressors (see Bond 2002; Pesaran 2015). Predetermined regressors are contemporaneously uncorrelated, so that \(corr({\mathbf {x}}_{t}\), \(\varvec{\nu }_{t})=0,\) but do depend on earlier shocks so that, for example, \(corr({\mathbf {x}}_{t}\), \(\varvec{\nu }_{t-1})\ne 0\). This means that an adjustment to \(\ln {\mathbf {e}},\)which embodies \(\varvec{ \nu },\) at time t does not have an instantaneous effect on output and capital investment time t but takes effect at \(t+1\) and later. This allows an extension to the set of instruments (compared with assuming endogeneity, where all endogenous variables are lagged by two periods), by the inclusion of \({\mathbf {x}}_{1t-1},{\mathbf {x}}_{2t-1},{\mathbf {W}}_{N}{\mathbf {x}}_{1t-1}\), and \({\mathbf {W}}_{N}{\mathbf {x}}_{2t-1}\) so that

$$\begin{aligned} {\mathbf {Z}}_{t}=\left( \begin{array}{c} \ln {\mathbf {e}}_{1},\ldots ,\ln {\mathbf {e}}_{t-2,}{\mathbf {W}}_{N}\ln {\mathbf {e}} _{1},\ldots ,{\mathbf {W}}_{N}\ln {\mathbf {e}}_{t-2}, \\ {\mathbf {x}}_{11},\ldots {\mathbf {x}}_{1t-2},{\mathbf {x}}_{1t-1},{\mathbf {x}}_{21},\ldots {\mathbf {x}}_{2t-2},{\mathbf {x}}_{2t-1},{\mathbf {x}}_{31},\ldots {\mathbf {x}}_{3t-2}, \\ {\mathbf {W}}_{N}{\mathbf {x}}_{11},\ldots {\mathbf {W}}_{N}{\mathbf {x}}_{1t-2},{\mathbf {W}} _{N}{\mathbf {x}}_{1t-1},{\mathbf {W}}_{N}{\mathbf {x}}_{21},\ldots {\mathbf {W}}_{N}\mathbf { x}_{2t-2},{\mathbf {W}}_{N}{\mathbf {x}}_{2t-1}, \\ {\mathbf {W}}_{N}{\mathbf {x}}_{31},\ldots {\mathbf {W}}_{N}{\mathbf {x}}_{3t-2} \end{array} \right) \end{aligned}$$
(19)

Given the set of instruments as in Eq. (19), these are used to obtain initial estimates of \(\gamma ,\rho _{1},\theta\), \(\beta _{1},\beta _{2}\) and \(\beta _{3}\), having first differenced the data to eliminate the time-invariant individual effects \(\varvec{\mu }\) which are correlated with the time and space-lagged dependent variables. The resulting estimates are then used to give estimated errors which lead to estimates of the parameters of the spatial moving average error process, namely \(\rho _{2},\sigma _{\mu }^{2}\) and \(\sigma _{\nu }^{2}\) using moments equations given in Fingleton (2008). Given these, preliminary one-stage consistent spatial GM estimates are obtained, followed by the two-stage Spatial GM estimates of \(\gamma ,\rho _{2},\theta\) and \(\varvec{\beta }\) based on a robust version of the variance-covariance matrix.

5 Estimates

Table 1 Estimates of Eq. 9

Table 1 shows that the \(\theta\) estimate for the spatial lag of the temporal lag (\({\mathbf {W}}_{N}\ln {\mathbf {e}}_{t-1})\) is not dissimilar to \(- {\widehat{\gamma }}{\widehat{\rho }}_{1}\), in line with expectation stemming from an equilibrium process. Also, Table 1 estimates are stationary and dynamically stable, as shown by the largest characteristic root of \(\mathbf {B }_{N}^{-1}{\mathbf {C}}_{N}\) which is equal to 0.6874, and the stationary bounds for \(\rho _{2}\) are \({\widetilde{e}}_{\min }^{-1}=-1.1239<\ \rho _{2}< {\widetilde{e}}_{\max }^{-1}=1.\) Observe that the negative values of \(\ {\widehat{\rho }}_{2}\) imply positive spatial dependence among the errors. Among the instrument set, we have several endogenous variables, one is the dependent variable lagged by two periods, \(\ln {\mathbf {e}}_{t-2}\) , its spatial lag lagged by two periods \({\mathbf {W}}_{N}\ln {\mathbf {e}}_{t-2},\ln \overline{{\mathbf {e}}}_{t-2}\) and \({\mathbf {W}}_{N}\ln \overline{{\mathbf {e}}} _{t-2}\). To satisfy the orthogonality conditions and moments equations for these instruments, we require a lack of serial correlation in the \(\nu _{it}\), in other words we need to satisfy the assumption that \(E\left( \Delta \nu _{it},\Delta \nu _{it-2}\right) =0\). Arellano and Bond (1991) give a test \(m_{2}=cov\left( \Delta \nu _{it},\Delta \nu _{it-2}\right) /s.e\). which is asymptotically N(0, 1) under the null of no serial correlation. In our case \(m_{2}=-\,0.9389\) with two-tailed p value equal to 0.3478. Thus, we assume that there is an absence of serial correlation as required. Note also that \(m_{1}=cov\left( \Delta \nu _{it},\Delta \nu _{it-1}\right) /s.e\,=-\,6.22\) indicating significant first-order serial correlation as one would expect, since if the \(\nu _{it}\) are serially uncorrelated, \(\Delta \nu _{it}\) has first order moving average serial correlation. A second complementary approach to testing the validity of the instrument set is via the application of the Sargan–Hansen test of over-identifying restrictions, which is equal to 253.3. This is insignificant when referred to the \(\chi _{314}^{2}\) distribution, and while this evidently supports the moments conditions implied by our dynamic spatial panel model, one should be cautious because it may have low power, given the presence of many moments conditions (Bowsher 2002; Pesaran 2015).

Appendix” Table 4 gives the estimates of some rival estimators, including one with SMA-RE errors but assuming exogenous regressors (Table A1), and with SAR-RE errors assuming predetermined regressors (Table A2). As noted in Sect. 6, the predictive ability of these rivals is not as good as obtained via the preferred estimates summarized in Table 1.

6 Prediction

In order to support the preferred model summarized by Table 1, a cross-validation strategy is employed to assess the performance of competing estimators ‘by comparing their predictive ability on data which have not been used in model estimation’ (Anselin 1988). Out-of-sample predictions of the level of employment across regions are obtained for the years 2011 and 2012 using 2011 and 2012 data combined with the parameter estimates obtained for data over the estimation period from 2001 to 2010.

Following Chamberlain (1984), Sevestre and Trognon (1996), and Baltagi et al. (2014, 2018), the linear predictor is

$$\begin{aligned} E\left[ \ln {\mathbf {e}}_{t}\right] ={\mathbf {B}}_{N}^{-1}\left[ {\mathbf {C}}_{N}E \left[ \ln {\mathbf {e}}_{t-1}\right] +{\mathbf {x}}_{t}\varvec{\beta }+c\varvec{ \iota }+{\mathbf {G}}_{N}E\left[ {\mathbf {u}}_{t}\right] \right] \end{aligned}$$
(20)

in which \(E\left[ .\right]\) denotes the expectation, so this can be seen to be identical to Eq. (9) but with expectations. With regard to the estimate of the time-invariant component of the error term \(\varvec{ \mu }\), assuming a spatial moving average error process gives Eq. (9) rewritten thus

$$\begin{aligned} \varvec{\varepsilon }_{t}&= {\mathbf {B}}_{N}\ln {\mathbf {e}}_{t}-{\mathbf {C}} _{N}\ln {\mathbf {e}}_{t-1}-{\mathbf {x}}_{t}\varvec{\beta }-c\varvec{\iota } \nonumber \\ {\mathbf {G}}_{N}{\mathbf {u}}_{t}&= {\mathbf {B}}_{N}\ln {\mathbf {e}}_{t}-C_{N}\ln {\mathbf {e}}_{t-1}-{\mathbf {x}}_{t}\varvec{\beta }-c\varvec{\iota } \end{aligned}$$
(21)
$$\begin{aligned} {\mathbf {u}}_{t}&= \varvec{\mu }+\varvec{\nu }_{t} \nonumber \\ \varvec{\mu }^{(t)}&= {\mathbf {G}}_{N}^{-1}\left( {\mathbf {B}}_{N}\ln {\mathbf {e}} _{t}-{\mathbf {C}}_{N}\ln {\mathbf {e}}_{t-1}-{\mathbf {x}}_{t}\varvec{\beta }-c \varvec{\iota }\right) -\varvec{\nu }_{t} \end{aligned}$$
(22)

In order to obtain estimates \(\widehat{\varvec{\mu }}^{(t)}\) estimates \(\widehat{{\mathbf {G}}}_{N}=\left( {\mathbf {I}}_{N}-{\widehat{\rho }}_{2}{\mathbf {M}} _{N}\right) ,\widehat{{\mathbf {B}}}_{N}=\left( {\mathbf {I}}_{N}-{\widehat{\rho }} _{1}{\mathbf {W}}_{N}\right) ,\widehat{{\mathbf {C}}}_{N}=\left( {\widehat{\gamma }}+ {\widehat{\theta }}{\mathbf {W}}_{N}\right)\) and \({\widehat{c}}\) and \(\widehat{ \varvec{\beta }}\) are used along with random draws from \(\varvec{\nu } _{t}\sim N(0,{\widehat{\sigma }}_{\nu }^{2}).\) We then take the mean over time of the \(\widehat{\varvec{\mu }}^{(t)}s\) for \(t=2,\ldots ,T\) , subsequently scaling so that it has variance equal to \({\widehat{\sigma }}_{\mu }^{2},\) thus giving the estimate \(\widehat{\varvec{\mu }}\) of the time-invariant error component. The outcome is the prediction Eq. (23) for \(T+1=2011,\) in which \({\mathbf {x}}_{1T+1}=\ln {\mathbf {q}}_{T+1}\), \({\mathbf {x}} _{2T+1}=\ln {\mathbf {k}}_{T+1},\) and \({\mathbf {x}}_{3T+1}=\ln \overline{\mathbf {e }}_{T+1},t=1,\ldots ,T.\)

$$\begin{aligned} \ln \widehat{{\mathbf {e}}}_{T+1}=\widehat{{\mathbf {B}}}_{N}^{-1}\left[ \widehat{ {\mathbf {C}}}_{N}\ln \widehat{{\mathbf {e}}}_{T}+{\mathbf {x}}_{T+1}\widehat{\varvec{ \beta }}+{\widehat{c}}\varvec{\iota }+\widehat{{\mathbf {G}}}_{N}\widehat{\varvec{ \mu }}\right] \end{aligned}$$
(23)

For two-step ahead,Footnote 11\({\mathbf {x}}_{1T+2}=\ln {\mathbf {q}}_{T+2},\)\({\mathbf {x}} _{2T+2}=\ln {\mathbf {k}}_{T+2}\) and \({\mathbf {x}}_{3T+2}=\ln \overline{{\mathbf {e}} }_{T+2}\). Figure 1 shows a close correlation between predicted log employment \(\ln \widehat{{\mathbf {e}}}_{T+1}\)and observed log employment, suggesting that the preferred estimator giving Table 1 estimates would be a good basis for simulating the impact on employment following Brexit.

Fig. 1
figure 1

Out-of-sample predictions for 2011

The preference for Table 1 estimates is based on the mean of the \(\text{RMSE}= \sqrt{\sum \nolimits _{i=1}^{N}\left( \ln e_{i,T+s}-\ln {\widehat{e}} _{i,T+s}\right) ^{2}/N}\) for \(s=1,2,\) denoted by \({\overline{{\text{RMSE}}}}\). In the case of Table 1, \({\overline{{\text{RMSE}}}}=0.0781\). Rival estimators (“Appendix” Table 4) give less accurate one- and two-step ahead predictions. In the case of assuming SMA-RE errors and exogenous regressors, \({\overline{{\text{RMSE}}}} = 0.1791\). Assuming SAR-RE errors with predetermined regressors gives \({\overline{{\text{RMSE}}}}=0.2890\). Note that in the case of SAR-RE errors, \(\widehat{{\mathbf {G}}}_{N}=\left( {\mathbf {I}}_{N}-{\widehat{\rho }}_{2}{\mathbf {M}} _{N}\right) ^{-1}\) in Eqs. (2021, 23). Table 4 also gives estimates relating to SMA-RE errors and predetermined regressors, but are based on \({\mathbf {W}}_{N}\) derived using the Chow–Lin approach. In this case \({\overline{{\text{RMSE}}}}= 0.2529\), providing support for the choice of \({\mathbf {W}}_{N}\) based on the PBL trade data. Table 4 also gives estimates based on SMA-RE errors and predetermined (and exogenous) regressors, but with the additional variables \({\mathbf {W}}_{N}{\mathbf {x}}_{1},\) the spatial lag of \(\ln {\mathbf {q}}\), and \({\mathbf {W}}_{N}{\mathbf {x}}_{2}\), the spatial lag of \(\ln k\), with \({\mathbf {W}}_{N}\) given by the PBL trade data. This is thus a form of spatial Durbin specification, but with regressors \({\mathbf {x}}_{t}=\left( {\mathbf {x}}_{1t},{\mathbf {x}}_{2t}, {\mathbf {W}}_{N}{\mathbf {x}}_{1t},{\mathbf {W}}_{N}{\mathbf {x}}_{2t}\right)\) the additional covariates evidently cause a problem of weak instruments, giving dynamically unstable nonstationary estimates, as reflected by the largest characteristic root of \({\mathbf {B}}_{N}^{-1}{\mathbf {C}}_{N}\) equal to 1.0663 (1.9041) and, with \({\mathbf {x}}\) in Eqs. (22) and (23) \({\overline{{\text{RMSE}}}}=7.4403\) (3.0918). The same spatial Durbin specification again assuming predetermined regressors but with \(\rho _{2}\) restricted to zero gives a largest characteristic root equal to 1.1127 and \({\overline{{\text{RMSE}}}}=3.3017\). The same spatial Durbin specification assuming exogenous regressors and with a spatial autoregressive (SAR) error process gives a largest characteristic root equal to 2.489 and \({\overline{{\text{RMSE}}}}= 20.9333\).These results point to the viability of Table 1 estimates for prediction purposes.

7 Simulating the Brexit effect

The approach adopted is to use the parameters estimates in Table 1 to predict the impact on employment of presumably reduced trade between the UK and the remaining EU regions in the year 2020 and beyond. Attention is focussed on 2020 and later, given that the UK’s formal exit from the EU is scheduled for the first half of 2019, so 2020 will be the first full year outside the EU. Given a lack of appropriate and accessible data, for instance with the same geography as up to 2011, beyond 2011 employment could be predicted on the basis of assumptions about the level of \(\mathbf {q},{\mathbf {k}}\) and \(\overline{{\mathbf {e}}}\) in 2020.

From \(\tau =2020\) onwards, there are two scenarios, one based on the trade flows assuming no-Brexit effect, and the other assuming a Brexit effect on trade flows, and the difference between them is taken as the Brexit effect. Regarding the no-Brexit effect scenario, this applies matrix \({\mathbf {W}}_{N}\) ,which is based on the latest available trade flows pertaining to the year 2010. The prediction is then given by the solution to Eq. (24) with \(\widehat{{\mathbf {B}}}_{N}=\left( {\mathbf {I}}_{N}-\widehat{ \rho }_{1}{\mathbf {W}}_{N}\right) ,\widehat{{\mathbf {C}}}_{N}=\left( \widehat{ \gamma }+{\widehat{\theta }}{\mathbf {W}}_{N}\right)\) and \(\widehat{{\mathbf {G}}} _{N}=\left( {\mathbf {I}}_{N}-{\widehat{\rho }}_{2}{\mathbf {M}}_{N}\right) .\) Also \({\mathbf {x}}_{\tau }\) is an \(\left( N\text { by }3\right)\) matrix containing the forward projections \(\ln {\mathbf {q}}_{\tau },\)\(\ln {\mathbf {k}}_{\tau }\) and \(\ln \overline{{\mathbf {e}}}\), thus

$$\begin{aligned} \ln \widehat{{\mathbf {e}}}_{\tau }=\widehat{{\mathbf {B}}}_{N}^{-1}\left[ \widehat{{\mathbf {C}}}_{N}\ln \widehat{{\mathbf {e}}}_{\tau -1}+{\mathbf {x}}_{\tau } \widehat{\varvec{\beta }}+{\widehat{c}}\varvec{\iota }+\widehat{{\mathbf {G}}}_{N} \widehat{\varvec{\mu }}\right] \end{aligned}$$
(24)

The second scenario is to assume that bilateral trade between the UK regions and the (remaining) EU regions is, for example, \(2\%\) lower than it would otherwise be. Thus, of the \(N=255\) UK plus EU regions, there are \(N^{2}-N=64,770\) bilateral trade flows in any one year involving the regions. With 37 UK regions and 218 EU regions (2 × 37 × \(218)=16{,}132\) interregional trade flows are assumed to be \(2\%\) smaller than under an assumption of no-Brexit effect. This Brexit-affected trade flow matrix is denoted by \(\widetilde{{\mathbf {W}}}_{N}^{{}}\) which leads to \(\widetilde{ {\mathbf {B}}}_{N}=\left( {\mathbf {I}}_{N}-{\widehat{\rho }}_{1}\widetilde{\mathbf {W }}_{N}\right) ,\widetilde{{\mathbf {C}}}_{N}=\left( {\widehat{\gamma }}+\widehat{ \theta }\widetilde{{\mathbf {W}}}_{N}\right)\) and the prediction equation

$$\begin{aligned} \ln \widetilde{{\mathbf {e}}}_{\tau }=\widetilde{{\mathbf {B}}}_{N}^{-1}\left[ \widetilde{{\mathbf {C}}}_{N}\ln \widetilde{{\mathbf {e}}}_{\tau -1}+{\mathbf {x}} _{\tau }\widehat{\varvec{\beta }}+{\widehat{c}}\varvec{\iota }+\widehat{ {\mathbf {G}}}_{N}\widehat{\varvec{\mu }}\right] \end{aligned}$$

Thus, the % job-shortfall at time \(\tau\) is \(\ln \widetilde{{\mathbf {e}}} _{\tau }-\ln \widehat{{\mathbf {e}}}_{\tau }\).

Using the equilibrium solution of (8), but also taking into account \({\mathbf {x}}\) at time \(\tau =T\) and \(\widehat{{\mathbf {G}}}_{N}\overline{\varvec{ \mu }}\), employment converges to

$$\begin{aligned} \ln \widehat{{\mathbf {e}}}_{T}=\left( \widehat{{\mathbf {B}}}_{N}^{{}}-\widehat{ {\mathbf {C}}}_{N}\right) ^{-1}\left[ {\mathbf {x}}_{T}\widehat{\varvec{\beta }}+ {\widehat{c}}\varvec{\iota }+\widehat{{\mathbf {G}}}_{N}\overline{\varvec{\mu }} \right] \end{aligned}$$

Similarly

$$\begin{aligned} \ln \widetilde{{\mathbf {e}}}_{T}=\left( \widetilde{{\mathbf {B}}}_{N}^{{}}- \widetilde{{\mathbf {C}}}_{N}\right) ^{-1}\left[ {\mathbf {x}}_{T}\widehat{\varvec{ \beta }}+{\widehat{c}}\varvec{\iota }+\widehat{{\mathbf {G}}}_{N}\overline{ \varvec{\mu }}\right] \end{aligned}$$

Thus, the % job-shortfall with long-run convergence at time T is \(\ln \widehat{{\mathbf {e}}}_{T}-\ln \widetilde{{\mathbf {e}}}_{T},\) hence

$$\begin{aligned} \ln \widehat{{\mathbf {e}}}_{T}-\ln \widetilde{{\mathbf {e}}}_{T}&= \left[ \left( \widehat{{\mathbf {B}}}_{N}-\widehat{{\mathbf {C}}}_{N}\right) ^{-1}\right] \left( {\mathbf {x}}_{T}\widehat{\varvec{\beta }}+{\widehat{c}}\varvec{\iota }+\widehat{ {\mathbf {G}}}_{N}\overline{\varvec{\mu }}\right) \\&-\left[ \left( \widetilde{{\mathbf {B}}}_{N}^{{}}-\widetilde{{\mathbf {C}}} _{N}\right) ^{-1}\right] \left( {\mathbf {x}}_{T}\widehat{\varvec{\beta }}+ {\widehat{c}}\varvec{\iota }+\widehat{{\mathbf {G}}}_{N}\overline{\varvec{\mu }} \right) \end{aligned}$$

One assumption might be that \(\mathbf {q}, {\mathbf {k}}\) and \(\overline{ {\mathbf {e}}}\) in 2020 are at the same level as observed in each region in 2011. An alternative assumption could be that from 2011 onwards they grow at their historical rates, taken over the period from 1991 to 2011 in each region. On this basis on average the level of \({\mathbf {q}}\) and \({\mathbf {k}}\) in 2025 is approximately \(25\%\) more than the 2011 levels. However, Table 2 gives simulation outcomes for the examples of Inner London and Paris which illustrate the relative insensitivity of job-shortfall to assumed regressor levels. The table shows that doubling the level of \({\mathbf {q}}\) and \({\mathbf {k}}\) gives equilibrium job-shortfalls that are only about 6% higher, as a result of the same increases applying to both Brexit and no-Brexit outcomes. Table 2 also shows that doubling the trade reduction in each region in effect doubles the job-shortfall. It shows that doubling the % trade reduction, say from \(2\) to \(4\%\), has the effect of doubling the job-shortfall in each region. Increasing the % reduction by a factor of 8, going from \(2\) to \(16\%\), increases each region’s job-shortfall by a factor of 8. This means that ratio of Inner London to Paris remains stable (which for this pair of regions is approximately 1.98) regardless of what is assumed for % trade reduction. This stability of the outcome ratio exists for any pair of regions so that maps of job-shortfalls would be in a sense identical—identifying the same regions with large or small levels giving constant outcome ratios—irrespective of the assumed % trade reduction. This geographical stability is a result of the assumptions made within the simulation exercise, with trade for all UK to EU trade flows reduced by the same %. This means that the subsequent map patterns are immune to the assumed reduction in trade, although the scales would differ, where we focus on a trade reduction other than \(2\%\). So in this way, we see we have an element of robustness in our simulations. Of course in an ideal world, one might wish to make changes to trade on an individual region by region and sector by sector basis rather than assume that trade reduces by the same amount across all regions and all sectors. However, this is very much the unknown, although some sectorally specific estimates are given elsewhere.

Table 2 London—doubling trade reduction and regressor levels

Sectorally specific Brexit impacts are obtained by assuming that trade in specific sectors alone is restricted. While this is unrealistic, it is likely that there will be sectorally differentiated impacts but it is difficult to know by how much trade in manufactures, for example, will be reduced compared with trade in services.Footnote 12 A simple approach is therefore to assume that a specific sector is impacted by Brexit, but that there is zero impact on other sectors. This highlights the geography of the sector-specific trade impacts, because the sectoral trade patterns have different geographies and therefore the impacts have different geographical distributions to the outcomes assuming a global reduction across all sectors (Table 3).

Table 3 Paris—doubling trade reduction and regressor levels

8 Results

The initial outcomes relate to a reduction in EU-UK trade of \(2\%\) across all sectors. The predicted % changes in employment across the EU and UK regions assuming the 2011 levels for \(\mathbf {q,k}\) and \(\overline{\mathbf { e}}\) are shown by Fig. 2. This shows the dynamic paths for each region to 2050, with convergence to steady state occurring after 2030. From this, it is evident that the maximum equilibrium job-shortfall is \(-\,2.56\%,\)in the case of Inner London, with most other regions falling below \(1\%\). Figure 3 shows the geographical pattern of the Brexit impact equal to \(\ln \widetilde{{\mathbf {e}}}_{\tau }-\ln \widehat{{\mathbf {e}}}_{\tau }\) for \(\tau =2025\), indicating a maximum shortfall by 2025 of \(-\,2.34\%\) (Inner London). The picture which emerges from the simulation is that the negative Brexit impact is diverse across regions and bilateral, with both UK regions and EU regions likely to see a job-shortfall. Figure 3 shows larger negative impacts in regions with strong trading links to the UK, most notably in the Ile de France (Paris) region (\(-1.19\%\)), the Southern and Eastern region of Ireland (\(-1.35\%\)), and the Oberbayern region centred on Munich(\(-\,0.99\%\)). Figure 4 gives the frequency distribution of Fig. 3 data, highlighting the fact that despite some large impacts, for about 160 of the 255 regions, Brexit is likely to have close to zero effect on employment. Figure 5 shows that within the UK, Inner and Outer London (\(-1.32\%\)) are expected to have the biggest % shortfall by 2025, with impacts generally higher along the Thames valley in Berkshire, Bucks and Oxfordshire (\(-1.09\%\)) towards Gloucestershire, Wiltshire and North Somerset (\(-\,1.19\%\)). Generally, %s are higher around the Greater South East and in some of the large conurbations (Birmingham \(-\,0.78\%\), Manchester \(-\,0.76\%\), West Yorkshire \(-\,0.67\%\)) than in more rural and peripheral regions. Figure 6 gives the frequency distribution of Fig. 5 data, emphasizing the Inner London outlier, with many regions having a job-shortfall of less than \(-\,0.5\%\). As noted above, if one were to assume different reductions in trade other than \(2\%\), the outcomes for employment would be different, but proportional to the \(2\%\) impact, so that the ratio of impacts in different regions and the geographical pattern would be identical.

Fig. 2
figure 2

Dynamic paths for % employment shortfall

Fig. 3
figure 3

% employment shortfall across 255 EU regions

Fig. 4
figure 4

% employment shortfall across 255 regions

Fig. 5
figure 5

% employment shortfall UK and Ireland

Fig. 6
figure 6

% employment shortfall

Next, consider the separate impacts on employment of restricted trade in the manufacturing sector, defined as the production of food, beverages and tobacco, textiles and leather, coke, refined petroleum, nuclear fuel and chemicals, electrical and optical equipment and transport equipment and other manufacturing. Simulating on the same basis, region paths converge to equilibrium levels as with Fig. 2, although the equilibrium levels differ from those of Fig. 2. We take a snapshot across the dynamic paths in Figs. 7, 8, 9 and 10, showing the % shortfall in employment in manufactures by region for the year 2025. Figure 7 shows that the geography of the impact due to \(2\%\) less industry trade is very similar to the overall pattern shown in Fig. 3. However, a comparison of Figs. 4 and 8 emphasizes the differences in the levels of impact, with the maximum level in Inner London. Figures 9 and 10 show the % job-shortfall in the UK and Ireland. Again the impact in the South and East of Britain and, especially, London is clear, and the effect on Ireland remains pronounced.

We also estimate the impact of reduced trade within the manufacturing sector. Of particular interest is the group of industries defined for trade purposes as comprising ‘electrical and optical equipment and transport equipment’, which includes the all-important production of vehicles. With technological development, we have seen the geographical fragmentation of production processes involved in vehicle manufacture, with the development of spatially dispersed value chains as different elements of the production processes optimally located in different regions or countries. Also just in time processes means that quick and easy access to parts and components used in manufacturing vehicles is important, so interregional connectivity is important, and any disruption of it due to increased barriers to trade will have a significant impact. The impacts of a 2% trade reduction are summarized in Figs. 11 and 12, which picks out some of the production hot spots. In Figs. 13 and 14, the integrated nature of production and dispersed knock-on effects of reduced trade are evident from the relatively even spread of impacts across regions.

Compared with industry, the impact of reduced trade in servicesFootnote 13 is much less symmetrical, with the bulk of the job-shortfall occurring in Britain and Ireland. This is clear from Figs. 15 and 16. Inner London clearly stands out with the most significant projected job-shortfall compared with all other EU regions, and apart from the South and Eastern region of Ireland, almost all nonzero job-shortfalls occur in Great Britain. Figures 17 and 18 emphasize the polarized effect of service trade reduction, with Inner London standing out as an outlier. Outer London and Southern and Eastern Ireland see comparable effects, ahead of all the other UK regions. Focussing on the highly important Financial Intermediation sector, Figs. 19 and 20 emphasize even more strongly the asymmetric impact on Brexit, with more than 200 non-UK–EU regions having almost zero job-shortfall. The most affected non-UK–EU region is the South and East of Ireland, followed by Luxembourg. Figures 21 and 22 illustrate the role of Inner London in particular as a centre for financial intermediation, but overall for the UK increased trade barriers for this element of services seems to have a less profound impact on the job-shortfall than the more geographically widespread and deeper impact of reduced trade in transport equipment.

Fig. 7
figure 7

% employment shortfall due to industry

Fig. 8
figure 8

Frequency distribution from Fig. 7

Fig. 9
figure 9

% employment shortfall UK and Ireland due to industry

Fig. 10
figure 10

Frequency distribution from Fig. 9

Fig. 11
figure 11

% employment shortfall due to transport equipment, etc.

Fig. 12
figure 12

Frequency distribution from Fig. 11

Fig. 13
figure 13

% shortfall UK and Ireland due to transport equipment, etc.

Fig. 14
figure 14

Frequency distribution from Fig. 13

Fig. 15
figure 15

Services impact: % employment shortfall across 255 EU regions

Fig. 16
figure 16

Frequency distribution from Fig. 15

Fig. 17
figure 17

% employment shortfall UK and Ireland due to services

Fig. 18
figure 18

Frequency distribution from Fig. 17

Fig. 19
figure 19

% employment shortfall due to financial intermediation

Fig. 20
figure 20

Frequency distribution from Fig. 19

Fig. 21
figure 21

% employment shortfall UK and Ireland due to financial intermediation

Fig. 22
figure 22

Frequency distribution from Fig. 21

8.1 Conclusion

The paper shows negative Brexit-induced impacts on employment which affect not only the UK regions but also employment levels in EU regions, especially those which are close trading partners. This pan-European interregional interdependency is captured in the state-of-the-art model by spatial and temporal interactions based on the best available trade flow estimates which determine the strength of interdependence. This means that employment within a region not only depends on the levels of output and capital within the region, but also on demand coming from other regions which are trading partners. The impacts will be multi-way, what happens to employment in London depends partly on what happens in Paris, which depends on what happens in Munich, which depends on what happens in London, etc. The approach adopted has been to assume a reduction in trade between EU and UK regions which gives a corresponding reduction in demand for jobs. The predicted job-shortfalls depend on the assumed global % reduction in trade between UK and EU regions, but the modelling assumptions ensured stability in the geography of Brexit impact.

In the paper, the impact of Brexit is measured in terms of the 2025 job-shortfall, which is the reduction in the number of jobs in each region due to Brexit assuming no alternative sources of employment are put in place. This of course might be a false assumption, as the pro-Brexit lobby has consistently emphasized the potential stimulus of new trade deals with other non-EU countries. Therefore, the Brexit impact as reflected in the maps of job-shortfall indicates those regions which could be in the greatest need of alternative compensating sources of employment. Thus, the paper is not predicting a job-loss per se, simply a potential job-loss without successful alternative trade arrangements post-Brexit. Additional employment due to trade diversion effects due to higher UK–EU trade barriers (Ortiz Valverde and Latorre 2018, Dhingra et al. 2017a, Krueger 1999) could possibly be captured within the current modelling set-up via changes to the levels of output and capital in each region, but these would be difficult to estimate and there is some empirical evidence that they might be quite small (Krueger 1999; Magee 2008).

Key outcomes are as follows. First, both UK and EU regions are negatively affected, this is a lose–lose scenario. Second, the deepest most concentrated impact is for the UK, many EU regions are barely affected, but the South and East regions of Ireland are the worst affected EU region, with impacts on a par with the worst affected UK regions. In addition the Ile de France, Oberbayern, Stuggart and Dusseldorf stand out as regions likely to see significant job-shortfalls. Third, in the UK, Southern regions, especially in and around London, and big cities, are expected to see the largest job-shortfall. Fourth, the effect of manufacturing trade reduction will be evidently larger than for services, but the service impact is more asymmetric, with the bulk of the job-shortfall focussed on UK regions, especially London. This is even more the case when comparing the effect of trade reductions in vehicles and financial intermediation.

Overall, the simulations suggest that the biggest Brexit impact on UK regions will occur in the richer South East and urban areas, which is in line with work from LSE based on GVA, which shows that ‘areas in the South of England, and urban areas, are harder hit by Brexit...the areas that were most likely to vote remain are those that are predicted to be most negatively impacted by Brexit’ (Dhingra et al. 2017b). This interpretation is in direct contrast to other work which maintains that ‘the regions which voted Leave also tended to be more dependent on Europe for their prosperity than the regions which voted Remain’ (Los et al. 2017).

Clearly, Brexit is a complex phenomenon leading to diverse interpretations of outcomes, as evident in the special issue of Papers in Regional Science (McCann 2018). The outcomes presented in this paper are based on model assumptions, but it is argued that the main driver of the results is the data, not imposed assumptions. Nevertheless, great caution is needed in interpreting the validity and value of any ‘prediction’ effort. It is worth recalling the words of Box and Draper (1986), 'Essentially, all models are wrong but some are useful'. David Spiegelhalter, Professor of the Public Understanding of Risk at the University of Cambridge, refers to Donald Rumsfeld as the patron saint of Risk Analysis, who will be remembered for famously saying that 'but there are also unknown unknowns. There are things we do not know we don’t know'. We should therefore put forward predictions with all due humility, but clearly and without fear, because we don’t want to come across as ‘dithering scientists’. In defence of the approach adopted, there is support from the words of Pesaran (1990), who points out that ‘Econometric models are important tools for forecasting and policy analysis, and it is unlikely that they will be discarded in the future. The challenge is to recognize their limitations and to work towards turning them into more reliable and effective tools. There seem to be no viable alternatives’.