An alternative semiparametric model for spatial panel data

Abstract

We propose a semiparametric P-Spline model to deal with spatial panel data. This model includes a non-parametric spatio-temporal trend, a spatial lag of the dependent variable, and a time series autoregressive noise. Specifically, we consider a spatio-temporal ANOVA model, disaggregating the trend into spatial and temporal main effects, as well as second- and third-order interactions between them. Algorithms based on spatial anisotropic penalties are used to estimate all the parameters in a closed form without the need for multidimensional optimization. Monte Carlo simulations and an empirical analysis of regional unemployment in Italy show that our model represents a valid alternative to parametric methods aimed at disentangling strong and weak cross-sectional dependence when both spatial and temporal heterogeneity are smoothly distributed.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Notes

  1. 1.

    Actually, the approach proposed by Pesaran and Tosetti (2011) does not explicitly allows for both forms of cross-sectional dependence (strong and weak). Rather they demonstrate that the CCE approach is still valid when in the DGP the errors contain both factors and a spatial-autoregressive terms.

  2. 2.

    The two-step method proposed by Bailey et al. (2016) consists to model first common factors (e.g. aggregate shocks) using cross-sectional averages of the observations (thus following Pesaran 2006) and, then, to model the de-factored observations using spatial econometric techniques. In the one-step method proposed by Vega and Elhorst (2016) common factors and spatial dependence are modeled simultaneously. Another related article is of Han and Lee (2016), where the authors use a bayesian estimator.

  3. 3.

    To install any R package from GitHub you need to have previously installed devtools package from CRAN. Then execute the commands library(devtools), to load devtools, and install github(“rominsal/sptpsar”) to install sptpsar package.

  4. 4.

    It is assumed than for \(\mathbf{W }_N\) row-standardized, \( \vert \rho \vert < 1\) so that this serial expansion holds.

  5. 5.

    The assumption of fixed \(\varvec{\beta }\) parameters can be relaxed, and a random coefficient specification can be assumed: \(\varvec{\beta }_{i}=\varvec{\beta }+u_{i}\), with \(u_{i} \sim i.i.d. (\mathbf 0 ,\Omega _{u})\). In this case the estimator proposed by Pesaran (2006) is the common correlated effects mean group (CCEMG) estimator. We do not employ this extension in our analysis.

  6. 6.

    All variables involved, both observable and latent, are stationary in the simulations. The analysis of the statistical properties of the proposed estimator under the assumption of nonstationarity is a subject of current research.

  7. 7.

    Following the suggestion of an anonymous referee, we have also simulated (only for DGP2) the CCEP and SAR-CCEP specifications including individual time trends. The results are similar to those reported in Table 2 and are available upon request.

  8. 8.

    If some of these covariates were considered as endogeneous, the methodology outlined in Sect. 3 can be extended using the control function approach, as explained in Basile et al. (2014).

  9. 9.

    In addition, there are many other equilibrium and disequilibrium variables affecting regional unemployment differentials. These include, for example, demographic factors (workers migration, commuting, age structure of the population and human capital variables), and institutional factors (unemployment benefits, tax wedge, employment protection legislation, collective bargaining labor relations, and so on). Valid measures for all these variables are often difficult to find at the adopted spatial unit level of the analysis. This means that there is a huge amount of spatial unobserved heterogeneity when modeling regional unemployment rates.

  10. 10.

    Testing error persistence in the case of fixed effects models (like Models 1–6) is complicated by the ’artificial’ serial correlation induced by time-demeaning. In fact, if the original errors are serially uncorrelated, the transformed ones are negatively serially correlated with coefficient \(-1/(T-1)\). Thus, following Millo (2015), the null hypothesis for the Wooldridge-type test of serial correlation in the case of Models 1–6 is \(H_{0}: \psi =-1/(T-1)\), while in the case of Models 7–14 (which do not include fixed effects) is simply \(H_{0}: \psi =0\).

  11. 11.

    All the computations for smooth spatio-temporal models have been made with the R package sptpsar available in github (https://github.com/rominsal/sptpsar).

  12. 12.

    It is worth noticing that the ratio between the indirect effect and the direct effect is the same for every explanatory variables in Table 8. This is the consequence of the SAR specification, where we only consider a spatial lag in the dependent variable, and not in the independent variables. As well known (Elhorst 2014), a Spatial Durbin specification, including also WX terms, would allow for different ratios between direct and indirect effects across the different explanatory variables. First of all, we must observe that this generalization (i.e. the inclusion of WX terms) does not have any effect on the estimators (either SAR-CCEP or PS-ANOVA-SAR). Indeed, as it is well known, we might define a larger matrix including both X and WX terms, and transform the Durbin specification into a SAR model. Second, in our empirical case, we have tried to estimate a Spatial Durbin version of the regional unemployment model, but the WX terms did not enter significantly the model. However, this is not surprising since the WX terms mainly work to capture unobserved heterogeneity in cross-sectional settings. In panel data settings, when unobserved heterogeneity is properly captured through other tools (fixed effects or smooth trends), the spatial lags of the exogenous variables often lose their relevance.

References

  1. Aragon Y, Haughton D, Haughton J, Leconte E, Malin E, Ruiz-Gazen A, Thomas-Agnan C (2003) Explaining the pattern of regional unemployment: the case of the Midi-Pyrénées region. Pap Reg Sci 82(2):155–174

    Google Scholar 

  2. Bailey N, Holly S, Pesaran MH (2016) A two-stage approach to spatio-temporal analysis with strong and weak cross-sectional dependence. J Appl Econom 31(1):249–280

    MathSciNet  Google Scholar 

  3. Bai J, Li K (2013) Spatial panel data models with common shocks. MPRA paper 52786, University of Munich, Germany

  4. Basile R, Girardi A, Mantuano M (2012) Migration and regional unemployment in Italy. Open Urb Stud J 5:1–13

    Google Scholar 

  5. Basile R, Durbán M, Mínguez R, Montero JM, Mur J (2014) Modeling regional economic dynamics: spatial dependence, spatial heterogeneity and nonlinearities. J Econ Dyn Control 48:229–245

    MathSciNet  MATH  Google Scholar 

  6. Blanchard OJ, Katz LF, Hall RE, Eichengreen B (1992) Regional evolutions. Brook Pap Econ Act 1992(1):1–75

    Google Scholar 

  7. Burridge P, Gordon IR (1981) Unemployment in the British metropolitan labour areas. Oxf Econ Pap 33(2):274–97

    Google Scholar 

  8. Chudik A, Pesaran MH (2015) Common correlated effects estimation of heterogeneous dynamic panel data models with weakly exogenous regressors. J Econom 188(2):393–420

    MathSciNet  MATH  Google Scholar 

  9. Chudik A, Pesaran MH, Tosetti E (2011) Weak and strong cross-section dependence and estimation of large panels. Econom J 14(1):C45–C90

    MathSciNet  MATH  Google Scholar 

  10. Claeskens G, Krivobokova T, Opsomer J (2007) Asymptotic properties of penalized spline estimators. Biometrika 96:529–544

    MathSciNet  MATH  Google Scholar 

  11. Cracolici MF, Cuffaro M, Nijkamp P (2007) Geographical distribution of unemployment: an analysis of provincial differences in italy. Growth Change 38(4):649–670

    Google Scholar 

  12. Currie ID, Durbán M (2002) Flexible smoothing with P-splines: a unified approach. Stat Model 2:333–349

    MathSciNet  MATH  Google Scholar 

  13. Currie I, Durbán M, Eilers P (2006) Generalized linear array models with applications to multidimensional somoothing. J R Stat Soc B 68:1–22

    MATH  Google Scholar 

  14. De Boor C (1977) Package for calculating with B-splines. J Numer Anal 14:441–472

    MathSciNet  MATH  Google Scholar 

  15. Decressin J, Fatas A (1995) Regional labor market dynamics in Europe. Eur Econ Rev 39(9):1627–1655

    Google Scholar 

  16. Eilers P, Marx B (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11:89–121

    MathSciNet  MATH  Google Scholar 

  17. Eilers P, Currie I, Durbán M (2006) Fast and compact smoothing on large multidimensional grids. Comput Stat Data Anal 50(1):61–76

    MathSciNet  MATH  Google Scholar 

  18. Eilers PH, Marx BD, Durbán M (2015) Twenty years of p-splines. SORT 39(2):149–186

    MathSciNet  MATH  Google Scholar 

  19. Elhorst JP (1995) Convergence and divergence among European Union. Pion, London, pp 190–200

    Google Scholar 

  20. Elhorst J (2014) Spatial econometrics. From cross-sectional data to spatial panels. SpringerBriefs in regional science. Springer, Berlin

    Google Scholar 

  21. Ertur C, Musolesi A (2016) Weak and strong cross-sectional dependence: a panel data analysis of international technology diffusion. J Appl Econom 32(3):477–503

    MathSciNet  Google Scholar 

  22. Han X, Lee L-F (2016) Bayesian analysis of spatial panel autoregressive models with time-varying endogenous spatial weight matrices, common factors, and random coefficients. J Bus Econ Stat 34:642–660

    MathSciNet  Google Scholar 

  23. Hoshino T (2018) Semiparametric spatial autoregressive models with endogenous regressors: with an application to crime data. J Bus Econ Stat 36:160–172

    MathSciNet  Google Scholar 

  24. Kapetanios G, Pesaran MH, Yamagata T (2011) Panels with non-stationary multifactor error structures. J Econom 160(2):326–348

    MathSciNet  MATH  Google Scholar 

  25. Lee D (2010) Smoothing mixed models for spatial and spatio-temporal data. Ph.D. thesis, University Carlos-III

  26. Lee D, Durbán M (2011) P-spline ANOVA type interaction models for spatio-temporal smoothing. Stat Model 11:49–69

    MathSciNet  MATH  Google Scholar 

  27. Lee L-F, Yu J (2010) Estimation of spatial autoregressive panel data models with fixed effects. J Econom 154(2):165–185

    MathSciNet  MATH  Google Scholar 

  28. Lee DJ, Durban M, Eilers P (2013) Efficient two-dimensional smoothing with P-spline ANOVA mixed models and nested bases. Comput Stat Data Anal 61:22–37

    MathSciNet  MATH  Google Scholar 

  29. LeSage J, Pace K (2009) Introduction to spatial econometrics. CRC Press, Boca Raton

    Google Scholar 

  30. Lin X, Zhang D (1999) Inference in generalized additive mixed models by using smoothing splines. J R Stat Soc B 61:381–400

    MathSciNet  MATH  Google Scholar 

  31. Lottmann F (2012) Spatial dependencies in German matching functions. Reg Sci Urb Econ 42(1):27–41

    Google Scholar 

  32. Marston ST (1985) Two views of the geographic distribution of unemployment. Q J Econ 100(1):57–79

    Google Scholar 

  33. Millo G (2015) Testing for serial correlation in spatial panels. Mimeo

  34. Molho I (1995) Spatial autocorrelation in british unemployment. J Reg Sci 35(4):641–658

    Google Scholar 

  35. Montero J, Mínguez R, Durbán M (2012) SAR models with nonparametric spatial trends. A P-spline approach. Estadística Española 54(177):89–111

    Google Scholar 

  36. Overman HG, Puga D (2002) Unemployment clusters across Europe’s regions and countries. Econ policy 17(34):115–148

    Google Scholar 

  37. Partridge MD, Rickman DS (1997) The dispersion of US state unemployment rates: the role of market and non-market equilibrium factors. Reg Stud 31(6):593–606

    Google Scholar 

  38. Patacchini E, Zenou Y (2007) Spatial dependence in local unemployment rates. J Econ Geogr 7(2):169–191

    Google Scholar 

  39. Patterson H, Thompson R (1971) Recovery of inter-block information when block sizes are unequal. Biometrika 58:545–554

    MathSciNet  MATH  Google Scholar 

  40. Perperoglou A, Eilers PHC (2009) Penalized regression and individual deviance effects. Comput Stat 25:341–361

    MathSciNet  MATH  Google Scholar 

  41. Pesaran MH (2004) General diagnostic tests for cross section dependence in panels. Technical report, CESifo working paper series

  42. Pesaran MH (2006) Estimation and inference in large heterogeneous panels with a multifactor error structure. Econometrica 74(4):967–1012

    MathSciNet  MATH  Google Scholar 

  43. Pesaran MH (2007) A simple panel unit root test in the presence of cross-section dependence. J Appl Econom 22(2):265–312

    MathSciNet  Google Scholar 

  44. Pesaran MH (2015) Testing weak cross-sectional dependence in large panels. Econom Rev 34(6–10):1089–1117

    MathSciNet  Google Scholar 

  45. Pesaran MH, Tosetti E (2011) Large panels with common factors and spatial correlation. J Econom 161(2):182–202

    MathSciNet  MATH  Google Scholar 

  46. Ríos V (2014) What drives regional unemployment convergence? In: ERSA conference papers Ersa14p924, European Regional Science Association

  47. Rodriguez-Alvarez MX, Kneib T, Durban M, Lee D, Eilers P (2015) Fast smoothing parameter separation in multidimensional generalized P-splines: the SAP algorithm. Stat Comput 25(5):941–957

    MathSciNet  MATH  Google Scholar 

  48. Rodrìguez-Álvarez MX, Boer MP, Eeuwijk FAV, Eilers PH (2018) Correcting for spatial heterogeneity in plant breeding experiments with P-splines. Spat Stat 23:52–71

    MathSciNet  Google Scholar 

  49. Searle S, Casella G, McCulloch C (1992) Variance components. Wiley, New York

    Google Scholar 

  50. Shi W, Lee L-F (2018) A spatial panel data model with time varying endogenous weights matrices and common factors. Reg Sci Urb Econ 72:6–34

    Google Scholar 

  51. Su L, Jin S (2012) Sieve estimation of panel data models with cross section dependence. J Econom 169(1):34–47

    MathSciNet  MATH  Google Scholar 

  52. Taylor J, Bradley S (1997) Unemployment in Europe: a comparative analysis of regional disparities in Germany, Italy and the UK. Kyklos 50(2):221–245

    Google Scholar 

  53. Thirlwall AP (1966) Regional unemployment as a cyclical phenomenon. Scott J Polit Econ 13(2):205–219

    Google Scholar 

  54. Vega SH, Elhorst JP (2016) A regional unemployment model simultaneously accounting for serial dynamics, spatial dependence and common factors. Reg Sci Urb Econ 60:85–95

    Google Scholar 

  55. Wood S (2006) On confidence intervals for generalized additive models based on penalized regression splines. Aust N Z J Stat 48:445–464

    MathSciNet  MATH  Google Scholar 

  56. Zeilstra AS, Elhorst JP (2014) Integrated analysis of regional and national unemployment differentials in the European Union. Reg Stud 48(10):1739–1755

    Google Scholar 

Download references

Funding

Funding was provided by Ministerio de Economía, Industria y Competitividad, Gobierno de España (Grant Nos. MTM2014-52184 and ECO2015-65826-P) and Grant 2019-GRIN-26913 provided by the University of Castilla- La Mancha (UCLM) and the European Fund for Regional Development (EFRD) to the Research Group “Applied Economics and Quantitative Methods”.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Román Mínguez.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A: Penalized splines as mixed models

Given the model:

$$\begin{aligned} y_i=f(x_i)+\varepsilon _i \quad \varepsilon \sim N(0, \sigma ^2\mathbf{I }), \end{aligned}$$

using the penalized regression approach we have (in matrix form):

$$\begin{aligned} \mathbf y =\mathbf{B }\varvec{\theta }+\varvec{\epsilon }, \quad \varvec{\epsilon }\sim N(\mathbf 0 , \sigma ^2\mathbf{I }), \end{aligned}$$

where \(\mathbf{B }\) is a matrix of B-spline bases, and \(\varvec{\theta }\) a vector of regression parameters to be estimated via penalized sum of squares:

$$\begin{aligned} (\mathbf y -\mathbf{B }\varvec{\theta })^\prime (\mathbf y -\mathbf{B }\varvec{\theta })+\varvec{\theta }^\prime \mathbf{P }\varvec{\theta }. \end{aligned}$$

The reformulation of a P-spline into a mixed model can be viewed as a reparameterization of the original non-parametric model; B-spline bases are transformed into a new model basis, i.e. \(\mathbf {B}\rightarrow \left[ \mathbf {X}: \mathbf {Z}\right] \), and coefficients \(\varvec{\theta }\rightarrow \left( \varvec{\beta },\varvec{\alpha }\right) ^{\prime }\). Hence, this representation decomposes the fitted values into the sum of a polynomial (unpenalized) part (\(\mathbf {X}\varvec{\beta }\)) and a nonlinear (penalized) (\(\mathbf {Z}\varvec{\alpha }\)) smooth term. To carry out this transformation, we need to find an (orthogonal) transformation matrix \(\mathbf{T }\), so that \(\mathbf{B }\mathbf{T }=\left[ \mathbf {X}: \mathbf {Z}\right] \) and \(\mathbf{T }^\prime \varvec{\theta }=\left( \varvec{\beta },\varvec{\alpha }\right) ^{\prime }\). There are several possibilities for this matrix; we choose one based on the singular value decomposition of the penalty matrix \(\mathbf{P }=\lambda \mathbf {D}^{\prime }\mathbf {D}\), that is:

$$\begin{aligned} \mathbf {D}^{\prime }\mathbf {D}=\mathbf {U}\varvec{\Sigma }\mathbf {U}^{\prime }\text {,} \end{aligned}$$

where \(\varvec{\Sigma }\) is a diagonal matrix that contains the eigenvalues of \(\mathbf {D}^{\prime }\mathbf {D}\), with 2 zero eigenvalues, and \(\mathbf {U}\) is the corresponding matrix of eigenvectors that can be decomposed into two parts: \(\mathbf {U}_{n}\) of dimension \(c\times 2\) containing the null-part eigenvectors and \(\mathbf {U}_{s}\) of dimension \(c\times (c-2)\) (where c is the rank of the basis and 2 the order of the penalty) with non-null-part eigenvectors. Note that we can write \(\varvec{\Sigma }\) as \(\varvec{\Sigma }=blockdiag\left( \mathbf {0}_{2},\tilde{\varvec{\varSigma }}\right) \), where \(\tilde{\varvec{\varSigma }}\) is a diagonal matrix that contains the non-zero eigenvalues of \(\mathbf {D}^{\prime } \mathbf {D}\) and \(\mathbf {0}_{2}\) is a \(2\times 2\) matrix of zeroes. Therefore, we define the transformation matrix \(\mathbf {T}\) as:

$$\begin{aligned} \mathbf {T}=[\mathbf {U}_{n}:\mathbf {U}_{s}\tilde{\varvec{\varSigma }}^{-1/2}]\text {,} \end{aligned}$$

where the fixed and random effect matrices are \(\mathbf {X}=\mathbf {BU}_{n}\), and \(\mathbf {Z=BU}_{s}\tilde{\varvec{\Sigma }}^{-1/2}\), respectively. Also, given this transformation matrix, the new coefficients are \(\varvec{\beta }=\mathbf {U}_{n}^{\prime } \varvec{\theta }\) and \(\varvec{\alpha }=\mathbf {U}_{s}^{\prime }\tilde{\varvec{\varSigma }}^{-1/2}\varvec{\theta }\). The fixed effect matrix \(\mathbf {X}\) may be replaced by any sub-matrix such that \(\left[ \mathbf {X}:\mathbf {Z}\right] \) has full rank and \(\mathbf {X}^{\prime }\mathbf {Z=0}\) (that is, \(\mathbf {X}\) and \(\mathbf {Z}\) are orthogonal). So, for example, if we assume a second-order penalty (\(d=2\)), the fixed effect matrix can be taken as \( \mathbf {X}=[\mathbf {1}:\mathbf {x}]\), where \(\mathbf {1}\) is a vector of ones and \(\mathbf {x}\) is the explanatory variable. Also, the penalty term \( \varvec{\theta }^{\prime }\mathbf {P} \varvec{\theta }\) becomes \(\varvec{\alpha }^{\prime } \mathbf{F }\varvec{\alpha }\), where \(\mathbf {F}=\lambda \mathbf{I }\). This follows since \(\mathbf {T}\) is orthogonal and \(\left( \varvec{\beta }, \varvec{\alpha }\right) ^{\prime }=\mathbf {T}^{\prime }\varvec{\theta }\). Hence, given the new basis and the new penalty, the penalized sum of squares,

$$\begin{aligned} (\mathbf y -\mathbf{B }\varvec{\theta })^\prime (\mathbf y -\mathbf{B }\varvec{\theta })+ \varvec{\theta }^\prime \mathbf{P }\varvec{\theta }, \end{aligned}$$

becomes:

$$\begin{aligned} \left( \mathbf {y}- \mathbf {X}\beta -\mathbf {Z}\varvec{\alpha }\right) ^{\prime }\left( \mathbf {y}- \mathbf {X}\beta -\mathbf {Z}\varvec{\alpha }\right) + \lambda \varvec{\alpha }^{\prime } I_{c-2}\varvec{\alpha }\text {,} \end{aligned}$$

This corresponds to the joint log-likelihood of a linear mixed model:

$$\begin{aligned} \mathbf y =\mathbf{X }\varvec{\beta }+\mathbf{Z }\varvec{\alpha }+\varvec{\epsilon }, \quad \varvec{\epsilon }\sim N(\mathbf 0 , \sigma ^2\mathbf{I }), \quad \varvec{\alpha }\sim \mathcal {N}(\mathbf 0 ,\mathbf{G }), \end{aligned}$$

with \(\mathbf{G }= \sigma _\nu ^2 \mathbf{I }_{c-2}\) and \(\lambda =\sigma ^2/ \sigma _\nu ^2\). Therefore, the smoothing parameters is estimated via the estimation of the variance components in the mixed model.

Appendix B: Mixed model representation of the semiparametric spatio-temporal autoregressive model and parameter estimation

For the sake of simplicity, we assume here that there are no covariates. The inclusion of covariates with a linear or non-linear functional relationship with the response is immediate by augmenting the matrices for fixed and random effects accordingly, as well as the corresponding covariance matrices. We therefore focus on the following model:

$$\begin{aligned} \mathbf y= & {} f_1(\mathbf s _1)+f_2(\mathbf s _2)+f_t(\varvec{\tau })+f_{1,2}(\mathbf s _1,\mathbf s _2)+ f_{1,t}(\mathbf s _1,\varvec{\tau })\nonumber \\&+ f_{2,t}(\mathbf s _2,\varvec{\tau })+f_{1,2,t}(\mathbf s _1, \mathbf s _2,\varvec{\tau })+\rho (\mathbf{W }_N \otimes \mathbf{I }_{T}) \mathbf y + \varvec{\epsilon }\end{aligned}$$

where the errors are assumed to follow a temporal AR(1) process, see (9). In matrix form:

$$\begin{aligned} (\mathbf{A }_N \otimes \mathbf{I }_{T}) \mathbf y = \mathbf{B }\varvec{\theta }+ \varvec{\epsilon }\qquad \varvec{\epsilon }\sim N \left( \mathbf{0},\frac{\sigma ^2}{1-\phi ^2} (\mathbf{I }_N \otimes \varvec{\varOmega }) \right) \qquad \mathbf{A }_N = \mathbf{I }_N - \rho \mathbf{W }_N \end{aligned}$$

The regression matrix of the model above will be the concatenation of B-spline bases for each of the smooth terms in the model:

$$\begin{aligned} \mathbf{B }= [\mathbf 1 \vert \mathbf{B }_{s_1}\vert \mathbf{B }_{s_2}\vert \mathbf{B }_{s_2}\vert \mathbf{B }_{s_1}\Box \mathbf{B }_{s_2}\vert \mathbf{B }_{s_1}\otimes \mathbf{B }_{\tau } \vert \mathbf{B }_{s_2}\otimes \mathbf{B }_{\tau }\vert (\mathbf{B }_{s_1}\Box \mathbf{B }_{s_2})\otimes \mathbf{B }_{\tau },] \end{aligned}$$

where \(\mathbf{B }_{s_1}\), \(\mathbf{B }_{s_2}\) and \(\mathbf{B }_{\tau }\) correspond to the marginal B-spline basis for the spatial coordinates (\(\mathbf s _1, \mathbf s _2\)) and time (\(\varvec{\tau }\)), and \(\Box \) represents the row-wise tensor product defined as:

$$\begin{aligned} \mathbf{B }_{i}\Box \mathbf{B }_{j}=(\mathbf{B }_i \otimes \mathbf 1 _{c_i}^\prime )*( \mathbf 1 _{c_j}^\prime \otimes \mathbf{B }_j), \end{aligned}$$

and \(\mathbf 1 \) is a column vector of ones, \(c_i\) is the rank of \(\mathbf{B }_i\), and \(\otimes \) and \(*\) are the Kronecker and element-wise matrix products, respectively.

The penalty matrix is now block-diagonal with blocks corresponding to the different terms in the model: \(\lambda _i\mathbf{D }_i^\prime \mathbf{D }_i\) for main effects, \(\lambda _{i}\mathbf{D }_i^\prime \mathbf{D }_i \otimes \mathbf{I }_{c_k} + \lambda _{k} \mathbf{I }_{c_i} \otimes \mathbf{D }_k^\prime \mathbf{D }_k\) for the second-order interactions, and \(\lambda _{i}\mathbf{D }_i^\prime \mathbf{D }_i \otimes \mathbf{I }_{c_k} \otimes \mathbf{I }_{c_l}+ \lambda _{k} \mathbf{I }_{c_i} \otimes \mathbf{D }_k^\prime \mathbf{D }_k \otimes \mathbf{I }_{c_j}+ \lambda _l\otimes \mathbf{I }_{c_i} \otimes \mathbf{I }_{c_k} \otimes \mathbf{D }_l^\prime \mathbf{D }_l\) for the three-way interaction.

In this case, several constraints need to be imposed, since the space spanned by any product \(\mathbf{B }_i \otimes \mathbf{B }_j\), contains the space spanned by the marginal bases \(\mathbf{B }_i\) and \(\mathbf{B }_j\). The mixed model reparameterization of this model will automatically provide the necessary constraints. To find that parameterization, a new transformation matrix is needed (again based on the singular value decomposition of the penalty \(\mathbf{P }\)) (see Lee 2010, for details). Then, the model is written as:

$$\begin{aligned}&\left( \mathbf{A}_N \otimes \mathbf{I}_T \right) \mathbf{y} = \mathbf{X} \varvec{\beta }+ \mathbf{Z} \varvec{\alpha }+ \varvec{\epsilon }\\&\varvec{\alpha }\sim N(\mathbf{0},\mathbf{G}) \qquad \varvec{\epsilon }\sim N \left( \mathbf{0},\frac{\sigma ^2}{1-\phi ^2} (\mathbf{I}_N \otimes \varvec{\varOmega }) \right) \nonumber \end{aligned}$$
(12)

with

$$\begin{aligned} \mathbf{X }= & {} \left[ ( \mathbf{X }_{s_1} \Box \mathbf{X }_{s_2}) \otimes \mathbf{X }_{\tau } \right] \\ \mathbf{Z }= & {} \left[ (\mathbf{Z }_{s_1} \Box \mathbf{X }_{s_2}) \otimes \mathbf{X }_{\tau } \vert (\mathbf{X }_{s_1} \Box \mathbf{Z }_{s_2}) \otimes \mathbf{X }_{\tau } \vert (\mathbf{X }_{s_1} \Box \mathbf{X }_{s_2}) \otimes \mathbf{Z }_{\tau } \vert (\mathbf{Z }_{s_1} \Box \mathbf{Z }_{s_2}) \otimes \mathbf{X }_{\tau } \vert \right. \\&\left. (\mathbf{Z }_{s_1} \Box \mathbf{X }_{s_2}) \otimes \mathbf{Z }_{\tau } \vert (\mathbf{X }_{s_1} \Box \mathbf{Z }_{s_2}) \otimes \mathbf{Z }_{\tau } \vert (\mathbf{Z }_{s_1} \Box \mathbf{Z }_{s_2}) \otimes \mathbf{Z }_{\tau } \right] \end{aligned}$$

where \( \mathbf{X }_{k}\), \(\mathbf{Z }_{k}\) \( (k=s_{1},s_{2},\tau \)) are the mixed model matrices obtained for the reparameterization of the marginal basis described in “Appendix A”. The covariance matrix of random effects, \(\mathbf{G }\), is such that:

$$\begin{aligned} \mathbf{G }^{-1} = \text {blockdiag}&\left( \mathbf 0 ,\frac{1}{\sigma _{\nu _1}^2}\varvec{\varLambda }_1,\frac{1}{\sigma _{\nu _2}^2}\varvec{\varLambda }_2,\frac{1}{\sigma _{\nu _3}^2}\varvec{\varLambda }_3,\frac{1}{\sigma _{\nu _{4}}^2} \varvec{\varLambda }_{4} + \frac{1}{\sigma _{\nu _{5}}^2} \varvec{\varLambda }_{5}, \frac{1}{\sigma _{\nu _{6}}^2} \varvec{\varLambda }_{6}+ \frac{1}{\sigma _{\nu _{7}}^2} \varvec{\varLambda }_{7}, \right. \nonumber \\&\left. \frac{1}{\sigma _{\nu _{8}}^2} \varvec{\varLambda }_{8} = \frac{1}{\sigma _{\nu _{9}}^2} \varvec{\varLambda }_{9}, \frac{1}{\sigma _{\nu _{10}}^2} \varvec{\varLambda }_{10} +\frac{1}{\sigma _{\nu _{11}}^2} \varvec{\varLambda }_{11}+\frac{1}{\sigma _{\nu _{12}}^2} \varvec{\varLambda }_{12} \right) \end{aligned}$$
(13)

where

$$\begin{aligned}&\varvec{\varLambda }_1 = \widetilde{\varvec{\varSigma }}_{s_1}, \quad \varvec{\varLambda }_2 = \widetilde{\varvec{\varSigma }}_{s_2}, \quad \varvec{\varLambda }_3 = \widetilde{\varvec{\varSigma }}_{\tau } \nonumber \\&\varvec{\varLambda }_4 = \widetilde{\varvec{\varSigma }}_{s_1} \otimes \mathbf{I }_{c_{s_2}-2}, \quad \varvec{\varLambda }_5 = \mathbf{I }_{c_{s_1}-2} \otimes \widetilde{\varvec{\varSigma }}_{s_2}, \quad \varvec{\varLambda }_6 = \widetilde{\varvec{\varSigma }}_{s_1} \otimes \mathbf{I }_{2} \nonumber \\&\varvec{\varLambda }_7=\mathbf{I }_{c_{s_1}-q_{s_1}} \otimes \mathbf{I }_{2}, \quad \varvec{\varLambda }_8= \widetilde{\varvec{\varSigma }}_{s_2} \otimes \mathbf{I }_{c_t-2} \quad \varvec{\varLambda }_9 =\mathbf{I }_{c_{s_2}-2} \otimes \widetilde{\varvec{\varSigma }}_{\tau } \\&\varvec{\varLambda }_{10} = \widetilde{\varvec{\varSigma }}_{s_1} \otimes \mathbf{I }_{c_{s_2}-2} \otimes \mathbf{I }_{c_{\tau }-2},\quad \varvec{\varLambda }_{11}= \mathbf{I }_{c_{s_1}-2} \otimes \widetilde{\varvec{\varSigma }}_{s_2} \otimes \mathbf{I }_{c_{\tau }-2}, \nonumber \\&\varvec{\varLambda }_{12}=\mathbf{I }_{c_{s_1}-2} \otimes \mathbf{I }_{c_{s_2}-2} \otimes \widetilde{\varvec{\varSigma }}_{\tau } \nonumber \end{aligned}$$
(14)

and \(\widetilde{\varvec{\varSigma }}\) matrices correspond to the non-zero eigenvectors of the singular value decomposition of penalty matrices. It is important to be able to decompose the precision matrix of the random effects as a linear combination over the variance parameters, since this is a necessary condition to apply the SAP algorithm.

B.1: Estimation of the PS-ANOVA-SAR(AR1) model via the SAP algorithm

Fixed and random effects in model (12) are estimated (conditional on the correlation parameters and variance components) using the standard mixed model theory (see Searle et al. 1992):

$$\begin{aligned} \widehat{\varvec{\beta }}&= (\mathbf{X }'\mathbf{V }^{-1}\mathbf{X })^{-1}\mathbf{X }'\mathbf{V }^{-1}(\mathbf{A }_N \otimes \mathbf{I }_{T}) \mathbf y \end{aligned}$$
(15)
$$\begin{aligned} \widehat{\varvec{\alpha }}&= \mathbf{G }\mathbf{Z }'\mathbf{V }^{-1}((\mathbf{A }_N \otimes \mathbf{I }_{T}) \mathbf y -\mathbf{X }\widehat{\varvec{\beta }}), \end{aligned}$$
(16)

where \(\mathbf{V }=\frac{\sigma ^2}{1-\phi ^2} (\mathbf{I }_{N}\otimes \varvec{\varOmega })+\mathbf{Z }\mathbf{G }\mathbf{Z }'\).

Variance components (and, therefore, smoothing parameters), and correlation parameters may be estimated by maximizing the residual log-likelihood (REML) of Patterson and Thompson (1971) (slightly modified by the Kronecker matrix product, \(\mathbf{A }_N \otimes \mathbf{I }_T\)):

$$\begin{aligned} \ell (\sigma _{\nu _i}^2,\sigma ^2,\rho ,\phi )&= -\frac{1}{2}\log |\mathbf{V }| -\frac{1}{2}\log |\mathbf{X }'\mathbf{V }^{-1}\mathbf{X }| \nonumber \\&\quad -\frac{1}{2}\left[ (\mathbf{A }_N \otimes \mathbf{I }_T)\mathbf y \right] '(\mathbf{V }^{-1}-\mathbf{V }^{-1}\mathbf{X }(\mathbf{X }'\mathbf{V }^{-1}\mathbf{X })^{-1}\mathbf{X }'\mathbf{V }^{-1})\left[ (\mathbf{A }_N \otimes \mathbf{I }_T)\mathbf y \right] \nonumber \\&\quad + \log \vert \mathbf{A }_N \otimes \mathbf{I }_T \vert \end{aligned}$$
(17)

where the matrices \(\mathbf{V }\), \(\mathbf{X }\) and \(\mathbf{Z }\) are obtained as described above (if linear and non-linear covariates have been added, \(\mathbf{X }\) and \(\mathbf{Z }\) matrices are augmented in a suitable additive way).

Maximization of this REML function is a very complex numerical problem, specially when the number of variance components/correlation parameters is large. Rodriguez-Alvarez et al. (2015) recently developed an algorithm named SAP (Separation of Anisotropic Penalties), which is based on the fact that the inverse variance-covariance matrix of the random effects, \(\mathbf{G }^{-1}\), is a linear combination of precision matrices. This is the case for the PS-ANOVA-SAR(AR1) model, as we showed in (13). This expression allows us to get closed estimates for all the variance component parameters \(\sigma _{\nu _i}^2\) and \(\sigma ^2\) very efficiently. We have adapted this algorithm to also include the estimation of \(\rho \) and \(\phi \) parameters. The steps for applying the SAP algorithm to optimize (17) can be summarized as follows:

  1. 1.

    Initialization. Set

    • Set \(k=0\)

    • \(\hat{\varvec{\beta }}^{(k)}=\mathbf 0 ; \quad \hat{\varvec{\alpha }}^{(k)}=\mathbf 0 \)

    • \(\hat{\sigma }_{\nu _i}^{2,(k)} = 1 \quad i=1,2,\ldots ,12\)

    • \(\hat{\sigma }^{2,(k)} = \text {var}(\mathbf y )\)

    • \(\hat{\rho }^{(k)} = 0\)

  2. 2.

    Compute \(\hat{\mathbf{G }}^{(k)},\hat{\mathbf{V }}^{(k)},\hat{\mathbf{P }}^{(k)},\hat{\mathbf{A }}_N^{(k)}\) matrices using next expressions:

    $$\begin{aligned}&\hat{\mathbf{G }}^{-1,(k)} =\sum _{i=1}^{12}\frac{1}{\hat{\sigma }_{\nu _i}^{2,(k)}}\varvec{\varLambda }_i^{(k)} \\&\hat{\mathbf{V }}^{(k)} = \hat{\sigma }^{2,(k)}\mathbf{I }_{NT}+\mathbf{Z }\hat{\mathbf{G }}^{(k)}\mathbf{Z }\\&\hat{\mathbf{P }}^{(k)} = \hat{\mathbf{V }}^{-1,(k)} - \hat{\mathbf{V }}^{-1,(k)} \mathbf{X }(\mathbf{X }^\prime \hat{\mathbf{V }}^{-1,(k)}\mathbf{X })^{-1} \mathbf{X }^\prime \hat{\mathbf{V }}^{-1,(k)} \\&\hat{\mathbf{A }}_N^{(k)} = \mathbf{I }_N-\hat{\rho }^{(k)}\mathbf{W }_N \end{aligned}$$
  3. 3.

    Compute the estimates:

    $$\begin{aligned} \hat{\varvec{\beta }}^{(k)}&= (\mathbf{X }^\prime \hat{\mathbf{V }}^{-1,(k)}\mathbf{X })^{-1} (\mathbf{X }^\prime \hat{\mathbf{V }}^{-1,(k)}\hat{\mathbf{A }}_N^{(k)}\mathbf y ) \\ \hat{\varvec{\alpha }}^{(k)}&= \hat{\mathbf{G }}^{(k)}\mathbf{Z }^\prime \hat{\mathbf{V }}^{-1,(k)}(\hat{\mathbf{A }}_N^{(k)}\mathbf y -\mathbf{X }\hat{\varvec{\beta }}^{(k)}) \\ ed_i^{(k)}&= \text {trace}(\mathbf{Z }^\prime \hat{\mathbf{P }}^{(k)}\mathbf{Z }\hat{\mathbf{G }}^{(k)} \frac{1}{\hat{\sigma }_{\nu _i}^{2,(k)}}\varvec{\varLambda }_i\hat{\mathbf{G }}^{(k)}) \quad i=1,2,\ldots ,12 \end{aligned}$$

    where \(\varvec{\varLambda }_i \quad i=1,\ldots ,12\) is defined in (14).

  4. 4.

    Estimate the variance components:

    $$\begin{aligned} \hat{\sigma }_{\nu _i}^{2,(k+1)}&= \frac{{\hat{\varvec{\alpha }}^{(k)^\prime }} \varvec{\varLambda }_{i} \hat{\varvec{\alpha }}^{(k)}}{ed_i^{(k)}} \quad i=1,\ldots ,12 \end{aligned}$$

    Estimate the variance of the noise as:

    $$\begin{aligned} \hat{\sigma }^{2,(k+1)} = \frac{(\hat{\mathbf{A }}_N^{(k)}\mathbf y - \mathbf{X }\hat{\varvec{\beta }}^{(k)} - \mathbf{Z }\hat{\varvec{\alpha }}^{(k)})^\prime (\hat{\mathbf{A }}_N^{(k)}\mathbf y - \mathbf{X }\hat{\varvec{\beta }}^{(k)} - \mathbf{Z }\hat{\varvec{\alpha }}^{(k)})}{N-\sum _i ed_i^{(k)}- \text {rank}(\mathbf{X })-2} \end{aligned}$$
  5. 5.

    Estimate the spatial parameter \(\hat{\rho }^{(k+1)}\) and serial correlation parameter \(\hat{\phi }^{(k+1)}\) solving numerically the non-linear equations obtained by equating to zero the score of REML function with respect to \(\rho \) and \(\phi \) respectively (this additional step is the only difference with respect to the usual SAP algorithm):

    $$\begin{aligned} \frac{\partial \ell (.) }{\partial \rho }&= -\frac{1}{2} \left[ 2 \hat{\mathbf{P }}^{(k)} \left( (\mathbf{A }_N \otimes \mathbf{I }_T)\mathbf y \right) \right] ^\prime \left( \frac{\partial (\mathbf{A }_N \otimes \mathbf{I }_T)}{\partial \rho } \mathbf y \right) \\&\quad + \text {trace} \left( (\mathbf{A }_N \otimes \mathbf{I }_T)^{-1} \frac{\partial (\mathbf{A }_N \otimes \mathbf{I }_T)}{\partial \rho } \right) \\&= \mathbf y ^\prime (\mathbf{A }_N \otimes \mathbf{I }_T^\prime ) \hat{\mathbf{P }}^{(k)} (\mathbf{W }_N\otimes \mathbf{I }_T)\mathbf y - T\text {trace}(\mathbf{A }_N^{-1}\mathbf{W }_N) = 0\\ \frac{\partial l(.)}{\partial \phi }&= -\frac{1}{2} \left\{ \text {trace} \left( \mathbf{P }\frac{\partial \mathbf{V }}{\partial \phi } \right) - \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }}\right] ^\prime \mathbf{V }^{-1}\right. \nonumber \\&\quad \times \left. \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }} \right] \right\} = 0 \end{aligned}$$

    where:

    $$\begin{aligned} \frac{\partial \mathbf{V }}{\partial \phi } = \frac{ \partial \left\{ \mathbf{Z }\mathbf{G }\mathbf{Z }^\prime + \frac{\sigma ^2}{1-\phi ^2} \left( \mathbf{I }_N \otimes \varvec{\varOmega }\right) \right\} }{\partial \phi }= \left( \mathbf{I }_N \otimes \frac{\partial \left[ (\frac{\sigma ^2}{1-\phi ^2})\varvec{\varOmega }\right] }{\partial \phi } \right) \end{aligned}$$

    and

  6. 6.

    Set \(k=k+1\) and go to step (2) until convergence.

Once the convergence is obtained, the effective degrees of freedom of the model can be estimated as:

$$\begin{aligned} \text {edf}=\sum _i ed_i^{(k)} + \text {rank}(\mathbf{X })+2 \end{aligned}$$

This quantity is increased by two units with respect to spatio-temporal smooth models because of the need to estimate \(\rho \) and \(\phi \) parameters.

To obtain the covariance matrix of the estimates, we need the hessian matrix of REML function with respect to \(\rho \) and \(\phi \) parameters given by the expressions:

$$\begin{aligned} \frac{\partial ^2 l(.)}{\partial \rho ^2}&= - \mathbf y ^\prime \left( \mathbf{W }_N^\prime \otimes \mathbf{I }_T \right) \mathbf{P }\left( \mathbf{W }_N \otimes \mathbf{I }_T \right) \mathbf y - T\text {trace}\left( (\mathbf{A }_N^{-1}\mathbf{W }_N)^2\right) \\ \frac{\partial ^2 l(.)}{\partial \phi ^2}&= -\frac{1}{2} \left\{ \frac{\partial \text {trace} \left( \mathbf{P }\frac{\partial \mathbf{V }}{\partial \phi } \right) }{\partial \phi } - \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }}\right] ^\prime \right. \nonumber \\&\quad \times \left. \frac{\partial \left( \mathbf{V }^{-1} \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} \right) }{\partial \phi } \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }} \right] \right\} \\ \frac{\partial ^2 l(.)}{\partial \phi \partial \rho }&= \mathbf y ^\prime \left( \mathbf{W }_N^\prime \otimes \mathbf{I }_T \right) \mathbf{V }^{-1} \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }} \right] \end{aligned}$$

where:

$$\begin{aligned}&\frac{\partial \text {trace} \left( \mathbf{P }\frac{\partial \mathbf{V }}{\partial \phi } \right) }{\partial \phi } = \text {trace} \left( \frac{\partial (\mathbf{P }\frac{\partial \mathbf{V }}{\partial \phi })}{\partial \phi } \right) =\text {trace} \left( \frac{\partial \mathbf{P }}{\partial \phi } \frac{\partial \mathbf{V }}{\partial \phi } + \mathbf{P }\frac{\partial ^2 \mathbf{V }}{\partial \phi ^2} \right) \\&\frac{\partial \mathbf{V }}{\partial \phi } = \left( \mathbf{I }_N \otimes \frac{\partial \left[ (\frac{\sigma _{\epsilon }^2}{1-\phi ^2})\varvec{\varOmega }\right] }{\partial \phi } \right) \qquad \frac{\partial ^2 \mathbf{V }}{\partial \phi ^2} = \left( \mathbf{I }_N \otimes \frac{\partial ^2 \left[ (\frac{\sigma _{\epsilon }^2}{1-\phi ^2})\varvec{\varOmega }\right] }{\partial \phi ^2} \right) \\&\frac{\partial \mathbf{P }}{\partial \phi } = - \mathbf{V }^{-1} \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} - \left( -\mathbf{V }^{-1}\frac{\partial \mathbf{V }}{\partial \phi }\mathbf{V }^{-1} \mathbf{X }(\mathbf{X }^\prime \mathbf{V }^{-1} \mathbf{X })^{-1} \mathbf{X }^\prime \mathbf{V }^{-1} \right. \\&\quad + \mathbf{V }^{-1} \mathbf{X }(\mathbf{X }^\prime \mathbf{V }^{-1} \mathbf{X })^{-1} \mathbf{X }^\prime \mathbf{V }^{-1} \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} \mathbf{X }(\mathbf{X }^\prime \mathbf{V }^{-1} \mathbf{X })^{-1} \mathbf{X }^\prime \mathbf{V }^{-1}\\&\quad \left. - \mathbf{V }^{-1} \mathbf{X }(\mathbf{X }^\prime \mathbf{V }^{-1} \mathbf{X })^{-1} \mathbf{X }^\prime \mathbf{V }^{-1}\frac{\partial \mathbf{V }}{\partial \phi }\mathbf{V }^{-1} \right) \end{aligned}$$

These expressions can be evaluated at maximum of REML function to obtain the negative of the hessian matrix. The inverse of this matrix provides the asymptotic covariance matrix in the usual way.

Eventually the covariance matrix of \(\rho \) and \(\phi \), jointly with the covariance matrix of the regression parameters \(\varvec{\beta }\) and \(\varvec{\alpha }\) given by \(Cov(\varvec{\beta },\varvec{\alpha })=\mathbf{C }^{-1}\) (see Sect. 3), can be used to obtain the simulated distributions of total, direct and indirect effects as explained in Sect. 2. As usual, REML estimates are asymptotically unbiased and gaussian distributed.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Mínguez, R., Basile, R. & Durbán, M. An alternative semiparametric model for spatial panel data. Stat Methods Appl 29, 669–708 (2020). https://doi.org/10.1007/s10260-019-00492-8

Download citation

Keywords

  • Spatial panel
  • Spatio-temporal trend
  • Mixed models
  • P-splines
  • PS-ANOVA

JEL Classification

  • C33
  • C14
  • C63