Appendix A: Penalized splines as mixed models
Given the model:
$$\begin{aligned} y_i=f(x_i)+\varepsilon _i \quad \varepsilon \sim N(0, \sigma ^2\mathbf{I }), \end{aligned}$$
using the penalized regression approach we have (in matrix form):
$$\begin{aligned} \mathbf y =\mathbf{B }\varvec{\theta }+\varvec{\epsilon }, \quad \varvec{\epsilon }\sim N(\mathbf 0 , \sigma ^2\mathbf{I }), \end{aligned}$$
where \(\mathbf{B }\) is a matrix of B-spline bases, and \(\varvec{\theta }\) a vector of regression parameters to be estimated via penalized sum of squares:
$$\begin{aligned} (\mathbf y -\mathbf{B }\varvec{\theta })^\prime (\mathbf y -\mathbf{B }\varvec{\theta })+\varvec{\theta }^\prime \mathbf{P }\varvec{\theta }. \end{aligned}$$
The reformulation of a P-spline into a mixed model can be viewed as a reparameterization of the original non-parametric model; B-spline bases are transformed into a new model basis, i.e. \(\mathbf {B}\rightarrow \left[ \mathbf {X}: \mathbf {Z}\right] \), and coefficients \(\varvec{\theta }\rightarrow \left( \varvec{\beta },\varvec{\alpha }\right) ^{\prime }\). Hence, this representation decomposes the fitted values into the sum of a polynomial (unpenalized) part (\(\mathbf {X}\varvec{\beta }\)) and a nonlinear (penalized) (\(\mathbf {Z}\varvec{\alpha }\)) smooth term. To carry out this transformation, we need to find an (orthogonal) transformation matrix \(\mathbf{T }\), so that \(\mathbf{B }\mathbf{T }=\left[ \mathbf {X}: \mathbf {Z}\right] \) and \(\mathbf{T }^\prime \varvec{\theta }=\left( \varvec{\beta },\varvec{\alpha }\right) ^{\prime }\). There are several possibilities for this matrix; we choose one based on the singular value decomposition of the penalty matrix \(\mathbf{P }=\lambda \mathbf {D}^{\prime }\mathbf {D}\), that is:
$$\begin{aligned} \mathbf {D}^{\prime }\mathbf {D}=\mathbf {U}\varvec{\Sigma }\mathbf {U}^{\prime }\text {,} \end{aligned}$$
where \(\varvec{\Sigma }\) is a diagonal matrix that contains the eigenvalues of \(\mathbf {D}^{\prime }\mathbf {D}\), with 2 zero eigenvalues, and \(\mathbf {U}\) is the corresponding matrix of eigenvectors that can be decomposed into two parts: \(\mathbf {U}_{n}\) of dimension \(c\times 2\) containing the null-part eigenvectors and \(\mathbf {U}_{s}\) of dimension \(c\times (c-2)\) (where c is the rank of the basis and 2 the order of the penalty) with non-null-part eigenvectors. Note that we can write \(\varvec{\Sigma }\) as \(\varvec{\Sigma }=blockdiag\left( \mathbf {0}_{2},\tilde{\varvec{\varSigma }}\right) \), where \(\tilde{\varvec{\varSigma }}\) is a diagonal matrix that contains the non-zero eigenvalues of \(\mathbf {D}^{\prime } \mathbf {D}\) and \(\mathbf {0}_{2}\) is a \(2\times 2\) matrix of zeroes. Therefore, we define the transformation matrix \(\mathbf {T}\) as:
$$\begin{aligned} \mathbf {T}=[\mathbf {U}_{n}:\mathbf {U}_{s}\tilde{\varvec{\varSigma }}^{-1/2}]\text {,} \end{aligned}$$
where the fixed and random effect matrices are \(\mathbf {X}=\mathbf {BU}_{n}\), and \(\mathbf {Z=BU}_{s}\tilde{\varvec{\Sigma }}^{-1/2}\), respectively. Also, given this transformation matrix, the new coefficients are \(\varvec{\beta }=\mathbf {U}_{n}^{\prime } \varvec{\theta }\) and \(\varvec{\alpha }=\mathbf {U}_{s}^{\prime }\tilde{\varvec{\varSigma }}^{-1/2}\varvec{\theta }\). The fixed effect matrix \(\mathbf {X}\) may be replaced by any sub-matrix such that \(\left[ \mathbf {X}:\mathbf {Z}\right] \) has full rank and \(\mathbf {X}^{\prime }\mathbf {Z=0}\) (that is, \(\mathbf {X}\) and \(\mathbf {Z}\) are orthogonal). So, for example, if we assume a second-order penalty (\(d=2\)), the fixed effect matrix can be taken as \( \mathbf {X}=[\mathbf {1}:\mathbf {x}]\), where \(\mathbf {1}\) is a vector of ones and \(\mathbf {x}\) is the explanatory variable. Also, the penalty term \( \varvec{\theta }^{\prime }\mathbf {P} \varvec{\theta }\) becomes \(\varvec{\alpha }^{\prime } \mathbf{F }\varvec{\alpha }\), where \(\mathbf {F}=\lambda \mathbf{I }\). This follows since \(\mathbf {T}\) is orthogonal and \(\left( \varvec{\beta }, \varvec{\alpha }\right) ^{\prime }=\mathbf {T}^{\prime }\varvec{\theta }\). Hence, given the new basis and the new penalty, the penalized sum of squares,
$$\begin{aligned} (\mathbf y -\mathbf{B }\varvec{\theta })^\prime (\mathbf y -\mathbf{B }\varvec{\theta })+ \varvec{\theta }^\prime \mathbf{P }\varvec{\theta }, \end{aligned}$$
becomes:
$$\begin{aligned} \left( \mathbf {y}- \mathbf {X}\beta -\mathbf {Z}\varvec{\alpha }\right) ^{\prime }\left( \mathbf {y}- \mathbf {X}\beta -\mathbf {Z}\varvec{\alpha }\right) + \lambda \varvec{\alpha }^{\prime } I_{c-2}\varvec{\alpha }\text {,} \end{aligned}$$
This corresponds to the joint log-likelihood of a linear mixed model:
$$\begin{aligned} \mathbf y =\mathbf{X }\varvec{\beta }+\mathbf{Z }\varvec{\alpha }+\varvec{\epsilon }, \quad \varvec{\epsilon }\sim N(\mathbf 0 , \sigma ^2\mathbf{I }), \quad \varvec{\alpha }\sim \mathcal {N}(\mathbf 0 ,\mathbf{G }), \end{aligned}$$
with \(\mathbf{G }= \sigma _\nu ^2 \mathbf{I }_{c-2}\) and \(\lambda =\sigma ^2/ \sigma _\nu ^2\). Therefore, the smoothing parameters is estimated via the estimation of the variance components in the mixed model.
Appendix B: Mixed model representation of the semiparametric spatio-temporal autoregressive model and parameter estimation
For the sake of simplicity, we assume here that there are no covariates. The inclusion of covariates with a linear or non-linear functional relationship with the response is immediate by augmenting the matrices for fixed and random effects accordingly, as well as the corresponding covariance matrices. We therefore focus on the following model:
$$\begin{aligned} \mathbf y= & {} f_1(\mathbf s _1)+f_2(\mathbf s _2)+f_t(\varvec{\tau })+f_{1,2}(\mathbf s _1,\mathbf s _2)+ f_{1,t}(\mathbf s _1,\varvec{\tau })\nonumber \\&+ f_{2,t}(\mathbf s _2,\varvec{\tau })+f_{1,2,t}(\mathbf s _1, \mathbf s _2,\varvec{\tau })+\rho (\mathbf{W }_N \otimes \mathbf{I }_{T}) \mathbf y + \varvec{\epsilon }\end{aligned}$$
where the errors are assumed to follow a temporal AR(1) process, see (9). In matrix form:
$$\begin{aligned} (\mathbf{A }_N \otimes \mathbf{I }_{T}) \mathbf y = \mathbf{B }\varvec{\theta }+ \varvec{\epsilon }\qquad \varvec{\epsilon }\sim N \left( \mathbf{0},\frac{\sigma ^2}{1-\phi ^2} (\mathbf{I }_N \otimes \varvec{\varOmega }) \right) \qquad \mathbf{A }_N = \mathbf{I }_N - \rho \mathbf{W }_N \end{aligned}$$
The regression matrix of the model above will be the concatenation of B-spline bases for each of the smooth terms in the model:
$$\begin{aligned} \mathbf{B }= [\mathbf 1 \vert \mathbf{B }_{s_1}\vert \mathbf{B }_{s_2}\vert \mathbf{B }_{s_2}\vert \mathbf{B }_{s_1}\Box \mathbf{B }_{s_2}\vert \mathbf{B }_{s_1}\otimes \mathbf{B }_{\tau } \vert \mathbf{B }_{s_2}\otimes \mathbf{B }_{\tau }\vert (\mathbf{B }_{s_1}\Box \mathbf{B }_{s_2})\otimes \mathbf{B }_{\tau },] \end{aligned}$$
where \(\mathbf{B }_{s_1}\), \(\mathbf{B }_{s_2}\) and \(\mathbf{B }_{\tau }\) correspond to the marginal B-spline basis for the spatial coordinates (\(\mathbf s _1, \mathbf s _2\)) and time (\(\varvec{\tau }\)), and \(\Box \) represents the row-wise tensor product defined as:
$$\begin{aligned} \mathbf{B }_{i}\Box \mathbf{B }_{j}=(\mathbf{B }_i \otimes \mathbf 1 _{c_i}^\prime )*( \mathbf 1 _{c_j}^\prime \otimes \mathbf{B }_j), \end{aligned}$$
and \(\mathbf 1 \) is a column vector of ones, \(c_i\) is the rank of \(\mathbf{B }_i\), and \(\otimes \) and \(*\) are the Kronecker and element-wise matrix products, respectively.
The penalty matrix is now block-diagonal with blocks corresponding to the different terms in the model: \(\lambda _i\mathbf{D }_i^\prime \mathbf{D }_i\) for main effects, \(\lambda _{i}\mathbf{D }_i^\prime \mathbf{D }_i \otimes \mathbf{I }_{c_k} + \lambda _{k} \mathbf{I }_{c_i} \otimes \mathbf{D }_k^\prime \mathbf{D }_k\) for the second-order interactions, and \(\lambda _{i}\mathbf{D }_i^\prime \mathbf{D }_i \otimes \mathbf{I }_{c_k} \otimes \mathbf{I }_{c_l}+ \lambda _{k} \mathbf{I }_{c_i} \otimes \mathbf{D }_k^\prime \mathbf{D }_k \otimes \mathbf{I }_{c_j}+ \lambda _l\otimes \mathbf{I }_{c_i} \otimes \mathbf{I }_{c_k} \otimes \mathbf{D }_l^\prime \mathbf{D }_l\) for the three-way interaction.
In this case, several constraints need to be imposed, since the space spanned by any product \(\mathbf{B }_i \otimes \mathbf{B }_j\), contains the space spanned by the marginal bases \(\mathbf{B }_i\) and \(\mathbf{B }_j\). The mixed model reparameterization of this model will automatically provide the necessary constraints. To find that parameterization, a new transformation matrix is needed (again based on the singular value decomposition of the penalty \(\mathbf{P }\)) (see Lee 2010, for details). Then, the model is written as:
$$\begin{aligned}&\left( \mathbf{A}_N \otimes \mathbf{I}_T \right) \mathbf{y} = \mathbf{X} \varvec{\beta }+ \mathbf{Z} \varvec{\alpha }+ \varvec{\epsilon }\\&\varvec{\alpha }\sim N(\mathbf{0},\mathbf{G}) \qquad \varvec{\epsilon }\sim N \left( \mathbf{0},\frac{\sigma ^2}{1-\phi ^2} (\mathbf{I}_N \otimes \varvec{\varOmega }) \right) \nonumber \end{aligned}$$
(12)
with
$$\begin{aligned} \mathbf{X }= & {} \left[ ( \mathbf{X }_{s_1} \Box \mathbf{X }_{s_2}) \otimes \mathbf{X }_{\tau } \right] \\ \mathbf{Z }= & {} \left[ (\mathbf{Z }_{s_1} \Box \mathbf{X }_{s_2}) \otimes \mathbf{X }_{\tau } \vert (\mathbf{X }_{s_1} \Box \mathbf{Z }_{s_2}) \otimes \mathbf{X }_{\tau } \vert (\mathbf{X }_{s_1} \Box \mathbf{X }_{s_2}) \otimes \mathbf{Z }_{\tau } \vert (\mathbf{Z }_{s_1} \Box \mathbf{Z }_{s_2}) \otimes \mathbf{X }_{\tau } \vert \right. \\&\left. (\mathbf{Z }_{s_1} \Box \mathbf{X }_{s_2}) \otimes \mathbf{Z }_{\tau } \vert (\mathbf{X }_{s_1} \Box \mathbf{Z }_{s_2}) \otimes \mathbf{Z }_{\tau } \vert (\mathbf{Z }_{s_1} \Box \mathbf{Z }_{s_2}) \otimes \mathbf{Z }_{\tau } \right] \end{aligned}$$
where \( \mathbf{X }_{k}\), \(\mathbf{Z }_{k}\) \( (k=s_{1},s_{2},\tau \)) are the mixed model matrices obtained for the reparameterization of the marginal basis described in “Appendix A”. The covariance matrix of random effects, \(\mathbf{G }\), is such that:
$$\begin{aligned} \mathbf{G }^{-1} = \text {blockdiag}&\left( \mathbf 0 ,\frac{1}{\sigma _{\nu _1}^2}\varvec{\varLambda }_1,\frac{1}{\sigma _{\nu _2}^2}\varvec{\varLambda }_2,\frac{1}{\sigma _{\nu _3}^2}\varvec{\varLambda }_3,\frac{1}{\sigma _{\nu _{4}}^2} \varvec{\varLambda }_{4} + \frac{1}{\sigma _{\nu _{5}}^2} \varvec{\varLambda }_{5}, \frac{1}{\sigma _{\nu _{6}}^2} \varvec{\varLambda }_{6}+ \frac{1}{\sigma _{\nu _{7}}^2} \varvec{\varLambda }_{7}, \right. \nonumber \\&\left. \frac{1}{\sigma _{\nu _{8}}^2} \varvec{\varLambda }_{8} = \frac{1}{\sigma _{\nu _{9}}^2} \varvec{\varLambda }_{9}, \frac{1}{\sigma _{\nu _{10}}^2} \varvec{\varLambda }_{10} +\frac{1}{\sigma _{\nu _{11}}^2} \varvec{\varLambda }_{11}+\frac{1}{\sigma _{\nu _{12}}^2} \varvec{\varLambda }_{12} \right) \end{aligned}$$
(13)
where
$$\begin{aligned}&\varvec{\varLambda }_1 = \widetilde{\varvec{\varSigma }}_{s_1}, \quad \varvec{\varLambda }_2 = \widetilde{\varvec{\varSigma }}_{s_2}, \quad \varvec{\varLambda }_3 = \widetilde{\varvec{\varSigma }}_{\tau } \nonumber \\&\varvec{\varLambda }_4 = \widetilde{\varvec{\varSigma }}_{s_1} \otimes \mathbf{I }_{c_{s_2}-2}, \quad \varvec{\varLambda }_5 = \mathbf{I }_{c_{s_1}-2} \otimes \widetilde{\varvec{\varSigma }}_{s_2}, \quad \varvec{\varLambda }_6 = \widetilde{\varvec{\varSigma }}_{s_1} \otimes \mathbf{I }_{2} \nonumber \\&\varvec{\varLambda }_7=\mathbf{I }_{c_{s_1}-q_{s_1}} \otimes \mathbf{I }_{2}, \quad \varvec{\varLambda }_8= \widetilde{\varvec{\varSigma }}_{s_2} \otimes \mathbf{I }_{c_t-2} \quad \varvec{\varLambda }_9 =\mathbf{I }_{c_{s_2}-2} \otimes \widetilde{\varvec{\varSigma }}_{\tau } \\&\varvec{\varLambda }_{10} = \widetilde{\varvec{\varSigma }}_{s_1} \otimes \mathbf{I }_{c_{s_2}-2} \otimes \mathbf{I }_{c_{\tau }-2},\quad \varvec{\varLambda }_{11}= \mathbf{I }_{c_{s_1}-2} \otimes \widetilde{\varvec{\varSigma }}_{s_2} \otimes \mathbf{I }_{c_{\tau }-2}, \nonumber \\&\varvec{\varLambda }_{12}=\mathbf{I }_{c_{s_1}-2} \otimes \mathbf{I }_{c_{s_2}-2} \otimes \widetilde{\varvec{\varSigma }}_{\tau } \nonumber \end{aligned}$$
(14)
and \(\widetilde{\varvec{\varSigma }}\) matrices correspond to the non-zero eigenvectors of the singular value decomposition of penalty matrices. It is important to be able to decompose the precision matrix of the random effects as a linear combination over the variance parameters, since this is a necessary condition to apply the SAP algorithm.
B.1: Estimation of the PS-ANOVA-SAR(AR1) model via the SAP algorithm
Fixed and random effects in model (12) are estimated (conditional on the correlation parameters and variance components) using the standard mixed model theory (see Searle et al. 1992):
$$\begin{aligned} \widehat{\varvec{\beta }}&= (\mathbf{X }'\mathbf{V }^{-1}\mathbf{X })^{-1}\mathbf{X }'\mathbf{V }^{-1}(\mathbf{A }_N \otimes \mathbf{I }_{T}) \mathbf y \end{aligned}$$
(15)
$$\begin{aligned} \widehat{\varvec{\alpha }}&= \mathbf{G }\mathbf{Z }'\mathbf{V }^{-1}((\mathbf{A }_N \otimes \mathbf{I }_{T}) \mathbf y -\mathbf{X }\widehat{\varvec{\beta }}), \end{aligned}$$
(16)
where \(\mathbf{V }=\frac{\sigma ^2}{1-\phi ^2} (\mathbf{I }_{N}\otimes \varvec{\varOmega })+\mathbf{Z }\mathbf{G }\mathbf{Z }'\).
Variance components (and, therefore, smoothing parameters), and correlation parameters may be estimated by maximizing the residual log-likelihood (REML) of Patterson and Thompson (1971) (slightly modified by the Kronecker matrix product, \(\mathbf{A }_N \otimes \mathbf{I }_T\)):
$$\begin{aligned} \ell (\sigma _{\nu _i}^2,\sigma ^2,\rho ,\phi )&= -\frac{1}{2}\log |\mathbf{V }| -\frac{1}{2}\log |\mathbf{X }'\mathbf{V }^{-1}\mathbf{X }| \nonumber \\&\quad -\frac{1}{2}\left[ (\mathbf{A }_N \otimes \mathbf{I }_T)\mathbf y \right] '(\mathbf{V }^{-1}-\mathbf{V }^{-1}\mathbf{X }(\mathbf{X }'\mathbf{V }^{-1}\mathbf{X })^{-1}\mathbf{X }'\mathbf{V }^{-1})\left[ (\mathbf{A }_N \otimes \mathbf{I }_T)\mathbf y \right] \nonumber \\&\quad + \log \vert \mathbf{A }_N \otimes \mathbf{I }_T \vert \end{aligned}$$
(17)
where the matrices \(\mathbf{V }\), \(\mathbf{X }\) and \(\mathbf{Z }\) are obtained as described above (if linear and non-linear covariates have been added, \(\mathbf{X }\) and \(\mathbf{Z }\) matrices are augmented in a suitable additive way).
Maximization of this REML function is a very complex numerical problem, specially when the number of variance components/correlation parameters is large. Rodriguez-Alvarez et al. (2015) recently developed an algorithm named SAP (Separation of Anisotropic Penalties), which is based on the fact that the inverse variance-covariance matrix of the random effects, \(\mathbf{G }^{-1}\), is a linear combination of precision matrices. This is the case for the PS-ANOVA-SAR(AR1) model, as we showed in (13). This expression allows us to get closed estimates for all the variance component parameters \(\sigma _{\nu _i}^2\) and \(\sigma ^2\) very efficiently. We have adapted this algorithm to also include the estimation of \(\rho \) and \(\phi \) parameters. The steps for applying the SAP algorithm to optimize (17) can be summarized as follows:
-
1.
Initialization. Set
-
Set \(k=0\)
-
\(\hat{\varvec{\beta }}^{(k)}=\mathbf 0 ; \quad \hat{\varvec{\alpha }}^{(k)}=\mathbf 0 \)
-
\(\hat{\sigma }_{\nu _i}^{2,(k)} = 1 \quad i=1,2,\ldots ,12\)
-
\(\hat{\sigma }^{2,(k)} = \text {var}(\mathbf y )\)
-
\(\hat{\rho }^{(k)} = 0\)
-
2.
Compute \(\hat{\mathbf{G }}^{(k)},\hat{\mathbf{V }}^{(k)},\hat{\mathbf{P }}^{(k)},\hat{\mathbf{A }}_N^{(k)}\) matrices using next expressions:
$$\begin{aligned}&\hat{\mathbf{G }}^{-1,(k)} =\sum _{i=1}^{12}\frac{1}{\hat{\sigma }_{\nu _i}^{2,(k)}}\varvec{\varLambda }_i^{(k)} \\&\hat{\mathbf{V }}^{(k)} = \hat{\sigma }^{2,(k)}\mathbf{I }_{NT}+\mathbf{Z }\hat{\mathbf{G }}^{(k)}\mathbf{Z }\\&\hat{\mathbf{P }}^{(k)} = \hat{\mathbf{V }}^{-1,(k)} - \hat{\mathbf{V }}^{-1,(k)} \mathbf{X }(\mathbf{X }^\prime \hat{\mathbf{V }}^{-1,(k)}\mathbf{X })^{-1} \mathbf{X }^\prime \hat{\mathbf{V }}^{-1,(k)} \\&\hat{\mathbf{A }}_N^{(k)} = \mathbf{I }_N-\hat{\rho }^{(k)}\mathbf{W }_N \end{aligned}$$
-
3.
Compute the estimates:
$$\begin{aligned} \hat{\varvec{\beta }}^{(k)}&= (\mathbf{X }^\prime \hat{\mathbf{V }}^{-1,(k)}\mathbf{X })^{-1} (\mathbf{X }^\prime \hat{\mathbf{V }}^{-1,(k)}\hat{\mathbf{A }}_N^{(k)}\mathbf y ) \\ \hat{\varvec{\alpha }}^{(k)}&= \hat{\mathbf{G }}^{(k)}\mathbf{Z }^\prime \hat{\mathbf{V }}^{-1,(k)}(\hat{\mathbf{A }}_N^{(k)}\mathbf y -\mathbf{X }\hat{\varvec{\beta }}^{(k)}) \\ ed_i^{(k)}&= \text {trace}(\mathbf{Z }^\prime \hat{\mathbf{P }}^{(k)}\mathbf{Z }\hat{\mathbf{G }}^{(k)} \frac{1}{\hat{\sigma }_{\nu _i}^{2,(k)}}\varvec{\varLambda }_i\hat{\mathbf{G }}^{(k)}) \quad i=1,2,\ldots ,12 \end{aligned}$$
where \(\varvec{\varLambda }_i \quad i=1,\ldots ,12\) is defined in (14).
-
4.
Estimate the variance components:
$$\begin{aligned} \hat{\sigma }_{\nu _i}^{2,(k+1)}&= \frac{{\hat{\varvec{\alpha }}^{(k)^\prime }} \varvec{\varLambda }_{i} \hat{\varvec{\alpha }}^{(k)}}{ed_i^{(k)}} \quad i=1,\ldots ,12 \end{aligned}$$
Estimate the variance of the noise as:
$$\begin{aligned} \hat{\sigma }^{2,(k+1)} = \frac{(\hat{\mathbf{A }}_N^{(k)}\mathbf y - \mathbf{X }\hat{\varvec{\beta }}^{(k)} - \mathbf{Z }\hat{\varvec{\alpha }}^{(k)})^\prime (\hat{\mathbf{A }}_N^{(k)}\mathbf y - \mathbf{X }\hat{\varvec{\beta }}^{(k)} - \mathbf{Z }\hat{\varvec{\alpha }}^{(k)})}{N-\sum _i ed_i^{(k)}- \text {rank}(\mathbf{X })-2} \end{aligned}$$
-
5.
Estimate the spatial parameter \(\hat{\rho }^{(k+1)}\) and serial correlation parameter \(\hat{\phi }^{(k+1)}\) solving numerically the non-linear equations obtained by equating to zero the score of REML function with respect to \(\rho \) and \(\phi \) respectively (this additional step is the only difference with respect to the usual SAP algorithm):
$$\begin{aligned} \frac{\partial \ell (.) }{\partial \rho }&= -\frac{1}{2} \left[ 2 \hat{\mathbf{P }}^{(k)} \left( (\mathbf{A }_N \otimes \mathbf{I }_T)\mathbf y \right) \right] ^\prime \left( \frac{\partial (\mathbf{A }_N \otimes \mathbf{I }_T)}{\partial \rho } \mathbf y \right) \\&\quad + \text {trace} \left( (\mathbf{A }_N \otimes \mathbf{I }_T)^{-1} \frac{\partial (\mathbf{A }_N \otimes \mathbf{I }_T)}{\partial \rho } \right) \\&= \mathbf y ^\prime (\mathbf{A }_N \otimes \mathbf{I }_T^\prime ) \hat{\mathbf{P }}^{(k)} (\mathbf{W }_N\otimes \mathbf{I }_T)\mathbf y - T\text {trace}(\mathbf{A }_N^{-1}\mathbf{W }_N) = 0\\ \frac{\partial l(.)}{\partial \phi }&= -\frac{1}{2} \left\{ \text {trace} \left( \mathbf{P }\frac{\partial \mathbf{V }}{\partial \phi } \right) - \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }}\right] ^\prime \mathbf{V }^{-1}\right. \nonumber \\&\quad \times \left. \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }} \right] \right\} = 0 \end{aligned}$$
where:
$$\begin{aligned} \frac{\partial \mathbf{V }}{\partial \phi } = \frac{ \partial \left\{ \mathbf{Z }\mathbf{G }\mathbf{Z }^\prime + \frac{\sigma ^2}{1-\phi ^2} \left( \mathbf{I }_N \otimes \varvec{\varOmega }\right) \right\} }{\partial \phi }= \left( \mathbf{I }_N \otimes \frac{\partial \left[ (\frac{\sigma ^2}{1-\phi ^2})\varvec{\varOmega }\right] }{\partial \phi } \right) \end{aligned}$$
and
-
6.
Set \(k=k+1\) and go to step (2) until convergence.
Once the convergence is obtained, the effective degrees of freedom of the model can be estimated as:
$$\begin{aligned} \text {edf}=\sum _i ed_i^{(k)} + \text {rank}(\mathbf{X })+2 \end{aligned}$$
This quantity is increased by two units with respect to spatio-temporal smooth models because of the need to estimate \(\rho \) and \(\phi \) parameters.
To obtain the covariance matrix of the estimates, we need the hessian matrix of REML function with respect to \(\rho \) and \(\phi \) parameters given by the expressions:
$$\begin{aligned} \frac{\partial ^2 l(.)}{\partial \rho ^2}&= - \mathbf y ^\prime \left( \mathbf{W }_N^\prime \otimes \mathbf{I }_T \right) \mathbf{P }\left( \mathbf{W }_N \otimes \mathbf{I }_T \right) \mathbf y - T\text {trace}\left( (\mathbf{A }_N^{-1}\mathbf{W }_N)^2\right) \\ \frac{\partial ^2 l(.)}{\partial \phi ^2}&= -\frac{1}{2} \left\{ \frac{\partial \text {trace} \left( \mathbf{P }\frac{\partial \mathbf{V }}{\partial \phi } \right) }{\partial \phi } - \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }}\right] ^\prime \right. \nonumber \\&\quad \times \left. \frac{\partial \left( \mathbf{V }^{-1} \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} \right) }{\partial \phi } \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }} \right] \right\} \\ \frac{\partial ^2 l(.)}{\partial \phi \partial \rho }&= \mathbf y ^\prime \left( \mathbf{W }_N^\prime \otimes \mathbf{I }_T \right) \mathbf{V }^{-1} \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} \left[ \left( \mathbf{A }_N \otimes \mathbf{I }_T \right) \mathbf y - \mathbf{X }\hat{\varvec{\beta }} \right] \end{aligned}$$
where:
$$\begin{aligned}&\frac{\partial \text {trace} \left( \mathbf{P }\frac{\partial \mathbf{V }}{\partial \phi } \right) }{\partial \phi } = \text {trace} \left( \frac{\partial (\mathbf{P }\frac{\partial \mathbf{V }}{\partial \phi })}{\partial \phi } \right) =\text {trace} \left( \frac{\partial \mathbf{P }}{\partial \phi } \frac{\partial \mathbf{V }}{\partial \phi } + \mathbf{P }\frac{\partial ^2 \mathbf{V }}{\partial \phi ^2} \right) \\&\frac{\partial \mathbf{V }}{\partial \phi } = \left( \mathbf{I }_N \otimes \frac{\partial \left[ (\frac{\sigma _{\epsilon }^2}{1-\phi ^2})\varvec{\varOmega }\right] }{\partial \phi } \right) \qquad \frac{\partial ^2 \mathbf{V }}{\partial \phi ^2} = \left( \mathbf{I }_N \otimes \frac{\partial ^2 \left[ (\frac{\sigma _{\epsilon }^2}{1-\phi ^2})\varvec{\varOmega }\right] }{\partial \phi ^2} \right) \\&\frac{\partial \mathbf{P }}{\partial \phi } = - \mathbf{V }^{-1} \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} - \left( -\mathbf{V }^{-1}\frac{\partial \mathbf{V }}{\partial \phi }\mathbf{V }^{-1} \mathbf{X }(\mathbf{X }^\prime \mathbf{V }^{-1} \mathbf{X })^{-1} \mathbf{X }^\prime \mathbf{V }^{-1} \right. \\&\quad + \mathbf{V }^{-1} \mathbf{X }(\mathbf{X }^\prime \mathbf{V }^{-1} \mathbf{X })^{-1} \mathbf{X }^\prime \mathbf{V }^{-1} \frac{\partial \mathbf{V }}{\partial \phi } \mathbf{V }^{-1} \mathbf{X }(\mathbf{X }^\prime \mathbf{V }^{-1} \mathbf{X })^{-1} \mathbf{X }^\prime \mathbf{V }^{-1}\\&\quad \left. - \mathbf{V }^{-1} \mathbf{X }(\mathbf{X }^\prime \mathbf{V }^{-1} \mathbf{X })^{-1} \mathbf{X }^\prime \mathbf{V }^{-1}\frac{\partial \mathbf{V }}{\partial \phi }\mathbf{V }^{-1} \right) \end{aligned}$$
These expressions can be evaluated at maximum of REML function to obtain the negative of the hessian matrix. The inverse of this matrix provides the asymptotic covariance matrix in the usual way.
Eventually the covariance matrix of \(\rho \) and \(\phi \), jointly with the covariance matrix of the regression parameters \(\varvec{\beta }\) and \(\varvec{\alpha }\) given by \(Cov(\varvec{\beta },\varvec{\alpha })=\mathbf{C }^{-1}\) (see Sect. 3), can be used to obtain the simulated distributions of total, direct and indirect effects as explained in Sect. 2. As usual, REML estimates are asymptotically unbiased and gaussian distributed.