A simulation experiment was performed to assess the ability of the goodness-of-fit test to detect misspecification of the covariance structure of a model. A total of 30 pairs of longitude and latitude values, \(\varvec{s}_i=(s_{i1}, s_{i2}), \,\, i=1,\ldots , n=30\), were sampled randomly from one of three subsets within the unit square. Within the first subset, \(Q_1\), 5 locations were generated uniformly from the lower left [0, 0.2] \(\times \) [0, 0.2] portion of the unit square. In the second subset, \(Q_2\), 10 locations were uniformly sampled from the lower right [0.8, 1] \(\times \) [0, 0.2] portion of the unit square. Finally, in subset \(Q_3\), 15 locations were uniformly sampled from the entire unit square. The motivation is that the fit of a spatial model can be best tested by comparing its fit in distinct regions (where its local smoothness properties can be evaluated), with its fit to points distributed throughout the domain (where its global features can be evaluated) as mentioned in Jun et al. (2014). Subsets \(Q_1\) and \(Q_2\) provided clusters of locations that allow for the assessment of local model fit, whereas subset \(Q_3\) provides motivation for assessing global model fit. Figure 1 shows the simulated locations and the corresponding subsets.
Three datasets were simulated using the spatio-temporal process defined in Sect. 2. The mean process, \(\varvec{\mu }_t\), was set to zero, to allow for detection of model misspecification through the covariance structure only. Observed data \(\{\varvec{y}_t \}\) were simulated for \(t = 1, \ldots , 5\) time points.
Three variants of the general Matèrn correlation function
$$\begin{aligned} R(\varvec{s}_i, \varvec{s}_j; \nu ,\phi ) = \frac{1}{\Gamma (\nu )2^{\nu - 1}} \bigg (\frac{\sqrt{8\nu }}{\phi }||\varvec{s}_i - \varvec{s}_j|| \bigg )^\nu K_{\nu } \bigg ( \frac{\sqrt{8\nu }}{\phi }||\varvec{s}_i - \varvec{s}_j|| \bigg ), \end{aligned}$$
(11)
with closed form expressions were used to construct the covariance matrix \(\sigma ^2_{\omega }\varvec{R}\). In the equation above, \(\nu >0\) controls the smoothness of the realised random field, \(\phi \) is a spatial scale parameter, \(K_{\nu }\) is a modified Bessel function of order \(\nu \) and \(||\varvec{s}_i - \varvec{s}_j||\) is the Euclidean distance between the locations (Banerjee et al. 2014).
The first variant of \(R(\varvec{s}_i, \varvec{s}_j; \nu ,\phi )\),
$$\begin{aligned} R(\varvec{s}_i,\varvec{s}_j;\phi ) = \exp \bigg (\frac{-||\varvec{s}_i - \varvec{s_j}||}{\phi }\bigg ), \end{aligned}$$
(12)
is the closed form of the Matèrn correlation function, where the smoothness parameter, \(\nu \), is set to 0.5 and is also known as the exponential correlation function. The second variant,
$$\begin{aligned} R(\varvec{s}_i,\varvec{s}_j; \phi ) = \exp \bigg [-\bigg (\frac{||\varvec{s}_i - \varvec{s_j}||}{\phi }\bigg )^2\bigg ], \end{aligned}$$
(13)
is the closed form of the Matèrn correlation function, where the smoothness parameter, \(\nu \rightarrow \infty \), and is known as the Gaussian correlation function. The third variant,
$$\begin{aligned} R(\varvec{s}_i,\varvec{s}_j; \phi ) = s_{i2}s_{j2}\exp \bigg (\frac{-||\varvec{s}_i - \varvec{s_j}||}{\phi }\bigg ), \end{aligned}$$
(14)
is a non-stationary form of the exponential correlation function given by Eq. 12, that allows the correlation between observations separated by distance \(||\varvec{s}_i - \varvec{s_j}||\) to scale by their latitudes, \(s_2\).
The following parameters were chosen to simulate the data \(\varvec{y}_t\). The measurement variance (nugget variance) was set to \(\sigma ^2_{\varepsilon } = 0.0001\). A small value was chosen to focus on identifying an incorrect spatio-temporal covariance structure. Further, we set \(\rho = 0.7\) and \(\sigma ^2_{\omega } = 1\). Finally, we set \(\phi = 0.2\) in Eqs. 12 and 14, and \(\phi = 0.8\) in Eq. 13.
We fitted the spatio-temporal model given in Sect. 2 with the covariance function in Eq. 12 (exponential correlation function) to each of the three datasets. We excluded covariates, with only a single intercept term included in the mean function, such that \(\varvec{\mu }_t = \varvec{1}_{30} \beta \), where \(\varvec{1}_{30}\) is a vector of 1’s. The parameters \(\varvec{\theta } = (\beta , \sigma ^2_{\varepsilon }, \sigma ^2_{\omega }, \phi , \rho )'\) were assumed a priori independent, and were assigned the non-informative prior distributions,
$$\begin{aligned} \beta \sim \text {N}(0, 100), ~~ \sigma ^2_{\varepsilon } \sim \text {IG}(2,1), ~~ \sigma ^2_{\omega } \sim \text {IG}(2,1), ~~ \phi \sim \text {U}(0.001, 2), ~~ \rho \sim \text {U}(-1,1). \end{aligned}$$
Markov chain Monte Carlo (MCMC) was used to fit the model to the data and this was done in R using the package NIMBLE (NIMBLE Development Team 2017). Two chains of 100, 000 iterations each were generated for the parameter vector \(\varvec{\theta } = (\beta , \rho , \phi , \sigma ^2_{\omega },\sigma ^2_{\varepsilon })^T\) for each dataset. The first 90, 000 iterations from each chain were discarded as warm-up, and the remaining draws were combined, resulting in a posterior sample of size \(M=20{,}000\). Convergence of the Markov chain was assessed using traceplots (not provided) and potential scale reduction factor (\({\hat{R}}\)) values. We took a value of \({\hat{R}}>1.1\) to indicate lack of convergence. For each fitted model, pivotal quantities for every posterior sample were calculated inline with Eq. 7. We considered three cases of partitioning to assess the impact it has on testing goodness-of-fit. In the first case, the locations were not partitioned into subsets. Pivotal quantities for each fitted model \(S(\varvec{y}_t, \tilde{\varvec{\theta }}^{(m)})\) for \(t=1, \ldots , 5\) and \(m = 1, \ldots , 20{,}000\) were calculated, combined and ordered. For the second case, the locations were partitioned into \(C = 3\) subsets of \(w=10\). Pivotal quantities for each fitted model \(S(\varvec{y}_{tj}, \tilde{\varvec{\theta }}^{(m)})\) for \(t=1, \ldots , 5\), \(j=1, 2, 3\), and \(m = 1, \ldots , 20{,}000\) were calculated, combined and ordered. Finally, the locations were partitioned into the subsets \(S_1\), \(S_2\), and \(S_3\), that were used to simulate the locations. Pivotal quantities for each fitted model \(S(\varvec{y}_{tj}, \tilde{\varvec{\theta }}^{(m)})\) for \(t=1, \ldots , 5\), \(j=1, 2, 3\), and \(m = 1, \ldots , 20{,}000\) were calculated, combined and ordered.
Table 1 True values and posterior summary statistics of parameters for simulated data Table 1 displays the true parameter values that were used in the simulation of the data, as well as the summary statistics obtained from the posterior draws when the model was applied to the three data sets. \({\hat{R}}\) values for assessing convergence are also shown. The results show that the summary statistics for the model applied to data set 1 are the closest to the true values. The 95% credible intervals include the true values for \(\beta \), \(\phi \), and \(\sigma ^2_{\omega }\).
Table 2 The 10th and 90th percentiles of ordered pivotal discrepancy measures for the model applied to each of the three datasets. These are compared to the nominal 10th and 90th percentiles: 12.76 and 56.33 for the non-subset data; 1.827 and 27.11 for the even subset data; 0.4894 and 31.71 for the uneven subset data Table 2 gives the 10th and 90th percentiles of the aggregated (over time and subset) ordered pivotal discrepancy measures for the model applied to each of the three data sets, in each of the three cases of subsetting. In order to confirm that the model provided a good fit, the quantities need to be within the interval given by the critical values that are calculated from the nominal \(\chi ^2\) distributions assumed when they were calculated.
In the first case, when the locations were not partitioned for calculation of the pivotal quantities, it was found that the 10th and 90th percentiles of ordered pivotal discrepancy quantities were within the corresponding nominal percentiles of (12.76, 56.33). This suggests that the model provided a good fit to each of the three simulated data sets. In the second case, when the locations were partitioned into 3 even subsets to calculate the pivotal quantities, it was found that the 10th and 90th percentiles of the aggregated ordered pivotal discrepancy quantities were within the corresponding nominal percentiles of (1.827 and 27.11) when the model was applied to data set 1. The 10th and 90th percentiles of the aggregated ordered pivotal discrepancy quantities were, however, outside the nominal percentiles when the model was applied to data sets 2 and 3. This suggests that the model provides a good fit only to data set 1. A similar result was observed for the final case, with the locations being partitioned into three unequal subsets. It was found that the 10th and 90th percentiles of the aggregated ordered pivotal discrepancy quantities were within the corresponding nominal percentiles of (0.4894 and 31.71) when the model was applied to data set 1. The 10th and 90th percentiles of the aggregated ordered pivotal discrepancy quantities were, however, outside the nominal percentiles when the model was applied to data sets 2 and 3. This suggests that the model provides a good fit only to data set 1.
We would have expected that the model only provide a good fit to data set 1, because the model used to generate that data set is the same as the one being fitted. This is correctly executed in the two cases of partitioning. In the first case, the model provided a good fit to each data set, because a lack of partitioning caused a decrease in power to detect the differences. This is highlighted in Figs. 2, 3 and 4. In those figures, the pivotal discrepancy quantities from each model applied to each data set in each case of partitioning are plotted as a density, and are overlaid with the nominal densities. We see for each data set that when no partitioning occurs, there is sufficient overlap of the pivotal quantities observed and the nominal densities to suggest the model provides a good fit. This is also the case for the partitioning scenarios for data set 1, but not the case for data sets 2 and 3.
For comparison we calculated the WAIC values when fitting the model with an exponential correlation function to all three datasets. The results in Table 2 show the smallest WAIC value for Dataset 3. This result is counterintuitive, however. Compared to the other two datasets, Dataset 3 has the worst model mis-specification and we would therefore expect it to have the highest WAIC value.