Spatial modeling of day-within-year temperature time series: an examination of daily maximum temperatures in Arag\'on, Spain

Acknowledging a considerable literature on modeling daily temperature data, we propose a multi-level spatio-temporal model which introduces several innovations in order to explain the daily maximum temperature in the summer period over 60 years in a region containing Arag\'on, Spain. The model operates over continuous space but adopts two discrete temporal scales, year and day within year. It captures temporal dependence through autoregression on days within year and also on years. Spatial dependence is captured through spatial process modeling of intercepts, slope coefficients, variances, and autocorrelations. The model is expressed in a form which separates fixed effects from random effects and also separates space, years, and days for each type of effect. Motivated by exploratory data analysis, fixed effects to capture the influence of elevation, seasonality and a linear trend are employed. Pure errors are introduced for years, for locations within years, and for locations at days within years. The performance of the model is checked using a leave-one-out cross-validation. Applications of the model are presented including prediction of the daily temperature series at unobserved or partially observed sites and inference to investigate climate change comparison.


Introduction
Evidence of global warming in the climate system is strong and many of the observed changes since the 1950s are unprecedented, with an estimated anthropogenic increase of 0.2 • C per decade due to past and ongoing emissions (IPCC, 2013(IPCC, , 2018. Climate change raises significant concerns as it may result in health problems and death, degradation of flora and fauna biodiversity, reductions in crop production and increase of pests, etc. In this framework, the analysis of daily maximum temperatures and their long-term trends over time is particularly important due to the strong potential impact on public health (Roldán et al., 2016;Rossati, 2017;Watts et al., 2015), agriculture (Hatfield et al., 2011;Schlenker and Roberts, 2009), and economy (Diffenbaugh and Burke, 2019).
We propose a new multi-level spatio-temporal model to explain the daily maximum temperature in the summer period, in an area containing the Comunidad Autónoma de Aragón in the northeast of Spain. The region includes part of the Ebro Valley in the center, with mountainous areas in the south (Iberian System) and north (Pyrenees). The valley is an extensively irrigated production area with garden crops, fruits and vegetables, as well as rainfed agriculture with cereals, almonds, wine and oil. In the mountainous areas there are some protected natural spaces with extensive forests and a high diversity of landscapes. It is an area of great biodiversity with important water resources for the region. Despite its relatively small size, spatio-temporal modeling of the temperatures in this region is a challenge due to the heterogeneous orography and the climatic variability.
The spatio-temporal model seeks to characterize spatial patterns and detect trends over time in the daily maximum temperature during the summer period. It is specified over continuous space but adopts two discrete units of time, years and days within years. This allows us to model the time evolution of daily maximum temperatures during the summer, omitting the cooler months that are not of interest here. The model introduces temporal dependence using autoregression terms for days within years and also for years. The model separates fixed and random effects in the mean. Fixed effects capture the global mean, the seasonal component across days, the average long-term trend across years, and the influence of elevation. Random effects are employed for the spatial dependence in the intercepts, the slope coefficients, the autoregression coefficients, and the variances of the responses. The two temporal scales allow us to separate space, years, and days within years for each type of effect. Three pure error processes are adopted, one for locations at days within years, one for locations within years, and one for years. The full specification is motivated by exploratory analyses. Altogether, the model provides a better understanding of the temporal evolution of temperatures for the entirety of the region along with the spatial uncertainty linked to those features.
The model is specified in a hierarchical Bayesian framework and estimated using a Markov chain Monte Carlo (MCMC) algorithm. In this framework, posterior predictive distributions for the features of daily maximum temperatures (trends, persistence, mean, variance, etc.) can be readily obtained. In particular, we can obtain posterior predictive samples of the spatial processes and the daily maximum temperature series at unobserved sites. Prediction at unobserved sites is particularly important in Aragón since this region is sparsely monitored due to rural depopulation; there is a lack of observed series in many areas of interest. The model can also be used to impute periods of missing observations in a series.
Space-time modeling of environmental series has received substantial attention in the literature. Sahu et al. (2006) proposed a random effects model for fine particulate matter concentrations in the midwestern United States. Sahu et al. (2007) proposed a space-time hierarchical model for daily 8-hour maximum ozone levels in the state of Ohio. This model includes an autoregressive part for the residuals of the fixed effects, a global annual intercept and a spatially correlated error term. Lemos et al. (2007) modeled monthly water temperature data in a Central California Estuary. They used a Bayesian approach to separate the seasonal cycle, short-term fluctuations, and long-term trends by means of local mixtures of two patterns. With regard to temperature models, Craigmile and Guttorp (2011) built space-time hierarchical Bayesian models using daily mean temperatures in Central Sweden that emphasize modeling trend through a wavelet specification, as well as seasonality, and error that may exhibit space-time long-range dependence. Verdin et al. (2015) modeled maximum and minimum temperature to develop a weather generator using spatial Gaussian processes (GPs), where both temperature models are autoregressive with spatially varying model coefficients and spatial correlation. Li et al. (2020) proposed a three step space-time regression-kriging model for monthly average temperature data. With such data, they first remove seasonality, then they regress the revised data on environmental predictors, and finally they take the resulting residuals and administer spatio-temporal variogram modeling. By contrast, models for daily temperatures take a different approach, seeking to explicitly express short-term persistence of temperature. They employ autoregressive terms, e.g., the one-point model by Mohammadi et al. (2021). A modeling approach very different from our mean specification considers extremes in the daily temperature series and leads to extreme value modeling under the block maxima framework or peaks-over-threshold framework (see, e.g., Reich et al., 2014;Bopp and Shaby, 2017).
The outline of the paper is as follows. An exploratory analysis to motivate the complexity of the model is given in Section 2. Section 3 describes the modeling details, and Section 4 presents a leave-one-out cross-validation (LOOCV) analysis for model comparison as well as some results and applications for the selected model. Section 5 ends the paper with some conclusions and future work. Supplementary Materials accompanying this paper appear online.

Data and exploratory analysis
The point-referenced dataset we use contains 18 daily maximum temperature observational series from AEMET (the Spanish Meteorological Office) around the Comunidad Autónoma de Aragón (see Figure 1). The time series include the daily observations from May to September (MJJAS), corresponding to the extended summer period, and span the period from 1956 to 2015. The region of interest is located in the central portion of the Ebro Basin in the northeastern part of Spain and has an area of 53,279 km 2 , wherein the areas above 500 m and 1,000 m are 32,924 km 2 and 15,195 km 2 respectively. The maximum elevation is roughly 3,400 m in the Pyrenees, 2,600 m in the Iberian System, and between 200 and 400 m in the Central Valley. Most of the area is characterized by a Mediterranean-Continental dry climate with irregular rainfall and a large temperature range. However, climate differences can be distinguished by elevation and the influence from the Mediterranean Sea in the east as well as the continental conditions of the Iberian Central Plateau in the southwest (AEMET, 2011).
We summarize an extensive exploratory data analysis of the daily maximum temperature series that helps us establish the covariates and spatio-temporal structures that are candidates for inclusion in the model. The top plots in Figure 2 show the variability in temperature characteristics and the influence of elevation on them. The two plots on the left show the mean and the standard deviation of temperature at each site against elevation. The mean temperature shows an approximately linear decreasing relation with elevation, varying from almost 30 to 18 • C. However, there exist other influential factors, e.g., Sallent in the north and Tornos in the south have both an elevation around 1,000 m but a quite lower mean temperature is observed for the latter (see Table S1 in Supplementary Materials).
The bottom plots in Figure 2 summarize the mean and standard deviation from data corresponding to a month in MJJAS for the 18 sites in the periods 1956-1985 and 1986-2015; the summary measures are calculated in 30-year periods following the recommendation of the WMO (2017). The seasonal pattern for all of the series is quite similar, i.e., the maximum mean temperature is observed in July and the minimum in May, with a difference of around 7 • C between them. The range of the mean temperatures among sites is around 10 • C, so the spatial variability of the mean is a bit higher than the variability at each site within the summer. The mean of the set of standard deviations is slightly higher than 4 • C. However, relevant spatial differences are observed with a range of values around 1.5 • C. Temporal variability is lower within the summer.
To explore the effect of global warming in the region, the changes between 1956-1985 and 1986-2015 periods, expressed as differences for the means and quotients for the standard deviations, are also shown on the bottom-right plot in Figure 2 and Table S1. The mean temperature in 1986-2015 has increased from 1956-1985 by roughly 1 • C, with a slightly smaller increase in the northeastern sites. The increase of the mean temperature is observed in May, June, August and, except for three sites, in July. No relevant change in the seasonal

Quotient of Sd
May June July Aug Sept Figure 2: Top: Mean value, standard deviation, annual time trend, and serial correlation against elevation for the daily maximum temperature series at the 18 sites. Bottom: Mean value and standard deviation of the series in both 30-year periods, 1956-1985 and 1986-2015, and the change between them, expressed as differences for the mean and quotients for the standard deviation.
pattern is observed. The spatial variability in the two periods is similar. As for the standard deviations, no evidence of temporal change is observed, with all of the quotients between the two periods being approximately one. The two plots on the top-right in Figure 2 summarize an exploratory analysis of the behavior of the time series over time. The first shows the slope regressed against year (expressed in • C per decade), fitted by ordinary least squares to the daily maximum temperature series in each site. Clear differences are observed in the 18 fitted trends, suggesting the need to include a spatial random effect to reflect this feature. The variability in the trends does not seem to be related to the elevation. The last plot shows the serial correlation in the temperature series. A strong correlation, higher than 0.72, is observed for all the sites but with spatial differences. The strong autocorrelation is probably caused by a persistent anticyclonic situation that tends to affect the Iberian Peninsula in the summer. Sites with a higher elevation seem to show a slightly higher persistence.
As an additional exploratory analysis, 18 hierarchical temporal models were fitted, one for each of the available sites. These local models, which are summarised in Section S1.1 of the Supplementary Materials, are useful to identify the time structures required for the temperature series and to evaluate the spatial variability of the fitted terms. The results motivate the introduction of spatially varying intercepts, trends, autoregression coefficients, and variances for the spatial variability in the model.

The Model
We propose a multi-level (i.e., hierarchical) full mean model for daily maximum temperatures that operates over continuous space and two discrete temporal scales. It captures temporal dependence through autoregression on days within year and on years. It captures spatial dependence through spatial process modeling of intercepts, slope coefficients, variances, and autocorrelations. We detail this model below and then discuss model fitting, prediction under the model, and model comparison.

Model construction
Let Y t (s) denote the daily maximum temperature for day , = 2, . . . , L of year t, t = 1, . . . , T at location s ∈ D, where D is our study region. Here, for all years, = 1 corresponds to May 1 and L = 153 corresponds to September 30. It is convenient to express the full model in a form which separates fixed effects from random effects and also carefully separates space, years, and days for each type of effect. Specifically, we model daily maximum temperature for day , year t, and location s by (1) Here, µ t (s; θ f ) denotes the fixed effects component and γ t (s) the random effects component. We specify µ t (s; θ f ) = β 0 + αt + β 1 sin(2π /365) + β 2 cos(2π /365) + β 3 elev(s) in which β 0 is a global intercept, α is a global linear trend coefficient, the sin and cos terms are introduced to provide an annual seasonal component, and elev(s) is the elevation at s. We denote these fixed effect parameters by θ f = (β 0 , α, β 1 , β 2 , β 3 ). We specify γ t (s) = β 0 (s) + α(s)t + ψ t + η t (s).
In (3), ψ t follows an AR(1) specification, i.e., ψ t = ρ ψ ψ t−1 + λ t , providing an autoregression in years for annual intercepts. This autoregression could help to capture factors yielding correlation across years such as the influence of variation in solar activity on the earth's surface temperature or the El Niño Southern Oscillation. However, in Section 3.2, we discover that ρ ψ is not significantly different from 0. We still need the ψ's in the model to address the fact that some years are warmer or colder than others but we do not need to specify them autoregressively. We denote the variance for this component by σ 2 λ . Continuing, β 0 (s) is a mean-zero GP with an exponential covariance function having variance parameter σ 2 β 0 and decay parameter φ β 0 , and α(s) is a mean-zero GP with an exponential covariance function having variance parameter σ 2 α and decay parameter φ α . Thus, β 0 (s) provides local spatial adjustment to the intercept and α(s) provides local slope adjustment to the linear trend. Due to the simplicity of linear time trends they are often used in climate studies (IPCC, 2013). Here, they provide an extremely flexible, locally linear baseline specification. Further, we add local space-time varying random effects, η t (s), to provide adjustment to this baseline. We collect the random effects parameters into θ r = (ρ ψ , σ 2 λ , σ 2 β 0 , φ β 0 , σ 2 α , φ α ). The entire specification is supplied distributionally in the form of a multi-level hierarchical model as As a result, we have introduced three pure error terms: λ t iid ∼ N (0, σ 2 λ ) at yearly scale, η t (s) iid ∼ N (0, σ 2 η ) at sites within years, and ∼ N (0, σ 2 (s)) at sites for days within years. Additionally, ρ Y (s) and σ 2 (s) are, respectively, a spatially varying autoregressive term and a spatially varying variance at location s, both of which are assumed constant over days and years. We model log , and log{σ 2 (s)} = Z σ 2 (s) ∼ GP (Z σ 2 , C(·; σ 2 σ 2 , φ σ 2 )), again with exponential covariance functions. Motivation for adopting spatially varying specifications for these terms arises from exploratory data analysis at the level of the individual sites. That is, suppose we fit the model above but ignore spatial structure and treat the sites as conditionally independent. We show in Section S1.1 of the Supplementary Materials that the assumptions of constant autoregression coefficients and constant variances over the region do not seem justified.
All of the components considered in the full model and their relationships are depicted in the graphical model in Figure 3. This diagram, perhaps, reveals the complexity of the full model more readily than through Equations (1) to (4). The reader might wonder if the GPs above are independent. We investigated dependence between the intercept and slope GPs using the following coregionalization (Banerjee et al., 2014, Chapter 9). Suppose v 1 (s) and v 2 (s) are independent GPs with zero mean and unit variance whose exponential covariance functions have decay parameters φ 1 and φ 2 , respectively. In the full model we insert β 0 (s) = a 11 v 1 (s) and α(s) = a 21 v 1 (s) + a 22 v 2 (s). Here, we let a 11 and a 22 each have a half (or folded) Gaussian prior while a 21 has a regular Gaussian prior. The parameter a 21 captures the dependence between the two processes. That is, the induced covariance between β 0 (s) and α(s) is a 21 a 11 . We care whether a 21 is significantly different from zero with little interest in exactly what the correlation is. Under the model above, the posterior distribution of a 21 was centered at zero with wide credible intervals. So, this dependence was not included in the final model for which we present the inference.
Returning to the full model, notice that we have separated the fixed effects according subscripts t, , and s. As for γ t (s), we can see that it has a spatially varying intercept, a spatially varying coefficient for drift, and an AR(1) model for years. Also, γ t (s) has both space and time dependence and, in fact, we can readily calculate cov(γ t (s), γ t+h (s )). Under independence of the intercept and slope processes, the equilibrium covariance becomes Finally, special cases of interest include: β 0 (s) = 0 implies a constant intercept over space, α(s) = 0 implies a constant linear drift over space, ρ ψ = 0 implies no yearly autoregression. These assumptions merely revise the form of γ t (s). We might consider conditioning on a longer history of maximum temperatures. We experimented with introducing additional lags in the modeling but we found no gain in predictive performance. We could also consider additional fixed effects, e.g., longitude, latitude or distance to coast, or even adding interactions, e.g., t × elev(s). However, the exploratory analysis did not reveal a relationship between daily temperatures and these fixed effects, so they were not introduced in the full model.

Model fitting
Model inference is implemented in a Bayesian framework, requiring prior distributions for each of the model parameters. In general, diffuse and, when available, conjugate prior distributions are chosen. Recall that the model adopts a conditional Gaussian distribution for all Y t (s)'s. Thus it is appropriate to assign each of the coefficient parameters β 0 , α, β 1 , β 2 and β 3 , independent and diffuse Gaussian prior distributions with mean 0 and standard deviation 100. The variance parameters, σ 2 λ and σ 2 η , are assigned independent Inverse-Gamma(2, 1) prior distributions. In preliminary analyses, the autoregresive term between years, ρ ψ , was assigned a non-informative Uniform(−1, 1) prior distribution. As its posterior distribution was centered at zero with wide credible intervals, we set the parameter at ρ ψ = 0. For identifiability, the random effect for the first year, ψ 1 , is fixed to zero.
Hyperpriors are assigned to the mean of both Z ρ Y (s) and Z σ 2 (s). That is, Z ρ Y and Z σ 2 are given a Gaussian prior distribution with mean 0 and standard deviation 100 and 1, respectively. The variance parameter for each of the four spatial covariance functions, σ 2 β 0 , σ 2 α , σ 2 ρ Y and σ 2 σ 2 , is assigned an independent Inverse-Gamma(2, 1) prior distribution. Preliminary analyses with a discrete uniform prior distribution for each of the spatial decay parameters indicated that these parameters almost always placed most mass on the smallest decay value. Due to the fact that, with an exponential covariance function, the variance and the decay parameter cannot be individually identified (Zhang, 2004), and the decay parameter where d max is the maximum distance between any pair of spatial locations.
MCMC is used to obtain samples from the joint posterior distribution. The sampling algorithm is a Metropolis-within-Gibbs version. Since we only have 18 sites, we fit the model without marginalization over the spatial random effects. Also, we introduceβ 0 (s) = β 0 +β 0 (s) andα(s) = α+α(s) within γ t (s) for the fitting to enable the benefits of hierarchical centering in the model fitting (Gelfand et al., 1995). Details of the MCMC used for the model fitting are provided in Section S2.1 of the Supplementary Materials. All the covariates have been centered and scaled to have zero mean and standard deviation one to improve the mixing behavior of the algorithm.

Spatial and spatio-temporal prediction
Under the full model, prediction at location s 0 , day , and year t is based on the posterior predictive distribution of Y t (s 0 ) arising from the full model. Here, s 0 may correspond to a fully observed location (held out for validation), a partially observed location (for completion of a record), or a new location in D. Our goal is not forecasting, so we restrict ourselves to the observed time period = 2, . . . , L and t = 1, . . . , T . Within the Bayesian framework, the posterior predictive distribution for Y t (s 0 ) is obtained by integrating over the parameters with respect to the joint posterior distribution. The formal expression for the posterior pre- where Y is the observed data, is given in Section S2.2 of the Supplementary Materials. Customarily, the distribution is obtained empirically through posterior samples. That is, with MCMC algorithms, samples of the posterior parameters are used to obtain posterior predictions of observations, so-called composition sampling (see Banerjee et al., 2014, Chapter 6; and Section S2.2 for the details).

Model evaluation
For model assessment, a LOOCV is carried out to compare the spatial predictive performance of the models. The full model considered includes four spatial GPs. To validate that model as well as the importance of the considered GPs, reduced models incorporating 0, 1, 2, or 3 GPs are fitted. Models are presented explicitly in Section 4.1 where we further clarify that removing particular terms allows explicit interpretation of the resulting reduced models.
Results from Section 4.1 favors the full model and so, results for this model are presented subsequently. However, several of the reduced models yield essentially equivalent global performance, though the fit at some sites is poorer. We attempt to clarify why this might be expected but also show that each set of random effects reveals differences across sites, further encouraging us to retain them in the inference presentation.
For each location in the hold-out set, the entire time series of daily maximum temperatures is withheld during model fitting. Then, for location s i , we conduct our model comparison through the following metrics: (i) root mean square error (RMSE), (ii) mean absolute error (MAE), (iii) continuous ranked probability score (CRPS; Gneiting and Raftery, 2007), and (iv) coverage (CVG). By definition, i.e., the 5th and 95th percentiles of the MCMC samples Y is the indicator function. The smaller the RMSE, MAE and CRPS values, the better the model performance. However, the target for CVG is proximity to 0.90.

Results
We summarize, using LOOCV, the comparison of models with differing inclusion of the foregoing spatial GPs. Each model was fitted to the daily maximum temperature series in months MJJAS for the 60 years from 1956 to 2015. Then, we present the results for the fitting of the full model over the study region.
In the MCMC fitting we ran 10 chains, with 200,000 iterations for each chain, to obtain samples from the joint posterior distribution. The first 100,000 samples were discarded as burn-in and the remaining 100,000 samples were thinned to retain 100 samples from each chain for posterior inference. MCMC diagnostics for the full model are shown in Section S2.3 of the Supplementary Materials.

Validation and model comparison
The full model considered includes four spatial GPs. To compare models and assess the importance of the proposed GPs, simpler models incorporating 0, 1, 2, or 3 GPs are fitted. M p with p = 0, 1, . . . , 4 denotes a model including p spatial processes that are specified in parentheses. For example, M 1 (β 0 (s)) is the model with a single spatial process for the intercept; for simplicity the full model is denoted M 4 .
Using the criteria in Section 3.4 with LOOCV for each of the 18 available locations, Table 1 summarizes the averages across sites for the four metrics. The strongest improvement in predictive performance is obtained by adding a spatially varying intercept process, i.e., M 1 (β 0 (s)). The inclusion of the other GPs does not yield a clear improvement in performance. This is not surprising, since the GP for intercepts explicitly rewards predicting the mean and random realizations well in order to agree with the held-out values. However, the usefulness of the other GPs with regard to effectively capturing autocorrelations and variances at the observed sites will be seen in Section 4.2. Table S4 in the Supplementary Materials provides details, by site, for the metrics in Table 1. The locations with poorest fit for all of the models are Pamplona and Tornos, the only ones with CRPS greater than 3. They also show large RMSE and MAE as well as poor CVG. For the other locations, the CVG of all the models is closer to the nominal value 0.90. In particular, M 4 not only has the best CVG on average, but the variability of the CVG i 's with respect to the nominal 0.90 is the lowest of all the models.

Results for the full model
Here, we show fitted and prediction results for the full model, M 4 , and demonstrate the need to include the four GPs. The parameters α, β 1 , β 2 , β 3 , α(s), and σ α have been rescaled to interpret them in terms of the original measure of the covariates. Table 2 summarizes the posterior mean and credible intervals of the model parameters, including standard deviation of random effects. The harmonic coefficients β 1 and β 2 indicate the strong seasonality in the temperature series. The coefficient β 3 supplies the gradient of temperature corresponding to elevation, approximately −7 • C per 1,000 m. This value agrees with the exploratory analysis in Section 2, and the average environmental lapse rate (Navarro-Serrano et al., 2018). The linear trend coefficient, α, indicates that the average increase in temperature is 0.21 • C per decade. Peña-Angulo et al. (2021) found a similar trend (0.27 • C per decade) in the summer maximum temperature in Spain . The posterior mean of the autoregresive spatial process, ρ Y , confirms the strong serial correlation of daily temperatures.
The other parameters are standard deviations linked to the spatio-temporal effects of the model. The posterior mean of σ , the mean of the spatially varying standard deviations of the pure error process (Y ) t (s), is close to 3 • C. This value doubles the posterior mean of σ β 0 which represents the spatial variability of the mean level β 0 (s), and triples the posterior mean of σ λ , linked to the variability of the yearly random effects ψ t . The magnitude of the remaining standard deviation parameters is smaller.
With ρ ψ = 0, the yearly random effects, ψ t , are, a priori, distributed as N (0, σ 2 λ ). The posteriors are summarized using boxplots in Figure 4. It is observed that the effects may add or subtract in a given year up to roughly 2.5 • C, with a standard deviation close to 1 • C. These yearly random effects are able to capture historical events like the extremely cold summer of 1977 in Spain or the European heat wave in 2003 (Peña-Angulo et al., 2021).
The posterior distributions at the observed locations of the four spatial processes in M 4 , β 0 (s),α(s), ρ Y (s), and σ (s), are summarized in Figure 5 using boxplots. The boxplots of the locations are sorted from the lowest to the highest elevation in the horizontal axis. They confirm the need to consider the four GPs to represent the great climatic variability of the region under study. To show the spatial behavior of the spatial processes over the entire region, maps of their posterior means, obtained by a model-based Bayesian kriging, are presented in Figure 6. In Section S3.2 of the Supplementary Materials, the parameters of M 4 are compared with the parameters of the local models described in Section S1.1, and  Figures 5 and 6 correspond toβ 0 (s). The posterior distributions for most of the locations show remarkable differences. In particular,β 0 (s) has a clear climatic interpretation. The spatial adjustments provided by this GP help to improve the fit for the two areas with a similar elevation around 1,000 m but different climates. These areas are the southwest and the north of the region. The former has a warmer climate than the latter, whose climate is influenced by the proximity of the Atlantic Ocean.
With regard to the spatially varying yearly linear trend,α(s), the top-right plots in Figures 5 and 6 reveal clear spatial differences in the warming trend. The posterior distributions for higher locations and for the Central Valley are shifted with respect to others. Most of the area shows warming trends, except some areas in the northwest, e.g., Yesa or Ansó, whose posterior distributions are centered at zero.
The spatial process for the autoregressive term, ρ Y (s), is clearly necessary in the model. The bottom-left plot in Figure 5 shows that the posterior distributions for the 18 locations differ substantially. The posterior means of the ρ Y (s) are positive in all locations and their values seem to have an increasing relation with the elevation. According to the bottom-left plot in Figure 6, the posterior mean is also related to cierzo, a severe northwesterly cold wind that gives rise to a renewal of the atmospheric condition with less warm air masses. This wind reduces the persistence of the temperature and therefore the dependence with respect to the previous day. In the areas affected by cierzo, the mean is around 0.65, lower than the posterior mean of the mean of ρ Y (s), close to 0.7.
The need for the σ (s) process is also clear. The bottom-right plot in Figure 5 reveals strong differences among the posterior distributions of the standard deviations across locations. The high variability of Pamplona, Yesa and Tornos stands out. The bottom-right plot in Figure 6 confirms the spatial variability of the standard deviation and shows that higher standard deviations are observed in the western part of the region.

Prediction at unobserved locations
Now, we illustrate the use of the full model for prediction at three unobserved sites in the region: Longares (530 m), Olite (390 m), and Guara (800 m). The new sites are marked in red in Figure 1 and represent areas with different environmental and climatic characteristics. Longares is located in the southern half of the region in a rainfed agricultural area dedicated to the production of wine. Vines are seriously affected by global warming since high temperatures lead to both a decrease in production and a premature ripening of the grapes. Olite is located in a rural area in the northwest where smaller increases in the temperature have been observed; an incomplete series of observed values is available at this site. Guara is an uninhabited area in the Natural Park Sierra and Cañones de Guara. The prediction of the temperature evolution in this area is essential to better understand the changes that have been observed in the ecosystem of the Natural Park. We use the model to impute missing values in an observed series using the posterior predictive distribution. Daily temperatures in Olite are available in the AEMET database from 1968 to 2007, although with many missing observations. As an example, Figure 7 shows the plot of the observed series and the posterior predictive means with 90% credible intervals for MJJAS days in 1968 and, as a summary, the plot of the observed and the posterior yearly averages with 90% credible intervals. The 90% CVG in the observed data is 92.0%. The agreement between the observed and the predicted data confirms that M 4 can be used effectively to impute missing values in Olite.
The posterior distribution of the four spatial processesβ 0 (s),α(s), ρ Y (s), and σ (s) for the three predicted locations are shown in Figure S5     M 4 is also used to evaluate the change over time of the temperature in the three predicted sites, using the posterior predictive distribution of the difference between the average in the 30-year periods 1956-1985 and 1986-2015 (see Figure S6 in Supplementary Materials). Despite the difference in elevation, the posterior mean of the increment is similar in Longares and Guara, around 1.4 • C, while in Olite it is smaller, 0.5 • C and its 90% credible interval (−0.010, 1.028) contains zero. The posterior probability that the mean in 1986-2015 is higher than in 1956-1985 is 0.94 in Olite and essentially 1 in Longares and Guara.

Summary and future work
We have proposed a very rich space-time mean model for daily maximum temperatures, fitted over a sixty year period for a region in Spain. Our specification is continuous in space and autoregressive in time. In time, autoregression was examined annually and also daily for the summer season within each year. We find novel spatial structure including spatially varying intercepts and trend coefficients as well as spatially varying autoregression coefficients and variances.
The proposed modeling can be adapted to other regions, perhaps considering other geographical covariates such as latitude, longitude or distance to the sea. Also, the modeling can omit spatial processes that are not necessary, e.g., avoiding ρ Y (s) in a more homogeneous region with a lower variation in elevation. The modeling might also be adapted to other response variables in spatio-temporal problems, such as daily minimum temperature and other environmental variables including daily evapotranspiration or hourly temperature in the sea. The flexible autoregression terms can express behavior in series where serial correlation is an important source of variation.
A limitation of the present analysis is that we have only 18 monitoring stations so that learning about the spatial surfaces in our modeling is less than we would want. Despite this small number of sites, the model has been able to capture the climate variability of the region under study. The spatial random effects identify areas with a different mean temperature level, but also areas where the observed warming over time shows a different trend, areas where temperature is more persistent (i.e., with a stronger daily serial correlation) or with different variability. The capacity of the fitted model to impute temperature over the entire region allows us to obtain reliable predictions and credible intervals for daily temperature series at unobserved sites. This can be valuable for economical, agricultural or environmental reasons.
Future work will consider different regions providing more available spatial locations n. However, the O(n 3 ) computational complexity of inverting a n × n covariance matrix can be prohibitive for implementing the above model for data with large n. Reduced rank approximations to GPs may be used to address this computation bottleneck, e.g., Gaussian predictive process (Banerjee et al., 2008) or nearest-neighbour GP (Datta et al., 2016). As a different challenge, one may wonder whether the low trend values (blue region) in the top-right plot in Figure 6 are actually meaningful. Future work could implement a version of a spatially dependent multiple testing analysis (Risser et al., 2019) given the posterior draws ofα(s). A different future direction will move away from mean modeling to quantile modeling in order to investigate extremes of temperature, both hot and cold. This will lead to novel development for spatio-temporal quantile regression.  Table S1 shows the elevation in meters for each site. It also summarizes the differences in mean value and standard deviation of the daily maximum temperature in degrees Celsius for each site between both 30-year periods, 1956-1985 and 1986-2015.

S1.1 The local model
In order to motivate our spatial modeling decisions in Section 3 of the Main Manuscript, in this section we fit independent local models for each location following the steps in that section. However, here we do not center or scale the covariates. The local model for day , year t at any location simplifies the full model as where the fixed effects are µ t = β 0 + αt + β 1 sin(2π /365) + β 2 cos(2π /365).
We consider ψ t iid ∼ N (0, σ 2 λ ) and The interpretation of the model terms is equivalent to that given for the full model in the Main Manuscript, so we do not repeat it here.
The parameters are shown in Figure S1 and the seasonal pattern is summarized in S2. The first figure shows the posterior mean and 90% credible interval for β 0 , α, ρ Y , σ , for the independently fitted models at each location. Significant differences between locations are observed for the four parameters. The parameter σ λ did not show a remarkable difference between most locations (not shown).
It is clear that β 0 is related with elevation, i.e., the temperature is inversely proportional to the elevation of each location. However, this relationship shows considerable noise due to the specific climatic conditions in the southern region, which has stations with an elevation Table S1: Elevation, and mean value and standard deviation of the daily maximum temperature for each site in the months MJJAS for the 30-year periods, 1956-1985 and 1986-2015. Difference between means (∆ mean) and quotient of standard deviations (Q sd) of each period. 1956-19851986 Elevation close to 1,000 m with a significantly warmer temperature than stations in the north region with a similar elevation. The spatial variability of α suggests that there is a different warming between locations, that does not seem to be related to elevation. The ρ Y parameter and the elevation are clearly related, but the variability around the linear relationship cannot be captured by a fixed effect. Finally, the posterior mean of σ is quite different between locations with narrow credible intervals, showing a spatial variability of this parameter throughout the region without any relation to elevation. The left plot in Figure S2 shows the empirical seasonal pattern at each location. In particular, the mean value over the years is drawn for each day of MJJAS and each location. The right plot shows the posterior mean of β 1 sin(2π /365)+β 2 cos(2π /365) for = 1, . . . , L. In both plots the patterns have been centered to compare them between locations. In conclusion, the climate of Aragón shows a common and unimodal seasonal pattern in MJJAS. In particular, the seasonal component can be characterized by a single harmonic for the entire region.
In summary, this local modeling is useful to find spatial differences and similarities between the full model parameters for different points in space. Furthermore, the inclusion of spatial random effects in the model associated with the intercept, the linear trend, the autoregression coefficient and the variance is justified by these results.  S2 Sampling methods

S2.1 Gibbs sampling algorithm
The Bayesian spatio-temporal model can be represented in a hierarchical structure, following Gelfand (2012), we specify distributions for data, process and parameters in three stages, First stage: data | process, parameters Second stage: process | parameters Third stage: (hyper)parameters .
Although the hierarchical model can be flattened by suitable marginalization, the advantage of the hierarchical structure lies in convenience of specification, ease of interpretation and facilitation of model fitting. In particular, since the model is Gaussian and linear, the Gibbs sampler is expected to be well behaved and convergence to be fairly quick before 50,000 iterations. The hierarchical form leads to the following joint distribution for data, processes and parameters, S1) provided one starts any year t with the observed Y t1 (s).
Defining notation that will be used to shorten the expressions, we denote the elements of the correlation matrices by r , where here · is any of β 1 , β 2 , β 3 , γ t (s i ) and represents that µ t (s i ; θ f ) + γ t (s i ) does not have that component. Finally, we shorten sin = sin(2π /365) and cos = cos(2π /365). And a and b denote chosen hyperpriors in each case.
The Gibbs sampler algorithm for Equation S1 is initialized giving initial values to all the parameters. Then, updating from iteration b to b + 1 consists of drawing a sample from the following full conditional distributions: • The full conditional distributions of β 0 , α, β 1 , β 2 , β 3 , Z ρ Y , Z σ 2 are all Gaussian, in particular • The full conditional distribution of ρ ψ is a truncated Gaussian distribution within the interval (a, b).
• The full conditional distributions for σ 2 λ , σ 2 η , σ 2 β 0 , σ 2 α , σ 2 ρ Y , σ 2 σ 2 are all inverse gamma as follows, the simplest solution is to fix the parameter at some reasonable value. An alternative is to discretize the support to say m between 10 to 20 values, obtain and store the collection of n × n matrices, i.e., inverses and determinants, and then make discrete updates from the following full conditionals.
• The full conditionals for theβ 0 (s i )'s andα(s i )'s are Gaussian, coming from the joint multivariate Gaussian distributions ofβ 0 (s) andα(s) respectively, and the part of the random effects. Note that we consider the hierarchical centering of these random effects to improve convergence behavior. For i = 1, . . . , n, the full conditionals are • The full conditional distributions of the Z ρ Y (s i )'s and Z σ 2 (s i )'s are non-standard. To draw samples from them, we suggest a random walk Metropolis-Hastings algorithm with Gaussian distribution proposals with the mean at the current parameter value. According to Gelman et al. (1996), the variance of the proposals should be tuned until the acceptance rate is between 15% and 40%. For i = 1, . . . , n, the full conditionals are proportional to • We obtain the Gaussian full conditionals for the ψ's as follows. For identifiability, ψ 1 is fixed to zero. Then, two cases are considered: (i) when t = 2, . . . , T − 1, and (ii) when t = T . Then, respectively • Finally, the full conditionals for the γ t (s i )'s are all Gaussian. For i = 1, . . . , n and t = 1, . . . , T , Note in the expressions above that the product of Gaussian densities is proportional to a Gaussian density with parameters as follows

S2.2 Composition sampling algorithm and Bayesian kriging
Following the notation of Section 3.3 of the Main Manuscript. Once samples of the join posterior distribution have been obtained using the Gibbs sampler algorithm in Section S2.1, one may want to make spatial or space-time predictions. Formally, the posterior predictive distribution for Y t (s 0 ) is where Y denotes the observed data and θ = (θ f , θ r , Z ρ Y , Z σ 2 , σ 2 η , σ 2 ρ Y , σ 2 σ 2 , φ ρ Y , φ σ 2 ) all the parameters. Note that prediction at a new location would require additional modeling for Y t 1 (s 0 ), although no additional complications arise. The simplest solution is to obtain these values by ordinary kriging.
The samples from the Gibbs sampler are used to obtain samples from the posterior predictive distribution. First, a random sample is drawn from the posterior distribution using the details in the Gibbs sampler algorithm described in Section S2.1. Then, from each GP, say W (s), Bayesian kriging is used to draw a sample from the conditional distribution of W (s 0 ) given {W (s i )}, details are given in the paragraph below. A sample of γ t (s 0 ) is drawn from γ t (s 0 ) | · · · ∼ N β 0 (s 0 ) +α(s 0 )t + ψ t , σ 2 η .
Finally, a sample Y t (s 0 ), = 2, . . . , , is drawn sequentially from the top level model The details of the Bayesian kriging are as follows. In particular, we are interested in predicting the state of a GP, W (s), at a new location s 0 . The joint distribution for s ∈ {s 0 , s 1 , . . . , s n } is a multivariate Gaussian distribution arising from the GP for W (s), i.e., Therefore, the conditional distribution of the process at s 0 is from which we would draw a sample. In particular, to obtain a sample for ρ Y (s 0 ) or σ 2 (s 0 ), it is enough to apply to the samples of their associated GPs the inverse function of the transformation applied to them.

S2.3 MCMC convergence diagnostics
In the MCMC fitting we run 10 chains and 200,000 iterations on each chain to obtain samples from the joint posterior distribution. The first 100,000 samples were discarded as burn-in and the remaining 100,000 samples were thinned (i) to retain 1,000 samples from each chain for computing the estimated potential scale reduction factor (R; Gelman and Rubin, 1992), and (ii) to retain 100 samples from each chain for computing the effective sample size (ESS; Gong and Flegal, 2016) out of 1,000 samples and showing trace plots. The samples from (ii) were used for posterior inference. We check the convergence and mixing of the MCMC algorithm for the full model based on diagnostics and trace plots for all the parameters, although we do not show the individual results for ψ's and γ's (a total of T × (n + 1) parameters). Tables S2 and S3 show the ESS and theR for the main parameters and the GPs at the observed locations, respectively. The best ESS should be close to the actual sample size of 1,000, although an ESS of around 200 is considered sufficient. On the other hand, if the 10 chains have converged to the target posterior distribution, thenR should be close to 1. In particular, ifR < 1.2 for all model parameters, one can be confident that convergence has been reached. The ESS is around 1,000 in most parameters, but it is particularly small for β 3 and β 0 (s i )'s due to the high correlation between them, although their ESS is sufficient. TheR is below 1.2 for all the parameters in the tables, ψ's and γ's (not shown), which suggests the adequate convergence of the chains. Figure S3 shows the trace plots of all 1,000 samples of the parameters. 1,000 1.01 α 1,000 1.00 β 1 1,000 1.00 β 2 1,000 1.00 β 3 226 1.12 ρ Y 1,000 1.00 σ 2 1,000 1.00 σ 2 η 1,000 1.00 σ 2 λ 1,000 1.00 σ 2 β 0 1,000 1.00 σ 2 α 1,000 1.00 σ 2 ρ Y 1,000 1.00 σ 2 σ 2 1,049 1.00  Figure S3: Trace plots for the parameters (1 of 5).

S3.1 Leave-one-out cross-validation
The performance of the models is compared using the approach in Section 3.4 of the Main Manuscript based on LOOCV for the 18 locations available. Table S4 summarizes for each site the four considered metrics.
S3.2 Comparison between local models and the full model Figure S4 shows a comparison between the posterior distributions at the observed locations of the spatial processesα(s), ρ Y (s), and σ (s), in M 4 (black), and the posterior distribution of the corresponding parameters in the local models shown in Section S1.1 (red). The results for M 4 show a good agreement with results of the local models. This agreement shows that M 4 has no systematic bias in the estimation of the parameters related to time trends, autocorrelations, or variances. Note that β 0 expresses the baseline in local models, but M 4 also includes the term associated with elevation in the fixed effects, then the comparison of the posterior distribution of the intercept in local models andβ 0 (s) is not of interest. Figure S5 shows for each unobserved location (Longares, Olite and Guara) the posterior densities of the four spatial processes in M 4 . Figure S6 shows the posterior difference between average temperatures in both 30-year periods, 1956-1985 and 1986-2015. Figure S4: Boxplots of the posterior distributions of the spatial random effects in M 4 (black) and of the corresponding coefficients in the local models (red). Locations are sorted by elevation, from lowest to highest.  1956-1985 and 1986-2015, at the unobserved locations Longares, Olite and Guara. Right: Posterior mean and 90% credible intervals of the previous difference against posterior mean value of the entire period.