Prediction of spatial functional random processes: Comparing functional and spatio-temporal kriging approaches

In this paper, we present and compare functional and spatio-temporal (Sp.T.) kriging approaches to predict spatial functional random processes (which can also be viewed as Sp.T. random processes). Comparisons with respect to computational time and prediction performance via functional cross-validation is evaluated, mainly through a simulation study but also on two real data sets. We restrict comparisons to Sp.T. kriging versus ordinary kriging for functional data (OKFD), since the more flexible functional kriging approaches, pointwise functional kriging (PWFK) and functional kriging total model, coincide with OKFD in several situations. We contribute with new knowledge by proving that OKFD and PWFK coincide under certain conditions. From the simulation study, it is concluded that the prediction performance for the two kriging approaches in general is rather equal for stationary Sp.T. processes, with a tendency for functional kriging to work better for small sample sizes and Sp.T. kriging to work better for large sample sizes. For non-stationary Sp.T. processes, with a common deterministic time trend and/or time varying variances and dependence structure, OKFD performs better than Sp.T. kriging irrespective of sample size. For all simulated cases, the computational time for OKFD was considerably lower compared to those for the Sp.T. kriging methods.


Introduction
In many fields, such as environmental, forestry, climatology, meteorology and medical sciences, the spatial variation of objects in the form of curves are of interest to study.It could e.g.be ocean temperature, salinity or other variables measured over time (or at different depths) at a set of spatial locations.With today's modern technology and huge storage capability, it is in principle possible to observe entire curves by recording them over a dense raster of time (depth) points.In particular it may be of interest to predict a curve at a new spatial location given that such curves have been observed at n other locations by utilizing the information inherent in the spatial dependence between curves.
Kriging predictors have a long history of being used to predict objects at new locations based on information observed at a set of other locations, especially for objects that are real-or vector-valued, see e.g.Chiles and Delfiner (2009), Cressie (2015), Cressie and Wikle (2015), and references therein.Functional kriging predictors, used when the objects are functions with infinite dimension, was initially discussed by Goulard and Voltz (1993), and further proposed by e.g.Giraldo et al. (2010Giraldo et al. ( , 2011) ) and Nerini et al. (2010).In these papers, the expected value of the curves is assumed to be independent of the spatial location, so called ordinary functional kriging.More recently, e.g.Caballero et al. (2013), Menafoglio et al. (2013), Ignaccolo et al. (2014), and Reyes et al. (2015) has investigated functional kriging methods where the expected value of the curves may also depend on location.
A kriging predictor is a weighted sum of the objects observed at the n spatial locations, defined to be the best linear unbiased predictor (BLUP) minimizing the mean squared prediction error.The optimal kriging weights are functions of the (spatial) dependence structure of the objects, which in practice needs to be estimated.Typically estimators of the dependence structure rely on stationarity assumptions, unless parametric and distributional assumptions are made.
Here, two kriging approaches to predict spatial functional random processes are compared.A functional random process is a process with stochastic functional objects (curves) χ s = χ s (t), t ∈ T over the "time" domain T at each spatial location s ∈ D. Given that the process has been observed at n different locations, a curve at a new location s 0 , can be predicted by a functional kriging approach, i.e. as a linear combination of the n observed curves.A spatial functional process can also be viewed as a spatio-temporal (Sp.T.) random process {Z(s, t) = χ s (t), (s, t) ∈ D × T }, and hence, a Sp.T. kriging approach could also be used.The curve χ s0 (t), t ∈ T would then be predicted at a dense grid of values over T , based on linear combinations of a time-grid of values over the observed curves.The question of which approach, functional or Sp.T. kriging, should be used to analyze a particular data set is an important one (with no closed answer), as pointed out by Delicado et al. (2010).In this paper we compare the two approaches with respect to prediction performance and computational time, mainly by a simulation study and on two real data sets.Prediction performance is evaluated by functional cross-validation.
In Section 2 notation and definitions are given.Section 3 presents the functional and Sp.T. kriging approaches, including how to estimate the dependence structure.We also discuss how the functional kriging methods relate to each other, and under which circumstances they may coincide.In particular we state conditions under which the two functional kriging methods ordinary kriging for functional data and pointwise functional kriging coincide, with proofs given in Appendix A. A simulation study, comparing the two kriging approaches, is presented in Section 4, see also Appendix B. Both kriging approaches are applied to the Canadian temperature data (previously analyzed e.g. by Giraldo (2009), Giraldo et al. (2010) and Menafoglio et al. (2013)) and to salinity in seawater data (previously analyzed e.g. by Reyes et al. (2015) and Romano et al. (2015)), see Section 5. A discussion and concluding remarks are found in Section 6.

Preliminaries
A spatial functional random process {χ s : s ∈ D ⊂ R d } (Delicado et al., 2010;Giraldo et al., 2010), is a process where, for each given s ∈ D, the observed random element is a functional random variable, χ s , taking values in an infinite dimensional space (or function space).We will consider the case where χ s for every fixed s is a real-valued function, χ s (t), t ∈ T ⊂ R, from the compact set T to R and with s ∈ D ⊂ R 2 .It is usually assumed that the realizations of the curves (functions) χ s (t), t ∈ T, s ∈ D belong to a separable Hilbert space H of square integrable functions defined on T .
A spatial functional random process is second-order stationary if for each t ∈ T the corresponding spatial random process {χ s (t), s ∈ D} is second-order stationary.For (second-order) stationary functional processes, the covariance function (covariogram) satisfies Cov[χ s (r), χ v (t)] = C(s − v, r, t), which can be described by the variogram where σ 2 (t) = V [χ s (t)].Our main focus is on second-order isotropically stationary spatial functional random processes, satisfying (ii) Cov[χ s (r), χ v (t)] = C( s − v , r, t) ∀s, v ∈ D and ∀r, t ∈ T , where • denotes the (Euclidean) distance measure.For any given t ∈ T , γ t (h) := V [χ s (t) − χ v (t)]/2, h = s − v , is the semivariogram of the spatial random process {χ s (t) : s ∈ D}.In order to ensure that V [ n i=1 l i χ si (t)] ≥ 0 for any set of constants l 1 , . . ., l n ∈ R, n = 1, 2, ..., the variogram (as a function of h) needs to be a conditional negative definite function and the covariogram needs to be a positive definite function, see e.g.Cressie (2015).
A spatial functional random process can also be viewed as a Sp.T. process Z(s, t) = χ s (t), where Z(s, t) takes values in R, and is mapped from (s, t) ∈ D × T , cf.Cressie and Wikle (2015).A Sp.T. process is said to be second-order stationary and spatially isotropic if Note that the class of stationary Sp.T processes is a subset of the class of stationary functional random processes, since a stationary Sp.T. process implies that the corresponding functional random process also is stationary, while the opposite may not be true.

Kriging prediction
In this section two kriging approaches to predict spatial functional random processes are described.Section 3.1 presents different functional kriging methods, and under which circumstances they may coincide.Section 3.2 describes the Sp.T. kriging approach.A way to evaluate prediction performance using functional cross-validation is given in Section 3.3.

Functional kriging
For the presentation below, unless otherwise stated, we will assume that the spatial functional random process is second-order stationary and isotropic.Within the functional kriging framework, it is of interest to predict the complete random function χ s0 (t), t ∈ T , at a new location s 0 , given that a sample of random functions has been observed at n different locations, s 1 , . . ., s n .A functional kriging predictor, χs0 (t), t ∈ T , is defined to be the best linear unbiased predictor (BLUP) minimizing the mean integrated squared error (MISE) 3.1.1Ordinary kriging for functional data Goulard and Voltz (1993) proposed one of the first functional kriging predictors, the so called curve kriging predictor This predictor was further discussed by Giraldo et al. (2007Giraldo et al. ( , 2011) ) and there given the name of ordinary kriging for functional data (OKFD).The optimal kriging weights λ 1 , ..., λ n ∈ R are chosen such that MISE(s 0 ) is minimized given that the predictor is unbiased.It turns out that the optimal λ i 's only depend on the (isotropic) trace-semivariogram, being defined as where h ij = s i − s j .The second equality holds by Fubini's theorem under the assumption that the realizations of the random functions are square integrable.For a detailed derivation of the optimal weights, see Giraldo et al. (2011).The trace-semivariogram often satisfies the properties of a classical semivariogram, being a conditional negative definite function (Menafoglio et al., 2013).
The trace-variogram is in practice unknown and therefore needs to be estimated from the data.Under assumption (2), a (consistent) method of moments estimator of the trace-semivariogram (6) can be formed for a set of h-values as where N (h) = {(s i , s j ) : s i − s j = h}, and |N (h)| is the number of distinct elements in N (h).For irregularly spaced observations, it is rare to have several pairs of observations separated at exactly distance h and then N (h) is modified to {(s i , s j ) : s i − s j ∈ (h − , h + )}, with > 0 being some small positive value, in order to obtain a more stable estimate.To obtain a valid (variogram) estimate for any h, a parametric variogram model γ(h | θ), e.g. the spherical, exponential or stable model, is fitted to a set of estimated values {γ(h l ), h l }, l = 1, ..., L, by a least squares method, cf.Cressie (2015).Here, the ordinary least squares (OLS) method is used to estimate θ.
The random functions, χ si (t), are typically observed only at a finite number of time points t i1 , . . ., t imi , i = 1, . . ., n. Goulard and Voltz (1993) suggested to fit a parametric model χ si (•, α i ) to the observed values and replace χ si (t) by χ si (t, αi ) in ( 5) and (7).A non-parametric approach was suggested by Giraldo et al. (2011), where the observed random functions are represented (approximated) by linear combinations of p known basis functions, B(t) = (B 1 (t), . . ., B p (t)) , as The basis functions could e.g.be B-splines, Fourier basis, or wavelets.The coefficients (a i 's) can typically be determined by the least squares method.Giraldo et al. (2011) suggested to chose the number of basis functions p by cross-validation.In the final ordinary kriging predictor (5), the estimated trace-variogram values are plugged into the kriging weights (λ i 's), and the χsi (t)'s replacing the χ si (t)'s.et al. (2008, 2010) suggested the point-wise functional kriging predictor (PWFK), to allow more flexibility than the OKFD predictor (5).It allows the λ i 's to depend on t, and is defined as

Giraldo
The best linear unbiased predictor minimizing the mean squared integrated prediction error is found by choosing the λ i (t)-functions such that (4) is minimized subject to the unbiasedness constraint of the predictor, n i=1 λ i (t) = 1, for all t ∈ T. In order to solve the optimization problem, the λ i (t)-functions are represented by a linear combination of K known basis functions, where the b i 's are to be determined.Moreover, the χ si (t)'s are represented as in (8), implying that The optimization problem then reduces the infinite dimensional problem to a multivariate geostatistics problem.Given that the weights satisfy (9), the unbiasedness condition implies that where c = n i=1 b i .Hence, only basis functions B λ (t) that satisfy (10) for some constant vector c give admissable solutions to the kriging optimization problem.When B λ (t) are B-splines, ( 10) is fulfilled when c = 1, and for Fourier basis functions when c = (1, 0, . . ., 0) .In fact any set of basis functions where one (the first say) basis function is a constant, B λ1 (t) = k, satisfies (10) for c = (1/k, 0, . . ., 0) .The full derivation of the equation system to be solved in order to find the b i 's, for admissable choices of B λ (t) satisfying (10), is given by Giraldo et al. (2010) when B λ (t) = B(t), and for general B λ (t) in Appendix A.
The b i 's turn out to be functions of the covariances between the various a i 's, which in practice are not known and thus need to be estimated.If a i = a(s i ), and a(s) = [a 1 (s), . . ., a p (s)] is a pvariable second-order isotropically stationary spatial random field for all s ∈ D, with E[a(s)] = m a and Cov[a(s i ), a(s j )] = Σ( . Under this assumption Giraldo et al. (2010) suggest estimating the covariograms and crosscovariograms (the c kl (•)'s) via a linear model of coregionalization (Goulard and Voltz, 1992).This means that a(s) can be expressed as a(s) = Pr(s) where P ∈ R p×q and r(s) = (r 1 (s), . . ., r q (s)) are q latent univariate (second-order isotropically stationary) random fields, typically assumed to be independent.Given available data, a i = a(s i ), i = 1, . . .n, the c kl (•)'s (and P) can be estimated using the R-package gstat (Pebesma, 2004).In order to perform the estimation, the value of q and the variogram models of the r i (s)'s need to be specified.
The PWFK may have the potential to give better prediction performance than OKFD since it allows more flexible kriging weights.In which situations this could be true is still not completely known.In the following proposition (see Appendix A for the proof) we have confirmed situations in which PWFK and OKFD do coincide.
Proposition 3.1.Suppose that the χ si (t)'s can be represented by ( 8) and that a i = a(s i ) follows a linear model of coregionalization with q independent second-order stationary latent univariate spatial random fields with common variogram.Further assume that the B λ (t)'s satisfy (10) for some constant vector c, and that the inverse of the matrix G exists, where Then the optimal kriging weights (9) of PWFK that minimizes (4) satisfy λ i (t) = λ i , with b i = λ i c, for all i = 1, . . ., n, and thus coincide with those of OKFD.Giraldo et al. (2010) kindly permitted us to use their R-code to estimate and predict PWFK models.On all simulated and real data sets we considered, the computational time for PWFK turned out to always be substantially larger than for OKFD, estimated by the R-package geofd (Giraldo et al., 2012).Furthermore, we found a bug in their code, related to the numerical integration part.After correction of this bug, the estimated PWFK kriging weights always became constant (i.e.λ i (t) = λ i ) when B λ (t) = B(t) were cubic B-splines or Fourier basis functions.
3.1.3Functional kriging total model Giraldo (2009Giraldo ( , 2014)), and independently Nerini et al. (2010), proposed a third functional kriging method, that allows to use all time points of the observed functions in the prediction of χ s0 (t), t ∈ T .The method is called the functional kriging total model (FKTM), and the predictor is defined as This modeling approach is coherent with the functional linear model for functional responses (total model) introduced by Ramsay and Silverman (2005).Assuming that the random functions χ si (t)'s satisfy (8) and that the kriging weights satisfy Giraldo (2014) proposed a way to determine the λ i (t, v)'s (i.e. the C i 's) such that the predictor ( 11) is unbiased and minimizes (4).Also here, the C i 's turn out to be functions of the covariances between the various a i 's, which in practice are not known and can be estimated as proposed in Section 3.1.2.See Giraldo (2014) for more detailes.The FKTM method is computationally heavy compared to OKFD, just like the PWFK method (Giraldo, 2009).Moreover, Menafoglio and Petris (2016) showed that if the realizations of χ s (t) belong to the Hilbert space of square integrable functions on T , and the functional second-order stationary random process is Gaussian, then the kriging weights of FKTM and OKFD agree a.s.for any orthonormal base B(t).

Spatio-temporal kriging
Since a spatial functional process also can be viewed as a Sp.T. process, Z(s, t) = χ s (t), taking values in (s, t) ∈ D × T , it could also be predicted by Sp.T. kriging methods.Given the observed values Z(s i , t ij ), j = 1, . . ., m i , i = 1, . . ., n, the Sp.T. kriging predictor at location s 0 and time point t ∈ T , is defined to be the best linear unbiased predictor (BLUP), minimizing the mean squared prediction error (MSPE) Note that for each s 0 , the Sp.T. kriging weights (λ t ij 's) are allowed to change for each t ∈ T .When the mean value of the process is constant, the unbiasedness condition implies that n i=1 mi j=1 λ t ij = 1.Moreover, if the constant mean value of the process is unknown, the kriging weights depend on the Sp.T. covariance structure solely, and ( 12) is referred to as the so called Sp.T. ordinary kriging predictor, see e.g.Cressie and Wikle (2015).The dependence structure in practice needs to be estimated from the data and is then plugged into the kriging weights (λ t ij 's).For second-order stationary and spatially isotropic, Sp.T. processes satisfying (3), the dependence structure, given by the (spatially isotropic) Sp.T. variogram, is typically estimated via the following steps: First, an empirical (spatially isotropic) Sp.T. semivariogram is computed from lag classes as where N (h, u) = {(s i , t ik ), (s j , t jl ) : s i − s j ∈ (h − , h + ), and |t ik − t jl | ∈ (u − δ, u + δ)}, for some , δ > 0, and |N (h, u)| is the number of distinct elements in N (h, u).A parametric semivariogram model, γ(h, u|θ), is then fitted to a set of {γ(h l , u l ), (h l , u l )}, l = 1, . . ., L by a least squares method.Three commonly used types of stationary Sp.T. semivariogram (covariogram) models to estimate the Sp.T. dependence structure are the separable, product-sum and metric models.Gräler et al. (2016) show how Sp.T. ordinary kriging prediction can be performed with these three models using the R-package gstat.
The separable model assumes that the Sp.T. covariance function can be modeled by the product of the spatial and the temporal covariance functions, This model has the computational advantage of being able to express the covariance matrix as the Kronecker product between two covariance matrices (space and time) which simplifies and speeds up the computation of its determinant and inverse.The product-sum model is an extension of the separable model, where the covariance function is of the form, with k > 0. The metric Sp.T. covariance model is given by To treat the spatial and temporal distances equally, the spatial and temporal dimensions are matched by an anisotropy parameter κ.Note that when κ = 1, this covariance model corresponds to an isotropic second-order stationary random process in R 3 .More generally, in Sp.T. kriging modeling, the process is often described as where µ(s, t) is a deterministic trend, and (s, t) is a mean zero Sp.T. random field, usually assumed stationary.The trend is typically modeled by where x(s, t) ∈ R M is a set of M known covariates, often chosen to be polynomials of s and t, and β ∈ R M is an unknown parameter to be determined.When the Sp.T. process has a deterministic (unknown) non-constant trend of the form (15), then the BLUP (12) that minimizes (13) is called the Sp.T. universal kriging predictor, and the kriging weights are functions of both the dependence structure and the covariates evaluated at the observed and predicted locations, see e.g.Cressie and Wikle (2015) Section 4.1.2,page 148.An iterative weighted least squares method may be used to estimate β and the Sp.T. variogram parameter θ.Firstly, β would be estimated by the OLS method, minimizing Based on the resulting regression residuals, the Sp.T. semivariogram is then estimated by fitting a parametric Sp.T. semivariogram model to the corresponding empirical Sp.T. semivariogram by a least squares method.The parameter β is then re-estimated using a weighted least squares method, taking into account the estimated dependence structure of the residuals (Cressie, 2015).The dependence structure (variogram) is again estimated based on the updated residuals, and the whole procedure iterated until convergence.Note that if the deterministic trend only depends on time, such that µ(s, t) = m(t), the functional kriging methods do not need to specify and estimate the trend, whereas the Sp.T. kriging methods need to.

Evaluation of kriging methods
Functional cross-validation (FCV) is a common way of evaluating the prediction performance of prediction methods for functional data, as suggested by Giraldo et al. (2010Giraldo et al. ( , 2011)).In FCV, the data from each observed spatial location is removed, one at a time, and then predicted at all observed time points by the prediction method using the observed functional data at the remaining locations.The mean squared prediction error (MSPE) is computed as where Ẑ−i (s i , t ij ) denotes the predicted value at location (s i , t ij ) based on the functional data with the observations Z(s i , t ij ), j = 1, . . ., m i excluded.

A simulation study
Here we present a simulation study that aims to shed light over the relative merits of Sp.T. and functional kriging, with particular focus on Gaussian second-order stationary functional processes in R 2 .Since the functional kriging methods OKFD, PWFK, and FKTM often coincide for such processes (see Sections 3.1.2and 3.1.3)we restrict our comparisons to Sp.T. kriging versus OKFD.We simulate data from Gaussian processes with three main types of covariance structures.The first two scenarios have stationary isotropic separable and non-separable covariance functions, respectively.The third scenario corresponds to second-order stationary functional (but non-stationary Sp.T.) processes with constant mean.For all three scenarios, several different cases are simulated, with varying strengths of spatial and temporal dependence.All the (24) considered cases in the study are presented in Table 1, where the different parameters control the Sp.T. correlation structure in the three different main scenarios.For each case in Table 1, three different sample sizes were considered; small referring to n = 6 × 6 spatial locations and m = 12 time points, medium referring to n = 6 × 6 spatial locations and m = 50 time points, and large referring to n = 15 × 15 spatial locations and m = 50 time points.
For each sample size, the number of time points were equally distributed on [0, 1] and the spatial locations were located on a regular grid in [0, 1] × [0, 1].Moreover, for cases 1-18, the presence of a deterministic time trend, m(t) = 9 + 3 sin(2πt), was also investigated.For each case, sample size (and trend type for cases 1-18), 100 replicates were simulated.Figure 1 illustrates examples of simulated data for six of the cases, all with constant means.The three main scenarios are now presented in more detail, together with the simulated results.

Separable
The first nine cases in Table 1 were simulated (with and without the deterministic time trend) using the R-package RandomFields (Schlather et al., 2015), and are Gaussian Sp.T processes with separable covariance functions ( 14).The spatial covariance function C s (h) in ( 14) was chosen to be the exponential covariance function with nugget effect, The nugget effect ν was set to 0.04.For the parameter α, we considered the values 0.1, 0.5 and 2, corresponding to the effective ranges 30, 6 and 1.5 (very strong, medium and weak spatial correlation), respectively.The temporal covariance function C t (u) in ( 14) was given by the stable covariance function Here, γ was fixed to 0.5, while the values for β were 0.1, 1, and 10, corresponding to the effective ranges 90, 9 and 0.9 (very strong, medium and weak temporal correlation), respectively.

Without a deterministic time trend
Estimation of the OKFD model was performed using the R-package geofd (Giraldo et al., 2012).Given the generated data Z(s i , t j ), i = 1, . . ., n, j = 1, . . ., m, the OKFD model was estimated using different types (Fourier and cubic B-splines) and number (p) of basis functions, see Table 11 in Appendix B for a detailed specification.The (p) cubic B-splines were constructed based on p − 4 equally distributed interior knots on the interval [0, 1].For each number and type of basis function three different semivariogram models (spherical, exponential and stable) were fitted to the empirical tracesemivariogram.For each case (1-9), a total of 36, 42 and 42 OKFD models ( 2 types of basis functions × # different numbers of basis functions × #trace-semivariograms) for small, medium and large sample sizes, respectively, were estimated and fitted to the data, evaluated by FCV (functional crossvalidation) in terms of the MSPE ( 16), and the minimum MSPE over the models registered.The overall MSPE for each case and sample size was computed as the average minimum MSPEs over the 100 replicates.The Sp.T. kriging models were estimated using the R-packages gstat (Pebesma, 2004) and spacetime (Pebesma et al., 2012).The Sp.T. semivariogram models (separable, product-sum and metric) described in Section 3.2 were fitted to the empirical Sp.T. semivariogram, being all pairwise combinations of the exponential, spherical and stable variograms for the spatial (isotropic), temporal and joint variogram models.Hence, this resulted in 9 separable, 9 product-sum and 3 metric Sp.T. semivariogram models.All the Sp.T. kriging models were evaluated by FCV, minimum MSPE registered over the different models within each of the three groups of dependence structures, and the overall MSPE computed for each case (1-9) and sample size.The Sp.T. models with a product-sum and a metric covariance function were not evaluated for large size samples, due to the large computational time.The overall MSPEs for the OKFD and the different Sp.T. kriging models for cases 1-9 considering medium sample sizes without a deterministic time trend are presented in Table 2. Corresponding results for small and large sample sizes are reported in Appendix B, Tables 7 and 8.The numbers highlighted in red correspond to the smallest overall MSPE (per case) while the numbers in parentheses report the average computational time (for estimation and FCV) in seconds over all estimated models and replications (when run on a 3.5 GHz Intel Core i7 processor with 32 GB ram memory).The second last column presents the number of times, out of the 100 realisations, that OKFD had lower (minimum) MSPE than the Sp.T. separable model.The last column in Table 2 reports p-values from paired two-sided t-tests comparing the overall MSPEs between the OKFD and the Sp.T. separable models, and thus reflects for which cases significant differences occur.
For cases 1-9, the Sp.T. separable kriging models in general performed better (lower overall MSPEs) than the Sp.T. product-sum and metric models, which is natural since the simulated data were generated from Sp.T. models with separable covariance functions.Still, the overall MSPE was often Table 2: Prediction performance in terms of mean squared prediction errors (MSPEs) for the simulated cases 1-18 without a deterministic time trend, medium sample size.The smallest overall MSPE for each case is highlighted in red.The numbers in parentheses represent the average computational time in seconds over the corresponding estimated models and replications.The column #Times represents the number of times, out of the 100 realisations, that OKFD had lower (minimum) MSPE than the Sp.T. separable model.The last column shows p-values from two-sided paired t-tests comparing the overall MSPEs between the OKFD and the Sp.T. separable models.(significantly) lower for OKFD compared to the Sp.T. separable models, for small and medium sample sizes (Tables 2 and 7).For large sample sizes, the estimated Sp.T. (separable) models often performed better than OKFD (Table 8).In general, the larger the sample size, the more likely it is that the estimated Sp.T. (separable) models perform better than OKFD.Studying the overall MSPEs in more detail reveals that the weaker the spatial correlation and the stronger the temporal correlation, the better the OKFD performs and the worse the Sp.T. separable model performs, regardless of the sample size.Case 3 for example, with strong spatial and weak temporal correlation, has significantly lower overall MSPE for the Sp.T. separable model compared to the OKFD model for medium and large sample sizes (Tables 2 and 8).On the other hand, for case 7, with weak spatial and strong temporal correlation, the result is reversed.Moreover, from the registered computational times in Tables 2, 7, and 8, it can be concluded that prediction by and estimation of an OKFD model is substantially faster than the Sp.T. kriging models for cases 1-9, regardless of the sample size.The Sp.T. separable models had lower computational time compared to the Sp.T. product-sum and metric models, as expected (see, Section 3.2). Figure 2 presents how the type and number of basis functions used in the OKFD model affects the prediction performance (minimum MSPE over the three trace-semivariogram models, averaged over the 100 realisations) for cases 3 and 7 considering medium sample sizes.The number of basis functions turns out to be an important factor for prediction performance, in general with smaller prediction error the more basis functions are used.On the other hand, the type of basis functions, Fourier or cubic B-splines, is of less importance.These findings are consistent with all cases (1-9) and for all considered sample sizes.
To see how prediction performance may vary between replicates, Figure 3 presents box-plots of the differences in (minimum) MSPE between the two kriging approaches (MSPE(Sp.T)-MSPE(OKFD)) over the 100 replicates for cases 1-9 considering medium sample sizes.Here it becomes clear that OKFD produces more robust predictions.The Sp.T. kriging method (with estimated separable covariance q q q q q q q 10 20 30 40 50 function) produced much higher MSPEs than OKFD (casewise) for many realisations, especially for the cases with small and medium sample sizes.q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q Case 9 Case 8

With a deterministic time trend
Simulated data sets for cases 1-9 with a common deterministic (sinusoidal) time trend were predicted by the same OKFD models as in Section 4.1.1,since the OKFD models are designed to handle situations where a common deterministic time trend is present.Predictions were also made by universal Sp.T. kriging, using the same Sp.T. semivariogram models as in Section 4.1.1.The deterministic time trend in the universal Sp.T. kriging model was specified to be the same as the one simulated from.Table 3 summarizes the prediction performance of the two kriging approaches for cases 1-9 with deterministic time trend, for medium sample size.Corresponding results for small and large sample sizes are reported in Appendix B, Tables 9 and 10.Comparing these tables with the corresponding tables in Section 4.1.1,we see that the presence and estimation of a deterministic time trend did not have a large effect on the prediction performance, and more or less gave the same conclusions with respect to the relative performance of the two kriging approaches, regardless of the sample size.
However, for small sample sizes, a difference showed up indicating that not only the number of basis functions, but also the type of basis functions used in the OKFD models matter.Here we noticed that, for any given number of basis functions used in the OKFD model, B-splines had better prediction performance than if Fourier basis functions was used.This is illustrated in Figure 4, which presents how the type and number of basis functions used in the OKFD model affects the prediction performance (minimum MSPE over the three trace-semivariogram models, averaged over the 100 realisations) for cases 3 and 7 considering small sample sizes.We believe that a reason for the observed effect is connected to that the functional representations based on B-splines better fits the data.The effect is not observed for larger sample sizes.q q q q q q q q 5 6 7 B−splines Fourier Sp.T. separable q q q q q q q q 5 6 7

Non-separable
Cases 10-18 in Table 1 with and without the common deterministic time trend were simulated using the R-package RandomFields, and correspond to Gaussian Sp.T. processes with non-separable covariance functions of the form with parameters set to δ = 2, ν = 0.04, and α = 0.1, 0.5, and 2. The covariance function C t (u) was chosen to be the stable covariance function ( 17) with γ = 0.5 and β = 0.1, 1, and 10.The OKFD and the Sp.T. kriging models estimated in Section 4.1 were also fitted to the simulated data sets of cases 10-18 using the R-packages geofd, gstat, and spacetime, each with 100 realisations.Prediction performance of the two kriging approaches was evaluated in the same way as described in Section 4.1.1 and is summarized in Tables 2 and 3 for the non-separable cases 10-18 for medium sample sizes without and with a deterministic time trend, respectively.Corresponding results for small and large sample sizes are reported in Appendix B, Tables 7-10.In general, we draw similar conclusions as in Section 4.1 for the separable cases 1-9; the Sp.T. separable kriging models perform better than Table 3: Prediction performance in terms of mean squared prediction errors (MSPEs) for the simulated cases 1-18 with a deterministic time trend, medium sample size.The smallest overall MSPE for each case is highlighted in red.The numbers in parentheses represent the average computational time in seconds over the corresponding estimated models and replications.The column #Times represents the number of times, out of the 100 realisations, that OKFD had lower (minimum) MSPE than the Sp.T. separable model.The last column shows P-values from two-sided paired t-tests comparing the overall MSPEs between the OKFD and the Sp.T. separable models.the Sp.T. product-sum and metric models; the weaker the spatial correlation and the stronger the temporal correlation, the better the OKFD performs and the worse the Sp.T. (separable) model performs; OKFD works better for smaller sample sizes whereas fitted Sp.T. separable kriging models perform better for large sample sizes; more basis functions in OKFD generally improve prediction performance; computational times are much lower for OKFD; the presence of a deterministic time trend does not change the conclusions.
A more detailed comparison of the overall MSPEs (and p-values) in Tables 2-3, and Tables 7-10 reveals that prediction performance of OKFD in general improves in comparison to the Sp.T. separable kriging models for the simulated data sets with non-separable covariance functions (cases 10-18) compared to those simulated from separable covariance functions (cases 1-9).This result was to be expected, since none of the fitted (Sp.T.) kriging models coincide with the models that generated the data for cases 10-18.
To each generated data set we fitted the same OKFD models as those fitted in Section 4.1.1 using the R-package geofd.However, for medium and large sample sizes we extended the choices of number of basis functions (see Appendix B Table 12 for specification), yielding a total of 36, 90 and 90 different estimated OKFD models for small, medium and large sample sizes, respectively.For each case (19-24), sample size and realisation, predictions were made and evaluated by FCV for all models, and the minimum MSPE over the models registered.The overall MSPE for each case and sample size was finally computed as the average minimum MSPE over the 100 replicates.Furthermore, the same Sp.T. ordinary kriging models fitted to the data in Section 4.1.1,were also estimated for these data sets.Additionally, Sp.T. universal kriging models were fitted, with a deterministic time trend specified by a linear combination of the same basis functions that were used to generate the data set.Hence, a total of 18 separable, 18 product-sum and 6 metric Sp.T. kriging models were fitted to the data; predictions evaluated by FCV, the minimum MSPE registered over the models within the three groups of dependence structures, and the overall MSPE computed for each case (19-24) and sample size.As in Section 4.1 and 4.2, the Sp.T. models with a product-sum and a metric covariance function were not evaluated for large sample sizes due to the large computational times.
Table 4 summarizes the prediction performance of the two kriging approaches for cases 19-24 and all three sample sizes.Note that these simulated data sets have time varying variances and covariances, which the Sp.T. kriging approach is not designed to capture, whereas the OKFD model can handle such situations.We would therefore expect OKFD to perform better than the Sp.T. kriging approach, which is indeed the case.In fact, OKFD has significantly lower overall MSPE for all cases and sample sizes in Table 4 except for cases 22-23 considering small sample sizes.For these two cases the Sp.T. separable kriging model works better.This is probably coupled to the low number of observations ( 12) per location for small sample sizes.When the functional representations of the data at each location is formed for the OKFD models, we can thus at most fit a linear combination of 12 basis functions, whereas, the data are generated by 15 B-splines.The functional representations may thus fail to capture the full temporal time dynamics.The Sp.T. universal kriging models on the other hand, can fit a common deterministic time trend using all 15 B-splines.From Table 4, it is also noted that Sp.T. kriging models with fitted metric variograms sometimes had better prediction performance than the Sp.T. separable kriging models, but still worse than the best OKFD models.Moreover, we again note that the computational time for OKFD is much lower than for the Sp.T. models.
Figure 5 illustrates how the type and number of basis functions used in the fitted OKFD models affect the prediction performance (minimum MSPE over the three trace-semivariogram models, averaged over the 100 realisations) for cases 21 and 22 considering medium sample sizes.Case 21 corresponds to simulated data generated by 7 B-splines with weak spatial dependence, whereas case 22 corresponds to simulated data generated by 15 B-splines with strong spatial dependence.In contrast to the simulated stationary Sp.T. models (cases 1-18) where prediction performance typically increases with the number of basis functions used in the fitted OKFD models, here we observe this phenomena only when Fourier basis functions are used in the fitted OKFD models.For B-splines, the best prediction performance is (naturally) achieved using the same number of B-splines in the OKFD fitted models as used to generate the simulated data set (7 for case 21 and 15 for case 22).In fact, using too many B-splines may give substantially poorer predictions, especially when the spatial dependence is weak, as for case 21, cf. Figure 5.It can also be noted that the best OKFD model using B-splines has significantly smaller MSPE than the best OKFD model using Fourier basis functions.If the simulated data sets would have been generated by a set of Fourier basis functions instead, we would most likely see the opposite behaviour, i.e. that the same Fourier basis functions in the fitted OKFD model as in the data generation model probably would give the best prediction performance, and do better than the OKFD models using B-splines.
For the Sp.T. separable kriging models, it turned out (regardless of sample size) that it was advantageous to use universal kriging (estimating a deterministic time trend), especially for the cases with weak spatial dependence, whereas the prediction performance was about the same for cases with strong spatial dependence (Figure 5).For the Sp.T. metric model, we observed the opposite behaviour, i.e., for cases with weak spatial dependence it was more advantageous to use ordinary kriging instead of universal kriging.
q q q q q q q q q q q q q q q 10 20 30 40 50 0.20 0.24 0.28

Case 21
Number of basis functions MSPE q B−splines Fourier Sp.T. separable (with trend) Sp.T. separable (without trend) Sp.T. metric (with trend) Sp.T. metric (without trend) q q q q q q q q q q q q q q q 10 20 30 40 50 0.05 0.10 0.15 0.20 0.25

Case 22
Number of basis functions MSPE q B−splines Fourier Sp.T. separable (with trend) Sp.T. separable (without trend) Sp.T. metric (with trend) Sp.T. metric (without trend)

Applications
In this section we compare the prediction performance of the OKFD and the Sp.T. kriging models for two different data sets.The first data set consists of temperature curves recorded in the Maritimes Provinces of Canada, and the second corresponds to salinity curves obtained from the Caribbean coast of Colombia.

Spatial prediction of temperature curves in the Maritime Provinces of Canada
Here we analyse a meteorological data set, available in the R package geofd (Giraldo et al., 2012).
The data consists of temperature measurements recorded at n = 35 weather stations at Canada's Atlantic coast in the Maritime Provinces (Figure 6, top panel).At each station, the daily mean temperature averaged over the period 1960-1994 (February 29th (February 29th combined with February 28th) has been recorded.The resulting functional data are displayed in Figure 6 (bottom panel), connected by light grey lines.Using the R-package geofd, the data was first predicted by the OKFD model, which was estimated using 51, 101, 151, 201, 251, 301 and 351 Fourier basis functions.Three semivariogram models (exponential, spherical and stable) were fitted to the empirical trace-semivariogram by the OLS method.Thus, in total we estimated 7 × 3 = 21 OKFD models.Predictions were then made and evaluated by FCV in terms of their MSPEs ( 16).The best prediction performance was achieved using the stable trace-semivariogram (Figure 7, left panel) for all considered numbers of Fourier bases.Figure 7 (right panel) clearly reveals that the prediction error (minimum MSPE over the three trace-semivariogram models) decreases with the number of Fourier basis functions used in the fitted OKFD models.Thus, the best performance was attained with 351 Fourier basis functions and its MSPE was 0.5738.The average computational time for an estimated OKFD model based on 51 and 351 Fourier basis functions was less than one and three seconds, respectively.
The data was further predicted using Sp.T. kriging.Since the data show a clear time trend, universal Sp.T. kriging was first applied.The deterministic time trend was modelled by a linear combination of the 3 (and 7) first Fourier basis functions, and estimated by the OLS method.The dependence structure of the resulting residuals was then estimated by fitting Sp.T. second-order stationary and isotropic semivariogram models to the empirical Sp.T. semivariogram of the residuals.The Sp.T. semivariogram models (separable, product-sum and metric) described in Section 3.2 were estimated, letting their corresponding spatial, temporal and joint semivariogram models be altered between the exponential, spherical and stable semivariogram models.This resulted in 9 separable, 9 product-sum and 3 metric Sp.T. semivariogram models.As a comparison we also predicted the original data by Sp.T. ordinary kriging, using the same Sp.T. semivariogram models as for the universal Sp.T. kriging models.Thus, in total we investigated (9 + 9 + 3) × 3 = 63 Sp.T. kriging models.All models were fitted to the data and predictions evaluated by FCV.
Table 5 presents the best (smallest MSPE) Sp.T. models, within each of the three groups of dependence structure (separable, product-sum and metric), with and without an estimated trend.The numbers in brackets report the corresponding average computational time in seconds over the estimated models.Many of the Sp.T. models have about the same prediction performance, with the exceptions of the Sp.T. metric models with estimated trend, which worked less well.The best Sp.T. models have approximately the same magnitude of MSPE as the best OKFD model (MSPE being 0.5738), but in terms of computational time, an OKFD model (taking 1-3 seconds to compute) was 100-10000 times faster to compute compared to a Sp.T. kriging model.
Figure 8 presents the observed daily temperatures at locations Bertrand (the location with the largest prediction error) and Moncton, together with the corresponding predicted values using the best OKFD and Sp.T. kriging models.It emphasizes that there are very small differences between the best OKFD and Sp.T. models (in terms of prediction performance).q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q −68 −66  q q q q q q q q q q q q q 0 2 4 6 0 1000  This data set has previously been analysed by e.g.Giraldo (2009) with the objective to demonstrate and compare the functional kriging methods OKFD, PWFK and FKTM.Giraldo (2009) concluded that the three methods have similar FCV prediction performance when the first 65 Fourier basis functions are used in (8) to represent the χ si (t)'s.In Menafoglio et al. (2013) this data set was used to investigate the effect of using universal kriging for functional data (UKFD) instead of OKFD, also by representing the functional data with the first 65 Fourier basis functions.They concluded that UKFD performed better in terms of FCV compared to OKFD.The FCV performance was there computed with respect to the fitted data, thus differing from ours, where raw data has been used.

Spatial prediction of salinity curves on the Caribbean coast of Colombia
Here we analyse a data set consisting of salinity measurements recorded at 21 monitoring stations of the lagoonal-estuarine system comprised by Ciénaga Grande de Santa Marta (CGSM) and Complex of Pajarales (CP) located on the Caribbean coast of Colombia, see Figure 9 (top panel).The data for each station were recorded biweekly from October 1988 to March 1991 (connected by lines in Figure 9, bottom left panel).This data set has previously been used by Reyes et al. (2015) to illustrate, evaluate and compare the performance of the functional kriging approaches OKFD, PWFK and FKTM when applied on residual curves after estimating a deterministic trend, so called ROKFD, RPWFK and RFKTM, as well as directly applied on the data.They came to the conclusion that ROKFD was the best alternative for performing functional kriging prediction, although the difference (in prediction performance) to RPWFK and RFKTM were small.Here, we will redo the same analysis using OKFD and ROKFD, and add predictions made by Sp.T. kriging models, for comparison purposes.The salinity data is evidently not stationary, as there is a clear increasing trend from east to west, see Figure 9 (top and bottom left panel).For ROKFD, a deterministic trend was thus first estimated.We used the same trend model as Reyes et al. (2015), where X i (t), i = 1, ..., 21 are the salinity curves, and α(t), β 1 (t) and β 2 (t) are the functional parameters.
For convenience, since we are evaluating prediction performance by FCV, we restricted the estimation of the functional parameters to the observed time points and thus fitted the model by OLS for each observed time point using the raw salinity data.Reyes et al. (2015) fitted the trend based on smoothed salinity curves.Once the trend was estimated, the resulting residual data i (t j ), i = 1, ..., 21, j = 1, ..., 55 (connected by lines in Figure 9, right panel) were formed.The Salinity data was predicted by ROKFD using the estimated deterministic trend combined with estimated OKFD models applied to the residual data.As a comparison we also estimated OKFD models directly on the original raw data, and used them to predict the Salinity data.
In accordance with Reyes et al. (2015), we used B-splines basis functions to construct functional representations of both the original and the residual data.Specifically, we studied OKFD and ROKFD and their MSPEs using 5,6,7,8,9,10,15,20,30,40 and 50 B-splines.Moreover, the exponential, spherical and stable semivariogram models were fitted to the empirical trace semivariograms (of the residual and original data) by OLS.Predictions were made and evaluated by FCV in terms of their MSPE for a total of 11 × 3 = 33 estimated OKFD models on the original data as well as 33 estimated ROKFD models by using the R-package geofd.
The minimum MSPE, for each number of basis functions and trend used, was obtained by the stable trace-semivariogram (Figure 10, left panel).Figure 10 (right panel) presents how the trend and the number of B-splines used in the fitted OKFD and ROKFD models affect the prediction performance (minimum MSPE over the three trace-variogram models).The prediction errors of OKFD decreases with the number of basis functions used while the performance of ROKFD approximately is the same, irrespective of the number of basis functions used (Figure 10).We believe that the performance of the latter is due to that a large part of the dependence structure is captured by the estimated Sp.T. deterministic trend.It is also noted that prediction based on ROKFD yields lower prediction errors compared to OKFD.Moreover, the computational time for all the 66 OKFD and ROKFD models was about the same, taking approximately 0.2 seconds each.q q q q q q q q q q q q q q q q q q q q q 945000 950000 955000 960000 965000 970000 975000 Sp.T. universal kriging models were also estimated and used to predict the Salinity data, based on the estimated deterministic trend specified in (19).To compare, we also applied Sp.T. ordinary kriging models to the original raw data.The dependence structures were estimated by fitting the Sp.T. semivariograms (9 separable, 9 product-sum, 3 metric) to the empirical Sp.T. semivariograms (computed from both the residual and the original data).Thus, in total 2 × (9 + 9 + 3) = 42 Sp.T. models were fitted to the data and then evaluated by FCV.
Table 6 presents the best Sp.T. models, in terms of minimum MSPE, within the three groups of dependence structure (separable, product-sum and metric) with and without deterministic trend.The numbers in brackets report the corresponding average computational times in seconds over the estimated models.The lowest MSPEs, being approximately of the same magnitude as those for the ROKFD models, were obtained by the Sp.T. separable and the product-sum universal kriging models (cf. Figure 10, right panel).It is also noted that the best Sp.T. ordinary kriging models, when q q q q q q q q q q q q q 0 5000 10000 15000 20000 25000 30000 0 1000 2000 3000 4000 distance semivariance q q q q q q q q q q q 10 20 30  applied on the original data, gave about the same size of the MSPEs as the best OKFD model.Figure 11 illustrates the predictions together with the observed salinity data at locations LBA and CCG (corresponding to the locations with the largest and smallest prediction errors, respectively) using the best ROKFD and Sp.T. kriging models.The predictions obtained by the two methods (ROKFD and the Sp.T. product-sum universal kriging) are very similar with the predictions in CCG performing good whereas the predictions in LBA (the farthest considered station in our data set, see Figure 9, top panel) performing not as good.The computational times for the Sp.T. models (taking approximately 15-30 seconds per model, cf.Table 6) are much higher than the ones for OKFD and ROKFD (taking approximately 0.2 seconds per model).

Concluding remarks
In this paper we have presented and compared functional and Sp.T. kriging approaches to predict spatial functional random processes.Comparisons with respect to prediction performance and computational time has been performed, mainly through a simulation study and two real data sets.We restricted the comparison to Sp.T. kriging versus the functional kriging method OKFD, since the more flexible functional kriging approaches PWFK and FKTM coincide with OKFD in several situations (Sections 3.1.2 and 3.1.3).Here we also contribute with new knowledge by proving that OKFD and PWFK coincide under certain conditions.
Based on the simulation study and the analyses of the two data sets, we observed that the prediction performance (in terms of functional cross-validation) of OKFD normally was improved when the number of basis functions used to represent the functional data increased.Furthermore, OKFD typically performed similarly or better than the Sp.T. kriging models for small and medium sample sizes.This is likely due to the more complex task of finding good estimates of the Sp.T. variogram compared to the trace-variograms used in OKFD, since trace-variograms have one dimension less.The large number of choices of Sp.T. variogram models and parameters to estimate makes the Sp.T. estimation process more vulnerable, especially for small data sets.For larger sample sizes, the Sp.T. kriging starts to perform better for the stationary Sp.T. processes, whereas OKFD continues to work best for the non-stationary Sp.T. (but stationary functional) processes.We also noted a clear tendency for OKFD to perform better relative to Sp.T. kriging, the stronger the temporal-and the weaker the spatial dependence considered.
For all considered cases, OKFD was computationally considerably faster than the Sp.T. kriging models.The large matrices that need to be inverted in order to perform Sp.T. kriging prediction at each location, is the major reason for this fact.One way to reduce the computational time for the Sp.T. kriging models could be to use only the local neighbourhood (e.g. the k closest neighbouring locations) when prediction is made.This can often be done without much loss in prediction performance.
The purpose of this study has been to shed light on the relative merits of functional and Sp.T. kriging methods for prediction of spatial functional random processes.While functional kriging predicts complete curves on a given (time) domain, given observations on the same domain, the Sp.T. kriging methods make (a raster of) pointwise predictions of the curves and are not restricted to a given (time) domain.Experience from this study concludes that prediction performance of the two kriging approaches (functional and Sp.T.) in general is rather equal for stationary Sp.T. processes, with a tendency for functional kriging to work slightly better for small sample sizes and Sp.T. kriging to work slightly better for large sample sizes.For non-stationary Sp.T. processes, e.g. the presence of a common deterministic time trend and/or time varying variances and dependence structure, do not demand any extra modeling for functional kriging, whereas identification and modeling of trend and/or time varying dependence is necessary for Sp.T. kriging.From a modelers perspective, the Sp.T. kriging methods demands more work with a larger risk of choosing a suboptimal model.Moreover, from a computational perspective functional kriging is substantially faster than Sp.T. kriging.

A Proof of Proposition 3.1
Below we present a proof of Proposition 3.1.The proof relies on Lemma A.1, which we first state and prove.
Lemma A.1.Assume that we have a symmetric matrix of the form , where W and G are symmetric k × k matrices, I is a k × k identity matrix, and c ij = c ji for i = j, i, j = 1, ..., n are constants.Given that the inverse of G exists, the inverse of Q satisfies where the k ij 's, k i 's, and c are constants, determined by the c ij 's, such that k ij = k ji for i, j = 1, ..., n, = 0 for all j's (and i's), and Note that the k ij 's and the k i 's also change with n.However, for notational simplicity we suppress the dependence of n in the k ij 's and the k i 's.
Proof of Lemma A.1.The proof of ( 20) is done by induction.The proof uses the following, equivalent, block matrix inversion formulas where A and D are square matrices allowed to be of different size.
Assume now that (20) holds for n = m − 1.The proof is complete if we can show that (20) holds for n = m.For n = m we split the matrix Q in four blocks By assumption the inverse of D is of the form (20). Using this fact we have that where the second equality uses the induction hypothesis that m−1 i=1 k ij = 0 for all j = 1, ..., m − 1, and m−1 i=1 k i = 1.Further, it is relative straight forward, using the induction hypothesis, to verify that Hence, Combining ( 24) and ( 25), we obtain and due to the fact that D −1 and E are symmetric (using that the inverse of a symmetric matrix is again symmetric) and C = B we also have that Moreover, by combining ( 24), ( 26) and the inverse of D it can also be shown that the (i, j)th k × k block matrix entry in the km and Thus, the inverse Q −1 = E F H K is of the same form as in (20).It now remains to show that the k * -coefficients satisfy the same conditions as the k's in Lemma A.1.From ( 25) and ( 27) and the fact that k ij = k ji , i, j = 1, ..., m − 1, and that H = F , we have that k * ij = k * ji , i, j = 1, ..., m.Moreover, by the induction hypothesis, for j = 1 in k * ij , we obtain and for j = 2, ..., m we have Hence, formula (20) holds for n = m, and by the induction principle we have that ( 20) is true for all integers n larger than 1.We have thus proved Lemma A.1.
Proof of Proposition 3.1.Under the assumption in Proposition 3.1 we here show that the coefficients of the functional kriging weights (9) of PWFK, which are obtained by minmising (4) subject to the unbiasedness constraint of the predictor ( n i=1 λ i (t) = 1, for all t ∈ T ), yields weights that are constant over time, i.e., λ i (t) = b i B λ (t) = λ i , i = 1, ..., n.Giraldo et al. (2010) showed that the solution of the optimisation problem is given by the solution of the system Qβ = J =⇒ β = Q −1 J, where m = (m 1 , ..., m K ) are the K Lagrangian multipliers and c is an unbiasedness constraint vector satisfying c B λ (t)=1.
By the assumption we have that a i = Pr i with V (r i ) = D(0) = σ 2 I and C(r i , r j ) = D(h ij ) = (σ 2 − γ(h ij ))I, where γ(h) is the common semivariogram function depending on the distance h ij = h ji = s i − s j between two locations s i and s j .Therefore, σ i (t) and σ ij (t) in expressions ( 28), ( 29) and (30) equals where Hence, the system we want to solve Qβ = J may be expressed as .
By Lemma A.1 the inverse of Q is of the form (20) (as long as the inverse of G exists) and thus it follows that the solution of such system for any b i is of the form which can be rewritten as where the k ij 's and k i 's are constants determined by the γ(h ij )'s such that k ij = k ji for i, j = 1, ..., n, n j=1 k ij = 0 for all i's and n i=1 k i = 1.Using this fact together with ( 30) and ( 31), we may write the right hand side of (32) as (33) Thus, using (33) and the expression for G we may now express (32) as where we in the last expression used the fact that B λ (t)c must equal 1 for all values of t ∈ T (the unbiasedness constraint).We now see that the above equation holds if b i equals (k i + n j=1 k ij γ(h 0j ))c and from the assumption of G −1 's existence we also have that this solution is the only one.Thus, the weights must equal for all values of t ∈ T and i = 1, ..., n, and consequently, the PWFK predictor coincides with the OKFD predictor.

Figure 2 :
Figure 2: Prediction performance (minimum MSPE over the three trace-semivariogram models, averaged over the 100 realisations) for cases 3 and 7 considering medium sample sizes without a deterministic time trend when the estimated OKFD model is based on different numbers (p) of basis functions, being both Fourier and cubic B-spline bases.The solid black lines represent the corresponding overall MSPE of the Sp.T. separable model.

Figure 3 :
Figure 3: Box plots for cases 1-9 (considering medium sample sizes without a deterministic time trend) of the differences in (minimum) MSPE between the two kriging approaches (MSPE(Sp.T)-MSPE(OKFD)) for the 100 replicates.

Figure 4 :
Figure 4: Prediction performance (minimum MSPE over the three trace-semivariogram models, averaged over the 100 realisations) for cases 3 and 7 considering small sample sizes with a deterministic time trend when the estimated OKFD model is based on different numbers (p) of basis functions, being both Fourier and cubic B-spline bases.The solid black lines represent the corresponding overall MSPE of the Sp.T. separable universal kriging model.

Figure 5 :
Figure 5: Prediction performance (minimum MSPE over the three trace-semivariogram models, averaged over the 100 realisations) for cases 21 and 22 considering medium sample sizes when the estimated OKFD model is based on different numbers (p) of basis functions, being both Fourier and cubic Bspline bases.The solid and dashed black lines represent the corresponding overall MSPE of the Sp.T. separable model with and without an estimated deterministic time trend, respectively.The solid and dashed green lines represent the corresponding overall MSPE of the Sp.T. metric model with and without an estimated deterministic time trend, respectively.

Figure 6 :
Figure 6: Locations of the 36 weather stations in the Canadian Maritime provinces (top panel) where the average (over 30 years) daily temperature curves (bottom panel) were registered.The bottom panel also presents the estimated common time trend specified as linear combinations of the first 3 and 7 Fourier basis functions, respectively.

Figure 7 :
Figure 7: Left panel: Empirical trace-semivariogram and the best fitted stable model for the Canadian temperature curves, represented by 351 Fourier basis functions.Right panel: Minimum MSPE over the three trace-semivariogram models for OKFD, based on different numbers of Fourier basis functions.The solid black line represents the MSPE of the best Sp.T. model.

Figure 8 :
Figure 8: Predicted temperatures at locations Bertrand (top) and Moncton (bottom) obtained by the best OKFD model (solid grey line) and the best Sp.T. model (dashed black line) together with the observed (dotted) values.

Figure 9 :
Figure 9: Top panel: The 21 monitoring stations of the lagoonal-estuarine system comprised by CGSM (stations: BCG, BRF, BRS, CEN, LBA, PCO, PTA, RFU, RIN, RJA, RSE) and CP (stations: CBR, CCG, CCH, CCL, CCV, CDR, CLU, CPA, CRE, VCH).Bottom left panel: Biweekly salinity data recorded on the 21 monitoring stations.Bottom right panel: Residuals obtained from the fitted functional regression model.The salinity data, residuals and the corresponding locations at CGSM and CP are represented by grey and black lines/points, respectively.

Figure 10 :
Figure 10: Left panel: Empirical trace semivariogram and the best fitted stable model for residual curves represented by 6 B-spline basis functions.Right panel: MSPEs for OKFD and ROKFD over different number of B-splines and trends used.The solid and dotted black line represents the MSPE of the best Sp.T. model with a Sp.T. product-sum and separable semivariogram model applied on the residuals and the original data, respectively.

Figure 11 :
Figure 11: Predicted temperatures at the locations LBA (top panel) and CCG (bottom panel) obtained by the best ROKFD model (solid grey line) and the best Sp.T. model (dashed black line) together with the observed (dotted) values.

Table 1 :
The 24 different types (cases) of simulated Gaussian processes and their parameters: isotropic second-order stationary Sp.T. processes with separable (cases 1-9) and non-separable (cases 10-18) covariance functions, and second-order stationary functional (but non-stationary Sp.T.) processes (cases 19-24) with constant means.The larger the value of α and β the weaker the spatial and temporal correlation, respectively.

Table 4 :
Prediction performance in terms of mean squared prediction errors (MSPEs) for the simulated cases 19-24 over the different sample sizes.The smallest overall MSPE for each case is highlighted in red.The numbers in parentheses represent the average computational time in seconds over the corresponding estimated models and replications.The column #Times represents the number of times, out of the 100 realisations, that OKFD had lower (minimum) MSPE than the best Sp.T. model.The last column shows p-values from two-sided paired t-tests comparing the overall MSPEs between the OKFD and the best Sp.T. model.

Table 5 :
Prediction performance of different Sp.T. kriging models for the Canadian weather data.For each type of trend and Sp.T. variogram model, the (minimum) MSPE is reported.The numbers in parentheses represent the average computational time in seconds over the corresponding estimated models.

Table 6 :
Prediction performance of different Sp.T. kriging models for the salinity data.For each type of trend and Sp.T. variogram model, the (minimum) MSPE is reported.The numbers in parentheses represent the average computational time in seconds over the corresponding estimated models.

Table 7 :
Simulated data without deterministic time trend, small sample size.

Table 8 :
Simulated data without deterministic time trend, large sample size.

Table 9 :
Simulated data with deterministic time trend, small sample size.

Table 10 :
Simulated data with deterministic time trend, large sample size.