A parsimonious, computationally efficient machine learning method for spatial regression

We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.


Introduction
The spatial prediction (interpolation) problem arises in various fields of science and engineering that study spatially distributed variables.In the case of scattered data, filling gaps facilitates understanding of the spatial features, visualization of the observed process, and it is also necessary to obtain fully populated grids of spatially dependent parameters used in partial differential equations.Spatial prediction is highly relevant to many disciplines, such as environmental mapping, risk assessment (Christakos, 2012) and environmental health studies (Christakos and Hristopulos, 2013), subsurface hydrology (Kitanidis, 1997;Rubin, 2003), mining (Goovaerts, 1997), and oil reserves estimation (Hohn, 1988;Hamzehpour and Sahimi, 2006).In addition, remote sensing images often include gaps with missing data (e.g., clouds, snow, heavy precipitation, ground vegetation coverage, etc.) that need to be filled (Rossi et al, 1994).Spatial prediction is also useful in image analysis (Winkler, 2003;Gui and Wei, 2004) and signal processing (Unser and Blu, 2005;Ramani and Unser, 2006) including medical applications (Parrott et al, 1993;Cao and Worsley, 2001).Spatial interpolation methods in the literature include simple deterministic approaches, such as inverse distance weighting (Shepard, 1968) and minimum curvature (Sandwell, 1987), as well as the widely-used family of kriging estimators (Cressie, 1990).The latter are stochastic methods, with their popularity being due to favorable statistical properties (optimality, linearity, and unbiasedness under ideal conditions).Thus, kriging usually outperforms other interpolation methods in prediction accuracy.However, the computational complexity of kriging increases cubically with the sample size and thus becomes impractical or infeasible for large data sets.On the other hand, massive data are now ubiquitous due to modern sensing technologies such as radars, satellites, and lidar.
To improve computational efficiency, traditional methods can be modified leading to some tolerable loss of prediction performance (Cressie and Johannesson, 2018;Furrer et al, 2006;Ingram et al, 2008;Kaufman et al, 2008;Marcotte and Allard, 2018;Zhong et al, 2016).With new developments in hardware architecture, another possibility is provided by parallel implementations using already rather affordable multi-core CPU and GPU hardware architectures (Cheng, 2013;de Ravé et al, 2014;Hu and Shu, 2015;Pesquer et al, 2011).A third option is to propose new prediction methods that are inherently computationally efficient.
One such approach employs Boltzmann-Gibbs random fields to model spatial correlations by means of short-range "interactions" instead of the empirical variogram (or covariance) function used in geostatistics (Hristopulos, 2003;Hristopulos and Elogne, 2007;Hristopulos, 2015).This approach was later extended to non-Gaussian gridded data by using classical spin models ( Žukovič andHristopulos, 2009a,b, 2013;Žukovič and Hristopulos, 2018;Žukovič et al, 2020).The latter were shown to be computationally efficient and competitive in terms of prediction performance with respect to several other interpolation methods.Moreover, their ability to operate without user intervention makes them ideal candidates for automated processing of large data sets on regular spatial grids, typical in remote sensing.Furthermore, the short-range (nearest-neighbor) interactions between the variables allows parallelization and thus further increase in computational efficiency.For example, a GPU implementation of the modified planar rotator (MPR) model led to impressive speed-ups (up to almost 500 times on large grids), compared to single CPU calculations ( Žukovič et al, 2020).
The MPR method is limited to 2D grids, and its extension to scattered data is not straightforward.In the present paper we propose the modified planar rotator for scattered data (MPRS).This new machine learning method can be used for scattered or gridded data in spaces with different dimensions.MPRS achieves even higher computational efficiency than MPR due to full vectorization of the algorithm.This new approach does not rely a particular structure or dimension of the data location grid; it only needs the distances between each prediction point and a predefined number of samples in its neighborhood.This feature makes MPRS applicable to scattered data in arbitrary dimensions.

The MPRS Model
The MPRS model exploits an idea initially used in the modified planar rotator (MPR) model ( Žukovič and Hristopulos, 2018).The latter was introduced for filling data gaps of continuously-valued variables distributed on 2D rectangular grids.The key idea is to map the data to continuous "spins" (i.e., variables defined in the interval [−1, 1]), and then construct a model of spatial dependence by imposing interactions between spins.MPRS models such interactions even between scattered data and is thus applicable to both structured and unstructured (scattered) data over domains D ⊂ R d where d is any integer.

Model definition
Let us consider the random field Z(s; ω) : R d → R which is defined over a complete probability space (Ω, F , P ).Assume that the data are sampled at i=1 from the field Z(s; ω).The data set is denoted by , and the set of prediction points by , the sampling and prediction sets are disjoint), and P + N = N G .The random field values at the prediction sites will be denoted by the set Z p .
A Boltzmann-Gibbs probability density function (PDF) can be defined for the configuration z(s) sampled over G s ⊂ R d .The PDF is governed by the Hamiltonian (energy functional) H(z Gs ) and is given by the exponential form where z Gs {z(s) : s ∈ G s } is the set of data values at the sampling points, and Z is a normalizing constant (known as the partition function).In statistical physics, T is the thermodynamic temperature and k B is the Boltzmann constant.In the case of the MPRS model, the product k B T represents a model parameter that controls the variance of the field.
A local-interaction Hamiltonian can in general be expressed as where J i,j is a location-dependent pair-coupling function and Φ(z i , z j ) is a nonlinear function of the interacting values z i , z j .The notation • implies a spatial averages defined by means of where A i,j is a two-point function, and neighb(i) denotes all n b sampling points in the interaction neighborhood of the i-th point.
To define the local interactions in the MPRS model, the original data z i are mapped to continuously-valued "spin" variables represented by angles φ i using the linear transformation where z s,min = min i∈{1,2,...,N } Z s , z s,max = max i∈{1,2,...,N } Z s , and for i = 1, . . ., N .The MPRS pairwise energy is given by the equation Φ i,j = cos φ i,j , where φ i,j = (φ i − φ j )/2 . (5) In order to fully determine interactions between scattered data, the coupling function J i,j needs to be defined.It is reasonable to assume that the strength of the interactions diminishes with increasing distance.Hence, we adopt an exponential decay of the interactions between two points i and j, i.e., In the coupling function (6), the constant J 0 defines the maximum intensity of the interactions, r i,j = s i − s j is the pair distance, and the locally adaptive bandwidth parameter b i is specific to each prediction point and reflects the In MPRS, regression is accomplished by means of a conditional simulation approach which is described below.To predict the value of the field at the points in G p , the energy function ( 2) is extended to include the prediction points, i.e., we use H(z G ) = H(z Gs ∪ z Gp ).In H(z G ) we restrict interactions between each prediction point and its sample neighbors (i.e., we neglect interactions between prediction points) in order to allow vectorization of the algorithm which enhances computational performance.In practice, omitting prediction-point interactions does not impact significantly the prediction.Then, the Hamiltonian comprises two parts: one that involves only sample-to-sample interactions and one that involves interactions of the prediction points with the samples in their respective neighborhood.Since the sample values are fixed, the first part contributes an additive constant, while the important (for predictive purposes) contribution comes from the second part of the the energy.The latter represents a summation of the contributions from all P prediction points.
The optimal values of the spin angles φ p at s p ∈ G p can then be determined by finding the configurations which maximize the Boltzmann-Gibbs PDF (1), where the energy is now replaced with H(z G ).If T = 0, the PDF is maximized by the configuration which minimizes the total energy H(z G ), i.e., φp = arg min φp H(z G ), p = 1, . . ., P .
However, for T = 0, there can exist many configurations {φ p } P p=1 that lead to the same energy H(z G ) = E. Assuming that Ω(E) is the total number of configurations with energy E, the probability P (E) of observing Equivalently, we can write this as follows Taking into account that S(E) = k B log Ω(E) is the entropy that corresponds to the energy E, the exponent of (8) becomes proportional to the free energy: The minimum free energy corresponds to the thermal equilibrium state.In practice, the latter can be achieved in the long-time limit by constructing a sequence (Markov chain) of states using one of the legitimate updating rules, such as the Metropolis algorithm (Metropolis et al, 1953), as shown in Section 2.3.
The MPRS model is fully defined in terms of the equations ( 1)-(10).

Setting the MPRS Model Parameters and Hyperparameters
The MPRS learning process involves the model parameters and a number of hyperparameters which control the approach of the model to an equilibrium probability distribution.The model parameters include the number of interacting neighbors per point, n b , the decay rate vector b = (b 1 , . . ., b P ) ⊤ used in the exponential coupling function (6), the prefactor J 0 , and the simulation temperature T ; the ratio of the latter two sets the interaction scale via the reduced coupling parameter J 0 /k B T .Thus, in the following we refer to the "simulation temperature" (T ) as shorthand for the dimensionless ratio k B T /J 0 .In addition, henceforward energy functions H are calculated with J 0 = 1.
In order to optimize computational performance, after experimentation with various data sets we set the model parameters to the default values n b = 8 (for all prediction points) and T = 10 −3 ; the decay rates {b p } P p=1 are estimated as the median distance between the p-th prediction point and its four nearest sample neighbors.These choices are supported by (i) the expectation of increased spatial continuity for low T and (ii) experience with the MPR method.In particular, MPR tends to perform better at very low T (i.e., for T ≈ 10 −3 ).In addition, using higher-order neighbor interactions (n b = 8 neighbors, i.e., nearest-and second-nearest neighbors on the square grid) improves the smoothness of the regression surface.The definition of the decay rate vector b enables it to adapt to potentially uneven spatial distribution of samples around prediction points.
Our exploratory tests showed that the prediction performance is not sensitive to the default values defined above.For example, setting n b = 4 or increasing (decreasing) T by one order of magnitude, we obtained very similar results as for the default parameter choices.Nevertheless, we tested the default settings on various data sets and verified that even if they are not optimal, they still provide competitive performance.
The MPRS hyperparameters are used to control the learning process.The static hyperparameters are listed in Section 1.3.1 of Algorithm 1. Below, we discuss their definition, impact on prediction performance, and setting of default values.The number of equilibrium configurations, M , is arbitrarily set to 100.
Smaller (larger) values would increase (decrease) computational performance and decrease (increase) prediction accuracy and precision.The frequency of equilibrium state verification is controlled by n f which is set to 5. Lower n f increases the frequency and thus slightly decreases the simulation speed but it can lead to earlier detection of the equilibrium state.In order to test for equilibrium conditions, we need to check the slope of the energy evolution curve: in the equilibrium regime the curve is flat, while it has an overall negative slope in the relaxation (non-equilibrium) regime.However, the fluctuations present in equilibrium at T = 0 imply that the calculated slope will always quiver around zero.To compensate for the fluctuations, we fit the energy evolution curve with a Savitzky-Golay polynomial filter of degree equal to one using a window that contains n fit = 20 points.This produces a smoothed curve and a more robust estimate of the slope.Larger values of n fit are likely to cause undesired mixing of the relaxation and the equilibrium regimes.
The maximum number of Monte Carlo sweeps, i max , is optional and can be set to prevent very long equilibration times, lest the convergence is very slow.
Due to the efficient hybrid algorithm employed its practical impact is minimal.
The target acceptance ratio of Metropolis update, A targ , and the variation rate of perturbation control factor, k a , are set to A targ = 0.3 and k a = 3.
Their role is to prevent the Metropolis acceptance rate (particularly at low T ) to drop to very low values, which would lead to computational inefficiency.
Finally, the simulation starts from some initially selected state of the spin angle configuration.Our tests showed that different choices, such as uniform ("ferromagnetic") or random ("paramagnetic") initialization produced similar results.Therefore, we use as default the random state comprising spin angles drawn from the uniform distribution in [0, 2π].While it is in principle possible to adjust the hyperparameters for optimal prediction performance, using default values enables the autonomous operation of the algorithm and controls the computational efficiency.The dynamic hyperparameters, listed in Section 1.3.2 of Algorithm 1, increase the flexibility of the algorithm by automatically adapting to the current stage of the simulation process.

Learning "Data Gaps" by Means of Restricted Metropolis Monte Carlo
MPRS predictions of the values {ẑ p } P p=1 are based on conditional Monte Carlo simulation.Starting with initial guesses for the unknown values, the algorithm Algorithm 1 MPRS learning procedure.The algorithm involves the Update function which is described in Algorithm 2. Φ s is the vector of known spin values at the sample sites.Φ represents the vector of estimated spin values at the prediction sites.G(•) is the transformation from the original field to the spin field and G −1 (•) is its inverse.Ẑ(j), j = 1, . . ., M is the j-th realization of the original field.U(0, 2π) denotes a vector of random numbers from the uniform probability distribution in [0, 2π].SG stands for Savitzky-Golay filter.
⊲ Initialize the equilibrium state for j = 0, . . ., M − 1 do 5.2: Update( Φeq (j + 1), Φeq (j), 1, T ) ⊲ Generate equilibrium realizations 5.3: Ẑ(j + 1) ← G −1 Φeq (j + 1) ⊲ Back-transform spin states end for 6: return Statistics of M realizations Ẑ(j), j = 1, . . ., M updates them continuously aiming to approach an equilibrium state which minimizes the free energy (see Eq. 9).The key to the computational efficiency of the MPRS algorithm is fast relaxation to equilibrium.This is achieved using the restricted Metropolis algorithm, which is particularly efficient at very low temperatures, such as the presently considered T ≈ 10 −3 , where the standard Metropolis updating is inefficient (Loison et al, 2004).
The classical Metropolis algorithm (Metropolis et al, 1953;Robert et al, 1999;Hristopulos, 2020) proposes random changes of the spin angles at the prediction sites (starting from an arbitrary initial state).The proposals are accepted if they lower the energy H(z G; curr ), while otherwise they are accepted with probability p = exp[−H(z G; prop )/T + H(z G; curr )/T ], where z G; curr is the current and z G; prop the proposed states.The restricted Metropolis scheme generates a proposal spin-angle state according to φ prop = φ curr +α(r−0.5),where r is a uniformly distributed random number r ∈ [0, 1) and α = 2π/a ∈ (0, 2π).
The hyperparameter a ∈ [1, ∞) controls the spin-angle perturbations.The value of a is dynamically tuned during the equilibration process to maintain the acceptance rate close to the target set by the acceptance rate hyperparameter A targ .Values of a ≈ 1 allow bigger perturbations of the current state, while a ≫ 1 leads to proposals closer to the current state.
To achieve vectorization of the algorithm and high computational efficiency, we assume that interactions occur between prediction and sampling points in the vicinity of the former but not among prediction points.Moreover, perturbations can be performed simultaneously by means of a single sweep for all the prediction points, which increases computational efficiency (e.g., in the case of the MPR method two sweeps are required).
The learning procedure begins at an initial state ascribed to the prediction points, while the sampling points retain their values throughout the simulation. 1 The prediction points can be initially assigned random values drawn from the uniform distribution.It is also possible to assign values based on neighborhood relations, e.g., by means of nearest neighbor interpolation.
Our tests showed that the initialization has marginal impact on prediction performance but opting for the latter option tends to shorten the relaxation process and thus increases computational efficiency.In Fig. 2 we illustrate the evolution of the energy (Hamiltonian) H(z G ) towards equilibrium using random and nearest-neighbor initial states.The curves represent interpolation on Gaussian synthetic data with Whittle-Matérn covariance (as described in Section 4.1).The initial energy under random initial conditions differs significantly from the equilibrium value; thus the relaxation time (measured in MC sweeps), during which the energy exhibits a decreasing trend, is somewhat longer (≈60 MCS) than for the nearest-neighbor initial conditions (≈40 MCS).
Nevertheless, the curves eventually merge and level off at the same equilibrium value.In order to automatically detect the crossover to equilibrium, i.e. 3 Study Design for Validation of MPRS

Learning Method
The prediction performance of the MPRS learning algorithm is tested on various 1D, 2D, and 3D data sets.In 2D we use synthetic and real spatial data (gamma dose rates in Germany, heavy metal topsoil concentrations in the Swiss Jura mountains, Walker lake pollution, and atmospheric latent heat data over the Pacific ocean).For 1D data we use time series of temperature and precipitation.Finally, in 3D we use soil data.The MPRS performance in 1D and 2D is compared with ordinary kriging (OK) which under suitable conditions is an optimal spatial linear predictor (Kitanidis, 1997;Cressie, 1990;Wackernagel, 2003), and in 3D with inverse distance weighting (IDW).
We compare prediction performance using different validation measures ẑ; p are, respectively, the mean and standard deviation of the predictions for the v-th split.If V = 1 the initial "M" in the validation measures can be dropped.

Validation measure Definition
Mean absolute error
The spatial dependence is imposed by means of the Whittle-Matérn (WM) covariance given by where h is the Euclidean two-point distance, σ 2 is the variance, ν is the smoothness parameter, κ is the inverse autocorrelation length, and K ν (•) is the modified Bessel function of the second kind of order ν.Hereafter, we use the abbreviation WM(κ, ν) for such data.We focus on data with ν ≤ 0.5, which  The cross-validation measures obtained by MPRS and OK for tr = 0.10 are summarized in Table 2. OK produces smaller errors and larger M R than MPRS.However, the relative differences are typically 3%.On the other hand, the CPU time of MPRS is about 8 times smaller than for OK.
In Fig. 3 we present the evolution of all the measures with increasing tr.As expected, for higher tr values both methods give smaller errors and larger M R.
The differences between MPRS and OK persist and seem to slightly increase with increasing tr.While the relative prediction performance of MPRS slightly decreases its relative computational efficiency substantially increases-for tr = 0.8 MPRS is 33 times faster than OK.The computational complexity of OK increases cubically with sample size.In contrast, the MPRS computational cost only depends on P (# of prediction points), which decreases with increasing tr.The fit in Fig. 3e indicates an approximately linear decrease.approximately identical, and for ν 0.3 MPRS outperforms OK.Thus, MPRS seems to be more appropriate than OK for the interpolation of rougher data.

Ambient gamma dose rates
The first two data sets represent radioactivity levels in the routine and the simulated emergency scenarios (Dubois and Galmarini, 2006).In particular, the routine data set represents daily mean gamma dose rates over Germany reported by the national automatic monitoring network at 1, 008 monitoring locations.In the second data set an accidental release of radioactivity in the environment was simulated in the South-Western corner of the monitored area.
These data were used in Spatial Interpolation Comparison (SIC) 2004 exercise to test the prediction performance of various methods.
The training set involved daily data from 200 randomly selected stations, while the validation set involved the remaining 808 stations.Data summary statistics and an extensive discussion of different interpolation approaches are found in Dubois and Galmarini (2006).In total, 31 algorithms were applied to the routine scenario while several geostatistical techniques failed in the The validation measures from MPRS and OK applied to both the routine and emergency data sets are presented in Table 3.The OK code that we used failed to fit the spatial dependence in the routine data; thus, we show the average measures of the two OK approaches reported in Dubois and Galmarini (2006).Comparing the results for OK and MPRS, for the routine data set OK gives slightly better results than MPRS.However, for the emergency data set AE and ARE errors are much smaller for MPRS, while OK gives superior values for the RSE and R validation measures.
In Fig. 7 we compare the MPRS performance with the results obtained with the 31 different approaches reported in Dubois and Galmarini (2006).
This comparison shows that MPRS is competitive with geostatistical, neural network, support vector machines and splines.In particular, for the routine data set MPRS ranked 6th, 8th and 2nd for AE, RSE and R, respectively, and for the emergency data 6th, 13th and 11th.

Jura data set
This data set comprises topsoil heavy metal concentrations (in ppm) in the Jura Mountains (Switzerland) (Atteia et al, 1994;Goovaerts, 1997).In particular, the data set includes concentrations of the following metals: Cd, Co,  errors are on the order of a few percent.The largest differences appear for mean R: the maximum relative difference, reaching over 20%, was recorded for Co. Nevertheless, due to relatively large sample-to-sample fluctuations even in such cases, for certain splits MPRS shows better performance than OK.Fig. 10 shows the R ratios per split for the two methods.In 13 instances MPRS gives larger values than OK.The execution times, presented in Fig. 9e, demonstrate that MPRS is about 18 times faster than OK.

Walker lake data set
This data set demonstrates the ability of MPRS to fill data gaps on rectangular grids.The data represent DEM-based chemical concentrations with units in parts per million (ppm) from the Walker lake area in Nevada (Isaaks and Srivastava, 1989).We use a subset of the full grid comprising a square of size L × L with L = 50.The summary statistics are: N = 2, 500, z min = 0,   smaller than the OK estimates.We recall that the MPRS prediction variance is obtained from M equilibrium realizations; thus, its small magnitude is attributed to the fact that at very low temperatures (e.g., the default value T ≈ 10 −3 ), large fluctuations are strongly suppressed.

Atmospheric latent heat release
This section focuses on monthly (January 2006) means of vertically averaged atmospheric latent heat release (measured in degrees Celsius per hour) measurements (Tao et al, 2006;Anonymous, 2011).The L × L data grid (L = 50)

Time series (temperature and precipitation)
The MPRS method can be applied to data in any dimension d.We demonstrate that the MPRS method provides competitive predictive and computational performance for time series as well.
We consider two time series of daily data at Jökulsa Eystri River (Iceland), collected at the Hveravellir meteorological station, for the period between January 1, 1972 and December 31, 1974 (a total of N = 1, 096 observations) (Tong, 1990).The first set represents daily temperatures (in degrees Celsius) and the second daily precipitation (in millimeters).The time series and the respective The interpolation validation measures and computational times for MPRS and OK are listed in Table 6.The results are based on 100 randomly selected training-validation splits which include trN points.For the temperature data, the MPRS performance relative to OK is similar as for the 2D spatial data.
However, in the case of precipitation MPRS returns a lower MAE than OK for tr = 0.33, while for tr = 0.66 MPRS is clearly better for all measures.
This observation agrees with the results for the synthetic spatial data, i.e., the relative performance of MPRS improves for strongly non-Gaussian data (cf.Fig. 5 which displays relative errors for lognormal data with gradually increasing s z ).In this case we compare MPRS with the IDW method (Shepard, 1968) using a power exponent equal to 2 and unrestricted search radius.As evidenced in the validation measures (Table 7), MPRS outperforms IDW in terms of prediction accuracy.The relative differences change from a few percent for tr = 0.33 to ∼ 12% for tr = 0.66.For this particular data set, IDW is computationally more efficient than MPRS.However, this is due to the limited data size.With increasing N the relative computational efficiency of MPRS will improve and eventually outperform IDW, since the computational time for the former scales as O(P ), while for the latter as O(P N ) [e.g., see comparison of MPR and IDW ( Žukovič and Hristopulos, 2018)].

Discussion and Conclusions
We proposed a machine learning method (MPRS) based on the modified planar rotator for spatial regression.The MPRS method is inspired from statistical physics spin models and is applicable to scattered and gridded data.The MPRS prediction performance is competitive with standard methods, such as ordinary kriging and inverse distance weighting.For data that are spatially smooth or close to the Gaussian distribution, standard prediction methods overall show better prediction performance.The relative MPRS prediction performance improves for data with rougher spatial or temporal variation, as well as for strongly non-Gaussian distributions.For example, MPRS performance is quite favorable for daily precipitation time series which involve large number of zeros.
The MPRS method is non-parametric: it does not assume a particular distribution, grid structure or dimension of the data support.Moreover, it can operate fully autonomously, without user input (expertise).A significant advantage of MPRS is its superior computational efficiency and scalability with respect to sample size.These features render it suitable for massive data sets.
The required CPU time increases only linearly with the size of the prediction set and does not depend on the sample size.The high computational efficiency also benefits from the full vectorization of the MPRS prediction algorithm.
Thus, data sets involving millions of nodes can be processed in terms of seconds on a typical personal computer.
Possible extensions include generalizations of the MPRS Hamiltonian (2).
For example, spatial anisotropy can be incorporated by introducing directional dependence in the exchange interaction form (6). Another extension could include an external polarizing field to generate spatial trends.Such a generalization would involve additional parameters that could be estimated by means of cross-validation, increasing the computational cost.This approach was applied to the MPR method on 2D regular grids, and it was shown to achieve substantial benefits in terms of improved prediction performance ( Žukovič and Hristopulos, 2023).Finally, the training of MPRS can be extended to include the estimation of optimal values for the model parameters.This will improve the predictive performance at the expense of some computational cost.

Fig. 1 :
Fig.1: Schematic illustration of the interactions of ith prediction point with (a) its four nearest neighbors (including sampling and prediction points) via the constant interaction parameter J in MPR and (b) its n b = 8 nearest neighbor (only sampling) points via the mutual distance-dependent interaction parameter J i,j in MPRS.Blue open and red filled circles denote sampling and prediction points, respectively, and the solid lines represent the bonds.

Fig. 2 :
Fig. 2: Energy evolution curves starting from random (red dashed curve) and nearest-neighbor interpolation (blue solid curve) states.The simulations are performed on Gaussian synthetic data with m = 150, σ = 25 and Whittle-Matérn covariance model WM(κ = 0.2, ν = 0.5), sampled at 346, 030 and predicted at 702, 546 scattered points (non-coinciding with the sampling points) inside a square domain of length L = 1, 024.The inset shows a detailed view focusing on the nonequilibrium (relaxation) regime.
Data are generated at N = 10 3 random locations within a square domain of size L × L, where L = 50.Assuming tr represents the percentage of training points, from each realization we remove ⌊trN ⌋ points to use as the training set.The predictions are cross validated with the actual values at the remaining locations.For various tr values we generate V = 100 different sampling configurations.

Fig. 3 :
Fig. 3: Dependence of MPRS and OK validation measures on the ratio of training points tr.The measures are calculated from 100 realizations of a Gaussian random field with m = 150, σ = 25 and covariance model WM(κ = 0.2, ν = 0.5); the field is sampled at 1, 000 scattered points inside a square domain of length L = 50.

Fig. 4 :
Fig. 4: Dependence of the ratios of MPRS and OK validation measures on the smoothness parameter.The measures are calculated based on 100 realizations of a Gaussian random field with m = 150, σ = 25 and the covariance model WM(κ = 0.2, ν), sampled at 1, 000 scattered points inside a square domain of length L = 50.Panels (a) and (b) show the results for tr = 0.33 and 0.66, respectively.

Fig. 5 :Fig. 6 :
Fig.5: Dependence of the ratios of MPRS and OK validation measures on the random field skewness (controlled by σ).The measures are calculated from 100 realizations of a lognormal random field with m = 0 and covariance model WM(κ = 0.2, ν = 0.5), sampled at 1, 000 scattered points inside a square domain of length L = 50, for tr = 0.33.The inset shows, as an example, the data distribution for σ = 1.

Fig. 7 :
Fig.7: Values of AE, RSE and R obtained for the routine and emergency data sets by means of 31 interpolation methods reported in the SIC2004 exercise (circles, squares and diamonds) and the MPRS approach (red crosses).The numbers in parentheses denote the ranking of MPRS performance for the particular validation measure with respect to all 32 methods.

Fig. 8 :Fig. 9 :Fig. 10 :
Fig. 8: Histogram of cadmium soil contamination concentrations from the Jura data set.The inset shows the measurement locations.

Fig. 12 :
Fig. 12: Visual comparison of (a) MPRS and (b) OK interpolated maps and the corresponding prediction standard deviations, shown below in (c) and (d), respectively, for the Walker lake data with tr = 33%.The points with zero standard deviation predominantly coincide with the sample locations.

Fig. 13 :
Fig. 13: Monthly averaged latent heat release data measured in degrees Celsius per hour; (a) spatial distribution and (b) histogram.

Fig. 14 :
Fig. 14: Visual comparison of (a) MPRS and (b) OK interpolated maps (latent heat data) and corresponding prediction standard deviations (c,d) for a tr = 0.33 training-validation split.

Fig. 15 :
Fig. 15: Time series (a,c) and respective frequency histograms (b,d) of daily temperature (a,b) and precipitation (c,d) at Jökulsa Eystri River (Iceland) for the period between January 1, 1972 and December 31, 1974.
Spatial correlations are captured via distance-dependent short-range spatial interactions.The method is inherently nonlinear, as evidenced in the energy equations (2) and (5).The model parameters and hyper-parameters are fixed to default values for increased computational performance.Training of the model is thus restricted to equilibrium relaxation which is achieved by means of conditional Monte Carlo simulations.
Setting of parameters and hyperparameters 1.2: Model and simulation: n b : # of bonds between each point and neighboring samples; T : simulation temperature; b: vector of decay rates; 1.3: Setting of static hyperparameters 1.3.1:Set the following to fixed values: M : # of equilibrium configurations (length of averaging sequence) for statistics collection; imax: maximum # of MC sweeps in non-equilibrium stage; A targ : target acceptance ratio of Metropolis update; ka: variation rate of perturbation control factor a; n 1: f : verification frequency of equilibrium conditions; n fit : # of fitting points of energy evolution function; 1.3.2:Initializedynamichyperparameters:slope of energy evolution function k(0) ← −1; spin perturbation control factor a(0) ← 1; simulated state counter i ← 0; 2: Data transformation 2.1: Φs ← G(Zs) using (4)⊲ Set data spin angles 3: Interaction neighborhood 3.1: Identify n b neighbors ⊲ For each prediction point find n b neighbors from Algorithm 2 RestrictedMetropolis updating algorithm (non-vectorized version).Φold is the initial spin state, and Φnew is the new spin state.Φold −p is the initial spin state excluding the point labeled by p. U (0, 1) denotes the uniform probability distribution in [0, 1].

.
To assess the computational efficiency of the methods tested we record the CPU time, t cpu for each split.The mean computation time t cpu over all training-validation splits is then calculated.The MPRS interpolation method see Table1).The complete data sets are randomly split into disjoint training and validation subsets.In most cases, we generate V = 100 different trainingvalidation splits.Let z(s p ) denote the true value and ẑ(v) (s p ) its estimate at s p for the configuration v = 1, . .., V .The prediction error ǫ (v) (s p ) = z(s p ) − ẑ(v) (s p ) is used to define validation measures over all the training-validation splits as described in Table1

Table 1 :
Validation measures used to assess the prediction performance of MPRS and other methods.The measures are defined as double averages: the first average is over the sites of the validation set while the second average is over all V training-validation splits.The z p and σ z; p are the mean and

Table 2 :
Cross-validation measures for MPRS and OK based on 100 realizations of a Gaussian random field with mean m = 150, standard deviation σ = 25 and covariance model WM(κ = 0.2, ν = 0.5), sampled at 1, 000 scattered points inside a square of length L = 50.100 points are randomly selected as the training set (tr = 0.10) and the values at the remaining 900 points are predicted.

Table 3 :
Dubois and Galmarini (2006)asures (obtained from Table1for V = 1) for the MPRS and OK methods applied to the routine and emergency SIC2004 data sets.OK* results for the routine data set are taken fromDubois and Galmarini (2006).

Table 4 :
Interpolation validation measures for MPRS and OK.The measures are based on 100 randomly chosen training sets that include tr N of the N = 2, 500 points (Walker lake data set).

Table 5 :
Interpolation validation measures for MPRS and OK based on 100 randomly chosen training-validation splits; the training set includes trN points where N = 2, 500 (latent heat data set).

Table 6 :
Interpolation validation measures for MPRS and OK based on 100 randomly selected training-validation splits.The training sets include trN points (N = 1096 and tr = 0.33, 0.66) from the daily temperature and precipitation time series.

Table 7 :
Interpolation validation measures for the MPRS and OK methods based on 100 randomly chosen training sets including tr% of the total number N = 178 of points in calcium and magnesium contents in soil samples at the 0-20cm soil layer.