1 Introduction

The Gravity Recovery and Climate Experiment (GRACE, 2002–2017) and its successor the GRACE Follow-On (GRACE-FO, since May 2018) missions observe temporal variations of the Earth’s gravity field. These variations are primarily caused by the redistribution of mass by the various branches of the global water cycle. Satellite gravimetry is the only remote sensing system that directly senses changes in water masses independently of its surface expose, such that, in combination with other in-situ or model data, even groundwater changes can be detected (Rodell et al. 2009). GRACE has also been used to monitor ice-sheet and glacier mass loss (Schrama et al. 2014), to quantify drought conditions (Jäggi et al. 2019; Boergens et al. 2020), flood risk (Reager et al. 2014), or to assess the reliability of in situ snow fall observations (Behrangi et al. 2018). Over the oceans, GRACE directly observes barystatic sea-level changes (Chambers 2006), temporal variations in the transport of major current systems like the Antarctic Circumpolar Current (Bergmann and Dobslaw 2012), or wind-driven changes in the Arctic circulation (Peralta-Ferriz et al. 2014). The most important accomplishments of GRACE have recently been summarised by Tapley et al. (2019).

GRACE data further allow to improve numerical models of individual aspects of geophysical fluid dynamics, such as global hydrology (e.g. Güntner 2008; Eicker et al. 2014; Kumar et al. 2016), ocean circulation (e.g. Köhl et al. 2012), or ocean tide models (e.g. Mayer-Gürr et al. 2012) by means of validation and subsequent parameter calibration. Increasingly GRACE data are assimilated into numerical models by directly updating state variables (Zaitchik et al. 2008; Schumacher et al. 2018). It is important to note that data assimilation techniques critically rely on realistic uncertainty estimates for both model and observations to arrive at an optimal combination between the two sources of information. These uncertainty estimates need to reflect any significant correlations in both time and space.

Inverting a global gravity field by means of a Gauss–Markov model from satellite sensor data of just 30 days does not only provide estimates for the parameters of a spherical harmonics expansion, but also the associated covariances. These so-called “formal uncertainties” are known to be too small. Kvas et al. (2019) emphasise that in particular uncertainties of the time-variable geophysical background models for tidal and non-tidal mass variability in both atmosphere and ocean need to be considered explicitly to avoid too optimistic uncertainty estimates. With prior information for both spatial and temporal correlations of these background model uncertainties (Dobslaw et al. 2016) and the method of variance component estimation (Koch and Kusche 2002), fairly realistic uncertainty covariances can be achieved (Kvas et al. 2019). Variances derived by this method are currently only available for the gravity field series processed by the team at the Technical University in Graz as presented by Kvas et al. (2019).

These depicted flaws of the formal uncertainties were already suspected shortly after the launch of GRACE, leading to alternative methods being developed to quantify uncertainties. Wahr et al. (2006) subtracted trend, annual, and semi-annual harmonics from the monthly sampled series of mass anomalies and used the variance of the residuals as an upper bound of the variance for a certain averaging region. The method considers all real deviations from the climatology as noise and is therefore somewhat pessimistic. This so-called empirical approach allows to calculate variances, but does not provide any means to characterise spatial dependencies which thus need to be estimated from other sources. For this purpose, Swenson et al. (2003) proposed to model spatial covariances for GRACE data with different azimuthally symmetric covariance functions. Subsequently, Landerer and Swenson (2012) applied a stationary isotropic squared exponential covariance model which has been also used later by Zhang et al. (2016).

It is important to recall that the covariance functions introduced above can be evaluated efficiently in the spatial domain so that variances for any arbitrarily shaped region can be derived from the combination with (spatially and/or temporally variable) variance information. The spatial covariance model by Landerer and Swenson (2012) is therefore employed by JPL’s TELLUS website which is a popular access point to GRACE and GRACE-FO data for non-geodetic users interested to utilise satellite gravimetry data for their specific area-of-interest. With this paper, we present a new covariance model that also properly reflects the changes in mission performance of GRACE and GRACE-FO throughout the years in order to derive realistic variances for user-defined averaging regions from gridded mass anomaly data and their associated variances. The covariance model consists of a time independent (stationary) correlation model and time dependent (non-stationarity) empirical point standard deviations. Thus, the time dependence of the latter relies on the data, i. e., in the here presented study on the simulated data.

We develop and test the new covariance model with the help of synthetic GRACE-FO TWS data from a full-scale simulation over a nominal mission lifetime of 5 years (Flechtner et al. 2016). The simulations start from a synthetic time-variable gravity field (Dobslaw et al. 2015) to generate high-resolution orbits for both spacecraft. Simulated observations of all relevant sensors are combined with realistic assumptions about the individual error characteristics and are subsequently used to calculate series of monthly-mean gravity fields. We consider gravity field retrievals using the microwave K-band range-rate data only. The processing choices of the simulation conform with the GFZ RL05 standards (Dahle et al. 2013). Within the simulation environment, the performance of the covariance model can be validated by contrasting estimated uncertainties against true uncertainties available from the differences of the retrieval with respect to the synthetic input gravity field. Thus, while our new covariance model is only evaluated for the simulated data, we assume that the results can later be transferred to real GRACE and GRACE-FO monthly global gravity fields.

This article is structured as follows: We will initially review various sources of uncertainty that contribute to deficits in time-series of monthly-mean gravity fields available from GRACE and GRACE-FO and will explain why spatial correlations should be treated as non-homogeneous, anisotropic, and non-stationary (Sect. 2). Subsequently, we design the functional model for a new spatial covariance model and discuss its mathematical properties (Sect. 3). Numerical parameters for this covariance model will be estimated from 5 years of simulated GRACE-FO data (Sect. 4) that allow for a direct validation of our approach to the true error available from that simulation. The paper closes with a discussion of the results and some conclusions for the application of the covariance model to real data and the implementation into GFZ’s GravIS portal (Sect. 5).

2 Contributions to the GRACE and GRACE-FO error budget

In a strict statistical sense, errors are a measure of accuracy, whereas uncertainties are a measure of precision. True errors are typically not readily accessible in the real world, and are therefore often approximated by (variance propagated) uncertainties which is the actual subject of our modelling efforts in this article. Still, the term error is often loosely used to describe also uncertainty. A number of sources contribute to the overall errors and uncertainties of GRACE and GRACE-FO observed monthly gravity fields. The highly precise satellite-to-satellite tracking (SST) data are only sampled along the near polar orbit, i. e., in longitudinal direction . Thus, the correlation length is shorter in cross-track than in along-track direction. We will refer to this direction-dependency of the spatial correlations as anisotropy in this paper.

Both missions are operating in non-repeat orbits with slowly decaying altitudes. For individual months, however, the orbit may degenerate into an approximate repeat configuration, e.g., a 2-day repeat in February 2015. For these months the spatial resolution of the resulting gravity field is much lower than for an average month. During the course of the mission, some sensors aboard GRACE and GRACE-FO exhibit time-variable uncertainties that eventually exceed the measurement uncertainty of the microwave ranging system. Towards the end of the GRACE mission, battery cell failed which led to a limited power availability so that thermal control was reduced step-by-step after 2010 affecting the precision of various instruments. More severely, accelerometer data from GRACE-B became unavailable after November 2016 (in order to reduce the load on the GRACE-B battery). For this reason non-gravitational accelerations measured aboard the other spacecraft have been transplanted to the position of GRACE-B (Bandikova et al. 2019). Note that a similar approach is also being applied to GRACE-FO data due to the under-performance of the accelerometer aboard GRACE-D. We refer to this time-dependency of the uncertainties as non-stationarity in this paper.

It is further important to recognise that the GRACE and GRACE-FO missions are subject to spatially and temporally variable environmental influences. The solar cycle of typically 12 years duration leads to variations in radiation pressure acting on the spacecraft, with consequences for the ratio between gravitational and non-gravitational forces as recorded by the accelerometers. Particularly quiet solar conditions led to exceptionally good gravity models during the years 2004–2008. Also geophysical background models utilised during gravity field inversion to remove high-frequency mass variations caused by tides or non-tidal processes in atmosphere and oceans are commonly treated as error-free a priori models which is obviously not the case (Dobslaw et al. 2016). It is fair to assume that in particular uncertainties in the background models are dependent on the geographic location, since tide models are typically less accurate at the coasts and in polar seas. We therefore expect that the uncertainties have a spatial variation which we denote as non-homogeneity in this article. Please note that in statistical literature, homogeneity is often used to describe the translational invariance of the statistical properties of a stochastic process. In geodesy, the term is widely used instead to describe stationarity in space (Rummel and Schwarz 1977; Meier and Keller 1990), so that we use it here in this way in order to distinguish more clearly between the spatial and the temporal domain.

All these different contributions lead to random and systematic uncertainties that have pronounced spatial correlations. It should be clear from the points above that the uncertainties and their spatial correlations cannot be expected to be Gaussian, but will have very specific structures. Most prominently, monthly GRACE and GRACE-FO fields synthesised from unconstrained spherical harmonics have a strong latitude-dependent North-South striping error pattern which needs to be removed with anisotropic filtering methods (Swenson and Wahr 2006; Kusche et al. 2009, e.g.). However, even after post-processing the GRACE and GRACE-FO data with such filters the correlation structure in the spatial domain remain both anisotropic and non-homogeneous, particular in the latitudinal direction. Additionally, the temporal variations of the uncertainty contributions listed above indicate a non-stationarity of the spatial covariances. Even the minor changes in instrumentation between GRACE and GRACE-FO will inevitably lead to non-stationarities in the data.

3 Design of a spatial covariance model

We design a covariance model for TWS gridded GRACE data that account for anisotropy, spatial non-homogeneity, and potentially temporal non-stationarity. We request the covariance model to be a valid covariance function on the sphere which does not hold for many well-known covariance models in the two dimensional vector space \({\mathbb {R}}^2\) (Huang et al. 2011). More information on the validity of the covariance model will be given in Sect. 3.1. We therefore combine different approaches to address all these objectives.

Jun and Stein (2007, (2008) have designed a class of covariance models for latitudinal non-homogeneous covariance functions on spheres by scaling homogeneous covariance models with latitude-dependent Legendre polynomials. With Legendre functions this concept can be extended to accommodate also longitudinal non-homogeneity. Different ideas for anisotropic covariance functions in \({\mathbb {R}}^2\) have been developed for homogeneous data in Ecker and Gelfand (2003), or for non-homogeneous data in Darbeheshti and Featherstone (2009). The approach of Mateu et al. (2008) has also included space-time relations but again only for homogeneous data. The anisotropy of our covariance model will be realised with a orientation-dependent shape parameter allowing for shorter correlation lengths in the latitudinal direction. The residual stripings in the correlation structure can be modelled with a wave- or hole-effect correlation function (Ma and Jones 2001). To this end, we utilise a correlation function based on a Bessel function of the first kind following Yaglom (1987).

It is important to recall that we focus in this study only on modelling of covariances between two arbitrary locations on the sphere, not covariances in time. The covariance between two values of a certain functional of the gravity field ij at their respective locations \((\lambda _i, \theta _i)\) and \((\lambda _j, \theta _j)\), consists of the correlation structure \(C_1\) which is time-independent, and the two time-dependent point uncertainties \(\sigma \):

$$\begin{aligned} C(\lambda _i, \theta _i, \lambda _j, \theta _j, t) = \sigma _i(t) \sigma _j(t) C_1(\lambda _i, \theta _i, \lambda _j, \theta _j). \end{aligned}$$
(1)

For the correlation model \(C_1\) we assume homogeneity with respect to longitude. For brevity, from now on we will write \(f(\cdot )=f(\lambda _1, \theta _1, \lambda _2, \theta _2) \).

Following Jun and Stein (2008), we construct the correlation model as the latitude-dependent rescaled version of a basis correlation function \(C_0\):

$$\begin{aligned} C_1(\cdot ) = P(\theta _1)P(\theta _2)C_0(\cdot ), \end{aligned}$$
(2)

with

$$\begin{aligned} P(\theta ) = \sum _{i=0}^4 k_{2i}P_{2i}(\theta ) \end{aligned}$$
(3)

and \(P_n(\theta )\) the Legendre polynomials of degree n. Here, we only use the even numbered Legendre polynomials (up to degree 8) in order to ensure symmetry of the correlation model at the equator. We found this degree to be sufficient for the modelling considering the stability of the subsequent numerical parameter estimation.

In the next step, we characterise \(C_0\). Even though the GRACE and GRACE-FO data are already filtered to remove the longitudinal stripes, residuals of these can still be detected in the correlation structure as ripples in East-West direction. To model this wave-like effect, we use the Bessel function of the first kind \(J_{\nu }\) of order \(\nu \):

$$\begin{aligned} C_0(\cdot ) = \frac{c_0}{a(\cdot )d(\cdot )}J_{\nu }(a(\cdot )d(\cdot ) , \end{aligned}$$
(4)

where \(c_0\) is a global scaling factor of the function, and \(d(\cdot )\) is the spherical distance between the locations. This function was developed by Yaglom (1987) for \({\mathbb {R}}^2\), but we will later show the validity of the covariance function on the sphere. The width parameter \(a(\cdot )\) is substituted with a function that allows for the anisotropy of the correlation model:

$$\begin{aligned} a(\cdot ) = a_0(\cdot )\sin (\phi (\cdot ))+a_1(\cdot ). \end{aligned}$$
(5)

Here, \(\phi (\cdot )\) is the azimuth angle of point i to point j counted clockwise from north, and \(a_0\), \(a_1\) latitude-dependent with

$$\begin{aligned} a_0(\cdot ) = P^a(mean\{\theta _1, \theta _2\})A_0 \quad \text { with } P^a(\theta ) = \sum _{i=0}^4 k^{a0}_{2i}P_{2i}(\theta ) \end{aligned}$$
(6)

and

$$\begin{aligned} a_1(\cdot ) = P^a(mean\{\theta _1, \theta _2\})A_1 \quad \text { with } P^a(\theta ) = \sum _{i=0}^4 k^{a1}_{2i}P_{2i}(\theta ). \end{aligned}$$
(7)

\(A_0\) and \(A_1\) represent the width parameter of the function before applying latitudinal scaling describing the anisotropic (\(A_0\)) and isotropic (\(A_1\)) part of the function, respectively. Note that this definition of the anisotropy degenerates to an isotropic function at the poles.

The parameters \(k_0\) and \(c_0\) are linearly dependent and thus not separable. The same applies for \(A_0\) and \(k^{a0}_0\) and \(A_1\) and \(k^{a1}_0\), so that we can set \(k_0=k^{a0}_0=k^{a1}_0=1\). Further, we fix \(\nu = 2\). All together, the correlation model is defined by 15 parameters

$$\begin{aligned} \left\{ A_0, k^{a0}_2, k^{a0}_4, k^{a0}_6, k^{a0}_8, A_1,k^{a1}_2, k^{a1}_4, k^{a1}_6, k^{a1}_8, c_0, k_2, k_4, k_6, k_8\right\} . \end{aligned}$$

Note that we assume that the spatial correlation does not change over time. The previously discussed non-stationarity of the overall GRACE observing system is introduced only by the point uncertainties \(\sigma \) in Eq. 1.

3.1 Validity of the covariance function on the sphere

A covariance function is valid if the function is positive (semi-)definite. In our case, it has to be valid on an arbitrary spherical surface \({\mathbb {S}}^3\) which does not hold for many covariance functions defined in \({\mathbb {R}}^2\) (Guinness and Fuentes 2016). According to Gneiting (2013) a function \(C:{\mathbb {S}}^3\times {\mathbb {S}}^3\rightarrow {\mathbb {R}}\) is positive semi-definite if the matrix

$$\begin{aligned} A= \left( a_{i,j}\right) _{i,j=1}^n , \quad a_{i,j} = C(\mathbf {x_i}, \mathbf {x_j}) \end{aligned}$$
(8)

is positive semi-definite for any n, locations \({\mathbf {x}}_1, \ldots , {\mathbf {x}}_n \in {\mathbb {S}}^3\).

Due to the complicated analytically expression of the covariance function, it is not possible for us to prove the positive definiteness. However, we may use the definition in Eq. 8 for an empirical test: To check any square matrix on its definiteness it needs to be decomposed with an eigenvalue decomposition. If all eigenvalues are larger or equal to zero the matrix is positive semi-definite. The validity of a covariance function on the sphere does not depend on the chosen coordinate system and, thus, the presented covariance could be transformed to any other coordinate system without loss of its properties.

To test all realisations of covariance models used in this study for their positive-definiteness, we set up Monte-Carlo simulations. For this, we build 10.000 times the matrix A with 100 randomly distributed global locations \({\mathbf {x}}_i\) and test in each case the positive definiteness of the matrix. If all realisations of the matrix are positive definite, we conclude that the covariance function is positive definite and the covariance function is valid on the sphere.

4 Synthetic GRACE-FO data

The numerical parameters of the covariance model introduced above could be directly estimated from GRACE and GRACE-FO monthly gravity information. In this paper, however, we revert to synthetic gravity fields from a full-scale satellite simulation where true errors are readily available from a comparison with the time-variable input gravity field, so that the new method presented here can be thoroughly validated. The simulated data set consists of a time series of 60 monthly solutions (covering 5 years) provided in terms of spherical harmonics up to degree and order 100 Flechtner et al. (2016). We will follow two distinct paths that lead to four different realisations of covariances as shown in Fig. 1. The first path uses the formal variances of the Stokes coefficients, while the second employs so-called empirical covariances obtained from the gridded TWS signals.

Fig. 1
figure 1

Flow-chart of the processing performed for this study. For each block, the corresponding section of the paper describing the particular analysis step is given in parentheses

4.1 Formal variances

The Stokes coefficients of the simulated data are provided together with calibrated formal standard deviations. These standard deviations are now propagated to the spatial domain to derive gridded variances and covariances.

4.1.1 Formal variance propagation

Standard deviations of the Level-2 Stokes coefficients are variance propagated through the same post-processing process as the signal coefficients themselves, namely through the anisotropic spatial filtering and the spherical harmonic synthesis (step F1 in Fig. 1). The relationship between unfiltered Stokes coefficients \(a_{nm}\) (\(a_{nm}=c_{nm}\) if \(m\ge 0\), \(a_{nm}=s_{n-m}\) if \(m< 0\)), with their covariance matrix \( \Sigma \{a_{nm}\} = diag(\sigma _{a_{nm}}^2) \), and filtered \(a_{nm}^\alpha \) instances is

$$\begin{aligned} a_{nm}^\alpha = W^\alpha a_{nm} \end{aligned}$$
(9)

with the filter matrix \(W^\alpha \), thus the filtered variances and covariances are, according to variance propagation law,

$$\begin{aligned} \Sigma \{a_{nm}^\alpha \} = W^\alpha \Sigma \{a_{nm}\} (W^\alpha )^T. \end{aligned}$$
(10)

We ignore the off-diagonals of the covariance matrix of the filtered coefficients \(\Sigma \{a_{nm}^\alpha \}\), since tests revealed that the variances are several orders of magnitude larger than the covariances, so that this simplification does not affect the subsequent results.

For the propagation of the filtered coefficient variances to the grid with k grid points and up to degree and order \(n_{max}\), again a variance propagation is applied:

$$\begin{aligned} \Sigma \{tws\} = A \Sigma \{a_{nm}^\alpha \} A^T, \end{aligned}$$
(11)

with

$$\begin{aligned} \begin{aligned} A&=\frac{1}{\rho _w}\frac{M}{4\pi R^2}\begin{pmatrix} A_{j}(\lambda _i, \theta _i) \end{pmatrix}_{i=1, j=(0,0)}^{k,(n_{max}, n_{max})}\\ A_{nm}(\lambda _i, \theta _i)&= {\left\{ \begin{array}{ll} \frac{2n+1}{1+k_n'} c_{nm}P_{nm}(\sin \theta _i)\cos m\lambda _i &{} m\ge 0\\ \frac{2n+1}{1+k_n'} s_{n-m}P_{n-m}(\sin \theta _i)\sin m\lambda _i &{}m<0 \end{array}\right. }. \end{aligned} \end{aligned}$$
(12)

These formal spatial covariances can directly be used to determine the covariance between two points. The resulting first candidate realisation will be called formal covariance, \(C_f\), in the following.

4.1.2 Estimated correlation model from formal variances

As an alternative to the direct approach facilitating the formal covariances, we utilise just the formal correlations to estimate a correlation model.

The parameter estimations of the correlation model (cf. Sect. 3) lead to several non-significant parameters as the formal correlations do neither reflect anisotropy nor non-homogeneity. Thus, the correlation model can be simplified to

$$\begin{aligned} C_1(\cdot ) = \frac{c_0}{a\cdot d(\cdot )}J_{\nu }(a \cdot d(\cdot )). \end{aligned}$$
(13)

Model parameters are fitted as \(c_0 = 4.01\times 10^{-3}\) and \(a = 7.98\) from the 5 years of simulated GRACE-FO data. The resulting covariance function is successfully tested for positive definiteness on the sphere following the method described in Sect. 3.1. Thus, the combination of the formal correlation model and formal uncertainty is a valid covariance model on the sphere. This second realisation will be called modelled formal covariance (\(C_{f,m}\)) in the following.

The spatial structure of this estimated correlation model reveals that for the formal uncertainties the wave effect is negligible. Only at distances above 2000 km the effect becomes visible (Fig. 2, first column).

Fig. 2
figure 2

Spatial correlation models derived for different latitudes (0\(^\circ \) 30\(^\circ \), 60\(^\circ \)) from formal uncertainties (left column) and empirical signal correlations (right column) as available from 60 months of simulated GRACE-FO gravity fields. Note that the formal uncertainties do not reflect any latitudinal dependency

4.2 Empirical variances

As an alternative to the formal variances, the gridded TWS signal itself is used to calculate empirical correlations from which the parameters of the spatial correlation model are estimated. This is depicted as the second path in Fig. 1. In order to employ the covariance model, point variances are necessary according to Eq. 1. Here, we test two empirical realisations of the variances that vary either spatially or temporally.

4.2.1 Empirical variance and correlation estimation

In line with Wahr et al. (2006), empirical correlations are estimated from the gridded simulated TWS data \(Z_{sim}(\lambda , \theta , t)\in {\mathcal {Z}}_{sim}\) from which the deterministic trend, annual, and semi-annual signals are removed. We assume the empirical correlations \({{\widehat{C}}}\) to vary only with latitude \(\theta \) and that the empirical correlations are anisotropic. Thus, we estimate them for each latitude depending on distance dist and direction dir. Further, we assume symmetry at the equator such that \({\widehat{C}}_{\theta } = {\widehat{C}}_{-\theta }\) and an axial symmetry along longitudinal direction. We estimate average empirical correlations from all 60 months available.

To this end we define for a fixed latitude \(\theta \) and fixed (distdir)

$$\begin{aligned} N_\theta (dist, dir)&:= \Big \{(\lambda _1, \theta _1, \lambda _2, \theta _2) \in ([0^\circ ,360^\circ ] \times [-90^\circ ,90^\circ ])^2 : \\&\quad Z_{sim}(\lambda _1, \theta _1) \in {\mathcal {Z}}_{sim}, Z_{sim}(\lambda _2, \theta _2) \in {\mathcal {Z}}_{sim},\\&\quad abs(\theta _1) =\theta , abs(\theta _2-\theta _1)\le 20^\circ , abs(\lambda _2-\lambda _1)\le 15^\circ ,\\&\quad d(\lambda _1, \theta _1,\lambda _2, \theta _2)=dist, \phi (\lambda _1, \theta _1,\lambda _2, \theta _2)=dir\Big \}, \end{aligned}$$

where \(d(\lambda _1, \theta _1,\lambda _2, \theta _2)\) is the spherical distance and \(\phi (\lambda _1, \theta _1,\lambda _2, \theta _2)\) the spherical azimuth between the points.

The empirical covariances for each latitude are then

$$\begin{aligned} {\widehat{C}}_{\theta }(dist, dir) = \frac{1}{{\widehat{Cov}}_{\theta }(0,0)} \frac{1}{|N_\theta (dist, dir)|} \sum _{\begin{array}{c} (\lambda _1, \theta _1, \lambda _2, \theta _2)\\ \in N_\theta (dist, dir) \end{array}} Z_{sim}(\lambda _1, \theta _1) Z_{sim}(\lambda _2, \theta _2) , \end{aligned}$$
(14)

where \(|N_\theta (dist, dir)|\) is the cardinality of \(N_\theta (dist, dir)\) and \( {\widehat{Cov}}_{\theta }(0,0)\) the empirical variance at latitude \(\theta \). We do not estimate \({\widehat{C}}_{\theta }(dist, dir)\) for \(\theta \ge 70^{\circ }\) as the area of land in the Arctic is too small for a reliable estimation.

4.2.2 Estimated correlation model from empirical correlations

The parameters of the correlation function are now fitted to the empirical correlations. This step is depicted in the flowchart of Fig. 1 as E22. For the estimation we apply the numerical Sequential Least SQuares Programming (SLSQP) solver which allows to introduce parameter bounds and constraints in the optimisation. The two parameters \(c_0\) and \(A_1\) are bound-limited to zero, meaning that they have to be positive in any case. Additionally, a constraint is introduced by the function \(a(\cdot )=a_0(\cdot )+a_1(\cdot )\ge 0\). We found that the latitude dependence of \(a_1(\cdot )\) is not significant for the simulated data, leading to \(k^{a1}_2=k^{a1}_4=k^{a1}_6=k^{a1}_8=0\) and \(a_1(\cdot ) = A_1\). All estimated parameters are given in Table 1.

Table 1 Numerical values for the parameters of the covariance model obtained from fitting against 60 months of synthetic GRACE-FO data

The correlation function is tested for its validity on the sphere and is found to be positive definite. We note that the spatial structure of the correlation model changes significantly with latitude, both for the maximum amplitude and the shape. At the equator and at \(60^{\circ }\) latitude the function is rather isotropic, while at \(30^{\circ }\) latitude the half value distance is 2000 km in longitude compared to 500 km in latitudinal direction. The wave effect of the model is also most pronounced at \(30^{\circ }\) latitude (Fig. 2, second column).

To model the covariance between two points, each point’s uncertainty is required according to Eq. 1. Note that both uncertainties originate from the residual gridded TWS data with all deterministic signals removed. A first option considered here, \(\sigma _s(\lambda , \theta )\), is the standard deviation of the residual data at each grid point over the time series:

$$\begin{aligned} \sigma _s(\lambda , \theta ) = \sqrt{\frac{1}{n-1}\sum _{i}^n \left( Z_{sim}^{red}(\lambda , \theta ,t_i)-\overline{Z_{sim}^{red}(\lambda , \theta )}\right) ^2}. \end{aligned}$$
(15)

Thus, this uncertainty varies in space but not in time. The resulting third realisation is called empirical-spatial covariance (\(C_{e,s}\)). For the other option, \(\sigma _t(t)\), is the standard deviation for all grid points at a given time step:

$$\begin{aligned} \sigma _t(t) = \sqrt{\frac{1}{n-1}\sum _{i}^n \left( Z_{sim}^{red}(\lambda _i, \theta _i,t)-\overline{Z_{sim}^{red}(t)}\right) ^2}. \end{aligned}$$
(16)

In this fourth realisation of the new model, the uncertainty is time-variable but spatially constant, and therefore will be called empirical-temporal covariance (\(C_{e,t}\)).

4.3 Uncertainty for river basins

The four different candidate realisations of the new covariance model (\(C_{f}\), \(C_{f,m}\), \(C_{e,s}\), \(C_{e,t}\)) will now be used to calculate uncertainties of TWS change in arbitrarily shaped region such as river basins (Landerer and Swenson 2012). The variance of the region-wide TWS change is calculated as

$$\begin{aligned} var(t) = \sum _{i=1}^m \sum _{j=1}^m w_i w_j C(\lambda _i, \theta _i, \lambda _j, \theta _j,t). \end{aligned}$$
(17)

Here, m is the number of grid points in the region with its individual area weights \(w_i\). Thus, the standard deviation of the regions is \(\sigma (t) = \sqrt{var(t)}\).

The four different covariances, \(C_f\), \(C_{f,m}\), \(C_{e,s}\), and \(C_{e,t}\) lead to four realisations of the regions’ standard deviations \(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), and \(\sigma _{e,t}\). Regional standard deviations are compared to true errors \(\sigma _{true}\) which are computed as the difference of regional means between the synthetic input gravity field and the finally retrieved monthly gravity fields of the full-scale satellite simulation. Please note that the deterministic signals have not been removed in this case so that estimated errors correspond to the full signal content of a monthly GRACE-FO gravity field solution.

Fig. 3
figure 3

Time series of both terrestrial water storage signal (upper panels) and associated standard deviations derived from four different realisations of the covariance model (\(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), \(\sigma _{e,t}\)) together with the true uncertainties (\(\sigma _{true}\)) as available from 60 months of simulated GRACE-FO gravity fields (lower panels) for the river basins of Congo in Central Africa (top), Ob in Siberia (middle) and Murray Darling in Australia (bottom)

To evaluate the temporal evolution of these standard deviations, we first present time-series for the Congo, Ob, and Murray–Darling river basins (Fig. 3). These examples represent catchments at different latitudes with different hydrometeorological conditions and thus signal magnitudes. In all three basins the two standard deviations based on formal covariances \(\sigma _{f}\) and \(\sigma _{f,m}\) feature two prominent peaks in 2003 and 2005 which are not seen in \(\sigma _{true}\) and which are not linked to extreme values in the TWS time series. In these months, the simulated orbits were in a short repeat configuration, leading to larger uncertainties in the Stokes coefficients and thus larger formal variances.

Moreover, \(\sigma _{f,m}\) seems to be an up-scaled version of \(\sigma _{f}\) in all three basins, albeit with an individual scale for each basin. For discharge basins with smaller TWS variations such as the Murray-Darling, the standard deviations are at the same order of magnitude as the signal itself.

To check whether temporal variations of \(\sigma _{true}\) are caused by random noise in the empirical realisation or rather represent a significant deviation, we employ a Bartlett’s test for equal variances (Snedecor and Cochran 1989). For all 50 basins considered here, the test rejects the assumption of equal variances with a confidence interval of 99%. We therefore conclude that \(\sigma _{true}\) indeed changes over time. However, none of the four covariance models is properly reflecting these temporal variations.

In order to evaluate the temporal similarity of the four candidate covariance models, we utilise the Pearson’s correlation coefficient. The correlation coefficient provides a measure of similarity which is scale invariant. Globally, \(\sigma _{e,t}\) represents the temporal behaviour of \(\sigma _{true}\) very well compared to the other three standard deviations. At basin scale, however, the correlation coefficient never exceeds 0.6 and is even negative for some basins.

The temporal progression of the standard deviations for all 50 basins does not exhibit any trend or other systematic effects. We therefore compute the average standard deviation for each basin to investigate the spatial patterns more thoroughly.

The mean true standard deviations of the basins show a general decline towards the poles from 1.2 cm in the tropics to 0.9 cm in Northern Canada and Siberia (Fig. 4). However, the two formal variance-based covariance models let basin standard deviations increase towards the poles (\(\sim \) 0.5–1 cm for \(\sigma _f\), \(\sim \) 1.5–2.5 cm for \(\sigma _{f,m}\)). The spatial empirical standard deviations \(\sigma _{e,s}\) instead display the poleward decline, but severely overestimate the magnitude especially in the tropics with up to 5 cm. In Siberia and Northern Canada \(\sigma _{e,s}\) has a magnitude around 2 cm. The other empirically derived standard deviation, \(\sigma _{e,t}\), instead exhibits only a slight decline towards the poles (\(\sim \) 2.3–1.7 cm).

Fig. 4
figure 4

Standard deviations for the 50 largest discharge basins in the world as derived from four different realisations of the covariance model (\(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), \(\sigma _{e,t}\)) together with the true uncertainties (\(\sigma _{true}\)) as available from 60 months of simulated GRACE-FO gravity fields

In addition to the visual comparison of the spatial patterns, we again employ the correlation coefficient to quantify the similarity between the true and the modelled uncertainties. With a correlation coefficient of 0.83, \(\sigma _{e,t}\) shows the largest similarity to \(\sigma _{true}\) while \(\sigma _{e,s}\) shows only a spatial correlation of 0.41 (see Table 2). Both uncertainty estimates based on formal variances, \(\sigma _{f}\) and \(\sigma _{f,m}\), have similarly small correlations to \(\sigma _{true}\) with 0.48 for \(\sigma _f\) and 0.44 for \(\sigma _{f,m}\).

Table 2 Correlation (R) and global scaling factor (S) between spatial maps of the mean basin standard deviation for four different realisations of the covariance model (\(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), \(\sigma _{e,t}\)) with respect to the true uncertainties available from the GRACE-FO satellite simulation (\(\sigma _{true}\))

We conclude at this point that the modelled standard deviations do not always fit the magnitude of the true standard deviations. Thus, we investigate the ratio between the modelled and the true standard deviations globally (Fig. 5). The most homogeneous ratios are found for \(\sigma _{e,t}\) with values between 1 and 2. For \(\sigma _{e,s}\), we see a vast overestimation in the tropics. However, please be reminded that empirical correlations used to estimate the correlation model form a conservative upper bound to the real uncertainties. It is therefore plausible that the empirically-based standard deviations are too large compared to the true ones. All in all, \(\sigma _{f}\) fits best on a global average.

Fig. 5
figure 5

Ratio between standard deviations derived from four different realisations of the covariance model (\(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), \(\sigma _{e,t}\)) and the true errors (\(\sigma _{true}\)) as available from 60 months of simulated GRACE-FO gravity fields for the 50 largest discharge basins in the world

Lastly, we want to investigate the statistical equivalence of the true standard deviations and the modelled ones. To this end, we employ a \(\chi ^2\)-test. The \(\chi ^2\)-test for the variance equality compares an empirical variance (\(s^2\)) with a modelled variance (\(\sigma ^2\)) and thus assess the probability that the empirical variance could originate from the covariance model under consideration of the degree of freedom (N) (Snedecor and Cochran 1989). The test statistic is

$$\begin{aligned} t = N \cdot \frac{s^2}{\sigma ^2} \sim \chi ^2_{N}. \end{aligned}$$
(18)

The test is carried out for each basin separately. However, naturally the \(\chi ^2\)-test is highly sensible to differences in magnitude discussed above. Thus, to be able to apply the test, we each scale the modelled standard deviations with a global factor (mean of the basin ratios in Fig. 5, given in Table 2) which will be referred to with \(\sigma _{x}^s\). Introducing this global scaling factor is equivalent to changing \(c_0\) in the estimated correlation models.

Rather than testing only if t is above or below the critical value of a confidence interval, we investigate the p value of t. The p value range between 0 and 1, with values around 0.5 indicating the best congruency between the true and modelled standard deviations. Globally, \(\sigma _{e,t}^s\) shows the best congruency with \(\sigma _{true}^2\) with 33 out of 50 basins with a p value between 5 and 95% (39 between 1 and 99%). Also, for this model these basins are found on all continents. \(\sigma _{f,m}^s\) agrees well with \(\sigma _{true}^2\), too, with 24 and 35 basins with a p value between 5 and 95% and 1 and 99%, respectively. In some European and Siberian basins also \(\sigma _{e,s}^s\) is fitting well, but overall only 11 basins have a p value inside the 5–95% confidence interval (13 for 1–99%). 23 or 29 basins, respectively, are fitting for \(\sigma _{f}^s\) (Fig. 6).

Fig. 6
figure 6

p value of the \(\chi ^2\) test between scaled standard deviations derived from four different realisations of the covariance model (\(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), \(\sigma _{e,t}\)) and the true uncertainties (\(\sigma _{true}\)) as available from 60 months of simulated GRACE-FO gravity fields for the 50 largest discharge basins in the world

4.4 Regular grid uncertainties

The covariance models are developed in a way to allow for the rapid calculation of uncertainty for regions of arbitrary shape. This is demonstrated here for a regular \(5 ^{\circ }\) grid over the continents including the ice-covered regions of Greenland and Antarctica. Note that for grid cells along the coast, the standard deviation is calculated only from land points in Eq. 17.

Over the continents, the true uncertainties range between \(\sim \) 2.5 cm in the tropics and \(\sim \) 1.5 cm at high latitudes (Fig. 7). Largest values are found in the grid cells covering Lake Nasser in Egypt, and the Ganges-Bramaputra delta in Bangladesh. These are caused by strong local signals in the input gravity field which are not properly resolved from the synthetic GRACE-FO observations. Both formal variance-based standard deviations \(\sigma _f\) and \(\sigma _{f,m}\) again show a reversed latitude dependence with increasing standard deviations towards the poles. Unlike for the basins above, \(\sigma _{e,s}\) is capturing best the spatial pattern, but again is overestimating the magnitude especially in the tropics. The poleward decline of \(\sigma _{e,t}\) is very well visible.

Fig. 7
figure 7

Standard deviations for regular 5\(^\circ \) latitude-longitude grid boxes as derived from four different realisations of the covariance model (\(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), \(\sigma _{e,t}\)) together with the true uncertainties (\(\sigma _{true}\)) as available from 60 months of simulated GRACE-FO gravity fields

We also evaluate the similarity between the modelled standard deviations and the true uncertainties with the correlation coefficient (Table 2). We find including the ice-covered regions change the results significantly, hence we provide both the global correlation coefficient with and without these regions. \(\sigma _{e,s}\) shows the largest similarity for ice-free regions (R \(=\) 0.43), and the correlation coefficient for \(\sigma _{e,t}\) is not as large as for the basins with only R = 0.20. Both \(\sigma _f\) and \(\sigma _{f,m}\) have a negative correlation coefficient. Including Greenland and Antarctica changes the picture. Now, \(\sigma _{f,m}\) has the highest spatial correlation with 0.35. However, the models were all not developed for the ice-covered regions as \(\theta \le 70^{\circ }\) in Eq. 14, and thus, are not reliable there.

The ratio between the true and the modelled standard deviations confirm the reversed latitude dependence for \(\sigma _f\) and \(\sigma _{f,m}\) (Fig. 8). Similar to the basins, \(\sigma _{e,t}\) has the most homogeneous ratio globally, while the ratio of \(\sigma _{e,s}\) reveals again the overestimation in the tropics.

Fig. 8
figure 8

Ratio between standard deviations derived from four different realisations of the covariance model (\(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), \(\sigma _{e,t}\)) and the true uncertainties (\(\sigma _{true}\)) as available from 60 months of simulated GRACE-FO gravity fields for regular 5\(^\circ \) latitude-longitude grid boxes

Similar to the basin analysis, we perform the \(\chi ^2\) testing for the grid standard deviations. As before, we scale the modelled standard deviations, where the scaling factor is calculated only for ice-free regions (Table 2). Excluding Greenland and Antarctica, 801 grid cells remain for further analysis. Out of these, 503 are inside the 5–95% interval for \(\sigma _{e,t}^s\) and even 616 inside the 1–99% interval. Only the Himalaya, Nile and central South-American regions are not fitting well (Fig. 9). In comparison, \(\sigma _{e,s}^s\) reaches only 186 and 247 grid cells, respectively. They are mostly located in Europe, Siberia, and the African coast. For \(\sigma _{f,m}\) 448 and 554 fall inside the 5–95% and 1–99% interval while \(\sigma _{f}\) can explain 414 grid cells for the confidence interval 5–95% and 532 grid cells for the confidence interval 1–99%. For both \(\sigma _{f}\) and \(\sigma _{f,m}\) the tropics are less fitting while mid and high latitude show a good agreement with \(\sigma _{true}\).

Fig. 9
figure 9

P-value of the \(\chi ^2\) test between scaled standard deviations derived from four different realisations of the covariance model (\(\sigma _{f}\), \(\sigma _{f,m}\), \(\sigma _{e,s}\), \(\sigma _{e,t}\)) and the true uncertainties (\(\sigma _{true}\)) as available from 60 months of simulated GRACE-FO gravity fields for regular 5\(^\circ \) latitude-longitude grid boxes

5 Conclusions and outlook

We developed a new anisotropic and non-homogeneous covariance model for globally gridded TWS from the GRACE and GRACE-FO satellite missions. The model accommodates latitudinal non-homogeneity via scaling based on equator-symmetric Legendre polynomial. The anisotropy is modelled with a latitude-dependent shape factor which both allows for shorter correlation lengths in East-West than in North-South direction. By using time-dependent point uncertainties in the model, the non-stationarity of the GRACE-FO data can also be taken into account.

Four different candidate realisations of the covariance model (\(C_f\), \(C_{f,m}\), \(C_{e,s}\), and \(C_{e,t}\)) were thoroughly tested with simulated GRACE-FO data. Both formal covariances and empirical correlations derived from the signal itself were contrasted against true uncertainties readily available from the simulations for both the 50 largest discharge basins as well as regular \(5^{\circ }\) latitude-longitude boxes. To augment the empirical correlations, also empirical point uncertainties are needed: we tested both temporally varying (i.e., one global uncertainty value for each monthly solution) and spatially varying (i.e, one average uncertainty value for each location). We found that the former clearly outperforms all other tested variants for the 50 largest discharge basins. In general, both empirically based covariance models overestimate the magnitude of the uncertainties. But, this could be met by a global scaling factor derived from the comparison between the estimated and the true uncertainties. Note that such a scaling factor is equivalent to adjusting just one of the model parameters.

The other three tested options also perform reasonably well for the largest basins. However, for increasingly smaller averaging areas, we note that modelled uncertainties based on the formal uncertainties become increasingly too optimistic which suggests that spatial leakage not considered in the formal uncertainties is gradually gaining importance. At these smaller scales, the spatially varying point uncertainties provide the best fit in terms of spatial correlation with the true errors. Nevertheless, the temporally varying uncertainties perform best in the statistical \(\chi ^2\) test after applying a global scaling calibration, so that we also recommend this option for use at the smaller spatial scales.

It is important to recall that the parameters for the correlation model were estimated in this study from synthetic GRACE-FO data only. Preliminary analysis of GFZ RL06 monthly data (Dahle et al. 2019) confirms the applicability of the new covariance model for the latest generation of spherical harmonics solutions from GRACE-FO. It appears that the latitude dependency of the uncertainties is somewhat higher than in the simulated data which might require to re-consider the significance of the parameters used in \( a(\cdot ) = a_0(\cdot )\sin (\phi (\cdot ))+a_1(\cdot )\) (Eq. 5). We also note that the most recent GRACE-FO data releases have more realistic uncertainties as compared to GFZ RL05 as used in the simulations.

We assumed in this study that the spatial correlation model is stationary in time, and consequently estimated only a single solution from all available monthly gravity fields. Any non-stationarity was therefore only modelled through the point standard deviations. This assumption needs to be critically questioned for the transition from GRACE to GRACE-FO: Whereas the observation geometry is similar for both missions, different correlation lengths might be easily introduced by slightly different instrument characteristics that not only include the satellite-to-satellite tracking, but also accelerometers and star trackers.

Based on simulated GRACE-FO data, this study demonstrates that realistic uncertainties for arbitrarily shaped regions can be derived from the combination of a spatial correlation model and globally gridded point uncertainties. With this information, it becomes possible to derive uncertainties for arbitrarily shaped averaging regions without consideration of data given in spherical harmonics, which can be difficult to handle for non-geodetic users. The realism of the simulated data as well as our first results obtain with real GRACE and GRACE-FO data indicate that this covariance model can be readily applied to real data. We therefore include the empirical-temporal covariance model (\(C_{e,t}\)) developed here into the version 2 of the GFZ RL06 Level-3 TWS data available at GFZ’s gravity information system GravIS (http://gravis.gfz-potsdam.de), where time-variable terrestrial water storage information from satellite gravimetry and its associated uncertainties is made available also for non-geodetic users.