Error propagation in regional geoid computation using spherical splines, least-squares collocation, and Stokes’s formula

Current International Association of Geodesy efforts within regional geoid determination include the comparison of different computation methods in the quest for the “1-cm geoid.” Internal (formal) and external (empirical) approaches to evaluate geoid errors exist, and ideally they should agree. Spherical radial base functions using the spline kernel (SK), least-squares collocation (LSC), and Stokes’s formula are three commonly used methods for regional geoid computation. The three methods have been shown to be theoretically equivalent, as well as to numerically agree on the millimeter level in a closed-loop environment using synthetic noise-free data (Ophaug and Gerlach in J Geod 91:1367–1382, 2017. https://doi.org/10.1007/s00190-017-1030-1). This companion paper extends the closed-loop method comparison using synthetic data, in that we investigate and compare the formal error propagation using the three methods. We use synthetic uncorrelated and correlated noise regimes, both on the 1-mGal (=10-5ms-2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$=10^{-5}~{\mathrm {m s}}^{-2}$$\end{document}) level, applied to the input data. The estimated formal errors are validated by comparison with empirical errors, as determined from differences of the noisy geoid solutions to the noise-free solutions. We find that the error propagations of the methods are realistic in both uncorrelated and correlated noise regimes, albeit only when subjected to careful tuning, such as spectral band limitation and signal covariance adaptation. For the SKs, different implementations of the L-curve and generalized cross-validation methods did not provide an optimal regularization parameter. Although the obtained values led to a stabilized numerical system, this was not necessarily equivalent to obtaining the best solution. Using a regularization parameter governed by the agreement between formal and empirical error fields provided a solution of similar quality to the other methods. The errors in the uncorrelated regime are on the level of ∼\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}5 mm and the method agreement within 1 mm, while the errors in the correlated regime are on the level of ∼\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sim $$\end{document}10 mm, and the method agreement within 5 mm. Stokes’s formula generally gives the smallest error, closely followed by LSC and the SKs. To this effect, we note that error estimates from integration and estimation techniques must be interpreted differently, because the latter also take the signal characteristics into account. The high level of agreement gives us confidence in the applicability and comparability of formal errors resulting from the three methods. Finally, we present the error characteristics of geoid height differences derived from the three methods and discuss them qualitatively in relation to GNSS leveling. If applied to real data, this would permit identification of spatial scales for which height information is preferably derived by spirit leveling or GNSS leveling.


Introduction
Regional gravity field modeling is an important task in physical geodesy (Sansò and Sideris 2013). Recently, different regional gravity modeling methods were investigated within an International Association of Geodesy (IAG) Inter-Commission Committee on Theory Joint Study Group (Schmidt et al. 2015). Ophaug and Gerlach (2017) reviewed three methods for regional geoid computation: Stokes's formula (Stokes 1849), least-squares collocation (LSC) (Krarup 1969;Moritz 1980), and radial base functions (RBFs) (e.g., Freeden et al. 1998), the latter modeled using least-squares estimation (e.g., Schmidt et al. 2007;Lieb et al. 2016;Liu et al. 2020). The methods were shown to be theoretically equivalent, and to agree on the mm level in a closed-loop environment using noise-free synthetic data. The ongoing IAG Joint Working Group 2.2.2: "The 1-cm geoid experiment" is based on a 500 km × 800 km test dataset in Colorado, USA. Its main objective is to unveil differences in regional geoid solutions due to the use of different computation methods, and several computation centers have already contributed with preliminary geoid solutions.
Nowadays, the external accuracy of regional geoid models (e.g., by comparison with geometrical geoid heights, determined at sites observed by both Global Navigation Satellite Systems (GNSS) and spirit leveling) is typically on the level of a few centimeters (e.g., Denker 2013; Szelachowska and Kryński 2014;Bucha et al. 2016;Ågren et al. 2016;Gerlach and Ophaug 2017;Janák et al. 2017), with a few sub-cm exceptions (e.g., Ellmann et al. 2019;Foroughi et al. 2019;Slobbe et al. 2019). On this level, however, it is not a trivial task to differentiate between the errors from each contributing element (geoid, GNSS, leveling). In order to further reduce geoid errors, it is favorable to set up a formal error budget, i.e., to investigate each contributing element to the observed discrepancies. Consequently, the estimation of reliable formal geoid errors has gained interest (e.g., Ågren and Sjöberg 2014;Farahani et al. 2017;Featherstone et al. 2018;Gerlach et al. 2019;Goli et al. 2019;McCubbine et al. 2019;Slobbe et al. 2019). With different regional geoid determination methods in use, and an increased focus on realistic formal errors, it is important to compare the formal error propagations of these methods, and point out subtleties to each method which could affect the estimated errors. Consequently, this companion paper extends the study of Ophaug and Gerlach (2017) by assessing comparability and applicability of the formal errors resulting from the regional geoid modeling approaches in their study. In turn, the estimated formal errors are validated by comparison with empirical errors, as determined from differences of noisy geoid solutions to noise-free solutions.
The regional gravity signal is typically split into a global long-wavelength part which is modeled using spherical harmonics, a regional medium-wavelength part, for which typically one of the modeling approaches mentioned above are applied, and a short-wavelength part, which is modeled using digital terrain models. By removal of the long-and short-wavelength parts from the gravity data, the regional geoid computation method is applied to band-limited residual gravity data in a limited area only, giving residual geoid heights. Finally, the long-and short-wavelength parts of the geoid are restored to obtain the final geoid. This approach is known as the remove-restore technique (Denker 2013).
Consequently, we consider the medium-wavelength residual gravity signal only and use band-limited data and functions within spherical harmonic degrees n 1 ≤ n ≤ n 2 . We deem this sharp spectral cutoff as uncritical in our synthetic study, while recognizing that in practical applications, a smooth transition (filtering) between the spectral bands could improve the geoid solution (e.g., Rülke et al. 2012;Featherstone 2013). Similar to the numerical setup of Ophaug and Gerlach (2017), we consider two regions, the North Sea coastal region of East Frisia and the mountains of the European Alps, with, respectively, smooth and moderate topography. Sect. 2 reviews the three modeling approaches and introduces the corresponding error propagation formulations. It also introduces the synthetic noise modeling approach. Numerical experiments comparing formal and empirical errors of the methods are analyzed in Sect. 3. It further discusses geoid error covariance functions and errors of geoid height differences, which are essential for error budgeting of GNSS leveling. Sect. 4 summarizes our results.

Modeling approaches
We reiterate the equivalent regional field transformations from gravity anomalies to geoid heights by SKs, LSC, and Stokes's formula, as presented by Ophaug and Gerlach (2017). By using the remove-restore technique, we can restrict the Stokes integration to a spherical cap, designated the inner zone, with radius ψ 0 around the computation point. Ophaug and Gerlach (2017) reviewed the theoretical equivalence of Stokes's formula and LSC in regional applications, shown by de Min (1995) to hold true only if the cross-covariance function of LSC is modified such that an unwanted extrapolation outside the spherical cap is avoided. As the SKs were shown to be equivalent to LSC, they were modified correspondingly. The present contribution additionally reviews the formal error propagation of gravity anomalies to geoid heights using the three methods and discusses the creation of synthetic noise.

Stokes's formula
Stokes's formula applied to the inner zone only is given by where σ denotes the unit sphere, R is the spherical Earth radius, γ is normal gravity evaluated on the surface of the reference ellipsoid, and ψ Pq is the spherical distance between computation point P and data point q. ḡ q are the residual block mean gravity anomalies, and the modified Stokes func-tionS is given bȳ where P n (cos ψ Pq ) are the Legendre polynomials, and Q n (ψ 0 ) are Molodensky's truncation coefficients. As opposed to the original Stokes function, Eq.
(2) does not show a singularity at ψ = 0 • . Thus, we do not need to treat the innermost zone around the computation point P separately (e.g., Roland 2005).
Equation (1) can be solved by numerical integration according to where ϕ and λ are the spacing of the computation grid in spherical latitude ϕ and longitude λ. Equation (3) has the following matrix notation, where the vector g contains all I × J = L residual block mean gravity anomalies. The matrix S contains the elements The error information on the residual block mean gravity anomalies is summarized in an error-variance-covariance matrix (ECM) gg qq with dimension L × L. The ECM of the geoid heights is given by applying the law of error propagation to Stokes's formula, giving (e.g., Rapp and Rummel 1975;Strang van Hees 1986;Hofmann-Wellenhof and Moritz 2006),

Least-squares collocation
In correspondence with Eq. (1), the LSC formula reads (de Min 1995) withC Ng Pq being a matrix containing the signal crosscovariances between the functionals N and g between P and q, and C gg qq is the auto-covariance matrix between all combinations of observations.
The LSC error propagation is given by withC N N P P being the auto-covariance matrix of N . The gravity anomaly auto-covariance function reads with γ = GM/R 2 (GM is the gravitational constant multiplied with the Earth's mass) and c n are the dimensionless signal degree variances. The truncated cross-covariance and geoid height autocovariance function equations read and respectively.

Spherical splines
Geoid computation with RBFs is done according to where B(ψ Pk ) are the RBFs which are placed on spatially distributed grid points k andd k are point-specific gravity field parameters in the form of dimensionless coefficients. The particular RBF known as the spherical spline kernel (SK) (Freeden et al. 1998;Jekeli 2005;Eicker 2008) is given by where ψ Pk is the spherical distance between computation point P and the origin of the SK at grid point k, λ n are the spectral eigenvalues including dimensioning, and the Legendre coefficients with σ n = √ c n . Equation (12) represents the synthesis of geoid heights from known coefficientsd k , which in matrix notation can be written as where d contains the spline coefficientsd k according to Eq. (12), and A N represents the design matrix according to Eq. (13), with elements where λ N n = GM/(Rγ ). The SKs are here derived from observed residual gravity anomalies in an analysis step, by inversion of the linear model where g is the observation vector and v is the error vector. A g is the design matrix, with elements with λ g n = GM/R 2 (n − 1). Equation (17) is typically ill-conditioned and needs regularization. Using Tikhonov regularization, and setting R = I (Eicker 2008), the solution is given bŷ with α being the regularization parameter. We determine α using the L-curve method (Hansen and O'Leary 1993), as implemented in the l_curve routine of the matlab regularization tools package of Hansen (1994). P is the weight matrix, related to the dispersion of the gravity anomaly input signal by D{ g} = gg qq = σ 2 0 P −1 , where σ 2 0 is the a priori variance of the unit weight, here equal to one.
To ensure that the weight matrix P is taken into account when determining α using the l_curve routine, we transform its input observation vector and singular valuedecomposed design matrix by using the upper-triangular Cholesky factor matrix U of P, following Bouman (1998): Note that the least-squares solution using the above transformation, is equivalent to Eq. (19). The regularized solution in Eq. (19) is a biased estimator (Xu 1992;Sneeuw 2000). Its dispersion is given by with Q dd kk being the cofactor matrix of the unknown parameters, and where is the resolution or contribution matrix, measuring the relative contribution of regularization or prior knowledge of the parameters to the solution. As our observation error level is known exactly (and equal to 1 mGal, cf. Sect. 2.4), we determine the covariance matrix of the estimated parameters by Finally, the error propagation from dimensionless spline coefficients to geoid heights is given by (26)

Synthetic noise modeling
We simulate noisy observations closely following Wolf (2007). We consider two noise scenarios: uncorrelated noise ε UC and correlated noise ε C , both on the 1-mGal (= 10 −5 ms −2 ) level. The uncorrelated noise ε UC is generated as normally distributed random numbers and thus represents a stochastic Gaussian process. The Gaussian random numbers are determined using the matlab function normrnd, with a mean value of 0 and a standard deviation of σ ε UC = 1 mGal. The statistics of the uncorrelated simulated noise is summarized in Table 1. The uncorrelated noise for the considered regions, as well as its corresponding empirical covariance function, are shown in Fig. 1. As evident in the error covariance function, only the covariance value for ψ = 0 is significantly different from zero, i.e., there are no correlations present. The residual synthetic uncorrelated gravity anomalies g UC are then computed according to where g NF are the noise-free residual gravity anomalies. The corresponding ECM is given by gg UC = σ 2 ε UC · I. The transformation of uncorrelated noise into correlated noise can be described by a method similar to a moving-average process (Davis 1987;Wolf 2007). Then, the correlated noise values are given by where L is the lower-triangular Cholesky factor matrix of the ECM of the correlated noise, gg C = LL T . The error correlations are based on the exponential error covariance function of Weber (1984), which is based on investigations of gravity anomaly error covariances in Europe, where ψ is given in degrees, and, as mentioned, we take σ ε C to be 1 mGal. Comparing Eq. (29) with the equivalent error covariance function of the geoid heights, Eq. (32), the factor of 4 implies a correlation length of 0.173 • (see Sect. 3.3 for further details). Note that the error covariance function in Eq. (29) is a function of linear distance, and its values drop quickly with increasing distance. By contrast, commonly suggested signal covariance functions are a function of the quadratic distance (Moritz 1980), which results in a horizontal tangent at the origin ψ = 0 • and a larger correlation length, thus representing a smoother function. An important requirement for the covariance function is that it is positive semidefinite (Schuh 2017). A positive semidefinite function shows a positive Fourier spectrum over its entire range. One arithmetic operator which does not change its positive definiteness is the convolution of any positive semidefinite function with itself. It is this latter rule that led to the construction of covariance functions with finite support, so-called finite covariance functions (Sansò and Schuh 1987;Schuh 1989). An important result of using positive semidefinite covariance functions is that it allows for the factorization of gg C . Consequently, we use a finite error covariance function to generate synthetic correlated noise based on the exponential error covariance function. This finite error covariance function is modeled by combining the positive semidefinite exponential error covariance function (Eq. (29)) with a positive semidefinite function of finite support. In the present study, we have obtained the positive semidefinite error covariance function following Wolf (2007), by multiplication of Eq. (29) with a positive semidefinite and finite Sansò-Schuh function (Schuh 2017, Eq. (154)).
The statistics of the correlated simulated noise is summarized in Table 1. We obtain a slightly larger (∼0.3 mGal) spread of the correlated noise values as compared to the uncorrelated noise values. The same phenomenon is also evident in previous studies using the present approach to synthetic noise modeling (Wolf 2007). The correlated noise for the considered regions, as well as its corresponding empirical and analytically prescribed finite covariance functions, are shown in Fig. 1. The residual synthetic correlated gravity anomalies g C are computed according to 3 Numerical experiments

Computational setup
Here, we aim to investigate to what extent the different regional geoid modeling approaches give similar formal geoid height errors. By comparing formal and empirical errors we also evaluate whether the formal error measures are realistic. Our numerical example setup closely follows Ophaug and Gerlach (2017). Residual noise-free gravity anomalies g NF are computed by spherical harmonic synthesis using the EGM2008 GGM (Pavlis et al. 2012), according to C nm cos mλ + S nm sin mλ P nm (cos θ), Fig. 1 1-mGal uncorrelated noise ε UC (a, c) and the corresponding empirical covariance functions of ε UC (b, d) in East Frisia (a, b) and the Alpine region (c, d), as well as 1-mGal correlated noise ε C (e, g) and the corresponding empirical and analytically prescribed finite covariance functions of ε C (f, h) in East Frisia (e, f) and the Alpine region (g, h) with n 1 = 251 ≤ n ≤ n 2 = 2190, to simulate the removerestore approach. C nm , S nm are the spherical harmonic coefficients andP nm (cos θ) are the fully normalized associated Legendre functions. The comparative assessment of the error propagations of SKs, LSC, and Stokes's formula is performed in a closedloop simulation using synthetic data and noise, see  Details regarding the East Frisia and Alpine test regions are summarized in Table 2. For practical computational reasons, the input and output grid resolutions are set to 5 arcmin (corresponding to the maximum resolution of EGM2008). A spherical cap radius of ψ 0 = 1 • is used, giving theoretical omission errors of ∼2 cm and ∼6 cm for East Frisia and the Alpine region, respectively. The spatial omission or truncation error results from the neglected remaining gravity signal outside the inner zone, but is not relevant to the present comparison as we do not evaluate the total error budget, but only compare different solutions for quantifying the error contribution from the inner zone. We consider enlarged data areas to ensure that each computation point in the geoid computation target areas are surrounded by a 1 • spherical cap containing data (Table 2).
For all methods, the same synthetic observational ECM of gravity anomalies, gg UC/C is used. Geoid errors by the Stokes approach were determined using Eq. (6). The LSC error propagation was computed using Eq. (8) with Eqs. (9)-(11). The error propagation from gravity anomalies to geoid heights using the SKs was computed via the spline coefficient errors, using Eqs. (24)-(26).
As mentioned in Sect. 1, all spherical harmonic expansions were evaluated for n 1 = 251 ≤ n ≤ n 2 = 2190, to ensure suppression of long-and short-wavelength signal and noise content. The signal is practically band-limited due to the simulated remove-restore approach (where we assume that the long-wavelength part of the gravity signal has been removed using a satellite-only GGM). However, because the noise of the terrestrial data and a satellite-only GGM are uncorrelated, we do not reduce the noise of the terrestrial data by removing the GGM signal, and it remains non-band-limited. This is the reason for band-limiting the kernel functions, which will then practically band-pass filter the noise. We investigated the impact of band-limiting the kernel functions on the formal error estimates in case of Stokes's formula. Figure 3a shows the modified Stokes function (Eq. (2)) evaluated for different spherical harmonic bands. We first note that indeed, none of the functions shows a singularity at ψ = 0 • . The Stokes function values decrease with decreasing width of the spectral band, and for 251 ≤ n ≤ 2190, all weights for ψ > ∼0.2 • ≈ 20 km are close to zero. We note that this fits well to the correlation lengths of the error covariance functions (Fig. 1). The effect of using the bandlimited Stokes function for 251 ≤ n ≤ 2190 as opposed to the near-exact Stokes function for 2 ≤ n ≤ 3000 is a reduction of the formal error estimates of ∼40% and ∼70% in the uncorrelated and correlated noise regimes, respectively. This underlines the importance of band-pass filtering the noise through band limitation of the kernel functions for obtaining consistent formal error estimates.
LSC and the SKs are both dependent on the signal characteristics, i.e., their formal errors will increase in areas with large signal variation, and decrease in areas with a smooth signal. The basic quantity for describing the signal characteristics is the signal degree variance, c n . It is derived from the GGM and thus represents the signal variability on global average. Applying it to areas with smoother signal variability than the global average will lead to formal errors that are too pessimistic, while applying it to areas with larger signal variations will lead to formal errors that are too optimistic. Thus, we investigated the validity of the global gravity anomaly signal covariance function by comparison with empirically determined gravity anomaly signal covariance functions in both East Frisia and Alpine region data areas, see Fig. 3b, c. We found the Alpine signal variance to be 162.9 mGal 2 and similar to the global signal variance of 198.7 mGal 2 , while the signal variance in East Frisia was found to be notably smaller, Bold numbers indicate SK results when α is tuned to the best fit of formal and empirical errors (for details, see the discussion in Sect. 3.4 and Fig. 8) and equal to 10.5 mGal 2 . Thus, we rescaled c n accordingly for both areas, which in turn affected both LSC and SK solutions. The effect of rescaling the signal degree variance hardly affects the Alpine region, but gives a reduction of the formal error estimates in East Frisia of ∼30% and ∼15% for LSC and the SKs, respectively.

Results
The results of the formal error propagation from gravity anomalies to geoid heights in both data and target areas using the three methods are shown in Table 3, and Figs. 4 and 5.
As expected, the data area reveals significant edge effects, and unless stated otherwise, we will focus on the target area results in the following. Table 3 shows that the results for both East Frisia and the Alpine region are similar. The uncorrelated formal errors are generally smaller than the correlated formal errors. This is due to the fact that the random uncorrelated errors tend to average out to zero, while signal and error are more difficult to separate in the case of correlated errors. Figs. 4 and 5 reveal a similar qualitative pattern for both East Frisia and the Alpine region.
Generally, the numerical integration of Stokes's formula gives a smaller error than both LSC and SK estimation techniques. This is expected, as Stokes's formula performs a pure propagation of data noise, while the estimation techniques also quantify to what extent the signal can be reconstructed from the given data. This notable difference in method interpretation is evident from Figs. 4 and 5, where the estimation techniques give increasing formal errors toward boundary areas, while the numerical integration technique errors decrease. This is prominent in the data areas, but also noticeable in the target areas, although with a smaller amplitude.
All error patterns are centrally symmetric with the exception of Stokes's formula with uncorrelated errors, where an insignificant (<<1 mm), albeit clear, latitudinal pattern is observed, with decreasing error amplitudes with increasing latitude. We attribute this effect to meridian convergence, causing the distance between neighboring grid cells to decrease with increasing latitude. In turn, this leads to a stronger averaging out of the random errors, as the Stokes function weights are more similar for smaller than for larger distances.
In the target areas, the uncorrelated formal error propagations of the methods agree within 1-2 mm, with Stokes's formula giving the smallest error (4 mm RMS in both East Frisia and the Alpine region), and the SKs giving the largest error (5-7 mm RMS in East Frisia and the Alpine region, respectively). All results in the uncorrelated regime roughly agree with the rule of thumb of Strang van Hees (1986), which is also based on the assumption of uncorrelated errors. It gives σ N ≈ 1.15 · 10 −3 · a · σ ḡ = 7 mm, where a is the block size The correlated formal error propagations of the methods agree within 8-14 mm in the target areas, again with Stokes's formula giving the smallest error (10 mm RMS), and with LSC showing slightly larger error estimates (12-15 mm RMS in East Frisia and the Alpine region, respectively), followed by the SKs (16-21 mm RMS in East Frisia and the Alpine region, respectively). Table 4 summarizes the differences in geoid solutions subjected to uncorrelated and correlated noise regimes to the noise-free solution, giving empirical geoid errors δ N which provide information on the validity of the formal error estimates. Again, the results are largely similar for East Frisia and the Alpine region.
Note that while Table 4 shows descriptive statistics of empirical errors, Table 3 shows descriptive statistics of a statistical quantity (i.e., the standard deviation). Thus, they should be interpreted with care. In Table 4, information on the quality of the solution is primarily given by the RMS, while the mean value gives information on whether any bias has been introduced into the solution. In Table 3, however, the spread around the mean value tends to be small in the target area. Consequently, the mean and RMS values are similar, although it is the mean value of σ N which should be compared to the RMS value of δ N . The maximum and minimum values also provide information on the homogeneity of the formal errors. Because our data points are regularly distributed, and our computation points coincide with the data points, we expect the errors to be homogeneous in the target area. This is indeed the case for Stokes's formula, while the LSC and We define to what extent the formal errors are realistic by considering the difference between the mean value of σ N and the RMS value of δ N in terms of percentages. The agreement is assessed for the target areas, to avoid edge-effect contamination. Positive and negative percentages indicate that σ N is, respectively, pessimistic or optimistic.
In East Frisia, the formal errors are realistic both in the uncorrelated and correlated noise regimes (within ±13%). Stokes's formula gives the smallest error, followed closely by LSC and the SKs. Although the methods perform slightly differently, the formal error associated with each method is realistic.
In the Alpine region, the formal errors of Stokes's formula are realistic in both uncorrelated and correlated noise regimes, and are also the smallest, followed by LSC and the SKs. The LSC formal errors are between 16% (UC) and 23% (C) too pessimistic, while the SK formal errors are between 30% (UC) and 44% (C) too pessimistic. We refer to Sect. 3.4 for a discussion of the interpretation of error measures derived by the three modeling techniques.

Geoid error covariance functions and error of geoid height differences
In practical applications, not only absolute geoid heights, but also the determination of geoid height differences is relevant (e.g., Kearsley 1988;Brown et al. 2018). Thus, we devote this section to the analysis of error correlations and to the comparison of the error behavior of geoid height differences computed from the different methods. We first determine empirical covariance functions as derived from the full geoid height ECM, N N , of the three methods. The empirical covariance values are averages of all covariances of individual point pairs grouped into the same distance class. We let our distance classes contain 100 or 300 point pairs in the East Frisia and Alpine region target areas, respectively. Note that the error standard deviations σ N (ψ = 0 • ) (the square root of the error variance σ 2 N = E N 0 ) correspond to the RMS values as given in Table 3. The correlation length ξ is determined as the smallest distance for which the empirical error covariance is smaller than half of the variance. In turn, using the values of error variance E N 0 and correlation length ξ , we determine analytical covariance models using the following two exponential models: The first function, E N l (ψ), depends on the linear spherical distance ψ, while the second function, E N q (ψ), depends on the quadratic distance ψ 2 . E N l (ψ) is equivalent to the error covariance model of Eq. (29), which we used to generate the synthetic correlated noise of the residual gravity anomalies ε C . Comparing Eq. (29) and Eq. (32), we find ln 2/ξ = 4, from which we may derive the correlation length of the correlated gravity errors, ξ = ln 2/4 = 0.173 • . Figure 6 shows the empirical (dots) and analytical (solid lines) geoid height error covariance functions for the uncorrelated (a, b) and correlated (c, d) noise regimes. We find that the geoid error covariance functions of the uncorrelated noise regime tend to follow E N l (ψ). In the correlated noise regime the functions tend to follow E N q (ψ). This holds for LSC as well as for the SKs. Stokes's formula performs slightly different, where the geoid error covariance models tend to follow E N q (ψ) in both uncorrelated and correlated noise regimes. Furthermore, Stokes's formula gives the geoid errors with the shortest correlation length (0.09 • -0.19 • or 10-20 km), while the SKs give the largest correlation lengths (0.13 • -0.34 • or 15-38 km). For Stokes's formula and LSC, the geoid error correlation lengths of either the uncorrelated or correlated noise regimes are almost identical in both test regions. This does not hold for the SKs, where correlation lengths in both test regions differ significantly. A more elaborate investigation into this behavior remains subject to further research.
We determine errors of geoid height differences, σ N 12 , between two points P 1 and P 2 from analytical or empirical error covariances according to 34) where σ N 1 and σ N 2 are the geoid error standard deviations at points P 1 and P 2 and σ N 1,N 2 is the error covariance between them. The right-hand side is a simplification based on the assumption that, on average, it holds σ N 1 = σ N 2 = σ N . Figure 7 shows the error standard deviations of geoid height differences. The empirical formal errors are derived from the original ECM of geoid heights ( N N ) using Eq. (34) without simplification. The analytical formal errors are derived from the analytical covariance functions shown in Fig. 6 using the simplified version of Eq. (34). For very short distances ψ → 0, the geoid height errors are strongly correlated and the error of geoid height differences converges toward zero. The geoid error covariance E N (ψ) decreases with increasing ψ, whereas the geoid height difference error increases. For larger distances, depending on the correlation length of E N (ψ), the error correlations become irrelevant and the errors of N stabilize around √ 2 E N 0 . This strictly holds for the analytical functions, as E N (ψ) does not contain negative correlations. By contrast, the empirical covariance functions do contain negative covariances, which results in the errors of the geoid height differences surpassing the plateau of the analytical functions. For most cases, the geoid height differences are smallest for Stokes's formula and largest for the SKs.
Geoid height differences are of particular interest for GNSS leveling, which may serve as an alternative to spirit leveling. In general, setting up a complete error budget is of relevance for the investigation and further reduction of any misclosure between height differences from leveling, GNSS, and a geoid model. Such a complete error budget would in principle contain not only the medium-wavelength contribution of geoid heights as discussed in this study, but also long-wavelength contributions from a GGM and short-scale contributions from digital terrain models as well as errors from GNSS and leveling. Most of the contributions depend on availability and data quality in the specific region of interest. In addition, we may ask whether the application is (i) leveling with GNSS (in which case GNSS heights are combined with a geoid model according to H = h − N ) or (ii) validation of a geoid model at GNSS leveling sites (in which case heights from GNSS and leveling are combined according to N = h − H ). Setting up a complete error budget is outside the scope of the present study, as it requires knowledge of the error level of the specific study area. However, we have included the qualitative behavior of leveling errors in Fig. 7 in order to discuss some aspects of the error budget in a qualitative way. Prior to our qualitative comparison, we wish to discuss its rationale in detail, considering the characteristics of the error budget components-geoid model errors, leveling errors, and GNSS-height errors: -Considering errors of geoid height differences, we argue that they can be limited to the medium-wavelength contribution which is treated in our study. On the one hand, the long-wavelength geoid height error contribution from a GGM will, due to its limited spatial resolution, hardly affect the errors of geoid height differences in regions as small as the present target areas (where the geoid heights are strongly correlated). On the other hand, our qualitative comparison also does not include short-scale effects from topographic errors, because these very much depend on the study region-for our regions, we hardly expect any topographic error contributions in the relatively flat area of East Frisia, while we expect larger contributions in the Alpine region. Thus, we find it reasonable to limit the relevant spectral band for the qualitative study of regional geoid height differences to the medium-wavelength range of our numerical experiment. -Errors of spirit leveling height differences are known to depend on the square root of the length of the leveling line. In Fig. 7, we have indicated this dependency (black dashed line) at a level of precision of 1 mm/ √ d, where d is the distance in kilometer. The chosen error level is typical for the precision of national leveling networks, where we again note, that the 1-mm level only serves as an indicative measure in our generic example, while in reality, the precision might be better or worse depending on the data situation of the specific study region.
-Errors of GNSS-derived heights are hardly distance dependent, but depend more on, e.g., the duration of observational sessions or elevation differences between stations of the GNSS network (implying residual tropospheric effects). Based on this assumption, GNSS errors would, if included in Fig. 7, merely be represented by a horizontal line. Depending on the application (leveling by GNSS or validation of a geoid model) the GNSS error increases the absolute error level of height differences from either (i) the geoid model or (ii) spirit leveling. This is why we do not include GNSS errors in Fig. 7, but rather keep in mind that they would increase the noise level of one of the curves, while leaving its relative error characteristics unchanged.
We now turn to the qualitative comparison of the different curves in Fig. 7. In general, leveling errors increase as function of the length of the leveling line. They are very precise for short distances, typically on the mm level. Over larger distances, they grow unboundedly. Errors of geoid height differences are also small for short distances, but are in theory limited to the maximum error plateau of √ 2E N 0 . In our numerical experiment, this level may be exceeded in correspondence with the negative correlations of geoid height errors as shown in Fig. 6. Comparing the error characteristics of geoid height differences in the uncorrelated (panels a and b) and correlated (panels c and d) noise regimes, we observe that the maximum error plateau is reached for shorter distances in the uncorrelated case. This is in correspondence with the shorter correlation length of geoid height errors in the uncorrelated compared to the correlated case, see Fig. 6. In other words, correlated geoid height errors are to a large extent eliminated by forming geoid height differences. However, this does not improve the error characteristics of geoid height differences in the absolute sense, because error propagation of correlated gravity errors increases the overall noise level of geoid heights and thus the maximum error plateau of geoid height differences (see panels c and d as compared to panels a and b of Figs. 6 and 7). Depending on the absolute error level, we may identify spatial scales for which spirit leveling provides more precise height differences than GNSS leveling and vice versa. For example, in the uncorrelated case of our experiment (Fig. 7,  panels a and b), leveling gives better height information than the geoid for distances between about 0.05 • -0.7 • , i.e., 5-70 km. For shorter distances, the methods are very similar (for such small distances leveling errors are still small, as they grow by distance, while geoid height differences are precise due to the large correlation of geoid errors over short distances). For larger distances, the geoid models provide better information, because the errors of geoid height differences are bounded. At the same time, we recognize that these distance measures are different for geoid heights derived from the different geoid modeling techniques. For example, the solution using Stokes's formula outperforms leveling already for distances of about 30-40 km, while the SK solution gives superior results only after distances of about 50-70 km (with the LSC solution again between Stokes's formula and the SKs). Even though such distance measures are relevant for practical applications, we must stress that the mentioned numbers are only valid for our specific numerical setup. This not only holds true for the overall error level, but also for the relative differences between Stokes's formula, LSC, and the SKs.

Discussion
In our study, Stokes's formula generally gives smaller error estimates than the estimation techniques. This might be expected as it is a numerically stable method, but it is also the only method which is independent of the signal characteristics (as given by the signal degree variances). In this context, we emphasize that the error measures from the integration and estimation techniques must be interpreted differently. While the error derived from Stokes's formula corresponds to the pure integration of the given gravity anomaly error field, the estimation techniques quantify to what extent the expected signal can be estimated at all given a certain distribution of data points as well as information on the signal characteristics. In other words, Stokes's formula would give identical formal geoid errors when using identical error fields in different regions, while the estimation techniques would for the same error field give larger formal errors for regions with rougher gravity field signal.
The formal errors in East Frisia are generally smaller than in the Alpine region, which we attribute to the smaller signal variance in East Frisia. Thus, the estimation techniques give a smaller error. Stokes's formula also shows slightly smaller formal errors in East Frisia than in the Alpine region, but this is likely due to meridian convergence, as mentioned in Sect. 3.2.
Furthermore, the formal errors of LSC and the SKs are more dependent on the signal variance than the empirical errors are. The empirical errors of East Frisia are slightly larger than in the Alpine region. Due to the fact that the signal variance is smaller in East Frisia, the amplitudes of signal and error are more similar than in the Alpine region. Thus, relatively speaking, the error makes up a larger part of the noisy gravity anomaly signal in East Frisia and makes it more challenging to separate signal and noise. Both LSC and SK estimation techniques aim to smooth the data so as to filter out errors and will inevitably smooth the signal as well in the process. This would lead to more signal being smoothed in East Frisia than in the Alpine region, which could explain the slightly larger empirical errors, contrasting the formal errors. As evident from Eq. (23), the SK formal error propagation is dependent on the regularization parameter α, as governed by the regularization contribution matrix. Considering the difference in formal error estimates in East Frisia and the Alpine region, we investigated the effect of regularization on these, see Fig. 8. In all cases, we observe that δ N is relatively stable for a range of α values, where we interpret the maximum and minimum of this range as limits for overand under-regularization. We also observe a larger impact of regularization on the estimated formal errors than on the empirical errors.
Considering again the argument that we expect the target area formal errors to be homogeneous, Fig. 8 suggests that an optimal solution may be found when the maximum and minimum curves of the formal errors converge. This, however, is not a strict measure: the slow convergence of the maximum and minimum curves makes it difficult to find an exact α value for which they converge, and in some cases the solution already starts to diverge before the maximum and minimum curves coincide. Still, homogeneity of formal errors in the target area could be indicative of a stable numerical computation, in addition to sufficient data (and grid) area sizes. Further investigations are necessary to confirm whether the behavior we observe in Fig. 8 also holds for heterogeneous data distributions.
We have argued that it is easier to separate signal and noise in the uncorrelated error regime, as the errors ideally tend to average out. Then, the solution is relatively stable, and a broad range of α values will give small empirical errors, as evident from Fig. 8a, b. A broad range of α values also gives small empirical errors in the Alpine correlated regime (Fig. 8d). By contrast, the East Frisia correlated regime shows a distinct empirical error minimum, revealing that the chosen α is not optimal. This might be related to the mentioned fact that the signal is smoother in East Frisia, complicating the separation of signal and error. Using Fig. 8 as a guide, we can choose α values that are optimal in the sense that the formal and empirical errors are similar and minimized. We have indicated the RMS values of solutions using the optimal α values in bold in Tables 3 and 4, which indeed increase the method agreement. Although choosing α through methods that essentially minimize the difference to a reference solution have shown promising results (e.g., Naeimi 2013; Klees et al. 2019), they might be difficult to use in practical applications, where a reference solution is typically unavailable. Thus, we hesitate to promote this type of regularization parameter choice rule in general, but merely demonstrate that in this study, if α is indeed optimally chosen, the SKs give solutions of similar quality to the other methods.
Generally, Fig. 8 underlines the challenge of finding an optimal α value. The L-curve method is a heuristic method which can fail regardless of how it is implemented, e.g., in cases where the optimal α is not located in the corner of the L-curve (Hansen et al. 2007). In this paper, we use the continuous L-curve method as implemented in the l_curve routine, where the corner of the L-curve is found by determining its maximum curvature. To get an idea of how the optimal α may vary depending on the regularization approach, Fig. 8 also includes optimal α values as determined by the discrete L-curve method (based on fitting a two-dimensional spline curve to the discrete L-curve), α * L−curve , as well as the generalized cross-validation regularization method (minimizing the residual norm using the "leave-out-one" approach), α GCV , both implemented in the regularization tools. The three methods generally agree, and no single method consistently outperforms the others. Still other methods for determining the regularization parameter exist, see, e.g., Naeimi et al. (2015), Wu et al. (2018), and Liu et al. (2020) for further comparisons in the context of gravity field modeling.

Summary and conclusions
This study extends the investigations of Ophaug and Gerlach (2017), wherein three methods for regional geoid computation-Stokes's formula, LSC, and splines-were found to be theoretically equivalent and to numerically agree on the mm level in a closed-loop environment using synthetic noise-free data, by comparing formal error propagations using the three methods.
Geoid solutions in East Frisia and the Alpine region, based on gravity anomalies subjected to uncorrelated and correlated noise on the 1-mGal level, on average, give empirical errors of 4-5 mm and 8-16 mm, respectively. The formal errors agree with the empirical errors, giving 4-7 mm in the uncorrelated regime, and 10-21 mm in the correlated regime. The uncorrelated formal errors are generally smaller than the correlated errors, as random errors tend to average out to zero.
We also compare the accuracy of geoid height differences derived from the three methods and discuss general error characteristics of geoid height differences in relation to GNSS leveling. If applied to real data, the comparison of height differences derived by means of spirit leveling or by means of GNSS leveling allows to identify typical point distances for which spirit leveling outperforms GNSS leveling. For larger distances, GNSS leveling typically gives better height information because the errors are bounded, as opposed to errors from spirit leveling, which grow unboundedly. Such a distance measure will depend on the data availability and quality in the specific study area. Realistic error budgeting for a specific area is outside the scope of the present study, leaving our discussion qualitative.
The LSC and SK estimation techniques give larger errors than numerical integration using Stokes's formula, which we attribute to the fact that the estimation techniques are dependent on the signal characteristics. While Stokes's formula gives similar formal errors in both East Frisia and the Alpine region, the estimation techniques give notably smaller errors in East Frisia, due to the smaller signal variance of that region. In East Frisia, all methods give formal errors that are realistic within 13%. In the Alpine region, the formal errors are realistic within 20% for Stokes's formula and LSC, while only within 44% for the SKs. Computing SK solutions for a range of α values revealed that the formal errors are highly dependent on the regularization parameter, and that α has a larger impact on the formal than on the empirical errors. Although the obtained α values using the L-curve and generalized cross-validation methods led to a stabilized numerical system, this was not necessarily equivalent to obtaining the best solution. Using a regularization parameter governed by the agreement between formal and empirical error fields provided a solution of similar quality to the other methods.
Due to our numerical setup with regularly distributed data points, and coinciding computation points, we expect the errors to be homogeneous within the target areas. This is clearly the case with Stokes's formula, and practically also with LSC. For the SKs, however, while a broader range of α values allowed for a meaningful solution, only a certain band of α values gave homogeneous formal errors in the target areas. If the latter is true, we conclude that the SK solution is numerically stable and that the data and grid areas are of sufficient size. We stress, however, that more research must go into this topic, including, in particular, cases of heterogeneous data distributions.
Generally, the present paper has demonstrated the importance of carefully tuning the applied functions with respect to band limitation, signal adaptation, and regularization, as they all significantly affect the estimated formal errors on the centimeter level. If care is taken, however, the three methods provide realistic formal errors under assumptions of either uncorrelated or correlated errors.
This gives us confidence in the applicability and comparability of formal errors resulting from the three methods. Reliable formal errors can provide valuable information for measurement campaign planning and are critical when setting up a formal error budget in the search for the "1-cm geoid."