1 Introduction

Precise measurements of vertical motions of Earth’s surface with a precision approaching 0.1 mm/yr within the realization of a well-defined global reference frame (e.g., the International Terrestrial Reference Frame (ITRF)) (Métivier et al. 2020; Altamimi et al. 2016) are required to study numerous physical phenomena (e.g., Plag and Pearlman 2009). Variations in vertical position consist of secular motions (e.g., postglacial rebound, plate boundary deformation, orogeny, sea-level rise) and transients, including natural (e.g., coseismic offsets, postseismic decay, slow slip events, hydrological mass loading, volcanic uplift, cryospheric motions) and anthropogenic (e.g., subsidence due to groundwater extraction, climate change) sources (Amos et al. 2014; Argus et al. 2014; Borsa et al. 2014; Smith-Konter et al. 2014; Hammond et al. 2016; Howell et al. 2016; Simon et al. 2021; Métivier et al. 2020; Wöppelmann and Marcos 2016; Bitharis et al. 2017). To estimate vertical motions from GNSS observations, we apply a functional linear trajectory model to the daily displacement time series, consisting of a linear trend (station velocity), sinusoidal parameters, offset parameters (coseismic and artifacts), and logarithmic/exponential terms under the presence of postseismic motions (e.g., Nikolaidis 2002; Bevis and Brown 2014; Bevis et al. 2020; He et al. 2017).

To obtain both unbiased and minimum-variance parameter estimates, along with realistic parameter uncertainties, the modeled covariance matrix of the GNSS observations (stochastic model) should best reflect the times series observation noise (e.g., Bos et al. 2020). This requires the characterization of the spatiotemporal noise processes inherent in GNSS observations. A proper characterization of temporal (colored) noise has a significant impact on the quality of the estimates of station velocities. Erroneous covariance model assumptions can lead to a drastic over/underestimation of parameter uncertainties. Especially, the low-frequency noise portion and stochastic model assumptions have a significant influence on the uncertainty of station velocity (e.g., Santamaría-Gómez et al. 2011; Williams 2003a; Zhang et al. 1997). Besides unmodeled geophysical phenomena (He et al. 2020), the stochastic model should ideally account for stochastic-type variations in the position time series, including monument noise, receiver and antenna noise, and implicitly, unmodeled multipath effects (e.g., King et al. 2012; Klos et al. 2015) (although minimal in daily displacements, but considered a signal in GNSS reflectometry, e.g., Larson 2016).

The presence of flicker noise (FN) in GNSS displacement time series was recognized by Zhang et al. (1997) and Mao et al. (1999), describing temporal correlations (“colored noise”) in addition to uncorrelated white noise (WN), albeit with limited data sets, or some combination of WN, FN, and random walk (RW) noise with longer data sets (Langbein 2008). These three stochastic models are special cases of a power-law (PL) and white noise (PLWN) stochastic model (Agnew 1992). WN, FN, and RW are characterized by spectral magnitudes and spectral indices of 0, − 1, and − 2, respectively (spectral index = spectral slope of the noise in a double-logarithmic plot, see Bos et al. (2020), although in general a PLWN model can also have non-integer values. Multiple studies have demonstrated that the noise characteristics of GNSS displacement time series can be best described by some form of PLWN model (e.g., Williams 2003a; Williams et al. 2004; Bos et al. 2010; Klos et al. 2018b; Langbein and Svarc 2019; He et al. 2017; Montillet and Bos 2019). Within a global analysis, Klos et al. (2020a) showed that PL noise is dominant over Europe, with strongly increasing spectral amplitudes and indices toward Northern Europe (e.g., Bogusz et al. 2019). Using the EPN repro-2 dataset, Nistor et al. (2021) performed an analysis of GNSS noise over Europe; FN was found to be dominant. When comparing PLWN (best-fitting spectral index) and FN model estimates with respect to the noise power spectral density (PSD) of the time series, they noted a bias in the PLWN estimates.

Significant portions of GNSS daily displacement signals can be explained by non-tidal atmospheric loading (NTAL), non-tidal oceanic loading (NTOL), and hydrological loading (HYDL) (e.g., van Dam et al. 1994; Mémin et al. 2020; Springer et al. 2019; Williams and Penna 2011; Tregoning and Watson 2009). Amplitudes can reach levels of centimeters and can affect all observed frequency ranges. If uncorrected, loadings can significantly contribute to estimates of VLM velocity (up to 1 mm/year, see Santamaría-Gómez (2015), for example). Jiang et al. (2013), Wu et al. (2020), and Mémin et al. (2020) compared NTAL+NTOL+HYDL reduction rates using different models on a global scale. For Europe, typical reduction rates in scatter ranged from 20 to 50%. Martens et al. (2020) highlighted the importance of improved mapping functions to observe NTAL deformation, as otherwise the zenith total delay induced positioning errors tend to compensate NTAL-induced displacements. For the Western US, correcting for loading effects can significantly reduce the daily GNSS displacement scatter by 5–30% and about 10–40% of the mean amplitudes of seasonal oscillation (Martens et al. 2020). Mémin et al. (2020) also observed a larger reduction in scatter when applying loading corrections that are based on high-resolution numerical weather models. Recent studies of vertical postseismic deformation (e.g., Ward et al. 2022) also correct displacement time series for the effects of NTAL, NTOL, and HYDL using published physical models (Dill and Dobslaw 2013), resulting in decreased scatter in daily displacement time series. Argus et al. (2017) corrected for the effects of NTAL and NTOL in estimating changes in total water storage at Earth’s surface and found significant differences with hydrological models. Regarding the effect of corrections on land motion parameter uncertainties, Klos et al. (2021) showed for GNSS stations in inner continental Eurasia, that by correcting for the effects of NTAL, the uncertainties in station velocity could be reduced by a factor 2 (presuming PLWN models). NTAL and NTOL can contribute at all frequency bands, while HYDL mostly has contributions in the seasonal bands. Gobron et al. (2021) analyzed the joint effect of NTAL+NTOL on stochastic properties of long-term GNSS and found that the presence of NTAL results in an overestimate of PLWN coefficients. This in turn over-estimates velocity uncertainty. They also pointed out differences in the loading-corrected displacement spectra with respect to those presented by Männel et al. (2019), who corrected NTAL at the observation level. However, the main differences underlying the positioning solutions in both studies were the use of different loading corrections, sets of stations, and mapping functions. Gobron et al. (2021) also mentioned the use of generalized Gauss Markov (GGM) models to calibrate NTAL+NTOL, since these can account for more complex spectra (e.g., bending effects) (He et al. 2019). However, Gobron et al. (2021) also concluded that this leads to a complex and nonlinear variance estimation problem.

Beside physical loading corrections, empirical approaches account for common-mode errors (CME) in the GNSS time series data, such as stacking and weighted stacking (Wdowinski et al. 1997; Nikolaidis 2002), generalized by principal component analysis (PCA) (Dong et al. 2006) techniques, or median filtering approaches (Klein et al. 2019; Kreemer and Blewitt 2021). These approaches have in common that they do not only account for environmental phenomena present in the data, but also for other errors arising from orbit or antenna phase center mismodeling, or software shortcomings. Thus, the CME accounts for spatiotemporal-correlated errors, including geophysical loadings (if not corrected for). He et al. (2015) concluded that up to four first PCA components should be considered for CME reconstruction, and that—for large-scale networks, it should be evaluated in smaller blocks. With a focus on the Mediterranean region, Serpelloni et al. (2013) demonstrated the reduction of GNSS velocity uncertainties by using CME filtering. Another application of CME for the investigation of tectonic signals from GNSS data was given by Pintori et al. (2021), who studied the impact on estimated rates and noise across the European Alps, and also demonstrated a reduction in PL noise. A thorough review of GNSS CME approaches has been given by He et al. (2017) and He et al. (2020).

Step discontinuities (offsets) play an important role in GNSS trajectory parameter estimation and strongly affect station velocity estimates along with its uncertainty. The presence of undetected offsets—such as coseismic displacements and artifacts (e.g., changes in GNSS antennas)—will bias station velocities by introducing RW-type long-period noise. Also, many undetected offsets can mimic RW variations and inflate velocity uncertainties. (e.g., Williams 2003b; Gazeaux et al. 2013). However, the separation of offset effects on the functional and stochastic models is a challenging task. Contrary to the case of undetected offsets, too many assigned offsets (false positives) will underestimate the low-frequency portions of the noise spectrum by absorbing noise variance, resulting in an underestimate of velocity uncertainty (Santamaría-Gómez and Ray 2021). Long-period noise has the most significant impact on station velocity uncertainty. The study of Wang and Herring (2019) documented a limited effect of offsets on velocity uncertainty when the noise is strongly time correlated. However, as Santamaría-Gómez and Ray (2021) pointed out, this study missed certain important aspects, with the most prominent one not having accounted for the cumulative impact of offsets on the stochastic model caused by reduction in long-period noise spectra. This favors the choice of GGM noise models over combinations of random walk, flicker, and white noise (RWFNWN) stochastic models (Santamaría-Gómez and Ray 2021). Additionally, Gobron et al. (2022) showed that, under the presence of offsets in the functional model, using conventional maximum likelihood estimation (MLE) for determining spectral amplitudes and indices systematically underestimates station velocity (by up to 50% or more) and therefore proposed to use restricted maximum likelihood estimation (RMLE), to limit the magnitude of uncertainty biases. Other than MLE, RMLE accounts for the loss of variance introduced by offsets in the estimation process and therefore is less biased for long-period noise.

In this work we investigate the achievable improvements in sensitivity of GNSS-derived vertical land motion parameters—in particular station velocity (“trend sensitivity”)—after correcting the observations for loading effects and common-mode signals. Using the European data, we discuss (1) best-fitting stochastic models for the uncorrected and corrected cases (2) the attainable sensitivities (statistically detectable velocity) of vertical land motion when using unbiased noise models, and (3) differences of the loading-corrected case (including HYDL) with respect to the CME-corrected case. Concerning (1) and (2), we show that for the uncorrected case, PLWN stochastic models should be augmented by autoregressive (AR) models of order 1, in order to yield optimal fits to GNSS noise. By using realistic stochastic models, improvements in velocity uncertainty/sensitivity can still be achieved, but are significantly lower than those reported by previous studies (e.g., Klos et al. 2021; Gobron et al. 2021). We report misfits between GNSS data and loading models at low frequencies, particularly for the Southern regions of Europe. Regarding (3), the comparison with the CME analysis reveals that even a with the most basic PCA approach and a low number of PCA components, improvements in parameter uncertainty/sensitivity can be achieved with respect to the loading-corrected case. We demonstrate the added value of RMLE for obtaining better fitting stochastic models at the low frequencies, when offsets are present. We then compare the trend sensitivities with the statistically detected trends. Finally, we discuss the sensitivity of GNSS in resolving seasonal VLM signals and highlight the sensitivity to offset detection.

2 Data, workflow, and models

2.1 Data and workflow

We obtained the GNSS height time series openly available from the Nevada Geodetic Laboratory (NGL) (Blewitt et al. 2018; Nevada Geodetic Laboratory 2024) for European Permanent Network (EPN) stations (Bruyninx et al. 2019; EPN data repository 2023). We chose EPN stations, since they are of reliable and well-known quality, and accurate metadata are available. The European stations are mostly on the stable Eurasian plate except on its periphery (Italy and the Aegean). Generally, vertical velocities are on the order of ± 2 mm/yr and can be well fit by a single linear trend, including in Scandinavia, which is subject to glacial isostatic adjustment (GIA) with secular uplifts up 3–10 mm/yr (Peltier et al. 2015; Bogusz et al. 2019). Therefore, the European vertical time series provide a best-case scenario for evaluating colored noise processes and parameter sensitivity, in comparison to, for example, the U.S. West Coast heights, where the vertical displacements include nonlinear natural and anthropogenic motions (transients) such as crustal decay after earthquakes (postseismic deformation), large regions of irregular subsidence and uplift, volcanism, oil extraction, and drought (e.g., Klein et al. 2019). For the fitting of functional and stochastic GNSS time series models, we use Hector Software Repository (2024, version 1.9), distributed for Ubuntu 20.04 LTS, together with updated routines capable of RMLE. Besides Hector, other state-of-the-art MLE-based software packages to analyze geodetic time series include CATS (Williams 2008) or est_noise (Langbein 2004).

Fig. 1
figure 1

Overview of the data processing. More details can be found in Sect. 1 of the supplementary material

An overview of the data processing workflow is shown in Fig. 1. We analyzed daily displacement time series starting in January 1996 for the oldest stations, up until January 1, 2022. In total, we processed time series data for 244 stations. On average, the length of the investigated time series is 18.2 years (median of 18.8); minimum length is 3.5 years, and maximum length 26 years. We estimated offsets according to the catalog data provided by NGL. We also introduced additional offsets through visual inspection. After the first round of processing, we identified about half of the offsets to be insignificant (at 95% confidence level), using statistical hypothesis testing (Sect. 2.3). Thus, from a total of 707 offsets, 344 were kept. The “uncorrected” time series (Fig. 1) is referred to the case where no loading/CME corrections were applied to the GNSS data. To reduce the noise level, each time series was corrected in two ways. First, we corrected for the vertical component of physical loading models provided by Geoforschungszentrum Potsdam (GFZ) (Dill and Dobslaw 2013; GFZ loadings repository 2022), taking these models as given with no error. These include NTAL, NTOL, and HYDL loadings. GFZ provides these data on a regular 0.5°\(\times \) 0.5°grid, which we downloaded and bi-linearly interpolated using the shell script extractlatlon_bilinintp_remote.sh, provided by GFZ. Time resolution for HYDL is 24 h (referring to 12 UTC), and 3 h for NTAL and NTOL, which were averaged over a day. The NGL time series are given in IGS14 frame; loadings were obtained with respect to Earth’s center of figure (CF) (Dong et al. 2003). The tested scenario for reducing the noise in the data includes the sum of all three loading models, referred to as “AOH”. Separately, using an empirical approach, we corrected the time series residuals from a parametric fit to the data (Eq. 1) for common-mode errors (CME) derived from a three-component principal component analysis (PCA) (Dong et al. 2006) of the best-fit residuals. The uncertainties of the estimated trajectory parameters serve as as input to the sensitivity analysis. The results of the Hector trajectory estimation are provided at Hohensinn (2024). The parameter sensitivity analysis is then carried out based on the chosen significance levels for false positives and false negatives.

2.2 Parameter estimation

2.2.1 Functional model

For the functional model, we consider each GNSS station’s vertical displacement time series of the form \({\textbf{y}}^T = [y(t_1), y(t_2),\ldots ,y(t_k),\ldots , y(t_n)]\), where the index \(k=0,1,\ldots ,n-1\) refers the \(k-\)th of n samples in total, and t is the time (in days). The following trajectory model is considered (in units of millimeters, e.g., Bevis and Brown 2014; He et al. 2017):

$$\begin{aligned} y(t_k)= & {} a + b \cdot t_k + \sum _{j=1}^{11} U_{j,1} \cos (2 \pi f_j t_k ) \nonumber \\{} & {} + U_{j,2} \sin (2 \pi f_j t_k) + \sum _{i=1}^{m} \Delta _i H(t_k - T_i) + z(t_k)\nonumber \\ \end{aligned}$$
(1)

a and b are the intercept and slope of the linear trend function (station velocity), \(U_{j,1} = {\bar{U}}_j \cos (\varphi _j), ~~ U_{j,2} = -{\bar{U}}_j \sin (\varphi _j)\), with \({\bar{U}}_j\) and \(\varphi _j\) being amplitude and phase of the constant-amplitude periodic signals. The first two are annual and semi-annuals, and 3 to 8 are GPS draconitic frequencies at 1.04, 2.08, 3.12, 4.16, 5.20, and 6.24 cpy (Ray et al. 2008); additionally, the periodic signals at the fortnightly frequencies (13.62, 14.17, and 14.76 cpy) were estimated. (Both draconitic and fortnightly signals were found to be present in the data.) \(\Delta _i\) is the \(i-\)th offset (of m offsets for a station) at time \(T_i\), H is the Heaviside step function, and \(z(t_k)\) is the noise term. The model of Eq. (1) is linear, and the parameter vector is defined by \({\textbf{x}} = [a, ~b, ~ {\bar{U}}_1 \cos (\varphi _1), ~ {-{\bar{U}}_1} \sin (\varphi _1),\ldots , ~ {\bar{U}}_j \cos (\varphi _2), {-{\bar{U}}_j \sin (\varphi _j)}, ~ \Delta _1,\ldots , \Delta _m]^{T}\). The least squares estimator for the parameters as well as the parameter covariance matrix are (Bos et al. 2020)

$$\begin{aligned} \hat{{\textbf{x}}}&= ({\textbf{A}}^{T} {\textbf{C}}^{-1}_{\textbf{yy}} {\textbf{A}})^{-1} {\textbf{A}}^{T} {\textbf{C}}^{-1}_{\textbf{yy}} {\textbf{y}} \end{aligned}$$
(2)
$$\begin{aligned} {\textbf{C}}_{{\hat{x}}{\hat{x}} }&= ({\textbf{A}}^{T} {\textbf{C}}^{-1}_{\textbf{yy}} {\textbf{A}})^{-1} \end{aligned}$$
(3)

where \({\textbf{A}}\) is the design matrix for the model of Eq. (1), and \(\mathbf {C_{yy}}\) is the covariance matrix of the observations. The 1-sigma standard deviation (uncertainty) for the i-th parameter can be computed by \(\sigma _{{\hat{x}}_i} = \sqrt{{\textbf{C}}_{{\hat{x}}_i{\hat{x}}_i}}\).

2.2.2 Stochastic model

The covariance matrix of the observations \(\mathbf {C_{yy}}\) is considered to be a combination of p different stochastic models (Williams 2008)

$$\begin{aligned} \mathbf {C_{yy}} = \sum _{i=1}^p \sigma _i^2 {\textbf{J}}_i \end{aligned}$$
(4)

where \(\sigma _i^2\) is the variance component of the i-th of p stochastic models, with its correlation structure represented by \({\textbf{J}}_i\). The matrices in Eq. (4) are of dimension \(n \times n\). In this work we consider three different combinations of stochastic models:

$$\begin{aligned} {\textbf{C}}_{\textbf{yy}}^{(1)}&= \sigma ^2_\textrm{PL} \cdot {\textbf{J}}_\textrm{PL}(\kappa ) + \sigma ^2_\textrm{WN} \cdot {\textbf{I}} \end{aligned}$$
(5)
$$\begin{aligned} {\textbf{C}}_{\textbf{yy}}^{(2)}&= \sigma ^2_\textrm{AR1} \cdot {\textbf{J}}_\textrm{AR1}(\phi ) + \sigma ^2_\textrm{PL} \cdot {\textbf{J}}_\textrm{PL}(\kappa ) + \sigma ^2_\textrm{WN} \cdot {\textbf{I}} \end{aligned}$$
(6)
$$\begin{aligned} {\textbf{C}}_{\textbf{yy}}^{(3)}&= \sigma ^2_\textrm{RW} \cdot {\textbf{J}}_\textrm{RW} + \sigma ^2_\textrm{FN} \cdot {\textbf{J}}_\textrm{FN} + \sigma ^2_\textrm{WN} \cdot {\textbf{I}} \end{aligned}$$
(7)

where the superscript (1) stands for PL+WN (PLWN) model, (2) stands for AR1+PL+WN (AR1PLWN), and (3) stands for RW+FN+WN (RWFNWN). Each model consists of a combination of one or two colored noise part(s) plus a white noise part. \({\textbf{I}}\) is the identity matrix. \({\textbf{J}}_\textrm{PL}\) depends on the spectral index \(\kappa \) (Bos et al. 2020). The RW, the FN, and the WN models are special cases of the PL model: \(\kappa \) = − 2 is RW noise, \(\kappa \) = − 1 is FN, and \(\kappa \) = 0 represents white noise. AR1 represents the autoregressive model of order 1 (e.g., Klos et al. 2018a),and \({\textbf{J}}_\textrm{AR1}\) depends on the lag-1 coefficient \(\phi \). The variance components for each model and the unknown parameters of the J matrices (for PL and AR1 model) (Eqs. 5 to 7) are estimated by the Hector software from the residuals of the parametric fit of Eq. (1) using numerical MLE methods (Bos et al. 2008; Williams 2008; Bos et al. 2021). The parameters include the spectral indices and coefficients of the PL and AR1 models.

Complementary to MLE, RMLE accounts for the loss of variance in the likelihood function, caused by the estimation of the parameters present in the functional model. For the estimation of the variance components in Eqs. (5) to (7), the loss of variance is accounted for by adding an additional term to the log-likelihood function (cf. Eqs. (15) and (20) from Gobron et al. (2022)), resulting in unbiased estimates of the variance components. In the case of a single variance to be estimated (different from this case), the RMLE case is equivalent to dividing the weighted square sum of the residuals by the degree of freedom of the adjustment (instead of a division by the number of observations for the conventional MLE case).

2.2.3 Model choice

Since we consider several stochastic models, our model choice for each station will be guided by the Bayesian information criterion (BIC) (Bos et al. 2020)

$$\begin{aligned} \textrm{BIC} = u \ln (n) - 2 \ln (L) \end{aligned}$$
(8)

where u is the total number of estimated parameters, n is the number of samples, and L is the maximized value of the likelihood function. The model with the lowest BIC is chosen. BIC is routinely provided by the Hector software (Bos et al. 2021).

2.3 Statistical analysis

2.3.1 Assumption of normal distributions

In time domain, the stochastic models considered here are explained by linear filtering operations on independent and identically distributed driving noise (e.g., Bos et al. 2008). After accounting for colored noise in the observation covariance matrix to estimate realistic parameters and uncertainties (Eqs. 2 and 3), for further sensitivity analysis we assume a normal distribution for the noise term in Eq. (1), \({\textbf{z}} \sim ~ N (0, \mathbf {C_{{z}{z}}})\), and consequently \(\mathbf {C_{yy}} \equiv \mathbf {C_{zz}}\). So, despite time correlation, the observation noise in GNSS station position time series is considered to be normally distributed. Consequently, using linear least squares expressions Eqs. (2) and (3), the resulting set of parameters \(\hat{{\textbf{x}}}\) thus follows a normal distribution \(\hat{{\textbf{x}}} \sim {\mathcal {N}}({\textbf{x}},\, {\textbf{C}}_{{\hat{x}}{\hat{x}} } )\), and so, a commonly used likelihood ratio test statistic (“signal-to-noise”), i.e., relating estimated mean and standard deviation, follows a standard normal distribution. However, this assumes the coefficients of the covariance matrix of the observations \(\mathbf {Q_{yy}}\) to be known. For this study, the stochastic model is reflected in a combination of at least two covariance matrices (Eqs. 57), for which the parameters are estimated along with the parameters of the functional model. In the case of estimating a single variance component under low degrees of freedom, a t-distributed test statistic is appropriate, in order to test the significance of an estimated mean. As discussed for multiple stochastic model variance component estimation by Amiri-Simkooei (2007) and Teunissen (2004), a test statistic for deriving hypothesis tests for the functional and stochastic models generally follows complicated distributions (under the assumption of normal distributed observation noise) and cannot be explained analytically. However, for large degrees of freedom \(n-u\), the central limit theorem holds, and the test statistics can be approximated by standard normal distributions. Under these assumptions Amiri-Simkooei et al. (2019) also derived a framework for the significance testing of offsets, detected from GPS time series data, where both, multiple functional and stochastic model parameters were estimated. In the following, these assumptions also serve as a basis for the use of a standard normal distributed test statistic, on which our sensitivity analysis relies on.

To further justify the normal distribution assumptions, 1000 time series paths with 1826 samples (5 years) each were realized using a Monte Carlo simulation, consisting of flicker noise + white noise. We used 5 years of data, since this is a reasonably short GPS time series, with a degree of freedom of about 1800. Fractions of noise are set to 0.7 for the flicker and 0.3 for the white noise, and white normal distributed driving noise with an assumed standard deviation of 1 mm as an input (Bos et al. 2021). For each path, we estimated the parameters of a functional model, consisting of an intercept, linear velocity, 2 sinusoids (annual and semiannual) as well as 4 equally distributed offsets, along with the variance factors. The histogram for the estimated velocities is shown in Fig. S2, together with the sample standard deviations as well as the formal error (1 sigma) resulting from the adjustment. The estimated parameters clearly follow a normal distribution, and the sample standard deviation agrees with the standard deviation obtained by least squares \(\sigma _{{\hat{x}}_i}\).

2.3.2 Hypothesis test and sensitivity

In order to define a proper statistic for the sensitivity analysis, we characterize any estimable displacement parameter \({\hat{x}}_i\) of \(\hat{{\textbf{x}}}\)—that makes up the trajectory model of Eq. (1)—by its univariate normal distribution, given by \({\hat{x}}_i \sim N(x_i,\sigma ^{2}_{x_i})\). Thus, by computing the standardized normal distribution, via relating the parameter to its standard deviation, \({\hat{x}}_i\) could be tested for significance. For testing if a single estimated parameter \({\hat{x}}_i\) is significant, we set the constraint to 0 (Amiri-Simkooei 2007). The null and alternative hypotheses for such an univariate test are

$$\begin{aligned} H_0:&~~~~~ \text {E}({\hat{x}}_i) = 0 \end{aligned}$$
(9)
$$\begin{aligned} H_A:&~~~~~ \text {E}({\hat{x}}_i) \ne 0 \end{aligned}$$
(10)

The maximum likelihood ratio-based test quantity can then be formed by

$$\begin{aligned} T_v = {\hat{x}}_i /{\sigma _{{\hat{x}}_i}} \end{aligned}$$
(11)
Fig. 2
figure 2

The relation of a null hypothesis and an alternative hypothesis of the statistical test. The non-centrality parameter \(\delta \) depends on the chosen level of significance \(\alpha \) and \(\beta \) (type I and type II errors)

This is a parameter significance test (v-test, Amiri-Simkooei (2007), and \(T_v\) follows a standard normal distribution. In Eq. (11), \(\sigma _{{\hat{x}}_i}\) is the standard deviation of the \(i-\)th parameter, computed as the square root of the \(i-\)th diagonal entry of \({\textbf{C}}_{\hat{{\textbf{x}}}\hat{{\textbf{x}}}}\). Under \(H_0\), the test quantity is supposed to be standard normally distributed \(T_v \sim {\mathcal {N}} (0,1)\). Generally, the null hypothesis can be rejected if the following test criterion is true: \(|T_v |> {\mathcal {N}}_{\alpha / 2} (0,1)\). The level of significance \(\alpha \) reflects the probability of a false positive error (type I error). Under a concrete alternative hypothesis \(H_A\), \(T_v\) is distributed as \({\mathcal {N}}_{\alpha / 2} (\delta ,1)\) with a non-centrality parameter \(\delta \) (e.g., Teunissen and Kleusberg 1998; Koch 1999). \(\delta \) is a measure of the standardized distance from the null to the alternative hypothesis (Fig. 2). Beside \(\alpha \), the sensitivity depends on the level of significance \(\beta \), which reflects the probability of a type II error (false negative). Given the null and alternative hypothesis, the relationship of \(\alpha \), \(\beta \), and \(\delta \) is shown in Fig. 2.

For a test of a specific estimated parameter \({\hat{x}}_i\), \(T_v\) would be an estimate of \(\delta \). However, in practice, \(\delta \) is mostly unknown, as a concrete alternative hypothesis cannot be specified. (This includes most geodetic and geophysical applications.) However, in Eq. (11), we can replace the test quantity \(T_z\) with \(\delta \), and as a consequence, we have to replace the parameter estimate by its expected value, \(E({\hat{x}}_i) = x_i\), which we will refer to as sensitivity of the parameter \(x_\textrm{min}\) in the following. This yields

$$\begin{aligned} \delta (\alpha ,\beta ) = \frac{x^\textrm{min}_i}{\sigma _{{\hat{x}}_i}}. \end{aligned}$$
(12)
Fig. 3
figure 3

Root mean square (RMS) amplitudes of the individual loading corrections (after removal of seasonal signals), as well as their sum (A: NTAL, O: NTOL, H: HYDL, AOH: sum)

Thus, by specifying values for \(\alpha \) and \(\beta \) (false positive and false negative rates), the corresponding non-centrality parameter can be computed by inverting the cumulative distribution functions of the standard normal distribution at \(1 - \alpha \) and \(1 - \beta \), followed by summing-up of the so-obtained values (often called z-scores) at these quantiles set by \(\alpha \) and \(\beta \). Finally, a numerical value for \(x^\textrm{min}_i\) for each parameter can be computed by considering its standard deviation \(\sigma _{x_i}\), which is obtained from the parameter estimation process, by

$$\begin{aligned} x^\textrm{min}_i = \delta (\alpha ,\beta ) \sigma _{{\hat{x}}_i} \end{aligned}$$
(13)

\(x^\textrm{min}_i\) in Eq. (13) refers to the sensitivity of the i-th parameter. It can be obtained for any quantity of the parameter vector (i.e., the linear trend, the amplitudes of the sinusoids, offsets and intercept/bias terms). For a given functional model, \(x_\textrm{min}\) depends on the chosen levels of significance, as well as on the standard deviation of the parameter, obtained by \(\sigma _{{\hat{x}}_i} = \sqrt{{\textbf{C}}_{{\hat{x}}_i{\hat{x}}_i}}\) of Eq. (3).

In Sect. 2 of the supplementary, we also discuss the underlying framework for the sensitivity of the seasonal signals amplitudes, which are also approximated by standard normal distributions.

3 Results

Starting with the uncertainties from the parametric model from the linear least squares analysis (Eq. 1), we analyze the sensitivity of the estimated vertical velocity component in the presence of other fit parameters, including multiple offsets and periodic terms, taking into consideration spatiotemporal stochastic errors as represented in the covariance matrix of the daily observations (Eq. 4). For the sensitivity analysis, the type I (false positive) and type II (false negative) error rates are set to \(\alpha \) = 1% (false alarm rate) and \(\beta \) = 20% (missed detection rate), respectively. These are common values in geodetic testing theory, e.g., for the detection of land or object deformations (e.g., Heunecke et al. 2015). With a fixed false alarm rate of 1%, the choice of a relatively low \(\beta \) value ensures a reasonable separability of the null and the alternative hypothesis, and results in a high power of a test. The numeric choice of \(\alpha \) and \(\beta \) results in a non-centrality parameter of \(\delta \) = 3.42. Then, parameter sensitivity is computed station-wise by scaling the uncertainties \(\sigma _{{\hat{x}}_i}\) by \(\delta \).

Fig. 4
figure 4

Results for the trend (velocity) sensitivity under the PLWN stochastic model (top left: uncorrected case, top right: loading-corrected case (AOH), bottom: histogram comparing both cases)

With the goal of obtaining most reliable trend sensitivities, the following three subsections compare the various stochastic models guided by the BIC (Eq. 8):

  1. 1.

    The uncorrected versus the loading-corrected case, comparing the PLWN and AR1PLWN stochastic models;

  2. 2.

    The loading versus the CME-corrected case, comparing the PLWN and RWFNWN stochastic models;

  3. 3.

    Changes for 1. and 2. comparing MLE and RMLE.

Besides station velocity sensitivity, we briefly discuss the sensitivities of the offset and seasonal (annual) parameters. We will explore the rationale behind selecting specific combinations of stochastic models and the application of various correction methods.

3.1 The loading-corrected case

We first examine the root mean square (RMS) displacement amplitudes of the individual loading corrections (A: NTAL, O: NTOL, H: HYDL) from GFZ (Dill and Dobslaw 2013), as well as for the sum (AOH). For this analysis, we removed the 1 cycle-per-year (cpy) and 2 cpy seasonal constant-amplitude contributions from the loading products beforehand, in order to better reveal the non-seasonal signals (Fig. 3). The maps of the annual amplitudes of the loading products can be found in Fig. S4.

The dominance of NTAL to the overall correction budget is evident, especially for the Central and toward the Northern regions of Europe. There, RMS amplitudes can reach 5 mm or greater. For most regions, NTOL and HYDL only have minor contributions to the total loading budget. NTOL shows considerable RMS amplitudes (\(\sim \) 3–4 mm) for the stations at the North Sea (van Dam et al. 2012) (cf. top-right panel in Fig. 3). HYDL is expected to be more significant in the Eastern and Southeastern parts of Europe (Klos et al. 2020b; Springer et al. 2019). However, they are dominated by annual periodic seasonal contributions, which are basically captured by the time series displacement model. This is also true for HYDL trend effects in Central Spain, visible in the RMS amplitudes of Fig. 3 (bottom-left panel). In higher frequency bands, HYDL effects are considered to be small (Klos et al. 2021; Ruttner 2021). We also computed the average PSDs for the consecutive sum of the individual loading contributions (Fig. S5). There, it can be seen that NTAL follows an AR1 process, NTOL only has minor contributions at the very low and the very high-frequency bands, HYDL has strong contributions in the low-frequency and annual frequency bands.

For the initial sensitivity evaluation, we fit PLWN stochastic models for both the uncorrected and the loading-corrected (“AOH-corrected”) cases and show the sensitivity for the velocity component. For the uncorrected case (Fig. 4, top left), there is a strong North–South dependence in the PL noise parameters (cf. Figs. S6 and S7, and, e.g., Klos et al. 2020a; Bogusz et al. 2019; Klos and Bogusz 2017). This can mostly be attributed to NTAL contributions. For the AOH-corrected case, this pattern is no longer visible (Fig. 4, top right).

The sensitivity can be improved by more than 50% in the median when loadings are corrected (Fig. 4, bottom panel). However, recent studies indicated that PLWN model overestimate noise amplitudes toward the North (Nistor et al. 2021; Gobron et al. 2021). This bias results from higher NTAL contributions, which cause increased power in the middle frequency ranges of the data (e.g., Klos et al. 2021), resulting in an upward bias of the stochastic model at low frequencies (Gobron et al. 2021). Figure 5 shows average power spectral densities (PSDs) for the uncorrected and the AOH-corrected cases (averaged across all stations), both for the estimated PSDs from the post-fit residuals (dashed lines) and for the model PSDs, as reconstructed from the estimated noise parameters (solid lines). In the following, estimated PSD will refer to the periodogram estimates of the post-fit residuals, and model PSD refers to the spectrum that has been reconstructed by stochastic noise model parameter estimates, where averaging across stations has been performed beforehand. The solid blue curve in Fig. 5 shows the average model PSDs for the uncorrected case and the PLWN model. Compared to the estimated PSD (dashed blue line), the overestimation at low frequencies, can clearly be seen. This leads to unrealistically high noise amplitudes, and subsequently, overestimated parameter uncertainties, as shown for the Northern European region (cf. Fig. 4). In order to account for this type of overfitting, we follow the recommendation of Amiri-Simkooei et al. (2007) and augment the PLWN model by an AR1 model, denoted as AR1PLWN (Fig. 5, orange lines).

Fig. 5
figure 5

Results for the average power spectral densities (PSDs) of the GNSS noise, computed for all stations, comparing the performance of the PLWN and the AR1PLWN stochastic models for the uncorrected and the loading-corrected (AOH-corrected) cases. The solid lines represent the reconstructed modeled PSD (from the average amplitude of the noise coefficients), and the dashed lines represent the average periodogram PSD estimate

Fig. 6
figure 6

Results for the PLWN and the AR1PLWN stochastic models (left panel: uncorrected case, right panel: AOH-corrected case)

When comparing the estimated PSD with the model PSD, we observe an improved fit with the AR1PLWN model. There is no overfitting at low frequencies, and the increased power in the frequency range from about 2 to 100 cycles per year (cpy) is reproduced. Finally, the green curves show the PSDs for the loading-corrected (AOH-corrected) case. Here, on average, a reasonable fit across all frequency bands is observed. Since it is found to be much more reliable than the PLWN case for the uncorrected case, for the following investigations of trend (velocity) sensitivity, the results for the AR1PLWN model serve as a baseline for the comparison against the loading-corrected, as well as against the CME-corrected cases. In addition to Fig. 5, Fig. S8 compares and discusses these findings in terms of the AR1 coefficient for the uncorrected and the AOH-corrected cases. The distribution and magnitude of AR1 coefficients show that NTAL is well captured by the AR1PLWN model.

Fig. 7
figure 7

Results for the trend sensitivity under stochastic model choice shown in Fig. 6 (top left: uncorrected case, top right: loading-corrected case (AOH), bottom: histogram comparing both cases). Median improvement rates in sensitivity are about 14% when comparing the uncorrected and the loading-corrected cases

Next, we analyze best-fitting models by station, by selecting between AR1PLWN and PLWN for both the uncorrected and the loading-corrected cases, using the BIC criterion (Eq. 8). The results are shown in Fig. 6, for the uncorrected case (left) and for the AOH-corrected case (right). In the uncorrected case, for most stations the AR1PLWN model is chosen (about 210 stations in total). As discussed, this is mostly a consequence of the presence of the NTAL loadings; we again observe a very strong correlation pattern with NTAL RMS, depicted in Fig. 3, top-left panel. The spectral characteristics observed here agree well with the findings presented by Klos et al. (2021). For about 35 stations, which are mostly located in Spain, PLWN is the preferred model choice both before and after AOH-correction. This is a result of the very limited NTAL contributions in that area (cf. Fig. 3). As it can also be seen from Fig. 5 (green curves), the PLWN model is a reasonable choice after correcting for AOH loadings. This is clear when looking at the right panel of Fig. 6, confirming the findings by Gobron et al. (2021), who argued that the PLWN model should be chosen after loadings have been corrected.

Figure 7 is the counterpart of Fig. 4 (PLWN-only case) when comparing the AR1PLWN and PLWN models, for the uncorrected case (left) and the AOH-corrected case (right), together with the corresponding histogram of sensitivity distributions. When comparing the top panels of Fig. 7 with Fig. 4, the strong North-South pattern in sensitivity almost completely disappears (comparing the left panels), and sensitivity has been significantly enhanced for the uncorrected case. This also leads to less improvement in the AOH-corrected case (Fig. 7, right panel). By using the more realistic stochastic modeling for the uncorrected case, the reduction in uncertainty (improvement in sensitivity) achieved by the loading corrections is only about 14% in median, in comparison to the case analyzed in Fig. 4 (about 55% improvement in median). This is much less than reported by Klos et al. (2021) or Gobron et al. (2021), where, comparing the uncorrected against loading-corrected cases (under the PLWN stochastic models), improvement rates of 50% or more have been reported. We conclude that most of the velocity uncertainty difference previously reported was caused by the NTAL-induced PL+WN bias in the loading uncorrected case more than by the actual NTAL stochastic variations.

Fig. 8
figure 8

Top panel: relative change in sensitivity from the uncorrected to the AOH-corrected case shown in Fig. 7 (selection between PLWN and AR1PLWN). Middle panel: PSDs for the stations with improvement (blue markers of Fig. 7, middle panel). Bottom panel: PSDs for the stations with a worsening in sensitivity (red markers of Fig. 7, bottom panel)

We further analyze the changes from the uncorrected to the AOH-corrected case (Fig. 7). Figure 8 (top panel) shows the relative change in sensitivity from the uncorrected case (Fig. 7, top left panel) compared to the AOH-corrected case (Fig. 7, top-right panel). Applying the loading corrections improves sensitivity for about 2/3 of the stations (blue markers). For these stations, the overall improvement rate is about 24% on average. The stations which experience an apparent decrease in sensitivity are indicated by the red markers. For these stations, the decrease is about 23% on average. The middle and bottom panels of Fig. 8 depict the average PSDs of the residuals for both cases: the middle panel shows the stations with improvements, both for the uncorrected (orange curve) and the AOH-corrected case (green curve). By applying the loading corrections, a clear signal reduction across all frequency bands is observed. However, there is only a small reduction at the very low frequencies, especially for the stations in the Southwest. AR1PLWN and PLWN models generally well explain the observed noise, represented by the estimated PSDs. The bottom panel shows less noise reduction, the AOH-corrected PSD curves (green) show most reduction in the middle frequency bands; at low frequencies (below 1 cpy) we see little reduction from the estimated PSDs (if at all). The selected PLWN stochastic models give a reasonable fit to the data, as well.

We discuss two aspects of the results depicted in Fig. 8: an underestimation of the estimated stochastic models (model PSDs) with respect to the estimated PSDs at the low frequencies (to be analyzed in Sect. 3.3), as well as the performance of low-frequency signal reduction, as seen in the estimated PSDs. For the latter, Fig. 9 shows two representative vertical time series for two stations: YEBE is located in Central Spain (top panel), and ALAC is located in the Southeast of Spain (bottom panel). For both cases, we observe non-linear long-period loading effects (green curves) resulting from HYDL; predominantly causing high RMS shown in Fig. 3, bottom-left panel, together with strong annual periodic contributions. Interestingly, for YEBE, the long-period nonlinearity results in a change of slope from a negative to positive trend (see corrected observations in blue, and fitted trajectory model in orange). In the residuals (red) we also observe remaining long-term transient signals, that have not been explained yet by applying the loading corrections (i.e., ALAC, from 2013 to 2018), still contributing to the long-period noise. To further investigate the loading effects, we analyze the PSDs shown in Fig. 10. There, two distinctive effects can be seen: for YEBE, the loading corrections reduce the noise floor at the low frequencies (see estimated PSDs, dashed lines). For ALAC, this is not the case, and there is an excess in power at the low frequencies for the AOH-corrected case.

Fig. 9
figure 9

Vertical GNSS and loading time series for 2 stations located in Spain: the observations (blue, uncorrected and loading-corrected case), the fitted model (orange, based on corrected observations), and the residuals (red, uncorrected and loading-corrected case), as well as the sum of the loadings (green)

Fig. 10
figure 10

PSDs of the GNSS residuals (red), as well as PSDs of the loadings (green) for the example time series shown in Fig. 9

Examining the residuals of station ALC, we detected a visible offset in 2018 that was not present in the offset catalog. We introduced an offset parameter and re-did the analysis. Figures S9 and S10 compare time series and PSDs before and after the offset is introduced. Albeit the fit in the low-frequency spectrum mostly improves (just by adding a single offset!), sensitivity is still degraded. Furthermore, there was no reported catalog at that period of time. Station VALE (not shown) shows a very similar transient in the range of 2013 to 2018, which further speaks against the presence of an offset. As indicated by Fig. 8 (bottom-right panel), we can deduce that loading-model-predicted displacements may not always contribute to the observed GNSS data, particularly observed for the Southwest of Europe (Spain). However, even if there are noise reductions in the low-frequency band, the spectral index \(\kappa \) can still become lower than for the uncorrected case, resulting in higher uncertainties. This becomes justified when looking at the relative changes in \(\kappa \) (Fig. S11), from the uncorrected to the AOH-corrected case. These show basically the same pattern as the relative change in sensitivity, depicted in Fig. 8. Figure 11 relates the dependence of change in trend sensitivity with respect to the change in \(\kappa \) in a scatter plot. The correlation for the slope fit is approximately 0.9. This means that changes in \(\kappa \) have the most significant impact on the trend uncertainty/sensitivity (rather than noise amplitude). Thus, small changes in \(\kappa \) that might result in under or overestimation of low-noise amplitudes will have a significant impact on the uncertainty/sensitivity of station velocity.

Additionally, in Fig. S13 we show the PSDs for three more example stations, with the time series depicted in Fig. S12 (VAAS: Southwestern Finland at the Baltic sea, SOFI: Sofia, Bulgaria, and SPT0: Southern Sweden), pointing out pertinent characteristics of stochastic models, in combination with loading corrections.

Fig. 11
figure 11

Relative change in \(\kappa \) versus relative change in trend sensitivity (uncorrected to AOH-corrected case). The correlation for the slope fit is approximately 0.9

3.2 The CME-corrected case

In the previous section, we applied AOH model corrections to reduce the noise in the daily displacement time series. Here, we investigate an empirical noise reduction approach by considering the common-mode errors following Dong et al. (2006). We performed a principal component analysis (PCA) on the residuals of the parametric fit to the time series (Eq. 1), using the BIC to decide on an ARIPLWN or PLWN model (cf. Fig. 7). To deal with missing time series data (approx. 33%), we computed the ensemble mean of the residuals available at each epoch for all stations, to fill in the missing epochs. Then we performed the PCA (consisting of standardization, covariance matrix computation, and eigenvalue decomposition) and reconstructed the data by multiplying the matrix with the transposed matrix of the first \(n_\textrm{comp}\) eigenvectors up to the third component. At each step we filled the original gaps with reconstructed data from the PCA. Finally, the reconstructed observations were corrected with \(n_\textrm{comp}\) principal components to form the CME signal. We subtracted the CME, consisting of the principal components up to order 3, from the original uncorrected observations, and repeated the time series analysis to obtain the CME-corrected parameter estimates (“CME3-corrected”). We chose \(n_\textrm{comp}=3\) since it accounted for the dominant regional effects of our data set, without overfitting. He et al. (2015) recommended a maximum of \(n_\textrm{comp}\) = 4, the first three principal components were also used by Serpelloni et al. (2013).

Fig. 12
figure 12

Results for the average power spectral densities (PSDs) of the GNSS noise, computed for all stations, comparing the performance of the PLWN and the AR1PLWN stochastic model of the uncorrected time to the AOH- and CME-corrected approaches

Figure 12 (cf. Fig. 5) compares the average PSDs of the uncorrected case, the AOH-corrected case, and the CME3-corrected case. The CME time series for the first three principal components and its PSDs are shown in Figs. S14 and S15. We discuss implications of CME, as well as the choice of principal components, in Sect. 3.4.

Fig. 13
figure 13

Top panels: BIC model choice between PLWN and RWFNWN for the AOH-corrected case (left) and the CME3-corrected case (right). Bottom panel: histogram comparing the trend sensitivity for the AOH-corrected and the CME3-corrected data (under BIC)

For the CME3-corrected case, we see a further reduction of noise across all frequency bands compared to the AOH-corrected time series. This implies that (1) either the loading models still do not capture all common-mode signals present in the GNSS data, and/or (2) other significant common-mode signals (e.g., far field loadings, remaining common GNSS errors such as orbit errors, reference frame alignment issues, or residual tropospheric effects) are still present in the data. However, we also observe that at low and high frequencies there are smaller signal reductions than in the middle frequency band, when being compared to the AOH-corrected residual PSDs. This causes an increase of the spectrum’s curvature (concave upward bend) and makes the noise appear as a combination of random walk, flicker, and white noise (RWFNWN) processes (e.g., Langbein and Svarc 2019; Gobron et al. 2022). Compared to the estimated PSD, it can be seen that the model PSD of the PLWN model (solid purple line) underestimates at low frequencies, and over-estimates in the middle frequency range. Albeit to a lesser extent, such an effect can also be seen for the AOH-corrected case. To further localize this behavior spatially (i.e., station-wise), we also repeated the BIC analysis, but this time for the AOH-corrected and CME3-corrected data, and with the choice of PLWN and the RWFNWN model combination. By introducing the RWFNWN stochastic model, upward bending effects toward low frequencies, visible in the AOH-corrected and CME3-corrected spectrum, appears to be better captured. As a comparison, Langbein and Svarc (2019) evaluated monument noise of about 740 Western U.S. GNSS stations and also recommended the use of RWFNWN stochastic model. The results for the AOH-corrected stochastic model choice are depicted in Fig. 13, top-left panel. For 21 stations, the RWFNWN model is preferred over the PLWN model (223 stations). This choice does not significantly change median sensitivity for the AOH-corrected case (compare 0.51 mm/year in histogram of Fig. 7 to 0.49 mm/year in Fig. 13); however, the RWFNWN contributions yield higher uncertainties for a small set of stations, skewing the histogram somewhat more to the right. This results from an increase in uncertainty as a consequence of the RW noise contribution.

For the AOH-corrected case, Fig. 14 (top panel) compares the stochastic model PSDs with the estimated PSDs, for both the PLWN (green curves), and the RWFNWN cases (purple curves). In the blue-marked stations shown in the top-left panel of Fig. 13, the PLWN model gives a very good fit. For the RWFNWN stations, low-frequency portions are somewhat underestimated (mostly in magnitude). Surprisingly, for the CME3-corrected case, BIC favors the PLWN model (Fig. 13, top-right panel).

Fig. 14
figure 14

Resulting PSDs for the BIC model choice between the PLWN and RWFNWN model for the AOH-corrected (top panel) and the CME3-corrected cases (bottom panel)

The combined effect of noise reduction and PLWN model choice yields a median sensitivity of 0.37 mm/year for the CME3-corrected case. The PSDs shown in the bottom panel of Fig. 14 show adequately fitting PLWN models. However, it can also be seen that PLWN underestimates the low-frequency portions of the noise. When comparing PLWN and RWFNWN, we observe that the RWFNWN model, similar to the AOH-corrected case, underestimates low-frequency noise. However, as for the AOH-corrected case, the RWFNWN model was chosen for 16 stations only, so there is a limited significance when interpreting these spectra. Nevertheless, although some underestimation at the low frequencies is still present, both for the AOH- and CME3-corrected case the BIC decision between PLWN and RWFNWN models yields reasonable results, with a median sensitivity of 0.37 mm/year and 0.49 mm/year for the CME3- and the AOH-corrected case, respectively. In view of the noise underestimation of PLWN for the CME3-corrected case, the sensitivities might still be somewhat too optimistic. The issue of stochastic model underestimation is further analyzed in Sect. 3.3.

3.3 The effect of offsets and restricted maximum likelihood estimation (RMLE)

Williams (2003b) demonstrated that offsets, which are not accounted for in the functional model bias both the other trajectory parameters (such as station velocity), and the stochastic parameters (spectral index and magnitude), resulting in a RW-type noise spectrum. On the other hand, if offsets are introduced and warranted, their presence in the functional model removes variance at the low-frequency parts of the noise spectrum, and thus tends to promote GGM+WN-type spectral characteristics (showing a flattening at low frequencies), instead of PLWN or PLWN+RW (Santamaría-Gómez and Ray 2021). In this case, the velocity uncertainties are potentially underestimated. Furthermore, the absorption of variance by the estimated functional model parameters is not accounted for by conventional MLE: the log-likelihood is a function of the estimated residuals \(\hat{{\textbf{z}}}\), and tends to promote the stochastic model under the influence of offsets (and other estimated parameters of the functional model), rather than the true one. This also results a preference for GGM+WN instead of PLWN(+RW) noise behavior, and effects become particularly striking the more offsets are present in the functional model (Gobron et al. 2022). Thus, we evaluate restricted maximum likelihood estimation (RMLE) (Patterson and Thompson 1971; Koch 1986) for GNSS trajectory model parameter estimation. RMLE accounts for the underestimation of stochastic model parameters and station velocity uncertainty, which is especially significant when offsets are present. Based on a linear transformation of the observations, an expression for the misclosure is introduced in the log-likelihood function, which accounts for the loss of variance introduced by conventional MLE, and thus tends to be closer to the true stochastic model. RMLE more fully accounts for the correlations between the functional model and the low-frequency noise, thus resulting in a more realistic estimate of parameter uncertainties (Gobron et al. 2022). In our European data sets, we first set a total of 707 offsets based on the cataloged offsets and visual inspection of the time series. Applying the significance test mentioned in Sect. 2.3, the number of warranted offsets were reduced to 344. With 4452 years of displacement time series, this corresponds to 1 offset every 8 years. Using the AOH-corrected data, about 30% more offsets were detected to be significant than by using the uncorrected data. This clearly shows the benefit for offset detection, achieved by removing correlated noise.

Fig. 15
figure 15

Results for the trend sensitivity of the uncorrected and the AOH-corrected case (under RMLE). Top panel: histogram comparing the results. Bottom panel: average PSDs for the stations with improvement in sensitivity. The semi-transparent line indicates the model PSDs obtained with MLE

For the station velocity sensitivity examples shown in the previous sections, we observe a certain degree of underfitting of the stochastic models at low frequencies, especially for the case of the CME3-corrected data, and, to lesser extent, for the AOH-corrected case. Here we redo BIC comparison of the uncorrected and AOH-corrected case (Figs. 6, 7, and 8) using RMLE.

Figure 15 shows the resulting histogram (top panel). In comparison to the non-RMLE case (Fig. 6), we see a change in median sensitivity from 0.59 to 0.75 mm/year for the uncorrected case, and from 0.51 to 0.59 mm/year for the AOH-corrected case. Furthermore, when comparing the corresponding histograms, a stronger tail of the RMLE case sensitivities can be seen. The bottom panel shows the average PSDs for the stations that experience an improvement in sensitivity (2/3 of the stations in total). As a consequence of significant drop in noise performance for a handful of stations, we computed the model PSD spectrum (orange) for the uncorrected case using the median of the noise parameters (also visible in form of a tail for the histogram of Fig. 15). Unlike conventional MLE, RMLE considers the loss of variance at the low frequencies (caused by the offsets) in the stochastic model and avoids underestimation of the parameter uncertainties due to long-period noise. As a consequence, we obtain more realistic sensitivities, with a change of 21% in median for the uncorrected case, and about 14% for the AOH-corrected case.

Fig. 16
figure 16

Top panels: model choice between PLWN and RWFNWN for the AOH-corrected case (left) and the CME3-corrected case (right), for the case of RMLE. Bottom panel: histogram comparing the trend sensitivity for the AOH-corrected and the CME3-corrected data, for the case of RMLE (under BIC)

Fig. 17
figure 17

Resulting PSDs for the RMLE case: BIC model choice between the PLWN and RWFNWN model for both the AOH-corrected case (top panel) and the CME3-corrected case (bottom panel). The semi-transparent line indicates the model PSDs obtained with MLE

We also perform the RMLE analysis for the AOH and CME3-corrected cases (Figs. 13 and 14). Again, the BIC model choice was applied to choose between the PLWN and RWFNWN models (based on a pre-analysis of the average PSDs). The results for the station choice are depicted in Fig. 16: top-left panel shows the results for the AOH-corrected dataset, and the top-right panel shows the results for the CME3-corrected dataset. It is obvious that now significantly more stations have been chosen to match the RWFNWN model than the non-RMLE case. The histogram in the bottom panel shows the comparison of trend sensitivities. Under RMLE, the AOH-corrected case shows a median sensitivity of 0.62 mm/year (0.49 mm/year for the MLE case, sensitivity decrease of about 27% in median), compared to 0.48 mm/year for the CME3-corrected case (0.37 mm/year for the MLE case, decrease by about 30%). Again, the chosen stochastic models are validated by comparing against the observed noise PSDs, shown in Fig. 17. For both, the AOH-corrected case and the CME3-corrected case, for PLWN and the RWFNWN, we can see a clear improvement for the estimation at the lower frequencies by using RMLE (the MLE case is shown by the semi-transparent curves). On the one hand, this is a result of the bias reduction from RMLE; on the other hand, more RWFNWN stations are available, giving more reliable results, on average.

Table 1 Summary of the median station velocity sensitivities (in mm/year) derived for the different BIC-guided stochastic model combinations, for the uncorrected, the loading-corrected, and the CME-corrected case
Table 2 Summary of the median station velocity uncertainties (1-sigma, in mm/year)

3.4 Discussion

Table 1 consolidates trend sensitivity medians derived for the different stochastic model combinations (rows), for the uncorrected, the loading-corrected, and the CME-corrected cases. It also summarizes the relative changes. It can be seen that the improvement in trend sensitivity from the uncorrected to the loading-corrected case is about 14%, and the improvement from the AOH to the CME-corrected case is about 24%. The latter may be a bit too optimistic, since we still observe some underestimation at long periods (cf. Fig. 14). Generally, we notice a shift toward decreasing trend sensitivities by switching from MLE to RMLE (about 21% to 30%). The reason is that—contrary to conventional MLE—RMLE better accounts for the offsets-induced loss in long-period noise variance, which impacts station velocity uncertainty. Under the presence of offsets, RMLE gives a better fit at the lower frequencies, which is crucial for obtaining reliable velocity uncertainty estimates (Santamaría-Gómez and Ray 2021; Gobron et al. 2022). Compared to Gobron et al. (2021), we also add an HYDL correction, which tends to reduce on average the long-period noise (cf. Fig. S5).

Table 2 additionally shows 1-sigma uncertainties for the station velocities. Rates of change among the different model evaluations are the same as shown in Table 1. The difference between these and the velocity sensitivity is the scaling by the non-centrality parameter (3.42 in this case.)

For the AOH-corrected case (under RMLE, AR1PLWN/PLWN), we obtain median values of spectral index \(\kappa = -\,0.93\) and \(\sigma _\textrm{PL}\) = 7.5 mm/year\(^{-\kappa /4}\). So, the spectral indices are close to flicker noise (\(\kappa \) = − 1). We use Eq. (32) of Bos et al. (2008) to obtain an analytical expression for trend uncertainty, given the sampling interval \(\Delta T\) (in years, 1/365.25 for our case) and the number of observations N (6664 samples for the average time series length of 18.2 years). Given these values, we obtain a trend uncertainty (1 sigma) of 0.15 mm/year, and a trend sensitivity of 0.52 mm/year. This is in agreement with the values obtained in our analysis (Tables 1 and 2). Given the same parameter values, an average time series length of about 27 years of daily observations (10,000 samples) would be necessary to reach a 1-sigma trend uncertainty of 0.1 mm/year (sensitivity of 0.34 mm/year).

Fig. 18
figure 18

Estimated trends (in mm/year) for the uncorrected and the AOH-corrected case (top-left and top-right panel). The statistically significant trends are shown in the bottom panels

An effect that still needs to be investigated is nonlinear trends in the HYDL models over Spain. A potential reason for this could be long-term trends in the ECCO2 model, resulting from the Boussinesq approximation (Ponte et al. 2007). Scaling the uncertainties in parametric analysis of GNSS displacement time series by the RMS fit is often performed as another way to obtain more realistic estimates of uncertainty and to identify unrepresentative stations. To point toward potential shortcomings of models, we discuss the RMS fits and RMS reductions to the GNSS time series residuals for the uncorrected, AOH-corrected, and CME1- to CME4-corrected (4 principal components) cases (Fig. S16). To summarize, we observe strong spatial correlations in RMS amplitude in the uncorrected case that is reduced for each of the applied corrections. This is reminiscent of spatial patterns in trend sensitivity attributed to NTAL loadings (Fig. 3). Additionally, the CME1 to CME3 correction indicate shortcomings of GNSS and physical models (such as troposphere, NTAL, and HYDL). It should also be mentioned, that, although dataset lengths was about 10 years shorter, Serpelloni et al. (2013) reported very similar CME patterns.

We also analyzed the magnitude and significance of the estimated station velocities (linear trends) of Eq. (1), given the most realistic stochastic models (under RMLE), for the uncorrected and the AOH-corrected case (Fig. 18). Generally, linear trends range from about − 25 mm/year (Spain) to 35 mm/year in Scandinavia, where they are caused by Glacial Isostatic Adjustment (GIA). Comparing the top-left and top-right panel, we observe a change in sign of the trend function (from negative to positive) for many stations in the Southwest of Europe, after loading corrections were introduced. This is mostly related to the nonlinear effects in HYDL corrections. Bottom panels show the significant velocities (\(\alpha \) = 1%), computed by means of Eq. (11). While trends in Scandinavia are significant for both cases, again, we see differences for Southwest of Europe: after applying loading corrections, many station velocities become insignificant (statistically zero). This is attributable to increased long-period noise properties (cf. Fig. S11), which lead—in combination with the stochastic model choice (RWFNWN over PLWN) (cf. Fig. 16)—to higher trend uncertainties. Other than that, we observe that the loading corrections have a positive effect on the detection of trends over Central and Eastern Europe. The trend with the smallest magnitude detected was 0.51 mm/year. Overall, significant trends are reported for 98 stations the uncorrected case, and 107 stations for the loading-corrected case, which gives an improvement of about 9% only. 133 significant trends could be revealed after applying the CME3 correction (improvement of 35%) (Fig. S17).

Fig. 19
figure 19

Histogram for the sensitivities of the annual amplitudes and the offsets shown in Fig. S19

3.5 Beyond linear trends: sensitivity of other parameters

Beside velocity sensitivity, we also analyzed the sensitivity of other key parameters of the trajectory model, including the amplitude of the annual periodic signal, and offsets (Eq. 1), with the same hypothesis test values for false positive and false negatives (Fig. 2). We also used BIC to choose between PWLN and RWFNWN models with RMLE (Fig. S18). When there were multiple offsets at a station, we only computed the sensitivity for the first one. We observe a 20–30% improvement in median sensitivities and significance levels for the annual amplitudes (from 1.11 to 0.86 mm) and offsets (from 5.46 to 4.33 mm) (Fig. 19). Furthermore, we observe similar results for the semi-annual and intercept terms (Eq. 1). The detectable amplitudes of GNSS-derived seasonal signals is of interest for climatological (sea level), hydrological and seismological studies (e.g., Nahmani et al. 2012; Argus et al. 2014; Amos et al. 2014). Chanard et al. (2020) also emphasized over-estimation of such signals by GNSS. Over-applying (unwarranted) offsets in the trajectory model (Eq. 1) is an important consideration for achieving the highest and most realistic sensitivity for the velocity parameters (Santamaría-Gómez and Ray 2021). For our European dataset by only estimating the significant offsets (344) rather than our full catalog of 707 potential offsets, the median sensitivity was improved by 10–15% with the better performing stations achieving more significant improvements (Fig. S19). Detection and application of offsets is also important for studies of their effects on functional and stochastic models (Williams 2003b; Gazeaux et al. 2013).

3.6 Conclusions

The geodetic community through the Global Geodetic Observing System (GGOS) has set a goal of estimating vertical land motion (VLM) of Earth’s surface with a precision approaching 0.1 mm/yr with respect to a well-defined global reference frame to study a range of geophysical and hydrological processes (e.g., Plag and Pearlman 2009). This is important to reliably resolve geophysical phenomena, such as sea-level rise (e.g., Wöppelmann and Marcos 2016). In this work, we assess the uncertainty and sensitivity in vertical velocities from a long record (up to 25 years) of daily displacement time series for a representative network of 244 high-precision GNSS stations in Europe. This network presents a best-case scenario in a region with limited non-secular motions and no significant earthquakes. We performed a parametric time series analysis to estimate vertical velocities, periodic terms, and offsets in the presence of colored noise—we investigated the most appropriate stochastic model for the least squares inversion of the GNSS data. We examined two methods to reduce the noise in the displacement time series: (1) correct the observed time series for the effects of non-tidal atmospheric (NTAL), oceanic loading (NTOL), and hydrological loading (HYDL) (sum: AOH) models (Dill and Dobslaw 2013) and (2) perform a principal component analysis of the post-fit residuals (Dong et al. 2006) to correct for common-mode errors up to the first three principal components (CME3). We tested several stochastic models for both the observed and corrected time series (AOH or CME) using the Bayes information criterion (BIC) to select the most appropriate model, using both conventional maximum likelihood estimation (MLE) (Bos et al. 2008; Williams 2008; Bos et al. 2020) and restricted maximum likelihood estimation (RMLE) (Gobron et al. 2022) to reduce low-frequency noise biases. Assuming a normal probability distribution for the postfit residuals and hypothesis testing with a typical level of significance for false positives of 1% (Type I error) and false negatives of 20% (Type II errors) (Teunissen and Kleusberg 1998; Koch 1999), we calculated parameter sensitivities (minimum detectable values). For the observed data (uncorrected), we augmented the commonly used power-law white noise (PLWN) model with an autoregressive model of first order (AR1) (Amiri-Simkooei et al. 2007) to account for the biasing effects of NTAL (Nistor et al. 2021; Klos et al. 2021; Gobron et al. 2021). The AR1PLWN model provides a realistic baseline to compare the effects of the AOH and CME corrections. Under MLE, the median velocity sensitivity for the uncorrected data is 0.59 mm/yr and changes to 0.75 mm/yr for RMLE. Under MLE, median velocity sensitivity is improved to 0.51 mm/yr for AOH corrections compared to 0.37 mm/yr for CME3 corrections (three principal components). However, under RMLE, this sensitivity is degraded to 0.62 mm/yr for AOH and 0.48 mm/yr for CME3. After applying the corrections, we report improvement rates ranging from 14 to 24% with respect to the uncorrected case. Overall, we recommend the use of RMLE, and the assessment of proper stochastic models by station. We find that the the preferred model for assigning the most realistic uncertainties and sensitivities is RWFNWN and PLWN. Further, CME corrections (we chose CME3) tend to provide improved uncertainties compared to AOH corrections. The median value for the spectral indices is close (\(\kappa \) = \(-\) 0.9) to flicker noise (\(\kappa \) = \(-\) 1) agreeing with numerous earlier studies. Using an analytical expression for trend uncertainties (1-sigma) (Bos et al. 2008) yields a value of about 0.15–0.17 mm/yr for the European data (average length 18.2 years, 6648 samples), about 3 times less the velocity sensitivity. Roughly, given the same stochastic models and data corrections, it would take about 27 years of daily displacement time series (on average) to achieve the GGOS goal of 0.1 mm/yr (1-sigma) and 0.34 mm/yr for sensitivity.

Future work should address limitations in AOH models that are assumed to be correct. For example, the HYDL model has been studied (e.g., Michel et al. 2021) and comparison with GRACE observations should be further conducted (e.g., Tregoning and Watson 2009; Fu et al. 2012; White et al. 2022). The choice of the optimal functional model is of course critical. Based on the best available models, we also analyzed the significance of the estimated trends and found that—from 244 analyzed stations in total—about 100 to 130 linear station trends are significant, with the highest number of detections for the CME3-corrected case. We conclude that, as a consequence of refined stochastic model calibration and the potential discrepancy of GNSS data and loading models, the application of loading corrections yields less improvement in GNSS sensitivity to VLM than reported in earlier studies. We investigated the incorporation of offsets due primarily to station-specific artifacts (rather than coseismic offsets for our data set), the topic of earlier studies (Santamaría-Gómez and Ray 2021; Gobron et al. 2022; Gazeaux et al. 2013). It is important to set only the significant offsets—often too many offsets are set as indicated in the offset catalog that we used from other time series analyses. We reduced the number of offsets by about 50% resulting in a 10–15% improvement in velocity uncertainty and sensitivity (beside an improvement of offset detection rate of about 30%). We validated the chosen offsets through statistical tests and by visual inspection. In this regard, machine learning models could be developed to detect offsets, trained by the availability of 30-year displacement time series that have gone through extensive quality control and artifact validation.