1 Introduction

All realizations of the European Vertical Reference System (EVRS) (Ihde et al. 2002, 2008) computed so far are solely based on data from the Unified European Leveling Network (UELN). The UELN dataset comprises geopotential differences between height benchmarks (HBMs) obtained by spirit leveling and gravimetry. The data are provided by all participating countries to the Federal Agency for Cartography and Geodesy (BKG) which is in charge of realizing the EVRS. As discussed in Afrasteh et al. (2021), the exclusive use of spirit leveling/gravimetry imposes limitations on the coverage of the European Vertical Reference Frame (EVRF). Indeed, spirit leveling cannot be used to cross large water bodies. Consequently, the EVRF does neither cover islands not connected to the European mainland leveling network through bridges or tunnels (e.g., Ireland), nor offshore platforms. In addition, the inability of spirit leveling to cross large water bodies reduces the strength of the leveling network in coastal countries. This, in turn, limits the precision and reliability of the computed realization. Apart from this, it should be noted that spirit leveling is prone to systematic errors (Vaníček et al. 1980). Leveling errors accumulate over long distances and may introduce slopes in the height system realizations (e.g., Wang et al. 2012). As for the countries that participate in the UELN project, systematic errors are known to be present in the Belgian (Slobbe et al. 2019), British (Penna et al. 2013), and IGN69 French leveling networks (Sacher and Liebsch 2019).

One way to overcome the limitations in coverage and accuracy of the EVRF is offered by the so-called gravity field approach (Rülke et al. 2012), which has often been solved as a geodetic boundary value problem (e.g., Rummel and Teunissen 1988; Heck and Rummel 1990; Amos and Featherstone 2008; Gerlach and Rummel 2012; Amjadiparvar et al. 2015; Sánchez and Sideris 2017). Rülke et al. (2012) applied the approach to determine the height reference frame offsets among the European national height systems. In doing so, GNSS heights and different quasi-geoid models were exploited while the EVRF2007 was used as a reference. The offsets were estimated with an accuracy of about 5 cm for most countries. Wu et al. (2018) assessed the potential of using clock networks in height datum unification (Bjerhammar 1985) by simulations using the EUVN (European Unified Vertical Network) (Ihde et al. 2000) as a prior. Although their results are very promising, no clock network is operational yet.

Afrasteh et al. (2021) proposed to overcome the limitations in coverage and accuracy of the EVRF imposed by the exclusive use of spirit leveling/gravimetry by combining the UELN dataset with model-based hydrodynamic leveling data. Model-based hydrodynamic leveling is introduced by Slobbe et al. (2018) as an efficient and flexible alternative method to connect islands and offshore platforms to the height system on land. The method, which is independent from any other method, uses a regional, high-resolution hydrodynamic model to derive mean water level (MWL) differences between tide gauges. By converting the MWL differences to geopotential differences and adding them to the geopotential differences between the HBMs and the MWLs at the tide gauges at both ends of the link, we establish a so-called hydrodynamic leveling connection. Note that the period over which the water level is averaged is arbitrary. Slobbe et al. (2018) showed, however, that higher accuracy can be obtained by choosing a multi-year period and by only averaging over the summer months (the latter avoids the storm surges period). Therefore we use throughout the paper the term summer mean water levels (SMWLs), which refers to the average water level calculated over all May to September months of one or more years.

Assuming the availability of a series of hydrodynamic models each covering a part of the European waters and providing uncorrelated SMWLs at the tide gauges with a uniform variance of 4.5 \(\hbox {cm}^2\), Afrasteh et al. (2021) showed that combining UELN data with model-based hydrodynamic leveling data improves the median of the propagated height standard deviations by \(38\%\) compared to a solution that uses only spirit leveling/gravimetry. A variance of 4.5 \(\hbox {cm}^2\) for the model-derived SMWL corresponds to a standard deviation of 3 cm for each hydrodynamic leveling connection (to establish a connection, we take the difference between two model-derived SMWLs). If the variance is 12.5 \(\hbox {cm}^2\) (corresponding to a standard deviation of 5 cm for each hydrodynamic leveling connection), the reported improvement is still \(29\%\). They also found increased redundancy numbers for leveling observations close to the coastlines. Afrasteh et al. (2023) reassessed the potential impact of hydrodynamic leveling data on the quality of the EVRF using an empirical noise model for the model-derived SMWLs allowing to include error correlations. Their results suggest an improvement up to \(25\%\) if the water levels are averaged over three summer periods.

Both Afrasteh et al. (2021) and Afrasteh et al. (2023) assessed the impact of hydrodynamic leveling data on the quality of the EVRF through geodetic network analyzes, using only (assumed) stochastic information of the model-based hydrodynamic leveling data. This study goes further because it uses real model-based hydrodynamic leveling data between tide gauges in the Baltic Sea and the northwest European continental shelf, including the North Sea and Wadden Sea. The main objective is to demonstrate the impact of adding these data on the EVRF. The required model-derived SMWLs are obtained from the Nemo-Nordic hydrodynamic model (Hordoir et al. 2015, 2019) and the 3D DCSM-FM (Zijl et al. 2020), which cover the Baltic Sea and the northwest European continental shelf, respectively. The water levels are averaged over three successive summer periods that are the same per region. The reason for using three summer periods is that this is the maximum averaging period for which a noise model (needed to build the full noise variance-covariance matrix of the dataset) is available (Afrasteh et al. 2023).

We find it important to mention that the impact is a provisional impact and that we are not presenting a new official realization of the EVRS that replaces the latest release EVRF2019. The first follows from the fact that our study is subject to the following limitations. First, in establishing the hydrodynamic leveling connections no attempt was made to connect the tide gauges to the UELN by means of real levelings. Instead, we computed the potential differences from adjusted heights obtained during national height system realizations. Second, we made no attempt to reduce the potential differences to the reference epoch adopted for the EVRS (i.e., epoch 2000.0). Doing so requires information about the long-term sea level variability and vertical land motion at the locations of the involved tide gauges. For most tide gauges, these data are not available or there is a mismatch between the period for which the data are available and the period used in this study. Third, we took the used hydrodynamic models as is. That is, we made no attempt to improve their performance in representing the local MWL at the tide gauge locations (e.g., by improving the (local) spatial resolution of the hydrodynamic models). Finally, no validation is possible due to a lack of accurate control data. However, for those countries showing a large impact we assessed the agreement of the estimated heights with normal heights obtained by differencing GNSS heights and quasi-geoid heights obtained from the European gravimetric quasi-geoid model EGG2015 (Denker 2015).

The paper is organized as follows. Section 2 describes the methodology used to (i) compute the hydrodynamic leveling data, (ii) define the set of connections added to the adjustment, (iii) conduct the adjustment and variance component estimation, and iv) assess the impact. Section 3 introduces the data sets used in this study. In Sect. 4, we present, analyze, and discuss all results. Finally, we conclude by summarizing the main findings of the paper in Sect. 5.

2 Methodology

2.1 Functional and stochastic model of the hydrodynamic leveling dataset

In a rigorous implementation of model-based hydrodynamic leveling for the realization of the EVRS, the observation equation of the geopotential difference (\(\varDelta {W}\)) between two UELN height benchmarks (HBMs), one connected to tide gauge i and the other to tide gauge j, reads

$$\begin{aligned}&\varDelta {W}^{{\tiny \textrm{HBM}}_j}_{{\tiny \textrm{HBM}}_i} = \varDelta {W}^{{\tiny \textrm{TGZ}}_i}_{{\tiny \textrm{HBM}}_i} + \varDelta {W}^{\overline{{\tiny \textrm{SMWL}}}_i}_{{\tiny \textrm{TGZ}}_i} \nonumber \\&\quad + \varDelta {W}^{\widetilde{{\tiny \textrm{SMWL}}}_j}_{\widetilde{{\tiny \textrm{SMWL}}}_i} + \varDelta {W}^{{\tiny \textrm{TGZ}}_j}_{\overline{{\tiny \textrm{SMWL}}}_j} + \varDelta {W}^{{\tiny \textrm{HBM}}_j}_{{\tiny \textrm{TGZ}}_j}. \end{aligned}$$
(1)

Here, TGZ stands for tide gauge zero, and \(\overline{{\tiny \textrm{SMWL}}}\) and \(\widetilde{{\tiny \textrm{SMWL}}}\) are the observation- and model-derived SMWLs, respectively. Note that it is assumed that all data have been reduced to the reference epoch adopted for the EVRS (epoch 2000.0). Also note that the hydrodynamic models assume a constant gravity acceleration, which we also used to compute \(\varDelta {W}^{\widetilde{{\tiny \textrm{SMWL}}}_j}_{\widetilde{{\tiny \textrm{SMWL}}}_i}\) from the model-derived SMWLs. The adjective ‘rigorous’ refers to an implementation in which the first and last terms in Eq. (1) are determined using spirit leveling and gravimetry. In the context of this project, acquiring these measurements is not realistic and feasible. In addition, none of the available tide gauge records uses TGZ as the vertical reference. The water levels are typically expressed relative to the national height datum (NHD) or chart datum. The metadata to transform the water levels back into water levels with respect to TGZ is missing. Hence, we cannot compute the second and fourth terms in Eq. (1).

Therefore, we establish \(\varDelta {W}^{{\tiny \textrm{HBM}}_j}_{{\tiny \textrm{HBM}}_i}\) using

$$\begin{aligned}{} & {} \varDelta {W}^{{\tiny \textrm{HBM}}_j}_{{\tiny \textrm{HBM}}_i} = \left( \gamma _{{\tiny \textrm{HBM}}_i} {{\widehat{H}}}^{{\tiny \textrm{HBM}}_i}_{{\tiny \textrm{NHD}}_i} - \gamma _{{\tiny \textrm{TG}}_i}{{\widehat{H}}}^{\overline{{\tiny \textrm{SMWL}}}_i}_{{\tiny \textrm{NHD}}_i} \right) \nonumber \\{} & {} \quad + \varDelta {W}^{\widetilde{{\tiny \textrm{SMWL}}}_j}_{\widetilde{{\tiny \textrm{SMWL}}}_i} + \left( \gamma _{{\tiny \textrm{TG}}_j} {{\widehat{H}}}^{\overline{{\tiny \textrm{SMWL}}}_j}_{{\tiny \textrm{NHD}}_j} - \gamma _{{\tiny \textrm{HBM}}_j}{{\widehat{H}}}^{{\tiny \textrm{HBM}}_j}_{{\tiny \textrm{NHD}}_j} \right) , \end{aligned}$$
(2)

where \(\gamma _.\) is the GRS80 normal gravity (Moritz 2000), \({{\widehat{H}}}^{{\tiny \textrm{HBM}}_.}_{{\tiny \textrm{NHD}}_.}\) is the adjusted height of the nearest UELN benchmark with respect to the NHD, and \({{\widehat{H}}}^{\overline{{\tiny \textrm{SMWL}}}_.}_{{\tiny \textrm{NHD}}_.}\) is the height of the observation-derived SMWL expressed with respect to the NHD. Note that in computing the first and last term, we treat all adjusted heights as normal heights. However, Belgium and The Netherlands use an uncorrected leveled height system, Denmark an orthometric height system, and Great Britain an normal-orthometric height system (Federal Agency for Cartography and Geodesy 2022b). The ‘error’ introduced is believed to be insignificant; the coastal areas in Belgium, The Netherlands, and Denmark have minimal topographic variations, whereas according to Filmer et al. (2010) the difference between normal-orthometric and normal heights does not exceed the 2–3 cm level for Australia where the highest mountain is almost 900 m higher than the one in Great Britain. Apart from that, for a small distance between the HBM and tide gauge the error will partly cancel out in differencing \({{\widehat{H}}}^{{\tiny \textrm{HBM}}_.}_{{\tiny \textrm{NHD}}_.}\) and \({{\widehat{H}}}^{\overline{{\tiny \textrm{SMWL}}}_.}_{{\tiny \textrm{NHD}}_.}\) as the error is quite systematic. This also applies for any other systematic error present in the adjusted heights. Furthermore, note that \({{\widehat{H}}}^{\overline{{\tiny \textrm{SMWL}}}_.}_{{\tiny \textrm{NHD}}_.}\) is in fact the sum of three terms:

$$\begin{aligned} {{\widehat{H}}}^{\overline{{\tiny \textrm{SMWL}}}_.}_{{\tiny \textrm{NHD}}_.} = {{\widehat{H}}}^{{\tiny \textrm{TGBM}}_.}_{{\tiny \textrm{NHD}}_.} + {\varDelta {H}}^{{\tiny \textrm{TGZ}}_.}_{{\tiny \textrm{TGBM}}_.} + {H}^{\overline{{\tiny \textrm{SMWL}}}_.}_{{\tiny \textrm{TGZ}}_.}, \end{aligned}$$
(3)

where \({{\widehat{H}}}^{{\tiny \textrm{TGBM}}_.}_{{\tiny \textrm{NHD}}_.}\) is the adjusted height of the tide gauge benchmark (TGBM) with respect to the NHD, \({\varDelta {H}}^{{\tiny \textrm{TGZ}}_.}_{{\tiny \textrm{TGBM}}_.}\) is the leveled height difference between the TGBM and TGZ, and \({H}^{\overline{{\tiny \textrm{SMWL}}}_.}_{{\tiny \textrm{TGZ}}_.}\) the tide gauge derived SMWL expressed relative to TGZ. The TGBM is the HBM to which the TGZ is connected. Most of them are not part of the UELN.

The stochastic model of the hydrodynamic leveling dataset needs to account for the contributions of all three terms in Eq. (2). The contribution of the first and last term is mainly determined by the uncertainty in \({{\widehat{H}}}^{{\tiny \textrm{HBM}}_.}_{{\tiny \textrm{NHD}}_.}\) and \({{\widehat{H}}}^{{\tiny \textrm{TGBM}}_.}_{{\tiny \textrm{NHD}}_.}\) (Eq. (3)). Indeed, supposing the connections between the TGBMs and TGZs are established by spirit leveling with a precision of 0.5 mm/km (corresponding to the precision of first-order leveling (Bossler 1984)) and that the TGBM is typically close (within a few meters to a few kilometers) to the tide gauges, the contribution of \({\varDelta {H}}^{{\tiny \textrm{TGZ}}_.}_{{\tiny \textrm{TGBM}}_.}\) is likely below 1 mm in terms of standard deviation. Similarly, the uncertainty of the MWL computed over one month of sea level observations is already \(< 1\) mm based on a 10-minute sampling and assuming white noise with a standard deviation of 5 cm. As we lack the full variance-covariance matrices of the national height system realizations, we assume that the uncertainty in the first and last term of Eq. (2) is described by the precision of second-order leveling (Bossler 1984), i.e., 1.3 mm/km in terms of standard deviation. The distance between the HBM and TGBM is approximated by the ellipsoidal distance between the nearest UELN benchmark and the tide gauge location.

The full noise variance-covariance matrix associated with the middle term has been obtained using the noise models developed by Afrasteh et al. (2023). For the specific model being used, see Sect. 3.1.

2.2 Defining the set of hydrodynamic leveling connections

A key step in compiling the model-based hydrodynamic leveling dataset is to determine between which tide gauges connections should be established. Assuming N tide gauges are available, \(N-1\) independent hydrodynamic leveling connections (i.e., connections that do not form any closed circuit) can be established. For this, there are at most \(N^{N-2}\) possibilities (Afrasteh et al. 2021). Similar to Afrasteh et al. (2021, 2023), we use a heuristic search method that identifies the connections one-by-one. Each connection added to the final set is the one that provided the lowest median standard deviation of the adjusted heights. No connections are allowed between tide gauges i) located in different hydrodynamic model domains (i.e., one in the 3D DCSM-FM and the other in the Nemo-Nordic model domain), ii) located within the same country, and iii) located in neighboring countries for which the number of spirit leveling connections between the countries is larger than one. The first criterion stems from the fact that there may be a bias between the water levels obtained from the different hydrodynamic models. The last two criteria are intended to reduce the computational load of the search method. Since we are using real data in this study, there is one additional preparatory step that will be described in the remainder of this section.

As noted in Sect. 1, the SMWL differences will be determined over three consecutive summer periods that are the same per region. The latter reduces potential time-dependent, large-scale errors in the modeled water levels (they cancel out in computing the difference) and simplifies the required noise model (Afrasteh et al. 2023). Model-derived water levels and tide gauge records are available from 1997–2019 and 2017–2021 for 3D DCSM-FM and Nemo-Nordic, respectively. Hence, we need to select which three successive years for each of these time spans will be used. In this choice, the following considerations were taken into account:

  • The time difference between the center epoch of the 3-year period and the reference epoch of the EVRS (i.e., 2000.0)—Ideally, there is no time difference. The greater the difference, the greater the impact of relative differences in long-term sea level variations and vertical land motion at the two involved tide gauge locations.

  • Performance of the hydrodynamic model—Afrasteh et al. (2023) showed that for all three hydrodynamic models examined in their study, the performance to represent the SMWLs varies over space and time. As such, it makes sense to select a period in which the model shows a better performance.

  • Distribution of the tide gauges—The tide gauges are not homogeneously distributed along the coastline. In addition, the distribution is time-dependent as in many cases the tide gauge records are not full (some tide gauges expired or became available later, and the records may contain gaps). There is no objective way to define the ‘best’ possible distribution. In this study, we use the maximum sea distance (Afrasteh et al. 2023, Sect. 2.1.4) between two adjacent tide gauges. More specifically, we minimize the weighted sum of the maximum ‘sea distance’ between two adjacent tide gauges per country, where the weighting is determined by the length of the country’s coastline.

The criteria will be applied to the sets of tide gauges available per domain to identify what is referred to as the ‘most preferred’ and ‘least preferred’ time spans. These are subsequently used in Experiments I and II to assess the impact of model-based hydrodynamic leveling on the EVRF. The comparison of both experiments allows to evaluate the importance of the time span selection.

2.3 Height network adjustment and data weighting

The height network adjustment is conducted using weighted least-squares, see Afrasteh et al. (2021) for the equations. Similar to the approach followed in computing EVRF2019 (Sacher and Liebsch 2019), the datum defect is solved by adding the minimal constraint that for 12 datum points the sum of the height changes is zero. The height of the 12 datum points are obtained from the EVRF2019 adjustment. Hence, the datum of the computed EVRF is the same as the one for the EVRF2019.

After determining which hydrodynamic leveling connections will be added, we conduct the height network adjustment and estimate variance factors for the UELN dataset and each model-based hydrodynamic leveling dataset. In doing so, we used the iterative minimum norm quadratic unbiased estimator (Rao 1971). The iteration was terminated when the relative change of successive variance factors for all observation groups was smaller than \(10^{-4}\). Note that each of the observation groups that make up the UELN dataset are scaled with the variance factor estimated by the BKG in computing the EVRF2019 (Sacher and Liebsch 2019, Table 3).

2.4 Impact assessment

The impact of hydrodynamic leveling data on the EVRF is assessed by comparing the estimated geopotential numbers to the ones computed using UELN data only. If the differences for a country show a variability larger than 10 kgal mm, we will assess the agreement of the solution with normal heights obtained by differencing GNSS and quasi-geoid heights. In doing so, we interpolate the differences between the adjusted EVRF heights and the heights expressed relative to the NHD to the GNSS data points. By adding the differences to the physical height of the GNSS data points, we obtain their EVRF heights. These are compared with the differences between the GNSS and quasi-geoid heights. Note that the comparison is conducted in the zero-tide system. As the UELN data were provided in the mean-tide system (Sect. 3.3), we apply the following transformation (Federal Agency for Cartography and Geodesy 2022a):

$$\begin{aligned}{} & {} C_{{\tiny \textrm{zero}}} = C_{{\tiny \textrm{mean}}} - 0.28841\sin ^2(\phi ) \nonumber \\{} & {} \quad - 0.00195\sin ^4(\phi ) + 0.09722 + 0.08432, \end{aligned}$$
(4)

where \(C_{{\tiny \textrm{zero}}}\) and \(C_{{\tiny \textrm{mean}}}\) are the geopotential numbers in respectively the zero- and mean-tide system, and \(\phi \) is the geodetic latitude.

Apart from reporting statistics of the differences, including the median and standard deviation, we also assess the magnitude of trends in the differences in east–west and/or south-north directions. Here, the standard deviation was estimated as \(1.4826\ \times \) the median absolute deviation (Cook and Weisberg 1982; Rousseeuw and Croux 1993). In case we estimate the trend in both directions, the magnitudes are estimated by fitting a plane through the differences. When estimating the south-north slope only, we fit a linear trend and intercept term to the differences as a function of latitude.

Table 1 Number of observations and variance factor for each observation group of the UELN dataset

3 Data

3.1 Model-derived water level time series

In this study, two reanalysis products generated by two different hydrodynamic models have been used; one for the northwest European continental shelf generated with the 3D Dutch Continental Shelf Model – Flexible Mesh (3D DCSM-FM) (Zijl et al. 2020), and one for the Baltic Sea generated with the Nemo-Nordic model (Nemo-Nordic NS01) (Hordoir et al. 2019), which is based on version 3.6 of the Nucleus for European Modelling of the Ocean (NEMO) model code (Madec et al. 2016) and 3DEnVar data assimilation method (Axell, L. and Liu, Y. 2016). Some alternative products covering our area of interest are publicly available via https://data.marine.copernicus.eu/products. Key requirement is that the models used to generate them included all relevant physics contributing to the MWL variability.

The 3D DCSM-FM reanalysis is the same as used by Afrasteh et al. (2023). It is obtained using the Delft3D Flexible Mesh software framework that allows for the use of unstructured grids. For this study, we have used software version 2.17.05.72090. The 3D DCSM-FM is a three-dimensional hydrodynamic model with a grid resolution that varies between 0.5 and 4.0 nautical miles. It covers the area between \(15^\circ \) W to \(13^\circ \) E and \(43^\circ \) N to \(64^\circ \) N. The water level time series are generated for the period 1997–2019. For more details about the model setup and forcing data, we refer to Zijl et al. (2020) and (Afrasteh et al. 2023] [Table 1). Note that the model is actively maintained and improved. One known issue in the version of the model used in this study is an apparent strong vertical circulation between the bottom and the pycnocline in the deep ocean originating from instabilities close to the open boundaries. This results in a less accurate representation of the MWL in deep ocean waters. Detailed investigations by the model developers showed that the issue has no impact on the shelf. A thorough assessment of 3D DCSM-FM’s ability to represent the MWLs computed over one summer period (referred to as the ‘1-SMWLs’) is given by Afrasteh et al. (2023). Their results show that the performance varies over space and time. With regard to the latter, they noticed improved performance in the period 2004–2011, although they had no explanation for it.

The Nemo-Nordic model, developed by the Swedish Meteorological and Hydrological Institutes, is a three-dimensional coupled ocean-sea ice model. The model domain covers both the Baltic and the North Sea (i.e., it ranges from \(4.15278^\circ \) W to \(30.1802^\circ \) E and \(48.4917^\circ \) N to \(65.8914^\circ \) N). In this study, however, we only use the water level time series for the Baltic area. The model is based on the NEMO ocean engine version 3.6. For a detailed description of the model setup and forcing data, as well as a validation of the water level time series, we refer to Jahanmard et al. (2021, 2022). Note that the hourly water levels are exported on grids with a horizontal resolution of 1 nautical mile. The reanalysis period ranges from 2017 to 2021.

The method applied to compute the SMWLs over three successive summer periods, referred to as 3-SMWLs, is described by Afrasteh et al. (2023). For the 3D DCSM-FM dataset, a noise model was computed by Afrasteh et al. (2023), which has been exploited in this study. For the Nemo-Nordic dataset, we used the noise model developed for the Forecasting Ocean Assimilation Model 7 km Atlantic Margin model (AMM7) (Tonani and Ascione 2021). The only motivation we have for using this noise model is that the AMM7 hydrodynamic model, similar to the Nemo-Nordic model, relies on the Nucleus for European Modeling of the Ocean model code (Madec et al. 2016). Unfortunately, we lack the required long-time series of observation- and model-derived water levels to develop an empirical noise model specific to this dataset. The lack of a tailored noise model for the Nemo-Nordic dataset is among the reasons why variance component estimation is conducted.

3.2 Observation-derived water level time series

Tide gauge data were obtained from the national authorities in all coastal countries included in the two hydrodynamic model domains, except Spain and Ireland (3D DCSM-FM), and Lithuania and Russia (Nemo-Nordic). Spanish tide gauge data were not included in our database because we did not expect a good performance due to the model issue mentioned in Sect 3.1. For Ireland, we lack spirit leveling data. Regarding Lithuania and Russia, no data were available. The time span of the records is consistent with the time span of the reanalyses; 1997–2019 (3D DCSM-FM) and 2017–2021 (Nemo-Nordic). For the 3D DCSM-FM, the tide gauge dataset is similar to the one used and described by Afrasteh et al. (2023). It includes about 200 records. Note, however, that most of these records do not cover the entire reanalysis time span. For the Nemo-Nordic domain, about 50 tide gauge records were available. The records were used to compute monthly MWL time series by means of a harmonic analysis using UTide (Codiga 2020).

All water levels use the NHD as the vertical reference. For the tide gauges inside the Nemo-Nordic domain, a correction for the vertical land motion induced by glacial isostatic adjustment has been applied. The correction was computed using the regional land uplift model NKG2016LU_abs (Vestøl et al. 2019). It reduces all water levels to the reference epoch 2000.0. Note that this reference epoch is consistent with the one adopted for the EVRS.

A tide gauge is only considered as a potential candidate for a hydrodynamic leveling connection if it is i) outside the tidal flat areas, ii) connected to the NHD, and iii) within 40 km of the nearest UELN HBM. If multiple tide gauges were connected to the same UELN HBM, we used the one of which the median absolute deviation of the differences between the observation- and model-derived monthly MWLs was lowest. The 3-SMWLs were computed from the monthly MWL time series. Note that all monthly MWLs were excluded for which the value of the difference between the observation- and model-derived monthly MWLs exceeded the median of these differences plus/minus three times the standard deviation (estimated as before).

3.3 UELN data

The UELN data were provided by the BKG. The dataset comprises i) the geopotential differences between the UELN HBMs in the mean-tide system, ii) the a-priori variances for the geopotential differences, and iii) a variance factor for each observation group per country estimated by means of variance component estimation (see Table 1).

The dataset is, except for the following changes, the same as the one used to compute the EVRF2019 (release of September 2020). First, we included the data of the Third Geodetic Leveling in Great Britain acquired from 1951 to 1959. Second, we added a number of single connections (i.e., connections that do not form new loops) to HBMs for which GNSS data are available in Norway (61), Belgium (9), and Bulgaria (21). These were meant to facilitate a comparison with normal heights obtained by differencing GNSS heights and quasi-geoid heights obtained from the European gravimetric quasi-geoid model EGG2015. Note that the release of the EVRF2019 in September 2020 includes (i) a sign correction in the tidal correction for the Polish dataset and (ii) a number of minor updates to the data for Macedonia (error correction), Latvia, Lithuania, Italy (new nodal points), and Bulgaria (two border connections to Turkey have been included).

Fig. 1
figure 1

A zoom-in of the UELN, showing the connection between Great Britain and France through the Channel Tunnel

In computing the EVRF2019, only the datum shift between the Ordnance Datum Newlyn (ODN) and the EVRF was determined. This was done by adding four leveling connections between the HBMs ‘710137’ and ‘1300900’, ‘1300900’ and ‘1300385’, ‘1300385’ and ‘1300386’, and ‘1300386’ and ‘1300397’ to the French zero order leveling network (NIREF) dataset. With the exception of HBM ‘710137’, all mentioned HBMs are on British territory (see Fig. 1). The reason for determining only the datum shift stems from the fact that previous realizations of the EVRS showed a tilt relative to the ODN reference frame (Sacher and Liebsch 2019). This tilt is caused by systematic errors in the British leveling data (see Penna et al. (2013) and cited references). The ODN has been computed based on an adjustment of data from the Third Geodetic Leveling while fixing the results from the Second Geodetic Leveling (1912–1952). On the other hand, the BKG only had (and still has) access to data from the Third Geodetic Leveling. In the dataset we used, the four connections mentioned above are part of the British dataset. The variance factor for this data set has been set equal to 9.867 by the BKG, which means that these four connections are given a lower weight than in the computation of the EVRF2019 (the variance factor for the French NIREF data is 1.965).

Figure 2 shows the standard deviations of the UELN observations. Here, the variance factors presented in Table 1 are already applied.

Fig. 2
figure 2

Standard deviations of the UELN observations. Note that the variance factors presented in Table 1 have been applied

3.4 EUVN_DA data

The British and French GNSS data used in the impact assessment are part of the dataset assembled in the European Vertical Reference Network—Densification Action (EUVN_DA) project (Kenyeres et al. 2010). The British dataset comprises 181 data points and the French dataset 164 points. The dataset comprises ellipsoidal coordinates in the European Terrestrial Reference System 1989 (ETRS89) and leveled connections to UELN benchmarks. Kenyeres et al. (2010) did not provide details on when the specific datasets were acquired. It should be somewhere between 2003 and 2008. They also did not specify the length of the sessions over which GNSS data were acquired. They only stated that some countries, including France and Great Britain, “did not observe sessions of 24 hours, but submitted a denser database with a mean site separation close to 50 km.” As such, we expect that the accuracy of the ellipsoidal heights is slightly lower than the 1 cm target accuracy.

3.5 The European gravimetric quasi-geoid model EGG2015

The European gravimetric quasi-geoid model EGG2015 (Denker 2015) is the latest of a series of European quasi-geoid models computed by the Institute of Geodesy at the Leibniz University of Hanover. The model was computed from surface gravity data in combination with topographic information, as well as the GOCO05S geopotential model (Mayer-Gürr 2015). Here, the remove-compute-restore technique was exploited. The estimated uncertainty in terms of standard deviation is 1.9 cm (Denker et al. 2018). Further details about the datasets, computational method, and uncertainty are provided by Denker (2013); Denker et al. (2018). Note that EGG2015 is in the zero-tide system. The normal heights obtained by differencing the GNSS and EGG2015 heights are referred to as the ‘GNSS/EGG2015 normal heights’.

4 Results and discussion

4.1 Experiment 0: The UELN-only solution

To quantify the impact of model-based hydrodynamic leveling data on the EVRF, we used a solution computed using UELN data only as a reference. This solution is referred to as the ‘UELN-only solution’. As we included the British leveling data as well as a few minor updates to some other observation groups of the UELN dataset (Sect. 3.3), in Experiment 0 we quantify the differences between this solution and the official EVRF2019 (release September 2020, available at https://evrs.bkg.bund.de/Subsites/EVRS/EN/EVRF2019/evrf2019.html).

The differences are significant at three HBMs. The largest difference is observed for HBM ‘1706226’ (Poland); it reaches 489.5 kgal  mm. A detailed investigation shows that the geopotential number associated with the official EVRF2019 is incorrect. The other two HBMs, i.e., ‘1300386’ and ‘1300397’, are located in Great Britain. In the computation of the EVRF2019, the leveling connections to these HBMs were included in the France NIREF dataset. For both HBMs, the difference equals 6.527 kgal  mm.

A quality assessment of the UELN-only/EVRF2019 solutions is out of the scope of this study. The solutions are known to be contaminated by systematic errors in various observation groups, including but not limited to Belgium (Slobbe et al. 2019), France (Duquenne et al. 2015), and Great Britain (Hipkin et al. 2004; Ziebart et al. 2008; Penna et al. 2013). When evaluating the impact of adding hydrodynamic leveling data, though, we included the UELN solution in the comparison with GNSS/EGG2015 normal heights.

4.2 Experiment I: Adding hydrodynamic leveling data computed over the most preferred time span

In Experiment I, we complemented the UELN dataset with hydrodynamic leveling connections between tide gauges in both the Baltic Sea (Nemo-Nordic dataset) and the northwest European continental shelf (3D DCSM-FM dataset). The 3-SMWLs were computed over the periods 2017–2019 and 2004–2006, respectively. Based on the criteria outlined in Sect. 2.2, these time spans are considered the most preferred ones. The number of available tide gauges is 50 (Nemo-Nordic) and 76 (3D DCSM-FM). We refer to Fig. 3, for a map showing the locations of the tide gauges. In total 124 hydrodynamic leveling connections were added; 49 in the Baltic Sea and 75 in the northwest European continental shelf. For an overview of which connections are being added we refer to Appendix A.

Fig. 3
figure 3

The locations of the available tide gauges inside the 3D DCSM-FM (red dots) and the Nemo-Nordic (blue dots) model domains. The tide gauges marked with a black circle are those that are only available for the most preferred time span, while those marked with a black cross are only available for the least preferred one. No black circle or cross means the tide gauge is available for both time spans

Table 2 Variance factors estimated in Experiments I and II

Table 2 provides the estimated variance factors for the three observation groups. Figure 4 shows the differences between the geopotential numbers of the UELN-only solution and the ones estimated in Experiment I, as well as the contribution to these differences from both hydrodynamic leveling datasets separately. These contributions were obtained by realizing the EVRS based on the UELN dataset and the identified hydrodynamic leveling connections in the specific sea basin. The applied variance factors are the same as those estimated when using both hydrodynamic leveling datasets. The results show the following:

Fig. 4
figure 4

Differences between the geopotential numbers associated to the UELN-only solution (Experiment 0) and the ones estimated in Experiment I (a). The panels (b) and (c) show respectively the contributions of the 49 connections added in the Baltic Sea and the 75 in the northwest European continental shelf

  • Both hydrodynamic leveling datasets are downweighted compared to the UELN dataset; for the latter dataset, the estimated variance factor is 0.973, whereas for the 3D DCSM-FM and Nemo-Nordic hydrodynamic leveling datasets the variance factors are 2.156 and 2.273, respectively. This means that the standard deviations of the added hydrodynamic leveling connections increase from 33.8–41.5 kgal  mm to 49.6–61.0 kgal  mm (3D DCSM-FM) and 38.4–59.9 kgal  mm to 58.0–90.3 kgal  mm (Nemo-Nordic).

  • The differences between the geopotential numbers show a large-scale pattern, of negative (positive) values in the west (east). The median difference is \(-3.1\) kgal  mm and the standard deviation (estimated as \(1.4826\ \times \) the median absolute deviation (MAD)) is 6.1 kgal  mm.

  • Great Britain and France stand out in terms of the magnitude of the differences. In Great Britain, the values range between 23.8 and 444.2 kgal  mm. In France, the large differences are concentrated in the southwest. The values range between \(-96.7\) and 10.5 kgal  mm. For all other countries, the values range between \(-15.2\) (Spain) and 8.9 kgal  mm (Poland).

  • Only in the countries France and Great Britain the range of the differences exceed 10 kgal  mm.

  • The contribution of the Nemo-Nordic dataset (see Fig. 4b) is largest in Norway, Sweden, and northern Finland (between \(-16.5\) and \(-10.1\) kgal  mm) as well as along the Polish coastline (up to 8.3 kgal  mm). Likewise, the contribution of the 3D DCSM-FM dataset (see Fig. 4c) is the largest in Great Britain (between 27.9 and 450.8 kgal  mm), France (between \(-91.6\) and 13.8 kgal  mm), and Spain (between \(-12.4\) and \(-11.8\) kgal  mm).

The observed impact of the hydrodynamic leveling datasets on the EVRF is significant, especially in Great Britain and France where differences reach tens to hundreds of kgal  mm. In most other countries the impact is substantially lower; we observe spatially correlated differences of low magnitude. This result is easily explained when considering i) the domain in which model-based hydrodynamic leveling connections have been established, and ii) the weighting (i.e., quality) of the (individual observation groups within the) UELN dataset relative to the two hydrodynamic leveling datasets.

Indeed, on the former, no major impact was expected for non-coastal countries and the countries around the Mediterranean and the Black Seas because the leveling networks of these countries were not strengthened with model-based hydrodynamic leveling connections. Note that the impact observed in Spain has been propagated from elsewhere; no Spanish tide gauges were used to establish model-based hydrodynamic leveling connections. The difference in impact between Great Britain and France on the one hand and the other North Sea countries on the other hand, as well as the small impact for the Baltic Sea countries, follows directly from the weighting of the different observation groups. Most northwestern European and Baltic Sea countries have high-quality leveling datasets (see Fig. 2). As such, it would have been more useful to include hydrodynamic leveling connections in the Mediterranean and Black Seas. The downweighting of the hydrodynamic leveling datasets relative to the UELN dataset also suppresses the impact. While the downweighting is significant (see the increase in the standard deviations of the added hydrodynamic leveling connections), it is within reasonable bounds. Further discussion of the obtained variance factors follows in the discussion of the results of Experiment II (Sect. 4.3).

The fact that the differences in France are confined to the southwest is probably related to the difference in density of the old IGN69 leveling network versus that of the new NIREF network. As shown in Fig. 5, the two networks only overlap in the northwest. Outside the northwest, the IGN69 network is denser, i.e., contains smaller loops. Given the low weight assigned to this dataset in the adjustment (see Table 1), it is not surprising that the solution along the southwest coast is more impacted by the added hydrodynamic leveling connections.

Fig. 5
figure 5

The old French leveling network IGN69 (blue) and the new zero-order leveling network named NIREF (orange). The old network is suspected for a tilt of 23 cm in North–South direction (Sacher and Liebsch 2019)

Fig. 6
figure 6

Differences between the EVRF heights estimated in Experiments 0 (a), I (b), and II (c) and the GNSS/EGG2015 normal heights at the EUVN data points in Great Britain

Fig. 7
figure 7

The top panel shows the differences between the EVRF heights estimated in the three Experiments and the GNSS/EGG2015 normal heights (\(\delta H^{N}\)) at the EUVN data points in Great Britain as a function of latitude as well as the fitted linear model. The bottom panel shows the histograms of the differences

Fig. 8
figure 8

Top row shows the differences between the EVRF heights estimated in Experiments 0 (a), I (b), and II (c) and the GNSS/EGG2015 normal heights at the EUVN data points in France. The bottom panel shows the histogram of the differences

Fig. 9
figure 9

Panel a shows the plane fitted to the differences between the EVRF heights estimated in Experiments 0 and the GNSS/EGG2015 normal heights. Panels b and c show the change in the plane fitted in respectively Experiments I and II with respect to the one for Experiment 0. To enhance visibility, the mean values have been removed. These are 548.4, \(-110.8\), and \(-88.1\) in (a), (b), and (c), respectively

It is striking to observe that the impact of adding hydrodynamic leveling connections among Baltic Sea tide gauges on the EVRF in Norway and Sweden is larger than when adding leveling connections involving tide gauges along the coasts of Norway and western Sweden (cf. Figure 4b and Fig. 4c). In the latter case, there is hardly any impact. A conclusive explanation for this result is lacking. One explanation might be that the Norwegian and Swedish leveling networks fit well to the central part of the UELN (comprising The Netherlands, Germany, and Denmark) while the Nemo-Nordic dataset introduces some deformation of the network. Alternatively, the Nemo-Nordic dataset may correct some deformation of the network the 3D DCSM-FM dataset is not able to do.

Given the small ranges of the differences per country between the solutions obtained in Experiments 0 and I (they are \(<10\) kgal  mm, except for France and Great Britain), a comparison with GNSS/EGG2015 normal heights is only conducted for Great Britain and France. The results are summarized by difference maps (Fig. 6 and top panel of Fig. 8), plots of the differences as a function of latitude (top panel in Fig. 7), plots showing the plane fitted to the differences (Fig. 9) and the histograms of the differences (bottom panels in Figs. 7 and 8).

The comparison shows the following:

  • In Great Britain, the differences for the UELN-only solution show a south-north slope of 48 mm \(\hbox {deg}^{-1}\). In the solution including model-based hydrodynamic leveling data (Experiment I) this trend almost disappeared; the estimated slope is 2.2 mm \(\hbox {deg}^{-1}\).

  • In Great Britain, the median of the differences reduces from 304 mm to 57 mm whereas the standard deviation (estimated as before) reduces from 156 mm to 35 mm.

  • In France, the differences show a northwest-southeast slope in both solutions. In the southwest the differences are lower for Experiment I. The south-north slope in Experiment 1 reduces from 12.6 mm \(\hbox {deg}^{-1}\) to 10.1 mm \(\hbox {deg}^{-1}\). However the east–west slope increases from 5.3 mm \(\hbox {deg}^{-1}\) to 7.6 mm \(\hbox {deg}^{-1}\).

  • In France, the median difference reduces from \(-31\) mm to \(-21\) mm. The standard deviation (estimated as before) does not change.

The first conclusion to be drawn from the comparison is that the EVRF heights obtained in Experiment I show better agreement with the GNSS/EGG2015 normal heights, both in France and Great Britain. In Great Britain, there is hardly any slope left in the differences associated with Experiment I. In France, the better agreement is concentrated in the southwest. Given the uncertainty of the GNSS data (Sect. 3.4) and the EGG2015 (Sect. 3.5), we can conclude that for these two countries the impact of model-based hydrodynamic leveling data on the quality of the EVRF is positive. The fact that the impact is greater for Great Britain can again be explained by the fact that the border of the British leveling network is completely adjacent to the sea, so that it could be strengthened over the entire perimeter. For France, the strengthening is only on the west side. However, there are possibilities for further reinforcement; in the southwest, France borders the Mediterranean Sea.

Note that the magnitude of the estimated slope associated with the UELN-only solution is larger than the -(20–25) mm \(\hbox {deg}^{-1}\) reported in (Penna et al. 2013). This difference originates from the use of both the Second and Third Geodetic Leveling datasets in the realization of the British vertical datum (Ordnance Datum Newlyn), whereas we only used the latter one (see also Sect. 3.3). To what extent model-based hydrodynamic leveling data can help to identify the exact source of the systematic leveling errors in the British dataset is outside the scope of this study.

4.3 Experiment II: Adding hydrodynamic leveling data computed over the least preferred time span

In Experiment II, we repeat Experiment I while using time spans to compute the 3-SMWLs that can be considered as the least preferred choice based on the criteria outlined in Sect. 2.2. These are 2015–2017 for the 3D DCSM-FM dataset and 2018–2020 for the Nemo-Nordic dataset. Note that a different time span mainly affects the 3D DCSM-FM dataset; the time span over which the reanalysis is conducted is 22 years. For the Nemo-Nordic dataset, we only have 4 years of data available. Hence, there will be an overlap of 2 years. The choice of the time span impacts the quality of the hydrodynamic leveling connections (remember that the performance of the 3D DCSM-FM reanalysis varies over time (Afrasteh et al. 2023)) as well as the set of tide gauges available to establish connections. In Experiment II, we will assess to what extent the choice of the time span changes the impact of model-based hydrodynamic leveling on the EVRF observed in Experiment I.

For the Baltic Sea and northwest European continental shelf, we have 49 and 79 tide gauges available, respectively (see Fig. 3). Table 2 provides the estimated variance factors for the three observation groups. Figure 10 shows the differences between the geopotential numbers of the UELN-only solution and the ones estimated in Experiment II, as well as the differences with respect to the solution obtained in Experiment I. The comparison with the GNSS/EGG2015 normal heights are summarized in Figs 68. Moreover, Fig. 9 shows the changes in the plane fitted to the differences with the GNSS/EGG2015 normal heights with respect to the one fitted in Experiment 0. The results show the following:

  • The downweighting observed in Experiment I is lower. For the Nemo-Nordic dataset, the variance factor reduced from 2.273 to 2.174. For the 3D DCSM-FM, the value reduced from 2.156 to 1.210.

  • The overall pattern in the differences compared to the UELN-only solution is the same.

  • Changes compared to the solution obtained in Experiment I reach \(-0.4\) kgal  mm and 2.2 kgal  mm in terms of the median and standard deviation (estimated as before), respectively. They are largest in Great Britain (between \(-235.8\) and 16.5 kgal  mm), France (between \(-33.0\) and 79.7 kgal  mm), and the Scandinavian countries (between \(-12.5\) and 3.8 kgal  mm).

  • The agreement with the GNSS/EGG2015 normal heights reduced compared to the solution in Experiment I for both France and Great Britain. For Great Britain, we observe an increased offset and slope. The slope in this case increases to 12.6 mm \(\hbox {deg}^{-1}\). For France, we observe that the improved agreement in the southwest has gone. The tilt of the fitted plane in latitudinal direction does not show any significant change compared to Experiment I (i.e., it changed from \(-10.1\) mm \(\hbox {deg}^{-1}\) to \(-10.6\) mm \(\hbox {deg}^{-1}\)). In longitudinal direction, however, the tilt decreases from 7.6 mm \(\hbox {deg}^{-1}\) in Experiment I to 6.4 mm \(\hbox {deg}^{-1}\) in Experiment II.

Fig. 10
figure 10

Differences between the geopotential numbers associated to the UELN-only solution (Experiment 0) and the ones estimated in Experiment II (a) and the ones associated to Experiments I and II (b)

The impact of the different time spans used to compute the 3-SMWLs is significant and to be expected considering the differences between the sets of tide gauges available to establish the hydrodynamic leveling connections (see Fig. 3). Compared to the set available in Experiment I, in Experiment II we miss 12 tide gauges in Great Britain while only 2 new ones were added. In southwest France, the set in Experiment I includes two more tide gauges. Also, the set of Danish tide gauges is much less homogeneously distributed in Experiment II. At the same time, the number of Swedish tide gauges on the Kattegat side has increased substantially. In fact, given the small number of tide gauges along the British coast in Experiment II (4), it is surprising that there is still such a level of agreement with the GNSS/EGG2015 normal heights compared to what is observed for the UELN-only solution.

A striking difference between the two Experiments is observed in the estimated variance factors for the 3D DCSM-FM dataset. In Experiment II the number is significantly smaller. Compared to the variance factor of the UELN dataset, the downweighting observed in Experiment II is lower. This result can be understood by considering the regional variability in performance of the 3D DCSM-FM reported by Afrasteh et al. (2023); the best performance was obtained in the Kattegat-Skagerrak region. In British waters (i.e., Bristol Channel, Celtic Sea, Irish Sea and St. George’s Channel, and the Inner Seas off the West Coast of Scotland), the performance was less. Given the aforementioned differences between the two sets of tide gauges available for establishing hydrodynamic leveling connections, Experiment II added more connections from/to tide gauges in the Kattegat-Skagerrak than to tide gauges in British waters. With regard to the Nemo-Nordic dataset, we would like to remark that a tailored noise model is missing; the length of the available time series is not sufficient to develop such a model. Instead, we relied on a noise model developed for the AMM7 hydrodynamic model covering the northwest European continental shelf (Sect. 3.1). The downweighting of the Nemo-Nordic dataset observed in both experiments suggests that the quality of the Nemo-Nordic model in representing the SMWLs is lower than that of the AMM7 model.

5 Summary and conclusion

This study presents the first realization of a regional height reference system, namely the European Vertical Reference System (EVRS), based on the combination of geopotential differences determined with spirit leveling/gravimetry and model-based hydrodynamic leveling. The latter technique was introduced by Slobbe et al. (2018) as an efficient and flexible alternative method to connect islands and offshore platforms to the height system on land. The study built upon two previous studies on exploiting model-based hydrodynamic leveling data. In Afrasteh et al. (2021), we demonstrated the potential of the technique for the realization of the EVRS by means of geodetic network analyzes. That was, we assessed the potential impact on the quality of the European Vertical Reference Frame (EVRF). In Afrasteh et al. (2023), we presented empirical noise models for three different reanalysis products available for the northwest European continental shelf and reassessed the quality impact.

In the present study, we assessed the impact of model-based hydrodynamic leveling on the EVRF using real data. In doing so, we computed summer mean water levels (SMWLs) over three subsequent years (referred to as the 3-SMWLs) with the Nemo-Nordic and 3D DCSM-FM models covering the Baltic Sea and the northwest European continental shelf, respectively. About 250 coastal tide gauges, each located inside one of the two model domains but outside the tidal flat areas, were connected to the nearest Unified European Leveling Network (UELN) height benchmark (HBM). In establishing these connections, we relied on the adjusted heights obtained in the national height system realizations. Note that all tide gauges for which the distance to the nearest UELN HBM is \(>40\) km were excluded. Moreover, in case multiple tide gauges were connected to the same UELN HBM, we used the one in which the median absolute deviation of the differences between the observation- and model-derived monthly MWLs was lowest. For both reanalysis products, the 3-SMWLs were calculated over two different time spans. The first is considered the most preferred time span based on three criteria, while the second reflects the least preferred choice. The results obtained using these time spans are presented in Experiments I and II. Note that the choice of the time span is particularly relevant for the 3D DCSM-FM dataset; the reanalysis covers the period 1997–2019. The reanalysis based on the Nemo-Nordic model runs from 2017 to 2021. In determining the connections between the tide gauges, we made use of the heuristic search algorithm developed by Afrasteh et al. (2021). This algorithm identifies the set of hydrodynamic leveling connections that provide the lowest median of the propagated height standard deviations.

The impact of model-based hydrodynamic leveling on the EVRF was determined by comparing the solutions with the so-called UELN-only solution. Except for 3 HBMs, this solution is identical to the EVRF2019 (release September 2020). The differences include a mean difference of 6.527 kgal  mm at two British HBMs included in the EVRF2019.

Based on the comparison with the UELN-only solution, the impact of model-based hydrodynamic leveling in Experiment I has a long wave character and is significant for France and Great Britain where the differences range between \(-96.7\) and 10.5 kgal  mm, and 23.8 and 444.2 kgal  mm, respectively. The range of differences for the other countries is lower than 10 kgal  mm. In Experiment II, the large-scale pattern is the same. However, the impact for both France and Great Britain is lower.

A comparison with normal heights obtained by differencing GNSS and EGG2015 quasi-geoid heights showed that in Experiment I for both France and Great Britain the systematic differences between both height data sets decreased. In Great Britain, the south-north slope disappeared almost completely after adding model-based hydrodynamic leveling data. In France, the decrease is mainly visible in the southwest.

Based on the results, we conclude that model-based hydrodynamic leveling has a positive impact on the EVRF. At the same time, and not unexpectedly, this impact does depend on the number and locations of tide gauges available for establishing the connections.

The results presented here show a provisional impact. Indeed, as indicated in Sect. 1, this study is subject to a number of limitations. First, in connecting the tide gauges to the UELN we mostly relied on potential differences computed from adjusted heights rather than real levelings. Second, we made no attempt to reduce the potential differences to the reference epoch adopted for the EVRS (i.e., epoch 2000.0). Third, we made no attempt to improve the performance of the hydrodynamic models in representing the local MWL at the tide gauge locations. In addition, to reduce the computational load no connections were allowed between tide gauges located i) within the same country, or ii) in neighboring countries for which the number of spirit leveling connections between the countries is larger than one. In particular for countries having a long coastline, it could be useful to relax these constraints. Finally, no connections have been added in the Mediterranean and Black Seas. Given the quality of the UELN data in the southern European countries, this would be a valuable addition.

Fig. 11
figure 11

Location of the tide gauges (a), and the connectivity matrix showing which model-based hydrodynamic leveling connections are included in the Nemo-Nordic dataset (b)

Fig. 12
figure 12

Location of the tide gauges (a), and the connectivity matrix showing which model-based hydrodynamic leveling connections are included in the 3D DCSM-FM dataset (b)

All these aspects have to be addressed to achieve a rigorous implementation of model-based hydrodynamic leveling. However, this requires collaboration with experts who know which processes contribute to the (summer) mean water level variability at the tide gauge locations, and who have access to adequate models and datasets to model this variability. But also experts who can assess any vertical movement of the tide gauges and who can connect the tide gauges to the UELN. Undoubtedly, some things will not be realized overnight. Many data are not (publicly) available. A key dataset is leveling connections between the TGBMs and the nearest UELN HBMs. Likely, these connections have to be established which takes time and costs. At the same time, the prospect of having a technique that can connect islands and offshore platforms to the mainland height system and suppresses systematic errors in leveling networks makes the effort worthwhile.

Future research could also focus on using GNSS/leveling to establish (additional) connections between HBMs (separated by large water bodies). In addition to the required GNSS data, doing so requires the full noise variance-covariance matrix of the quasi-geoid model used. Obtaining the latter is probably the biggest challenge. Even if both are available, the question remains how to validate the results, as there is no dataset left to validate the solution.