How Different Analysis and Interpolation Methods Affect the Accuracy of Ice Surface Elevation Changes Inferred from Satellite Altimetry

Satellite altimetry has been widely used to determine surface elevation changes in polar ice sheets. The original height measurements are irregularly distributed in space and time. Gridded surface elevation changes are commonly derived by repeat altimetry analysis (RAA) and subsequent spatial interpolation of height change estimates. This article assesses how methodological choices related to those two steps affect the accuracy of surface elevation changes, and how well this accuracy is represented by formal uncertainties. In a simulation environment resembling CryoSat-2 measurements acquired over a region in northeast Greenland between December 2010 and January 2014, different local topography modeling approaches and different cell sizes for RAA, and four interpolation approaches are tested. Among the simulated cases, the choice of either favorable or unfavorable RAA affects the accuracy of results by about a factor of 6, and the different accuracy levels are propagated into the results of interpolation. For RAA, correcting local topography by an external digital elevation model (DEM) is best, if a very precise DEM is available, which is not always the case. Yet the best DEM-independent local topography correction (nine-parameter model within a 3,000 m diameter cell) is comparable to the use of a perfect DEM, which exactly represents the ice sheet topography, on the same cell size. Interpolation by heterogeneous measurement-error-filtered kriging is significantly more accurate (on the order of 50% error reduction) than interpolation methods, which do not account for heterogeneous errors.

cell size. Interpolation by heterogeneous measurement-error-filtered kriging is significantly more accurate (on the order of 50% error reduction) than interpolation methods, which do not account for heterogeneous errors.

Introduction
Satellite altimetry is one means of determining mass changes in ice sheets (Shepherd et al. 2018), which are affected by climate change and affect the global sea level. Mass changes are derived from estimations of volume changes combined with firn and ice densities (Shepherd et al. 2012;Hurkmans et al. 2014;Khan et al. 2015;McMillan et al. 2016).
Deriving height changes from satellite altimetry usually involves two steps: local height change determination from repeat altimetry, and subsequent spatial interpolation at unobserved areas, possibly involving smoothing. As the repeated altimeter measurements do not refer to exactly the same position, the local topography has to be accounted for. The repeat altimetry analysis (RAA) approach has been widely used to solve this problem (Legrésy et al. 2006;Flament and Rémy 2012b;Nilsson et al. 2016;Sørensen et al. 2018a;Schröder et al. 2019). However, it is subject to a number of methodological choices in the processing chain. They include the size and arrangement of RAA cells, modeling of seasonal height changes (Sørensen et al. 2011), the use of signal parameters such as leading-edge width or backscatter to model changing signal penetration (Simonsen and Sørensen 2017), and outlier elimination.
The spatial coverage of height change estimates from RAA depends on the orbit geometry and the success of RAA estimates, and is neither homogeneous nor complete. Therefore, subsequent interpolation and filtering is commonly applied. Hurkmans et al. (2012bHurkmans et al. ( , 2014 apply ordinary kriging (OK) with spatiotemporal modeling of the underlying process. They also introduce external data of higher resolution to improve the interpolation via external drift. OK is an exact interpolator (Cressie 1993). Exact interpolation is desired when the input values are free of error. This is not the case for height changes derived by RAA. They are the output of a fitting algorithm and are provided with individual uncertainties of the estimate. Uncertainties can be used to refine interpolation, for example, by using heterogeneous measurement-error-filtered kriging (HFK) (Christensen 2011).
Note that the preprocessing of altimetry data, which is not the subject of this investigation, can differ in slope correction and radar waveform retracking algorithms (Hurkmans et al. 2012a;Helm et al. 2014;Nilsson et al. 2016;Sørensen et al. 2018b). This affects positioning and height measurements as well as further derived values such as height change estimates. As a matter of fact, the irregular spatial coverage remains unaffected, and coping with it is the focus of this study.
Various approaches in RAA and interpolation lead to differences in the final height change estimates and their correspondingly derived uncertainties. In RAA, altimetry measurements are jointly processed in a defined area (called a cell). The effect of cell size and related modeling of the underlying local topography on trends in height change is investigated. For the subsequent interpolation OK, inverse distance weighting (IDW), filtered kriging (FK) and HFK are applied. These four different interpolation approaches are investigated with respect to the accuracy of the interpolated height changes and the reliability of their uncertainty estimates.
The coastal areas show the highest changes in the elevation of the Greenland Ice Sheet (GIS) (Sørensen et al. 2018a). These areas are covered by CryoSat-2 measurements in interferometric synthetic aperture radar (SARIn) mode. While the low-resolution mode (LRM) has a beam-limited footprint of 20 km and a pulse-limited footprint of 1.5 km, SARIn mode enhances along-track resolution to 300 m and uses the across-track angle for measurement attribution (Wingham et al. 2006). Obtaining full spatial coverage of height change in an area with sloped topography in combination with the irregular spatial distribution of RAA height change estimates is challenging. This simulation study is conducted at the eastern margins of the Northeast Greenland Ice Stream (NEGIS), where these circumstances can be reproduced without incorporating additional difficulties such as narrow fjords.
Section 2 introduces the mathematics of RAA and the interpolation algorithms used. Section 3 describes the simulations, including the synthetic data set and processing details. The results are analyzed and discussed in Sect. 4. Section 5 summarizes the results and highlights the effect of RAA parameter choices and the benefits of HFK for subsequent interpolation.

Repeat Altimetry Analysis
RAA is a fitting method based on least-squares regression. It uses height measurements h i and the corresponding locations and times in a cell to estimate parameters that describe the underlying spatial and temporal variation in the measured elevations.
The components Ft i , Fl i and Fs i describe the dependence on time, location and radar signal return characteristics, respectively. Parameters of these components (as specified below) are estimated by RAA. The resulting residuals between model and observation are depicted in res i . The final choice of the RAA parameter set depends on the satellite track configuration and the measurement properties of the mission used.
The design of the cells can vary (Sørensen et al. 2018a). If the cells are arranged along the subsatellite tracks, rectangular cells that span several consecutive shots along track and the repeat corridor across track are well established (Legrésy et al. 2006;Ewert et al. 2012). In the case of regular grids, both rectangular (McMillan et al. 2014) and circular (Simonsen and Sørensen 2017) cell shapes are commonly used. The latter permit a constant maximum measurement distance to the cell center and are used in this article.
In this study the time dependence in Eq. (1) is characterized by a linear trend dh dt .
Additionally, seasonal elevation variations modeled by a combination of sine and cosine terms can be introduced here (Sørensen et al. 2011). The location-dependent component models the local topography inside the analyzed cell. A common approach is the fit of a plane (Smith et al. 2009;Sørensen et al. 2015;Schröder et al. 2019) Fl where x i , y i are horizontal Cartesian coordinates with their origin in the cell center, and a 0 , a 1 , a 2 are the parameters of the plane. Other local topography models exist, such as the biquadratic model with six parameters, used by Nilsson et al. (2016), Simonsen and Sørensen (2017) or the nine-parameter model used by Ewert et al. (2012), Wouters et al. (2015). In contrast, the local topography model is reduced to only one parameter, a 0 , when a digital elevation model (DEM) is subtracted beforehand (Sørensen et al. 2011;Helm et al. 2014;Simonsen and Sørensen 2017). Additional parameters may be useful depending on specific characteristics of the altimeter return signal, which is affected by characteristics of the reflecting surface and volume of firn or ice. For CryoSat-2, by the parameter d BS, the effects of time-variable signal penetration with anomalies of backscattered power bs i − bs are described.
This modeling can be further expanded by involving leading-edge width or trailingedge slope of the signal waveform (Flament and Rémy 2012a;Simonsen and Sørensen 2017), or a bias between ascending and descending satellite tracks (McMillan et al. 2014;Simonsen and Sørensen 2017). However, such interactions between the radar measurements and the uppermost firn layer are complex (Simonsen and Sørensen 2017;Adodo et al 2018) and are not yet fully understood. This study focuses on the analysis of different aspects of spatial sampling. In regions where the topography is sufficiently flat to allow for reliable analysis of such waveforms, the influence of the scattering characteristics acts on significantly larger scales. Therefore, we expect a negligible influence of such parameters on our results and do not further analyse these types of parameters.
For each cell, all selected parameters are solved for simultaneously in a joint leastsquares adjustment to all measurements h i lying inside this cell. The elevation change parameter dh dt is the target of this analysis. The RAA approach also provides an a posteriori standard error for dh dt , based on the statistics of the residuals res i . The choice of the local topography model depends on several considerations. The actual topography of Greenland, which is smooth in the ice sheet interior and rugged at the margins, has to be taken into account. The ability to properly model the topography is also linked to the cell size and the number of observations available. Because of the limited number of observations in smaller cells, the number of topography parameters is restricted, while in larger cells a simple model might not be able to depict the actual local topography. The effect of different cell sizes and local topography models is investigated in this study.

Interpolation
The estimated height changes dh dt have to be interpolated to obtain values at places where data are missing or RAA could not be solved successfully. The interpolation methods used in this study are all based on the same principle: observations Z at locations r i are used to calculate a new value Z * at a certain location r 0 by weighted summation (Myers 1991;Cressie 1993;Chilès and Delfiner 2012). In this section, r denotes a two-dimensional position vector.
To derive the weights λ i , IDW uses only geometric information, while kriging uses a geostatistical approach. Different kriging methods have been developed, based on formulations by D. Krige and G. Matheron (Chilès and Delfiner 2012). This study focuses on OK, FK and HFK. Detailed information about the used kriging methods are given for example by Cressie (1993), Chilès and Delfiner (2012), Christensen (2011). The number of points used for interpolation depends on the data set and the user's decision. Irregularly distributed observations may lead to distorted results, for example due to unwanted screening (Cressie 1988;Chilès and Delfiner 2012) or spatial biases. Therefore, the surroundings are often divided into a certain number of sectors, selecting observations in each of the sectors to obtain a more uniform distribution. The same applies for variograms (Stosius and Herzfeld 2004).
In the field of geodesy, besides kriging, different least-squares collocation methods are commonly applied, which may also model uncertainties (Nilsson et al. 2015;Sørensen et al. 2018a). These include different treatment of measurement and interpolation errors. This study focuses on interpolation error, although methods such as IDW and OK would need additional uncertainty assessment. Basic agreement between kriging and collocation methods is confirmed (Dermanis 1984), although the individual requirements can differ. Distinguishing these two statistical approaches is outside the scope of this article.
Previous studies prove the general suitability of the selected methods (Rühaak 2015;Christensen and Berrett 2016;Kang et al. 2017) and provide some comparison results (Chaplot et al 2006;Li and Heap 2011). To the authors' knowledge, HFK has not yet been applied to height changes derived by satellite altimetry.

Inverse Distance Weighting
The weights of IDW depend on the normalized distances d i between the locations of the new point r 0 and the surrounding observations r i .
The power k of the distance can be adjusted to any positive value (Webster and Oliver 2007). In this study, a value of 1 is used to model linear dependence. The uncertainty of interpolation σ IDW can be estimated by error propagation as Points that have observations retain their observed value, and their interpolation error is set to zero.

Ordinary Kriging
Kriging uses a variogram for the calculation of weights λ i . Variograms describe the inquired process in terms of the second moments of value differences in their dependence to distance h. The sample variogram valueγ for distance d can be calculated from the observations bŷ These values are calculated for several distance classes representing an interval of discrete width. A specific variogram model γ is fitted to this sample variogramγ . The characteristic parameters describing it are sill (representing the variance of the process), range (corresponding to the maximum distance at which correlation between the values can be observed) and nugget (a discontinuity at the origin). This discontinuity at distances near zero is caused, for example, by limitations of the sampling density (Chilès and Delfiner 2012).
In the formulation of the kriging system, this modeled function is applied to the distances d between the points involved according to This equation system is then solved for the weights λ i . m is the Lagrange parameter, which completes the system. Interpolation is then applied according to Eq. (7). The resulting kriging variance σ 2 OK at a certain point equals the minimized mean squared error on which the kriging formulation is based (Cressie 1988).
For OK, the value for γ (d ii ), that is the variogram value for zero distance, is zero, so that the main diagonal in the matrix of Eq. (11) consists of zeros.

Filtered Kriging
In the case of no errors, the nugget of the variogram consists only of microscale variation. For noisy data, an error component has to be considered. The aim is to derive values of the error-free component T from measurements Z corrupted with noise (Christensen 2011).
If the variance σ 2 of the error is known and assumed to be homogeneously distributed, for example due to known measurement errors, it can be introduced into kriging. Different notations can be found for example at Delhomme (1978), Cressie (1993), Rühaak (2015). Based on Christensen (2011), FK can be expressed as with The kriging variance is This method leads to a filtering of the input data set, so that the values of the observed points are modified, depending on the error variances used. In contrast to OK, the kriging variance at observed points is no longer zero.

Heterogeneous Measurement-Error-Filtered Kriging
HFK, which incorporates heterogeneous measurement errors into kriging, was developed by Christensen (2011) and successfully applied, for example, by Christensen and Sain (2012), Christensen and Berrett (2016), Kang et al. (2017). The error variances σ 2 i of the observations are used individually in Eq. (18), after modifying the variogram. The values of the original variogram γ are reduced by the mean of the individual error variances σ 2 i of the observations.
The arithmetic mean of the RAA-derived a posteriori errors used in Eq. (17) for HFK variogram adjustment is used in this study as homogeneous error for FK in Eq. (14). It differs between the different RAA calculations.
The newly modeled variogram γ * is used in the HFK equation.
(18) According to Eq. (17), the variogram value for zero distances is zero. Therefore, the main diagonal of the matrix in Eq. (18) is zero, just as for OK. All other elements of the matrix incorporate the individual error variances.
The kriging variance for HFK is defined analogously to OK.
This method leads to a filtering of the input data set. In contrast to FK, the filtering considers the individual uncertainties at the observation points.

Simulation Setup
In order to assess the different RAA models and interpolation methods, simulations are performed on synthetic data sets. Figure 1 introduces the area investigated in this study. The area of approximately 23,000 km 2 covers the lower part of the NEGIS drainage system, based on a slight modification of its delineation by Zwally et al. (2012) (Fig. 1).

Simulation Data
Several real data sets were combined to obtain an authentic simulated data set. A rate of elevation change (Fig. 1b) is simulated by summing up contributions related to position, flow velocity, elevation and surface mass balance patterns. For each position i, height change is simulated as The terms b 0 + b 1 x i + b 2 y i simulate a component with a simple linear dependence on position. This reflects height loss from southwest to northeast, based on topography and location of the outlet glaciers. The term b 3 v i creates a trend that depends on ice flow velocity provided by Joughin et al. (2010a, b), in order to mimic changes related to ice flow dynamics. b 4 h i denotes topography-related changes, using the TanDEM-X (TerraSAR-X add-on for Digital Elevation Measurements) DEM (Krieger et al. 2007;Rizzoli et al. 2017). The term b 5 s i introduces an additional spatial pattern. s is taken as a temporal snapshot of the cumulative surface mass balance anomaly at a certain time from RACMO 2.3 (Regional Atmospheric Climate Model) (Noël et al. 2015). The factors b i balance the different components and adjust the annual height change between ± 2 m year −1 . The spatial resolution of this data set can be adjusted. It is In RAA, these errors are introduced as a priori standard errors for the noisy data.

Simulation Procedure
The simulated height measurements are used as input for RAA, where different choices of RAA cell size and local topography model are assessed. Figure 2 emphasizes the role of local topography in RAA applications. For a selected cell with 2,000 m diameter, the topography of TanDEM-X DEM and three differently parametrized local topography models are compared. As the local topography models are used to reduce the satellite observations to the cell center, model and reality should optimally match. RAA was applied to both the error-free and the noisy data set, with cell diameters of 500 m, 1,000 m, 2,000 m, 3,000 m, 4,000 m and 5,000 m. Because of the rather short time span and the main focus on the trend of height change, seasonal parameters were not included. The RAA cells are distributed on a regular grid (Helm et al. 2014;  (Wouters et al. 2015;Sørensen et al. 2018a). The local topography fit is parametrized using three, six and nine parameters (cf. Eqs. (3-5)), and DEMs are used to subtract the topography in each cell before the height change trend is estimated. The TanDEM-X DEM represents the true surface topography used for simulation, which is usually not available in real data applications. Therefore, additional DEMs were introduced, namely the Greenland Mapping Project (GIMP) DEM (Howat et al. 2014) and the ArcticDEM (Porter et al. 2018). They are likely to be used in actual applications because of their Greenlandwide coverage. The use of TanDEM-X DEM is abbreviated with T in this article, the GIMP DEM with G and the ArcticDEM with A. The parametrized local topography models are distinguished by the number of parameters, three, six and nine. Figure 3 depicts the differences between the additional DEMs and the TanDEM-X DEM, which is introduced as the true topography. The DEMs differ in data source and nominal time, but are similar in spatial resolution (about 100 m). The mass loss between different height acquisitions leads to different surface heights. After offset removal, the influence of fast-changing surface heights at the outlet glaciers on the different DEMs becomes apparent.
For RAA, the slope is of main interest. The slopes of the ArcticDEM match well those of the TanDEM-X DEM, except for some distinct features. Significant discrepancies occur for the GIMP DEM in some regions, which will influence the RAA results. The reasons for these differences, involving different data acquisition methods and time spans, are not a focus of this article and are therefore not further discussed.
After RAA, an outlier detection was applied on the resulting parameters. The simulated height changes vary between −2 m year −1 and +2 m year −1 . Therefore, absolute height changes exceeding 10 m year −1 are removed. Additionally those with an a posteriori standard deviation of more than 1 m year −1 are removed. This criterion is commonly applied to RAA results in Greenland (Simonsen and Sørensen 2017) and affects less than 0.1% of the results. Prior to interpolation, a bicubic function is removed from the original data, in order to reduce the influence of spatial trends on the variogram modeling. This bicubic function was re-added after interpolation.
The gridded elevation trend dh dt (hereafter denotedĝ i ), derived from synthetic data by applying RAA and further interpolation, is compared with the original synthetic ("true") elevation trend (g i ). The calculation of true values g i is adapted to the spatial resolution at which the estimatesĝ i are calculated. That is, for each RAA cell size a data set with true values is calculated based on the data of Eq. (20) in the respective resolution. The differences g i −ĝ i are termed true errors.
The accuracy of the results is assessed by the root-mean-square error (RMSE) over all n grid cells.
In contrast, the standard uncertainties are defined as the square root of the kriging and interpolation variances.

RAA Performance
The various cell size and local topography model combinations lead to different RAA results. Figure 4 shows the ratio of grid cells with a successful height change estimate by RAA, dependent on cell size and local topography model and restricted by outlier criteria. Cells without valid RAA results need to be filled by interpolation. The use of noisy versus error-free measurements leads to only a negligible difference with regard to the spatial coverage with RAA results. The size of the RAA cells strongly affect the coverage ratio, as larger cells cover the gaps between subsatellite tracks, while smaller cells adhere more closely to the tracks. The choice of the local topography model significantly affects the coverage ratio for the 500 m and 1,000 m cell sizes. For a diameter of 500 m, increasing the number of parameters from three to six and nine decreases the number of successful estimates dramatically. Here, the number of observations restricts the quality of parameter estimation. As cell size increases, the influence of the local topography model on the success of the height change estimates decreases. For cells with diameters of 3,000, 4,000, and 5,000 m, the three-parameter model of local topography yields slightly less valid RAA results than the six-and nine-parameter models. The use of DEM subtraction generally leads to more successful height change estimates, especially for the smallest cell size. The spatial pattern of a posteriori uncertainties reveals a reason for this local topography model-dependent coverage. Figure 5 shows the standard deviations for cells with 3,000 m diameter. The spatial pattern is related to slope, leading to larger uncertainties in the steep areas, for example near the steeply sloped zone leading into the glacier tongue of 79N. The more complex the topography modeled, the better the height change estimate and the lower its a posteriori uncertainty. In this investigation, values with an a posteriori RAA-derived standard deviation of more than 1 m year −1 are rejected (cf. Sect. 3.2). The rejections lead to gaps in the steep areas, depending on the local topography model.
The true errors,ĝ i − g i , for the selection of Fig. 5 are shown in Fig. 6. Comparison with Figs. 1d and 5 indicates that, similar to the formal uncertainties, the actual errors depend on slope. Striking is the case where the GIMP DEM is used. There, the actual errors are comparably high, even in flat areas. This observation concerning the GIMP DEM subtraction is similar for all cell sizes. The use of TanDEM-X DEM or ArcticDEM leads to both low uncertainties and small actual errors, except for the steepest areas. For the cells with a diameter of 1,000 m and less, no apparent slopedependent pattern of true error occurs (not shown). These observations apply for both the error-free and the noisy data set.
The standard uncertainties from RAA estimation are compared with the true errors. Figure 7 illustrates a nearly linear relationship between them for RAA cells with at least 3,000 m diameter, which confirms the visual assessment by Figs. 5 and 6. The standard uncertainties slightly overestimate the true errors.
For the cells with diameters of 500 m to 2,000 m, small standard uncertainties are significantly exceeded by true errors. This is similar for all local topography models, and most pronounced for the GIMP DEM. The difference of this DEM from the simulated topography has a very high effect when small RAA cells are used. Figure 8 represents the color-coded RMSE information for the different combinations of local topography models, cell sizes and the noisy and error-free data sets. The RMSE spans from 0.13 to 0.85 m year −1 . This is a difference of factor 6.5. The RMSE of the thee-parameter local topography model increases with increasing cell size. The  six-and nine-parameter models have the lowest RMSE for cell diameters of 2,000 m and 3,000 m. In contrast, the use of a DEM leads to the best RMSE with large cell diameters. It is striking that the GIMP DEM-reduced RAA shows the worst results among all local topography models. The pseudo-random errors of the noisy data set affect the results and lead to higher RMSE values compared with the error-free data set. The effect of the added noise is strongest for cells with diameters of 2,000 m and less. Asides from this effect, the noisy and error-free data sets show similar results. The best results are obtained with 5,000 m cell diameter and the TanDEM-X DEM.
In conclusion, different combinations of cell size and local topography models can lead to satisfactory height change estimates. In the applied specific constellations, cells with diameters of less than 2,000 m do not cope well with perturbations, such as the additional random errors. Larger cells include more observations in the estimation process and therefore are more capable of determining the desired linear height change estimates. If no DEM is used, high parametrized local topography models should be preferred over a plane-fit model, as long as a sufficient number of observations are available.
The DEM subtraction leads to good results, as long as the DEM is close to the actual topography. As shown in Fig. 3, the ArcticDEM is close to the assumed true topography of the TanDEM-X DEM. Therefore it leads to better results than the GIMP DEM. The importance of finding suitable DEMs concerning resolution and matching time span has already been addressed by Sørensen et al. (2011) and is well demonstrated here.
As the difference between noisy and error-free height change estimates is not significant, and the assumption of noise is assumed to better reflect the actual situation, further analysis uses the noisy results only.

Impact of Variogram Models on Kriging Interpolation
Before interpolation is applied, the influence of different variograms is analyzed. This analysis is done based on an RAA data set with 2,000 m cell diameter and nine topography parameters. HFK with different variogram selections is applied. The selected RAA result leaves sufficient area for interpolation, so that the effect of different variograms on the kriging result can be properly studied.
Fitting a variogram is an essential part of kriging, as different spatial distances, class divisions, weighting schemes and variogram models have to be considered. The sample variograms were calculated in 30 distance classes (cf. Sect. 2.2.2) ranging from zero to the chosen maximum distance. To fit the models, the different classes were weighted with p depending on sample distance h and number of observations per class n (Pardo-Igúzquiza 1999) as Variograms with three options for the maximum distance (10; 50; 100 km) and two options for the analytical model (Gaussian and spherical) are considered, which results in a total of six variogram model options. The choice of parameters for variogram modeling depends on the assumptions on the underlying physical processes. Changes in ice heights proceed on small and large scales. However, due to the radar footprint and the coverage with satellite data, as well as the restrictions of RAA, the actual spatial and temporal resolution is limited. The sample variogram can provide different solutions depending on the scales considered. In the process of interpolation, points are selected out of eight sectors, with a maximum of three points per sector. Therefore, correct modeling of the variogram on short distances (depending on the cell size) is of great importance.
The resulting variograms are illustrated in Fig. 9. The shapes of the sample variograms differ slightly according to the spatial scales. The sharp increase at distances up to 3 km (Fig. 9a) is less pronounced for large maximum distances (Fig. 9c). Further increase of variogram values at more than 70 km can be neglected, as it exceeds the maximum distance between observation points for interpolation. The largest data gap in the RAA results (at the heavily sloped region inland from the grounding line of ZAC) spans an area of approximately 15 km times 40 km. Fig. 9 Variograms of the noisy results with 2,000 m cell diameter and nine topography parameters, calculated for a maximum difference of a 10 km, b 50 km and c 100 km (note the different distance scales). Sample (black) and fitted Gaussian (blue) and spheric (green) variograms are shown For further investigation, HFK was applied to the selected RAA data set using the six differently modeled variograms. Here, the focus is on the differences between the results induced by different variogram models. These analyses show that the effect of the choice of variograms on the final HFK result is negligible. The RMSE values differ marginally (e.g. 0.129 m year −1 to 0.145 m year −1 for the complete interpolated result).
In contrast, the kriging uncertainty is significantly affected by the choice of the variogram, which thus affects the realism of the uncertainty characterization for the interpolation result. The kriging uncertainty should reflect, in a statistical sense, the true error. In Fig. 10 the kriging uncertainty is plotted against statistics of the true error. Additionally, the underlying number of points per ratio is illustrated as relative density. With a maximum distance of 10 km, a nearly linear relationship is obtained. As an exception, for the lowest uncertainty bin, a strong discrepancy is observed between uncertainty and error. This is caused by just five cells, where the RAA standard uncertainty, that is, the uncertainty of the input to HFK, is significantly underestimated.
The meaningfulness of the kriging uncertainty is best achieved for the fit with a maximum distance of 10 km. The performance of the spheric and Gaussian variogram models is comparable. This investigation shows that the variogram affects mainly the kriging uncertainty, and less the interpolation result. Therefore, the variogram modeling for interpolation of height changes should be focused on short distances. For the following kriging, a spherical variogram model is fitted to sample variograms spanning a maximum distance of 10 km.

Interpolation Performance
Interpolation was applied to the noisy data sets of the RAA results using the four interpolation methods IDW, OK, FK and HFK. The process included the calculation of the sample variogram, the fitting of the model variogram, the calculation of weights and finally the interpolation itself.
The RMSE values for the interpolated height changes are illustrated in Fig. 11. In contrast to IDW and OK, FK and HFK change the values of grid cells that have valid observations. Therefore, grid cells that have no valid value before interpolation are marked as "interpolated" and the entirety of the grid cells is marked as "complete", while the statistics of the filtered values are marked as "filtered".
For large cell sizes (4,000 m; 5,000 m diameter) the RMSE of the complete grid is determined predominantly by the RMSE of the filtered values, rather than by the RMSE of the interpolated values, because of the high coverage of RAA results (cf. Fig. 4).
For IDW, the RMSE of interpolation, shown in Fig. 11b, increases with increasing cell size for the parametrized local topography models. The use of DEMs does not lead to a clear advantage of certain cell sizes in interpolation. The complete result (Fig. 11c) shows increasing RMSE with cell size for the parametrized models and decreasing RMSE with cell size for the DEM subtraction. Compared with the underlying RAA result (see Fig. 11a), the RMSE values are comparable or slightly lower. Notably, for small cell sizes, where many grid cells are interpolated, the RMSE of interpolated values is smaller than that for the original RAA results. This indicates that the weighted averaging process of interpolation reduces noise in the RAA results. The best complete results are achieved with 5,000 m cell diameter and TanDEM-X DEM. Similar to IDW, the interpolation by OK (Fig. 11d) performs best for small cells and parametrized models or with the TanDEM-X DEM or ArcticDEM and 4,000 m or 5,000 m cell size. But the RMSE values are higher than for IDW, especially when DEMs are used. FK (Fig. 11f-h) interpolates similarly to OK, but considers a constant error that is used to filter the observation points. This leads to improved RMSE values at these points, as well as an improved complete result compared with OK and IDW. The pattern of RAA RMSE is thereby maintained, leading to the best complete results with TanDEM-X DEM and 4,000 m, 5,000 m cell diameters. For HFK, the filtering of the RAA results improves the RMSE significantly, much more than for FK (cf. Fig. 11i, with Fig. 11a, f). In part, the RMSE decreases by more than 0.3 m year −1 after filtering with HFK (e.g. 500 m cell diameter and GIMP DEM subtraction). The height change rates best reflecting the simulated truth are achieved with cell diameters of 2,000 m to 4,000 m and a local topography correction with six or nine topography parameters or the TanDEM-X DEM or the ArcticDEM. Table 1 shows the RMSE values for the observation points, interpolation and complete results for a chosen example: the noisy data set with cell diameter of 3,000 m and nine topography parameters. The quality of IDW and OK interpolation is similar, with IDW performing slightly better. FK interpolation performance is similar to OK, but the filtering included at observation points improves the complete result. HFK not only filters better than FK, but in many cases is even able to perform better interpolation, leading to significantly improved complete height change results. The accuracy improvement for this example (3,000 m cell diameter and nine topography parameters) is 72% between OK and HFK.
An example of the spatial pattern of the interpolated height changes, differences and standard uncertainties can be seen in Fig. 12. The results of IDW and OK are very similar and show speckled patterns. Their standard uncertainties neglect uncertainties at observation points (value zero) and increase with distance to them. In the interpolated areas, the OK standard uncertainties are generally higher than for IDW, and with less variation. The standard uncertainties are further discussed in Sect. 4.4. FK application leads to a smoother height change result and less error compared with OK. The pattern of uncertainties is similar, but has a fixed value (not zero) at observation points. HFK leads to less error and a much smoother height change result than the other interpolation methods, which is due to the spatially varying filtering. The spatial pattern of the standard uncertainties does not simply reflect the existence of observations, but gives more reliable information about areas with higher uncertainties, which are mainly the sloped regions near the grounding lines of the two glaciers. Additionally, the uncertainties are no longer zero at observation points.
The southwestern corner of the study area is not observed by CryoSat-2 in SARIn mode, but only in LRM mode. The consequent data gap is filled via extrapolation by the different interpolation techniques. OK is not recommended for extrapolation Fig. 12 Resulting height changes (left), true error (center) and standard uncertainties (right) for the noisy data set with 2,000 m cell diameter and nine-parameter local topography model. The interpolation methods are indicated in the top left corner of the left-hand plots because the extrapolated values approach the data mean (Chilès and Delfiner 2012). This is reflected in the corresponding higher standard uncertainties. IDW extrapolates slightly better than OK and FK. Although HFK is based on OK, it performs best in extrapolation as well.
The comparison of different interpolation methods shows that HFK is best suited for application to height changes derived from satellite altimetry. In particular, the filtering improves the results substantially. More simple geostatistical methods such as OK do not necessarily outperform other approaches such as IDW.

Uncertainties
The kriging standard uncertainties of the four interpolation methods are investigated to obtain more information about their reliability. As in Sects. 4.1 and 4.2, the relationship between true error and the standard uncertainties is analyzed. In Fig. 13, this relationship is shown for all cell sizes and interpolation methods. The focus here is on the distinction between the different interpolation approaches. Therefore, the different topography models are not plotted separately.
IDW has a rather linear relationship, except for the observed points, where the RAA value is maintained and the uncertainty set to zero. The standard uncertainties underestimate the errors.
The inconsistency between error and uncertainties of RAA results observed in Fig. 7 for cells with a diameter of 2,000 m and less propagate to inconsistencies for OK, FK and HFK uncertainty estimates. This can be seen in Fig. 13a-c, where no simple linear relationship between uncertainties and errors is visible. For cell diameters of 2,000 m and more, the plotted OK and FK relations are more scattered than those for HFK and IDW. While the uncertainties of OK and FK for cell diameters of 2,000 m and below overestimate the errors, the opposite happens for cell diameters of 3,000 m and larger. Similar to IDW, the observation points of OK are provided with uncertainty value of zero. FK and OK show very similar behavior in the relationship of error and uncertainty analyzed here.
The HFK uncertainties estimated with data based on cell diameter of at least 3,000 m represent the actual errors very well up to approximately 0.25 m year −1 . Higher uncertainties underestimate the errors. The best accordance of errors and estimated uncertainties is achieved with HFK for 3,000 m and 4,000 m cell diameter.
This investigation shows that standard uncertainties should be handled with care. The values at observation points for OK and IDW in particular are not meaningful. HFK improves the reliability of the uncertainty estimate.

Conclusions
To obtain reliable information about the performance of different RAA configurations and interpolation methods, a synthetic data set was created in order to compare the derived height changes with known true values.
It was shown that the RAA results differ depending on the cell size and topography parametrization. In these investigations, the models with six and nine topography parameters lead to good results. The smallest analyzed cell size of 500 m does not cope well with the induced random errors, while cells with a diameter larger than 4,000 m can lead to larger errors than those with smaller cells. The best results using parametrized models are achieved with a cell diameter of 3,000 m and nine topography parameters. The results of DEM subtraction in RAA depend very much on the quality of the DEM; the more the DEM represents the actual sampled topography, the better the results. ArcticDEM and TanDEM-X DEM (assuming true topography), in conjunction with cells of 3,000 m and greater diameter, provide the most accurate RAA results among all options tested. As the agreement of the DEM with the topography is difficult to assess when real data is used, DEM subtraction should be applied carefully. The a posteriori standard errors of RAA are reliable for cells with at least 3,000 m diameter, and can be used for filtering with FK and HFK.
The variograms used for kriging focus on short distances, as this gives the best results for interpolation and reliable uncertainties. The subsequent interpolation was accomplished with IDW, OK, FK and HFK. OK and IDW performed comparably well. The resulting height changes are improved by the filtering included in the FK and HFK algorithm. The best results are achieved by incorporating heterogeneous errors with HFK. Additionally, the corresponding standard uncertainties are reliable, and their spatial patterns reflect actual errors.
In this study, linear height change is the parameter of interest. This is a simplification of the real process, as interannual signals are present. They can be resolved by RAA and included in spatiotemporal interpolation.
Further research based on other regions and satellite missions, especially pulselimited radar data, could expand the applicability of these results. Additionally, the influence of outlier criteria and selection of points used for interpolation can be elaborated. Stacked variograms are another approach to cope with the different behavior apparent at different spatial scales that would be worth considering.
Based on these investigations, HFK can be recommended to achieve full spatial coverage of height changes from satellite altimetry measurements derived by RAA, as the results show the smallest error and least speckle, and provide meaningful and reliable uncertainties.
Agency for providing the altimetry data products. ArcticDEM was provided by the Polar Geospatial Center under NSF-OPP awards 1043681, 1559691, and 1542736. All figures were made with Generic Mapping Tools (GMT) (Wessel et al. 2013). We thank the two anonymous reviewers and the editor for their comments, which helped to improve the manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/ by/4.0/.