The use of B-splines to represent the topography of river networks

This work presents a new extension to B-Splines that enables them to model functions on directed tree graphs such as non-braided river networks. The main challenge of the application of B-splines to graphs is their definition in the neighbourhood of nodes with more than two incident edges. Achieving that the B-splines are continuous at these points is non-trivial. For both, simplification reasons and in view of our application, we limit the graphs to directed tree graphs. To fulfil the requirement of continuity, the knots defining the B-Splines need to be located symmetrically along the edges with the same direction. With such defined B-Splines, we approximate the topography of the Mekong River system from scattered height data along the river. To this end, we first test and validate successfully the method with synthetic water level data, with and without added annual signal. The quality of the resulting heights is assessed besides others by means of root mean square errors (RMSE) and mean absolute differences (MAD). The RMSE values are 0.26 m and 1.05 m without and with added annual variation respectively and the MAD values are even lower with 0.11 m and 0.60 m. For the second test, we use real water level observations measured by satellite altimetry. Again, we successfully estimate the river topography, but also discuss the short comings and problems with unevenly distributed data. The unevenly distributed data leads to some very large outliers close to the upstream ends of the rivers tributaries and in regions with rapidly changing topography such as the Mekong Falls. Without the outlier removal the standard deviation of the resulting heights can be as large as 50 m with a mean value of 15.73 m. After the outlier removal the mean standard deviation drops to 8.34 m.


Introduction
In geosciences, observations are often available either on the two-dimensional (2-D) surface of the Earth or as a one-dimensional (1-D) time series at a specific spatial location, or as a combination of both. B-splines (basic splines) are often used to describe the physical relationships underlying these observations in a purely mathematical model. B-Splines are piecewise polynomial basis functions with limited support and certain continuity properties (Prautzsch et al. 2002).
In the last decades a large number of modifications brought the 1-D B-spline modelling approach to higher dimensions. To be more specific, whereas so-called endpoint-interpolating B-splines allow for modelling a 1-D target function defined within a finite interval (Stollnitz et al. 1995a, b), trigonometrical B-splines are defined on a circle. Consequently, the above mentioned 2-D observations globally distributed on the Earth's surface can be represented by a series expansion in tensor products of endpoint-interpolating B-splines for describing the latitude dependency and trigonometric B-splines for describing the longitude dependency (Schumaker and Traas 1991;Jekeli 2005;Schmidt et al. 2015). As described e. g. by Schmidt (2012), the tensor product approach can even be extended to a representation of a multi-dimensional (n-D) signal. As two examples, the vertical total electron content and the electron density of the ionosphere are represented as a 3-D and a 4-D series expansion (e.g. Durmaz and Karslioglu 2011;Schmidt et al. 2015;Goss et al. 2019); other examples are rainfall interpolation (Gallo et al. 1998), air pollution (Leitenstorfer and Tutz 2007), and magnetic field modelling (Jiang and Zhang 2013) just to name a few.
In other applications continuous quantities are only observable along a network such as a river system or a road network embedded in a 2-D surface. Air pollution in cities, for example, is often only measurable along the street network; the continuously changing water temperature of a river can be monitored only along the same (Laaha et al. 2013). Another example is the topography along river networks which is the topic of this work. The height of the mean water level and the topography of the shore line change continuously along the path of the river, water falls excluded. At river junctions, the height in all branches are continuous albeit more or less independent from each other.
In this paper we expand endpoint-interpolating B-splines to facilitate modelling on a river network. We prove the concept with the estimation of the topography along the Mekong River system from water level observations of satellite altimetry. Also Villadsen et al. (2015) and Bercher et al. (2013) estimated river topography from altimetry observations but only with low degree polynomial functions which is not feasible for a more complicated topography or large river systems. Also, neither of them considered tributaries of the river network. Zakharova et al. (2020) also used B-Splines for the topographic modelling from altimetric observations, although only along a limited stretch of the main river. The here presented B-spline method has already been successfully applied to model the non-constant unknown mean in Universal Kriging (Boergens et al. 2019).
In the sequel, we will first provide an introduction to endpoint-interpolating Bsplines in Sect. 2 followed by our extension to B-splines on directed tree graphs Sect. 3. In Sect. 4 we apply the new method to estimate the river topography for the Mekong River system. The manuscript closes with a discussion and a conclusion in Sect. 5 where also the application of the method to other possible graphs is considered.

Endpoint-interpolating B-splines
The normalised B-spline functions B m k (x) of degree m ∈ N and shift k = 0, 1, . . . , K − 1 depending on the variable x ∈ R 1 are calculable recursively via the relation see e.g., Stollnitz et al. (1995a, b) or (Schmidt 2007). The positive integer value K defines the number of given non-decreasing values t 0 , t 1 , . . . , t K +m+1 , denoted as knots. A B-spline is compactly supported, i.e. its values are different from zero only in a finite range on the real axis. Since B m k (x) = 0 for t k ≤ x < t k+m+1 and B m k (x) = 0 otherwise, this finite range corresponds to an interval [t k , t k+m+1 ].
For regional modelling, i.e. x ∈ [x l , x r ], where x l and x r with x r > x l denote the left and right finite boundary values, we introduce the endpoint-interpolating B-splines of degree m ; see e.g. (Stollnitz et al. 1995a, b). For that purpose we set the first m + 1 knots to the value x l and the last m + 1 knots to the value x r . Hence, the knot sequence for endpoint-interpolating B-splines of degree m is given as where the distances x k = t k+1 − t k for k = m, . . . , K − 1 and m < K between two neighbouring knots could vary. Note that in Eq. (1), the factors are taken as zero if their denominators are zero. As can be seen from Fig. 1, the endpoint-interpolation modifies the first and the last m B-spline functions which have a different shape than the other B-spline functions. In the following we use the endpoint-interpolating B-splines of degree m = 2denoted as quadratic B-splines -for k = 0, . . . , K − 1. Each of the these functions is different from zero only in the sub-interval and thus, called is a localising function. Considering the set of the altogether K Bspline functions the condition holds. Figure 1 shows the K = 6 B-spline functions B 2 k (x) within the interval [x l = 0, x r = 10]. The black dots on the x-axis mark the altogether 9 knots: the first 3 knots t 0 = t 1 = t 2 = x l = 0 at the left boundary, the regular knots t 3 to t 5 as well as the last 3 knots t 6 = t 7 = t 8 = x r = 10. The red-coloured B-spline B 2 3 (x) is different from zero in the interval [t 3 , t 6 ] according to Eq. (4). As can be seen the distances x k = t k+1 − t k for k = 2, . . . , 5 are different, since the knots are non-equally spaced along the x-axis within the interval [0, 10].

B-splines on directed tree graphs
B-splines are well defined in R 1 as was shown in the previous section, as well as to higher dimensions R n for n > 1 as was mentioned in Sect. 1; see also Schmidt (2012). In this paper we show how to use B-splines on a directed tree graph G embedded in a metric space (R 2 , d). We can think of d as the Euclidean or spherical metric. We will demonstrate this using the example of river topologies.
In order to apply our method to the river topography, we model the geographical river shape as a directed tree graph denoted as G.
The river network is usually provided as polygonal chains describing the shore line, i.e. both the left and right river bank are given. In a first step, these polygonal chains are skeletonised (Jiang et al. 2011). The reaches between two confluences are discretised with less than 1 km distance. This yields a planar embedding of a graph G such that vertices of the skeletons correspond to vertices of G. In addition, we introduce vertices for both the upstream ends and the downstream ends of all tributaries as well as for the downstream end of the river. We direct edges in flow direction and obtain a directed, acyclic, planar graph in which each vertex has out-degree 0 or 1. Out-degree is the number of outgoing edges in a vertex, i. e. an out-degree of 0 or 1 means that we do not consider multi-channel and braided rivers. We call such graphs directed tree even though this term is usually used for digraphs which are directed in the opposite way. A vertex with more than one incoming edge is called junction vertex.
To realise B-splines on a directed tree G, the knots t k have to be placed on the branches of G, which is the part of G between two junction vertices. By identifying G with its image under the planar embedding, we define the B-Spline function f : where α k are the B-Spline coefficients. Figure 2 displays a schematic example of B-Splines in the neighbourhood of such a junction. As can be seen in the figure, a B-spline centred below the junction can reach up into the upper branches and vice versa.
The difficulty of applying B-splines on directed graphs arises at a junction as we aim at a continuity of the function f . As explained in Sect. 2 and shown in Fig. 1 the shape and width of a B-spline depends on the distance between the knots. Thus, to ensure a continuous function across a junction the knots have to be placed symmetrically on the two upper branches. As written before, a B-spline centred below the junction reaches in both upstream branches and the distance between the B-spline knots define the shape of the spline. Such a junction crossing B-Spline (e. g. the red spline in Fig. 2) can be either defined by the knots downstream of the junction together with the knots on the left upstream branch or with the knots on the right upstream branch. However, the shape of the B-spline downstream of the junction should be the same for both alternatives. This can only be ensured by placing the knots upstream of the junction symmetrically on both branches. Additionally, a function f (x) (see Eq. 6) modelled with such defined B-splines will be continuous both for the path downstream-left upstream branch and for the path downstream-right upstream branch.
These considerations lead to the following rules for placing the B-Spline knots t k : 1. At the sink, according to endpoint-interpolating B-splines (see Sect. 2); 2. At all sources, according to endpoint-interpolating B-splines as well; 3. At each junction vertex; 4. Along the branches equidistant to the next lower junction vertex with a fixed width.
The distances between each two knots have to be chosen carefully such that enough knots can be placed on each branch. The interval [t k , t k+m+1 ] of a B-spline B m k (x) of degree m = 2 covers m + 2 = 4 knots and thus, at least m = 2 knots need to be placed symmetrically on the two branches. Thus, the distance d k between the knots needs to obey The dense spatial resolution of G allows to place all knots on vertices with a reasonable accuracy.

B-splines for river topography modelling
In this section we will demonstrate the usage of B-Splines on graphs for modelling the river topography including tributaries. The mean river heights in a river basin cannot in general be described with any polynomial function due to a rapidly changing river slope. The topography of the main river and of the tributaries upstream of the confluence have to be modelled separately while ensuring continuity at the junctions.
In the here presented experiment we use satellite altimetry data of various missions to measure the river topography and to estimate the B-spline coefficient for the whole basin. As a test case we chose the Mekong River Basin in South-East-Asia.

Method
For this study the B-spline knots t are automatically placed along the main river and the major tributaries of the Mekong River following the rules above (c. f. Sect. 3). The maximum distance between two knots is 30 km, which we found an appropriate compromise between the river reaches with dense and sparse data coverage. We excluded river reaches upstream of dams, as they constitute a discontinuity of the river topography. This results in 221 knots in the whole river basin. Figure 3(a) displays the knot placement in the whole river basin which is zoomed in to show the placement around two river junctions in Fig. 3(b). Most notably is the smaller distance downstream of the left junction compared to the distance of the knots in the two upstream branches. Whereas, around the right junction the distances between the knots in the two upstream branches are nearly the same to the downstream knot. The such defined knots lead to the definition of B-Splines following Eq. 1, i. e. 221 B-Splines. With these B-Splines we can interpolate any function on the graph G by estimating all B-Spline coefficients α k with a sufficient number of observations of f (x) (c. f. Eq. 6). In the two following experiments, we have around 10 000 height observations for f (x) for which we assume equal uncertainty in a least-square-adjustment.

Synthetic data
For the synthetic data set we simulate water level heights along the Mekong River and its tributaries based on the HydroSHEDS topography model (Lehner et al. 2008). The simulation consists of three steps: First, we choose to simulate the water levels at the same location as the altimetry observations in the second experiment (see Fig. 6). However, the location of the river in HydroSHEDS and the altimetry observation do not always coincide. Thus, for each point we use the minimum height in HydroSHEDS in a 2' by 2' window around the location. These heights are mostly located on the river but still do not form a more or less continuous height profile along the river. Thus, secondly, we employ these heights to estimate a first set of B-Spline coefficients, with which we gain interpolated improved heights at the locations. The such derived heights are taken as the "true" river topography. Unfortunately, this method does not ensure monotonous rising heights along the river.  In the third and last step, we manufacture two synthetic data sets. For the first only noise is added to the true heights to simulate measurement uncertainties. We choose a constant normal distributed noise level with a standard deviation of 1 m. For the second synthetic data set, we add besides the noise an additional annual sinusoidal signal to the heights with a constant amplitude of 5 m. To determine the annual signal, every location gets a random date. It should be noted that the choice of seasonal amplitude has an influence on the quality of the results with higher amplitudes decreasing the quality. However, an amplitude of 5 m is for most parts of the Mekong River basin an upper bound to the amplitudes observed in reality.

Results
The two synthetic data sets, with and without annual signal, are used to estimate the B-Spline coefficients which are then employed to model the river topography. The resulting heights of the data set with annual signal are displayed in Fig. 4 (a). We refrain from additionally showing the results from the data set without annual signal, as differences are not visible at this resolution. In Fig. 4 (b) the differences between the estimated heights and the true heights are shown. Most differences are below 1 m but can be as large as 9 m. The differences tend to be larger in reaches with sparse data coverage and at the upstream ends of tributaries. Besides this, no systematic effect is observable.
Further, we employed quality measures to compare the estimated heights with the true heights: the root mean square error (RMSE), correlation coefficient (ρ), the mean absolute difference (MAD), and both the standard deviation and maximum value (std(AD) and max(AD)) of the absolute differences (AD). The results of this validation is summarised in Table 1. For both synthetic data sets the correlation is nearly perfect.
The RMSE is as low as only 26 cm for the case without added annual signal and about 1 m with added annual signal. The MAD values in comparison are lower showing that a few very large differences distort the RMSE to larger values. The std(AD) of the synthetic data without added annual signal is significantly smaller even while the maximum observed absolute differences are very similar for both synthetic cases. Lastly, we investigate in more detail the height estimation in two river reaches: The Tonle Srepok in the South-East, and the upstream main river reach South of Luang Prabang (see Fig. 3). For these two river reaches, we plot the true heights, the synthetic heights, and the resulting estimated topography for both data sets in Fig.  5. The resulting topography estimated with the synthetic data without added annual signal shows nearly no difference to the true heights in both investigated reaches. On the other hand, the results of the data with annual signal reveals an interesting point. In both reaches some observations close by each other seem to have a similar phasing (Tonle Srepong between 500 km and 530 km and between 750 km and 780 km, upstream between 1580 km and 1625 km) which leads to a uniform shift of the data and, thus, the resulting estimated topography. Further, we can see the continuity of the estimated topography across the river junctions (marked in grey in the figure). Only at the second junction of Tonle Srepok a small step is visible. However, around the junction only few observations are present which might lead to this feature.

Altimetry data
We use river water level height data of the satellite altimetry missions Envisat  -2 (2010-present). The missions Envisat, SARAL, Jason-2, and Jason-3 are so-called short-repeat orbit missions which lead to water level height time series at fixed locations with a temporal resolution of 10 to 35 days but suffering on a sparse spatial resolution. The other missions are either on non-repeat orbits (Envisat EM and SARAL DP) or on long-repeat orbits (CryoSat-2, repeat time 369 days). This leads to a very dense spatial coverage.
All data have been processed with DAHITI (Database for Hydrological Time Series of Inland Waters, (Schwatke et al. 2015)), except for the CryoSat-2 data which were processed following the method described by Boergens et al. (2017). The uncertainty, expressed as standard deviation, of the altimetric observations depends on the width of the river as well as the observing satellite mission and is estimated to be between 10 cm and 1.5 m. Figure 6 displays the spatial distribution of the data in the river basin. Here the dense spatial distribution of the long and non-repeat orbits missions (visualised by black symbols ) compared to the sparse distribution of the short-repeat orbit missions (visualised by red-coloured squares) is visible. Fig. 7(a) shows the resulting estimated topography heights at all points of the topology. The B-Spline modelled topography mostly fits well to the topography model. However, at some points the estimated heights do not correspond with the surrounding heights. Most prominently, many upstream end points of tributaries are not correctly modelled, which is likely caused by missing altimetry data in these reaches. Also in river reaches where only sparse altimetry data is available, especially with only a single observations, the quality of the estimated heights deteriorates (see Fig. 6) as the estimation of α k in Eq. 6 gets numerically unstable. In these parts a denser or more evenly distributed data coverage could improve the results. Additionally, the abrupt height change around the Mekong Falls (13.95°N, 105.94°E) cannot be modelled sufficiently with only the sparse data available. To correctly model such a reach a dense data coverage would be needed and possibly more B-spline knots to enable the modelling of small scale changes. Consequently, in these reaches also the estimated standard deviation of the heights increases ( Fig. 7(b)). The standard deviations are estimated from the a posteriori uncertainties, as no globally reliable data uncertainties are available. Based on these observations we removed points with a standard deviation larger than 25 m and points with less then 50 altimetry observations in their 50 km neighbourhood. However, this leads to gaps in the heights along the river. The result of this procedure  w.r.t. the estimated heights is shown in Fig. 7(c). The mean of all standard deviations is 15.73 m before the outlier removal and 8.34 m afterwards. As in the synthetic data experiment, we also investigate the estimated heights for two river reaches in more detail in Fig. 8. Two things are notably along the Tonle Srepok. First, the dip in river topography around 650 km which is not explained by the altimetric observations. As we saw the same decline in the synthetic data experiment this might be caused by the knot distribution in this region causing unexpected behaviour in the B-Splines. Second, the step around the second junction is visible in the estimated topography, which we also already observed in the synthetic data results. Again, this might be caused by either the knot placement or the missing altimetry data around the junction. It might also be possible, that the river topography in changing that rapidly in this region such that the B-Splines cannot model it sufficiently.

Discussion and conclusions
This study introduces a method which allows to apply B-splines on directed tree graphs. This procedure allows for modelling a continuous quantity along graphs. Most notably, it ensures the continuity of the B-splines around junction vertexes, i. e. vertices with more than one incoming edge. This is established by a symmetrical knot placement on the two incoming edges incident to the same vertex. To allow for a proper modelling of all sources and the sink of the graph we applied endpoint-interpolating B-splines.
We demonstrate the new method to the example of topographic modelling along a non-braided river network. Thus, the knots defining the B-splines have to be placed symmetrically on both river reaches upstream a confluence. The method was successfully tested to estimate mean water levels along the Mekong River system from both synthetic and altimetric water level observations. The synthetic data case allows a validation of the estimated river topography. The validation proves the suitability of the method for topographic modelling. Even if annual water level variations are not removed from the observation, the method is able to estimate an accurate topography. Further, we found in additional investigations that the noise level of synthetic altimetry observations if of minor influence to the quality of the resulting topography. This proves the stability of the proposed method. In the second investigation, we employed altimetric water level observations. Here the main challenge are outliers in the data set which needs to be removed. Further, for both the synthetic and real data set cases the unevenly distributed observation hinder the topography estimation in regions with sparse data coverage. Another problem of real altimetry data can be the systematic biases between different satellite altimetry missions. To this end, we used cross-calibrated corrections provided via OpenADB (https://openadb.dgfi.tum. de/en/) following the method of Bosch et al. (2014) to remove these biases from the altimetry data before hand. Alternatively, one could add an additional bias parameter for each satellite mission in the parameter estimation of the B-Splines coefficients. However, this would imply that the biases remain per mission constant over space and time, which is not necessarily the case.
In this study we only considered B-splines on directed tree graphs, but the question arises if it is possible to extend these ideas to more general graphs.
First, let us again consider a tree shaped graph (i.e. one or three edges in each vertex), but investigate the directionality of the whole graph. As mentioned in Sect. 3 the direction of the directed tree is of no importance for the proposed approach. In the next step, we can removeleave out the directionality altogether. Such an undirected tree graph might , for example, be used to model the density of biological specimens in a river network. So far, the directionality of the graph implies a hierarchy of the edges at each vertex with B-spline knots symmetrically distributed along two of the three edges. This is because B-Splines centred below the junction reach into both upstream reaches, but B-Splines centred upstream only reach into the downstream reach and not the second upstream reach. If the directionality is left out, B-splines centred on any of the three edges reach into both other edges. Thus, the knots need to be placed symmetrically around the vertex on all three edges. With this we know how to place the B-Spline knots locally around each junctionvertex but not how to combine the knot placement around several connected vertices. So far, in the case of the directed graph, GEM -International Journal on Geomathematics (2021) 12 :21 d Fig. 9 Knot placement around junctions in a non-directed graph the non-symmetry of the knot placement on the downstream branch allowed to place the B-Spline knots equidistant on each branch starting on its downstream end. This leads to a possible smaller distance between the last knot below the next junction and the junction (see the knots in Fig. 2). In the case of the non-directional graph this is no longer a suitable solution.
A possibility for a non-directional graph is shown in Fig. 9. With the aid of this example, we will provide you with an ideashow you how to proceed with the knot placement. Around each vertex (large coloured dots) we placed locally equidistant knots (small coloured dots). Let us assume, we choose the same distance d for the equidistant knot placement around all vertices (d will be later defined). In the case of an edge length larger than four times d (e. g. edge between red and yellow vertices) the B-Spline knots are simply placed symmetrically around the red and yellow vertex. The remaining space on the edge is then filled up with as many knots as needed ( Fig.  9 black dots). In the case of an edge length of exactly four time d (green and yellow vertices) we again place the knots symmetrically around each vertex. However, this time, one knot overlaps from each vertex (the green-yellow dot in the example). For even shorter edges, it is also possible to have two overlapping knots. This leads us to the definition of d which has to chosen such, that all edged can be divided in three or four d long pieces plus an optional remaining piece. Theoretically, it is not necessary to use the same d around each vertex although this simplifies the automatisation of the method.
The same idea can be even further expanded to cyclic graphs. As long as the knots are place symmetrically around the vertices the proposed method can be applied.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.