5.1 Introduction to River Networks

What does a tree and a river have in common in structure? The answer is rather obvious: a river’s skeleton on the map is a tree-like structure. But why rivers are tree-like in structure?

Suppose we need to send supply of water from many places (the black dots in Fig. 5.1) lying within an extended area to a well (the red dot at the lower left corner). Different strategies may be devised to achieve this goal. The following example manifests that a tree-like structure is an efficient structure, the most efficient in fact, to convey matter (or energy) from an extended source to a single outlet.

Fig. 5.1
figure 1

Three structures with different efficiency in transferring water

Strategy 1 (Fig. 5.1a): each point sends its share to a selected nearest-neighbor according to a given pattern (e.g. Hamiltonian walk). This is a globally efficient system, as the shipment is carried out according to an ordered path which does not allow for superposition that would lower the efficiency of the system. On the other hand, it is also locally inefficient as, for instance, water from point immediately at the north of the destination could flow along a direction much shorter southward path without making a long detour.

Strategy 2 (Fig. 5.1b): each point individually chooses the shortest route, which makes the system locally efficient, but globally highly costly as the total distance traveled by the entire system is much greater than that in the strategy 1, due to the many superpositions (dashed as well as solid lines have been used to enhance clarity).

Strategy 3 (Fig. 5.1c): A reasonable trade-off between the two opposite strategies is provided by the tree-like structure shown in Fig. 5.1c where each point attempts to find the shortest path but repetition is avoided through a hierarchical construction, so that a global efficiency can be achieved. A quantitative mathematical proof of the above statement can be obtained by showing that a tree-like structure is the one minimizing both the average individual path and the total traveled distance. In this respect, a river network is obviously a two-dimensional projection of the three-dimensional tree-like landscape morphology, not necessarily (although frequently) related with the actual river flowing on that landscape in real geographic space, which is composed of the collection of all the paths formed by every tributary of the main river in its drainage basin.

Rivers are usually irregular networks on maps (Fig. 5.2) and are viewed as the skeleton of the terrain (Chen et al. 2012) and play important roles on multiscale representations on maps (Li et al. 2010; Génevaux et al. 2013). Recently, Attentions have payed to geographical features of river networks in the process of map generalization (Ai et al. 2006; Stanislawski 2009; Buttenfield et al. 2010) which follows the idea “generalization is not a mere reduction of information–the challenge is one of preserving the geographic meaning” (Bard and Ruas 2005). Therefore, this chapter emphasizes on discussing the approaches for describing and generalizing river networks, but ignores the factors, such as topography, soil and bedrock influencing the patterns of rivers networks that have been studied in detail for decades (Twidale 2004; Charlton 2007; Ritter 2016).

Fig. 5.2
figure 2

River system: irregular networks on maps

5.2 Descriptions of River Networks

Water bodies on the Earth include natural and manmade objects such as rivers, lakes, seas, wells, fountains, reservoirs, ponds, ditches etc. Because rivers are the terrain lines or character lines that “control” the other features on maps, only rivers are selected and how to represents and describes river networks will be addressed here.

A river basin consists of its mainstream and tributaries. Rivers on maps can be classified into four categories in the light of their geometric patterns, i.e. tree-like, grid-like, feather-like, parallel, radial, fan-like, comb-like, net-like, ring-like and centripetal (Fig. 5.3).

Fig. 5.3
figure 3

Various patterns of river networks on maps

Tree-like river basin: the mainstream and its tributaries compose a tree-like graph. This is the most general kinds of rivers, for example, River Guijiang and River Yujiang in Guangxi Province, China.

Grid-like river basin: the mainstream and the tributaries perpendicularly intersect with each other and form the grid-like graph. A typical example of grid-like river basin is the Minjiang River in China.

Feather-like river basin: the tributaries are evenly distributed at the two sides of the mainstream, and the graph of the river on the map looks like a feather.

Parallel river basin: the tributaries are approximately parallel and afflux into the mainstream.

Fan-like river basin: the outline formed by the mainstream and the tributaries is like a fan, for example Haihe River, Yongdinghe River.

Comb-like river basin: most of the tributaries are at one side of the mainstream but few or none tributaries are at the other side which makes the graph of the river basin like a comb.

Radial river basin: such river basins usually appear around the volcanoes.

Centripetal river basin: they appear in depressional areas.

In literature, river basins on maps can be described using measures and by special methods for specific purposes which will be addressed in the following sections.

5.2.1 Measures and Parameters

Many measures and parameters have been developed for geometrically describing rivers.

  • Density of the river network

It can be calculated by Formula 5.1.

$$ \mathrm{D}=\frac{\sum L}{F} $$
(5.1)

Where, D is the density of the river network; ∑L is the total length of the tributaries and the mainstream of the river network; and F is the area of the river basin.

  • Development coefficient of the river basin

It is a ratio between the total length of the tributaries and that of the mainstream.

$$ {M}_{coe}^i=\frac{\sum {l}_i}{L_i} $$
(5.2)

Where, \( {M}_{coe}^i \) is the development coefficient of thei th order tributaries of the river network; L iis the total length of thei th order mainstreams of the river network; and l i is the total length of the i th order tributaries of the river network.

The greater \( {M}_{coe}^i \), the greater the total length than that of the mainstream, the more developed the river is.

  • Nonuniform coefficient of the river basin

It is the ratio of the total length of the tributaries on the left side of the mainstream and that on the right side of the mainstream. It can be calculated by Formula 5.3.

$$ {B}_{coe}=\frac{\sum L}{\sum R} $$
(5.3)

Where, B coe is the nonuniform coefficient of the river basin; ∑L is the total length of the tributaries on the left side of the mainstream; and ∑R is the total length of the tributaries on the left side of the mainstream.

The greater the difference between B coe and 1, the more unbalanced the tributaries at the two sides of the mainstream.

  • Ratio of the lakes/marshes

It is the ratio of the area of lakes/marshes and the total area of the river basin.

$$ {M}_{ratio}=\frac{\sum M}{F} $$
(5.4)

Where, M ratio is the ratio of the lakes/marshes in the river basin; ∑M is the total area of the lakes/marshes; and F is the total area of the river basin.

  • Length of the river segment

Rivers on digital maps are usually expressed using coordinates, and therefore the length of each river segment can be calculated by Formula (5.5).

$$ \mathrm{d}=\sum \limits_{i=1}^n{\left[{\left({x}_{i+1}-{x}_i\right)}^2+{\left({y}_{i+1}-{y}_i\right)}^2\right]}^{1/2} $$
(5.5)

where, d is the length of the river segment; and (x i,y i) is the coordinates of the i th point on the river segment.

  • Average length of the river segments of the river basin

It is a statistical value. If a river basin consists of m river segments, its average length of the river segments can be calculated by Formula 5.6.

$$ \overline{d}=\left(\sum \limits_{k=1}^m{d}_k\right)/\mathrm{m} $$
(5.6)

Where, \( \overline{d} \) is the average length of the river segments of the river basin; and d kis the length of the i th river segment.

  • Angle between two river segments

About 88% of the angles between river segments are acute angles (De Serres and Roy 1990), which statistically tells the characters of river segments at intersections. Such angle values can be used in river classification, for the angles of different classes of rivers are different, e.g. the angles of tree-like rivers are acute and that of the grid-like rivers are approximately right angles.

  • Meandering coefficient

It is a value used for describing the curvature degree of the linear river. To be simple, the meandering coefficient of a river can be the ratio between the total length the river and the distance of the straight line from the start point to the endpoint of the river.

  • Flow direction of the river

The flow direction of the river indicates the terrain of the river basin. The flow direction of the mainstream is the predominant direction which indicates the tendency of the river network. Different categories of rivers can be differentiated according to their flow directions, for example, the flow directions of the mainstreams of a parallel river system are similar, and that of their first order tributaries are also similar.

  • The number of the classes of the river basin

It means the number of the classes of the tributaries and the mainstream of the river basin.

  • Interval of the river segments

It is the distance between the two adjacent river segments. This parameter indicates the density of river segments in an area. This value is usually obtained by measuring the distances of the river segments at the same side of a mainstream. It may be either the distance between two approximately parallel river segments, or the distance between two non-parallel river segments. Apparently, the latter is more common in the geographic space and more difficult to calculate.

  • Watershed and catchment area

The watershed of the river basin is the area that the tributaries flow through.

The catchment area is the area of the watershed.

5.2.2 Methods for Describing River Networks

A measure or a parameter only gives a simple description of river networks, indicating a single aspect of the river, while a method that will be addressed below usually can provide systematic and comprehensive descriptions of rivers.

5.2.2.1 Information Entropy of River Networks

Informational entropy of river networks that is consistent with the definition given by Shannon (1948) in the information theory has been defined by Fiorentino and Claps (1992) in Formula 5.7.

$$ \mathrm{s}=-\sum \limits_{\delta =1}^{\varDelta }{P}_{\delta}\ln {P}_{\delta } $$
(5.7)

where, s is the information entropy; and P δis the relative probabilities related to a given state δ.In the definition of informational entropy of a river network, the network is considered as a system in which stream segments (links) are the elements whose placement characterize the system configuration. A network link is the path connecting two junctions and the topological distance from the outlet, i.e. the number of consecutive links forming the shortest path from its upstream node to the outlet, corresponding to the state δ in which the link is placed (see Fig. 5.4). The total number of states is the network topological diameter Δ, corresponding to the maximum topological distance from the outlet. If one disregards the length of links, the network configuration is completely determined by the topological width function which is the diagram of the relative frequency P δof the links as a function of the topological distance δ. The maximum entropy for a given topological diameter is attained with uniform width function (i.e. same number of links at each topological distance) and is expressed as S max =  ln Δ.

Fig. 5.4
figure 4

Topological levels of river networks

Fiorentino et al. (1993) found that the informational entropy of river network varies with the regularity of network characteristics, particularly Horton order, magnitude and topological diameter. They expressed the entropy using fractal plan trees. Fractal plane trees are fractal (self-similar) objects, that can be defined as “having a shape made of parts similar to the whole in some way” (Mandelbrot 1986). Fractal objects result from repeated generations, starting from an initiator (i.e. a set of segments) and using a generator (i.e. a different set of segments). In the construction of the fractal set, the initiator is firstly substituted with the generator, and then each segment of the generator becomes an initiator and is substituted again in a recursive way (e.g. Feder 1988, p.16).

5.2.2.2 Fractal Dimensions of River Networks

Ever since Mandelbrot (1983) coined the term “fractal”, there has been speculation that river networks are fractals. Fractals provide a mathematical framework for treatment of irregular, seemingly complex shapes that display similar patterns over a range of scales. Many objects in nature possess a property called statistical self-similarity. This may be defined as invariance of the probability distributions describing the object’s composition under simple geometric transformational change of scale. Tarboton et al. (1998) deem that river networks fall into this class of geometric objects and that the fractal dimension characterizing the self-similarity of river networks is close to 2.

Hydrologists and geomorphologists have speculated on the fractal nature of river networks in the light of their indirect empirical evidence. One of the typical examples is Gray’s method (1961) that tells the relationships between mainstream length (L) and basin area (A), presented in Formula 5.8.

$$ \mathrm{L}=1.4{A}^{0.568} $$
(5.8)

Mandelbrot (1983) speculated that river networks are fractal and got the fractal dimension D = 1.2 using Formula 5.8, taking the exponent as 0.6 but not 0.568. If individual rivers rather than river networks are concerned, D = 0.568 × 2 ≈ 1.1.

Mandelbrot (1983) also studied some fractal geometric patterns that resemble river networks and got that the fractal dimension of these individual lines is 1.1, too. However, if the complete network patterns which can be viewed as the models of river networks are studied, he concluded D = 2 by the following reasoning.

If a line’s shape (e.g. a coastal line or a stream) is considered, measure the length of the line (say, L) using a divider or a ruler whose length is r.

$$ \mathrm{L}=\underset{r\to 0}{\lim }N.r\ \mathrm{or}\ \mathrm{N}\approx \mathrm{L}.{r}^{-1} $$
(5.9)

Where, N is the number of divider steps.

It is apparent that the less r which means the shorter the ruler, the more accurate L and N. However, Formula 5.9 are not convergent, because the implied exponent of r is 1.

If the implied exponent can be a fraction, a measure F which is independent of r can be obtained,

$$ \mathrm{F}=\mathrm{N}.{r}^D= constant $$
(5.10)

Where, D > 1 is called the fractal dimension by Mandelbrot (1983).

This implies:

$$ \mathrm{N}\sim {r}^{-D} $$
(5.11)

or equivalently

$$ \mathrm{L}\sim {r}^{1-D} $$
(5.12)

Based on Mandelbrot’s work, Lovejoy et al. (1987) proposed another technique to estimate the fractal dimension of river networks by functional box counting. It works on a set of points embedded in a d = 2 dimensionalspace. Cover the space with a mesh of d-dimensional cubes of size r d. A relationship of the number (N(r)) of cubes that contain elements of the set is as follows:

$$ \mathrm{N}\left(\mathrm{r}\right)\sim {r}^{-D} $$
(5.13)

This formula is based on a definition of fractal dimension given by Hentschel and Procaccia (1983) as

$$ \mathrm{D}=-\underset{r\to 0}{\lim}\underset{m\to \infty }{\lim}\frac{\log N(r)}{\log r} $$
(5.14)

Where, m is the number of points in the set. The evidence given by Tarboton et al. (1998) strongly supports that the river network is space filling with D = 2.

The above fractal theory is closely related to Horton (1945) laws of network composition.

In Horton’s laws, the length and bifurcation ratios are usually stated in terms of Strahler's (1952) ordering scheme. Source streams are at the first order. If two first-order streams join, they become second order; and generally, if two streams of equal order (say, the order equals k) merge, a stream at order k+1 is formed. When a lower and a higher order streams join, the successive stream retains the order of the higher-order stream. The set of empirical laws collectively referred to as Horton’s laws include two formulae regarding to bifurcations (i.e. Formula 5.15) and lengths (i.e. Formula 5.16) of river networks:

$$ {R}_b=\frac{N_{\omega -1}}{N_{\omega }} $$
(5.15)
$$ {R}_l=\frac{L_{\omega }}{L_{\omega -1}} $$
(5.16)

Where, N ω is the number of the streams of order ω. L ω is the mean length of the streams of order ω.

Formulae 5.15 and 5.16 reveal the geometric-scaling relationships in river networks, since they hold no matter at what order or resolution human beings view the networks. If people regard a channel network as the paths where waterflows, people may imagine, if the resolution becomes higher and higher and the orders of the streams become lower and lower, they are literally looking at flows among the grass roots. Viewed this way, the limiting channel network is a fractal with properties governed by R b and R l.

Starting from Horton’s laws, LaBarbera and Rosso (1987) found the fractal dimension of river networks may be calculated by

$$ \mathrm{D}=\max \left(\frac{\log {R}_b}{\log {R}_l},1\right) $$
(5.17)

This formula requires that Horton’s bifurcation and length ratios hold exactly at all scales in the network. The result is obtained by considering the limit of this series which converges for R b < R limplying D = 1. However, if R b > R l, the series diverges, and the total length of channel networks follows

$$ \mathrm{L}\sim {S}^{1-\left(\log {R}_b/\log {R}_l\right)} $$
(5.18)

Where, the resolution of observation of the networks is taken as the length of first-order streams, and

$$ \mathrm{D}=\frac{\log {R}_b}{\log {R}_l} $$
(5.19)

To sum up, river networks can be viewed as fractal, and existing techniques in estimating fractal dimensions all tend to indicate that the fractal dimension of river networks is 2. This is consistent with the fact that river drain the entire catchment basin and thus space filling, which provides a fundamental link between Horton’s ratios. It is also worth noting that the random topology model proposed by Shreve (1967) is support D = 2, because it gave average values R b = 4 and R l = 2. The view of river networks as fractal with D = 2 therefore provides a description of the scaling of river networks that is consistent with classical fluvial geomorphology and the popular random topology model.

5.2.2.3 The River Tree

It is common to view rivers on maps as trees because of their dendritic structures; thus, many scholars describe rivers using so-called tree structure from computer sciences (Zhang and Quan 2005; Zhang 2006). The following presents an algorithm for constructing river trees, which is an improved version of the method proposed by Zhang and Quan (2005).

  • Step 1: organize the river data to form a direction graph

The river segments are viewed as the edges of the graph and the starting and ending points of the river edges are viewed as vertices. By this way, the river can naturally be regarded as a directed graph. Each edge of the graph contains two directed edges, with one’s direction being the same as that of the river and the other’s direction being counter that of the river.

The river is divided into river segments and re-organized to form a directed graph. The relations among the vertices and the edges are also recorded (e.g. topologically adjacent relations among edges, joint relations among edges and their starting and ending points). Meanwhile, the estuary (i.e. river mouth) of the river is marked and saved.

  • Step 2: determine and re-organize the directions of the river segments

The river data are usually collected starting from upstream to downstream; however, it is not always the case. Thus, it is necessary to judge the directions of the river segments. If the direction of a river segment is saved inconsistently to that of the river, its vertices should be reversed.

  • Step 3: Construct the river entities

The purpose of this step is to link the river segments to form a river entity.

Two rules that should be abided by in this step are: (1) the river segments with the same river name should be merged into the same river entity; and (2) two river segments with different flow directions may not be linked.

Nevertheless, not all river data have river names in practice. Hence, if the river segments have names, they are linked to form entities by their river names; otherwise, their geometric characters should be considered. Because the longer a branch is, the more probably it might be viewed as the mainstream of the river, calculations of the lengths of the river segments need to be done firstly. Then, (1) search and find the origin of the river from the river segments and clearly mark the starting point and the original river segment (it is named ‘current river segment’) of the river entity. (2) Start from the current river segment to find the next river segment that can be linked. Check the degree of the ending vertex of the current river segment. If the degree is 1, the river entity is completed and this procedure can be ended. If the degree is 2, the current river segment has only one river segment at its downstream. So, link them and take this river segment at the downstream as a new ‘current river segment’. If the degree is greater than 2, the current river segment has one river segment at its downstream and more than one river segments at its upstream. In this case, if the river segments have names, they can be linked according to their names (i.e. the river segments with the same river names can be merged); otherwise, link the current river segment with the longest one of the river segments at upstream and take the longest river segment as the new ‘current river segment’ (see Fig. 5.5). (3) recursively execute the procedure in (2) until all river segments are marked.

  • Step 4: Differentiate between right and left branches of the river tree

Fig. 5.5
figure 5

Construction of the river entity from upstream to downstream

Suppose that the mainstream and the two tributaries are saved in a spatial database, including their vertices and flow directions, the following method can be used to tell apart the right and the left tributaries by comparing the three angles α,β and γ, and any three angles in [0°, 360°]. If one of the two tributaries are considered, we have

  • α:the angle starting from the positive direction of the x axis and counter clock-wisely rotating to the opposite direction of the tributary.

  • β:the angle starting from the positive direction of the x axis and counter clock-wisely rotating to the opposite direction of the upstream of the mainstream.

  • γ:the angle starting from the positive direction of the x axis and counter clock-wisely rotating to the downstream of the mainstream.

If (β − γ) > (α − γ)the tributary is at the right of the mainstream; otherwise, the tributary is at the left of the mainstream.

Figure 5.6 presents the method for telling apart left and right tributaries of mainstreams.

Fig. 5.6
figure 6

Four cases for demonstrating how to determine the left and right tributaries of the mainstream

5.2.2.4 Topological Order of Rivers

The topological order of rivers, also called stream order or waterbody order, is a positive whole number used in geomorphology and hydrology to indicate the level of branches in a river system. Various approaches to the topological ordering of rivers or sections of rivers have been proposed based on their distances from the source or from the confluence (the point where two rivers merge) or river mouth, and their hierarchical positions within the river system. Topological ordering of the hierarchical relations of river patterns is consistent with multi-scale representation of spatial phenomena and is useful in river trees/networks description and generalization (mainly selection); thus, the following paragraphs present a number of approaches for ordering rivers.

5.2.2.4.1 Hack’s Stream Order

Hack’s stream order (Hack 1957) is also called the classic stream order or Gravelius’ stream order. It is a bottom-up hierarchical ordering approach and works as follows (Fig. 5.7):

  • Step 1: the river segment (say, r) at the river’s mouth at the sea (the main stem) is allocated the number a = 1.

  • Step 2: each direct tributary of r is assigned the number a+1.

  • Step 3: take each direct tributary of r as a new r and repeat Step 2 until all tributaries of the river are assigned numbers.

Fig. 5.7
figure 7

Hack’s stream order approach

This stream order starting at the river’s mouth indicates the river’s place in the network. It is suitable for general cartographic purposes. It apparently can get the main stream of the river and easily get the topological relations among the river segments and main stream (Fig. 5.8).

Fig. 5.8
figure 8

Strahler’s stream order approach

5.2.2.4.2 Strahler’s Stream Order

This top-down order system was devised by Strahler (1952, 1957). It works as follows:

  • Step 1: each of the outermost tributary is assigned the order number 1;

  • Step 2: when two streams with the order numbers b and c merge, if b = c,the resulting stream is given an order number b+1; If two streams with different order numbers merge, the resulting stream is given the higher of the two numbers.

  • Step 3: recursively execute Step 2 until all river segments are assigned order numbers.

Strahler’s order approach is designed for the morphology of a catchment and forms the basis of important hydrographical indicators of its structure, such as bifurcation ratio, drainage density and frequency. It is scale-dependent. The larger the map scale, the more orders of streams may be revealed. A general lower boundary for the definition of a “stream” may be set by defining its width at the mouth or, by reference to the map, by limiting its extent. The system itself is also usable for small-scale structures.

5.2.2.4.3 Shreve’s Stream Order

The Shreve system is also a top-down order system proposed by Shreve (1966). It works as follows (Fig. 5.9):

  • Step 1: each of the outermost tributary is assigned the order number 1;

  • Step 2: when two streams with the order numbers b and c merge, the resulting stream is given an order number b+c.

  • Step 3: recursively execute Step 2 until all river segments are assigned order numbers.

Fig. 5.9
figure 9

Shreve’s stream order approach

Shreve’s stream order approach is preferred in hydrodynamics: it sums the number of sources in each catchment above a stream gauge or outflow, and correlates roughly to the discharge volumes and pollution levels. Like the Strahler’s method (Strahler 1952, 1957), This approach is dependent on the precision of the sources included, but less dependent on map scale.

5.2.2.4.4 Horton’s Stream Order

Horton presented his approach to stream ordering (1945) based on the idea “the stream order is a measure of the position of a stream in the hierarchy of tributaries”. As show in Fig. 5.10, in a given map of certain scale, the first order streams are those which have no tributaries, and the second order streams are those which have tributaries only first order streams and second order streams and so on. Horton’s original system is somewhat more complex than this, in that, the streams of maximum order in the drainage basin is determined as in the extended back to its furthest source. In other words, the largest stream of the basin will be given the number of highest order.

Fig. 5.10
figure 10

Horton’s stream order approach

Horton’s scheme of ordering of rivers is difficult tedious and time consuming because it involves double phases of classification and re-classification at several times. During the second phase of renumbering some finger-tip tributaries are upgraded and other are left.

To sum up, Hack’s ordering approach assigns a dimensionless numerical order (i.e. 1) starting at the mouth of a stream which is the lowest elevation point. The vector order then increases as it traces upstream and converges with other smaller streams, resulting in a correlation of higher order numbers to the higher elevation of headwaters (Hack 1957). Horton’s idea based on vector geometry (1945) is a reversal of Hack’s approach, and his approach is much complex in practice; thus, Strahler (1952, 1957) and Shreve (1966) proposed two modified versions to Horton’s method. Both Horton and Strahler methods established the assignment of the lowest order (i.e. number 1) starting at the river’s headwater, which is the highest elevation point. Whereas Hack’s order number assignment correlates to height and elevation and traces upstream, Horton’s, Strahler’s and Shreve’s stream ordering methods correlate to gravity flow and trace downstream, and they rely on principles of vector point-line geometry, and their rules form the basis of programming algorithms which interpret map data as queried by Geographic Information Systems.

5.2.2.5 Describing Rivers by the Graph Theory

Rivers on maps can generally be represented by connected lines and form linear networks. Because of the complexity of such networks, i.e. the approximations of drainage basins on the earth, their metric aspects such as length/width of river segments can be disregarded, the patterns of junctions may be paid more attention to in the communities of cartography and geographic information sciences.

If the metric aspects of the rivers on maps are ignored and only the connections and patterns are taken into accounts, one is focusing on “topology” of rivers. Because, in mathematical topology, the geometric figure of a drainage basin on the map consisting of patterns of connected lines on a plane (i.e. the map) forms a plane graph, the following section presents the method for applying the Graph Theory to represent drainage networks.

A plane graph consists of vertices and edges connecting the vertices. The edges can be either directed or nondirected. In a river network, the vertices are represented by the junction points and the initial points of the first-order streams. The edges are the river segments between junction points.

Assume that a plan graph (i.e. a river network) has n vertices (v 1,v 2,…,v n) and m edges (e 1,e 2,…,e m). It is possible to describe the graph by an associated matrix A with m × n elements. An element a ij gives a quantitative expression from v i vertex to vertex v j.

Instead of the associated matrix, the graph can also be described by an incidence matrix B. The matrix element b ij is

  • +1, if vertex v i is the beginning of edge e j;

  • −1, if vertex v i is the end of edge e j;

  • 0, if vertex v i is not on e j.

The above two matrices can be used to describe the topology of river networks.

Except for islands and deltas, a river network is a special graph with no circuits. It has a “root” point (river mouth) and a number of pendant vertices (the beginning points of the lowest order river segments). Each edge is directed; from one pendant vertex to another there is one and only one path. Thus, such a river network, in mathematics, is an arborescence (or a tree). However, it should be noted that ‘trees’ are not necessarily representative of river systems because the flow direction is not specified on the edges and ‘aqueducts’ may occur (Scheidegger 1967).

5.3 Fundamental Principles for River Network Generalization

Besides a number of printed guidelines and regulations, the generalization of river networks/trees should abide by many rules and principles.

  1. 1.

    Original and target map scales are two key factors in map generalization. They determine to what extent the river networks should be simplified.

  2. 2.

    The purpose of the target map should be considered in the process of map generalization. The target maps of different purposes mean the maps are for different readers and/or users, and the rivers might be expressed differently.

  3. 3.

    The characteristics of mapped geographical region should be considered in which sometimes determines rivers selection rules and criterion. For example, rivers in southern China are very common and are regarded as ordinary natural features in the process of map generalization. However, in northern China’s arid areas such as Gansu Province, rivers become rare and therefore precious and should be retained on the target maps as much as possible.

  4. 4.

    For the mapping areas with higher density of river networks, the criterions for river segments selection should be more rigorous than the areas with lower density of river networks so that the number of river segments selected can be moderately reduced; on the contrary, for the mapping areas with lower density of river networks, the criterions for river segments selection should be lowered so that the number of river segments selected can be moderately increased.

  5. 5.

    Density comparison of river segments among the mapping areas should be kept in the process of river segments selection, which means the areas with higher density of river segments should still have higher density before and after map generalization.

  6. 6.

    Although some special rivers do not meet given selection criterions, they still should be retained on the target map (e.g. a small river that flows into or out of a river, a small river that connects ponds or/and lakes, a small river that direct flows into the sea, a river in an arid area).

  7. 7.

    Although a river meets the selection criterions and its distance with a larger river on the target map is less than 3 mm, very closed, this river is generally deleted.

  8. 8.

    When the lines of the river are simplified, the basic shapes of the curvatures should not be changed.

  9. 9.

    Different criterions should be applied to generalize different types of drainage basins.

  10. 10.

    A river network should be kept as similar as possible before and after generalization.

5.4 Approaches for Generalizing River Networks

5.4.1 Selecting River Segments by Indices

Several models for river segments selection have been proposed, mainly including the linear regression model, the multiple regression model and the Radical Law (He 2004). They are conditional models, and each one has its advantages and disadvantages, and different models should be selected to generalize rivers in light of map scale, purpose of the resulting map, characteristics of the mapping area etc.

5.4.1.1 The Linear Regression Model

The basic index used in the linear regression model for river segments selection is density of river network. This index can be calculated by

$$ \mathrm{D}=\frac{\sum L}{F} $$
(5.20)

Where,D is the density of the river network; ∑L is the total length of the river segments in the river basin; F is the area of the river basin.

It is apparent in the geographic space that the greater the density of a river network, the greater the number of the rivers in the river basin. In other words, the density of a river network is correlated to the number of rivers:

$$ {N}_0=\frac{N}{F} $$
(5.21)

Where, N 0 is the number of rivers in unit area of the river basin; N is the number of rivers in the river basin.

It is easy to get the number of rivers of a river basin, the area of the river basin and the total length of the rivers in the river basin by the source map or the source database, then D and N 0 can be got, and criterions for selecting river segments in map generalization may be determined by them.

5.4.1.2 The Multiple Regression Model

The multiple regression model determines if a river segment can be retained on the resulting map by the number of rivers and the length of rivers in a unit area of the river basin. It may be expressed by a formula:

$$ \mathrm{y}={b}_0{x_1}^{b_1}{x_2}^{b_2} $$
(5.22)

where,y is the criterion for selecting river segments; x 1 is the number of rivers in a unit area on the source map; x 2 is the length of rivers in a unit area on the source map; and b 0, b 1 and b 2 are undetermined coefficients.

5.4.1.3 The Radical Law

The Radical Law or the Law of Selection (Töpfer and Pillewizer 1966) is also called the Square Root Model which gives a formula for calculating the number of features that should be retained on the resulting map:

$$ {n}_F={n}_A\sqrt{{\left(\frac{M_A}{M_F}\right)}^x} $$
(5.23)

Where, n A is the number of the rivers on the source map; n F is the number of the rivers on the resulting map; M A is the denominator of the source map scale; M F is the denominator of the resulting map scale; and x is a coefficient that can be calculated by

$$ \mathrm{x}=\frac{2\ln \frac{n_F}{n_A}}{\ln \frac{M_F}{M_A}} $$
(5.24)

5.4.2 Selecting River Segments by the River Tree

Zhang (2006) proposed a river selection method using the river tree that has been discussed in a previous section of this chapter. This method can preserve density relations among different parts of a river network before and after map generalization.

Zhang’s method (2006) is based on an improved rule for encoding river segments, which is called “branch rule”:

$$ \mathrm{G}={N}_b+1 $$
(5.25)

Where, G is the grade of a river R; and N b is the total number of the branches of R.

Here, a river refers to a river entity but not a river segment. The branches include R s branches at all grades but not only its direct branches. By this way, the more the branches a river have, the higher grade the river is. In this sense, the branch rule can tell grades of rivers. On the other hand, as far as a river with a specific length is concerned, the greater the number of the branches, the denser of the river network. In this sense, the branch rule can tell density of river networks.

The reason that the branch rule is used in river selection is because it has the following advantages compared with other rules including the rules proposed by Horton (1945), Strahler (1952, 1957), and Shreve (1966) (Fig. 5.11). Horton’s and Strahler’s orders express the hierarchy of river entities or river segments and can tell the position of each river entity or river segment in the river tree. Horton’s approach takes river entities as units and Strahler’s approach takes river segments as units in ordering river networks. Horton’s and Strahler’s approaches do not increase the grade of the new river segment/entity when two river segments/entities merge, and they can tell neither the differences of river network densities nor the differences of the branch numbers. Although Shreve’s ordering approach constructs river trees and tells the differences of the branch numbers belonging to each river segment, the differences of the branch number are at the beginning of the river segment but not between the origin of the river segment and the river mouth; thus, the ordering results by this approach do not show the river density. The branch rule takes river entities as the units in ordering (this is similar to Horton’s approach), and when two branches merge, the grade of their main stream increases (this is similar to Shreve’s approach). The branch rule tells the total number of the branches merged into the main stream from the river origin to the river mouth, and the differences of the number of branches in unit distance can indicate the differences of the density of the river network.

Fig. 5.11
figure 11

Comparison among the four river ordering approaches (a) Horton (b) Strahler (c) Shreve (d) Branch

Traditionally, river segment selection is a complicated process which considers a number of factors such as river length, river density, and river grade, and these factors are usually mixed used which makes design and implementation of river selection algorithms difficult. Hence, if these factors can be combined to form a complex index, river selection algorithms might become simpler to design and easier to implement. It is a fact in manual cartography that rivers are hierarchically selected, i.e. the mainstream of the river basin is selected firstly, and then the mainstreams of the sub-river basins are selected, and so on. Here, river length is regarded as the most important factor, and density of river network and river grade are regarded as assistant factors, and they are used interactively in river selection.

To simple this situation in river selection algorithms, Zhang (2006) proposed a complex index which integrates the above factors into a formula:

$$ \mathrm{I}=\mathrm{L}{\left(\frac{G}{L}\right)}^{\alpha}\times {\left({H}_m-h+1\right)}^{\beta } $$
(5.26)

Where,I is the complex index of the river;L is the length of the river; G is the grade of the river; H m is the maximum hierarchy of the river; h is the number indicating which hierarchy the river is at; and α and β are two parameters determined by cartographers’ experiences.αϵ[0, 1]; and βϵ[0, 1].When α = β = 0, the formula becomes I = L.

This formula manifests that a river with greater length, higher density and greater grade have priority to be selected.

If L,Gand h of the river are given and do not change, cartographers generally hope the complex index (i.e. I) can increase with the increase of α and β. However, in practice, if \( \frac{G}{L}<1 \), the complex index decreases with the increase of α. To improve this situation, the complex index can be calculated by the following formula:

$$ \mathrm{I}=\mathrm{L}\times {G}^{\alpha}\times {\left({H}_m-h+1\right)}^{\beta } $$
(5.27)

To get the complex index I of a river by Formula 5.27, the length, the grade and the hierarchy of the river should be got firstly. Because it is easy to get the length of the river, only a method for obtaining the other two factors are discussed here, proposed by Zhang (2006), including three steps.

  • Step 1 data preprocessing: above all, the closed loops in the river network are marked and deleted so that the network becomes a pure tree structure; and then the river mouth is marked (a river tree has only one river mouth and thus the mouth can be easily obtained and marked).

  • Step 2: construction of river entities: search river segments from the river origin to the downstream. When several river segments meet at a junction, the longest river segment is viewed as a part of the mainstream (i.e. a section of the river entity) and the mainstream can be extended to the downstream until all segments belong to the mainstream are found and connected which means the river entity is formed.

    However, the longest river segments are not always regarded as the mainstream in many cases in the geographic space because of historical reasons; thus, if the river segments have names their names should be taken into consideration and the river segments with the same names can be connected to form a river entity.

  • Step 3: formation of the river tree: the mainstream and its tributaries may be recognized and recorded in the process of constructing river entities, hence the river tree is formed naturally. In the meanwhile, the length, grade and the hierarchical position of each river segment are calculated and saved.

After the above three steps, the complex indices of the rivers can be obtained and by which the river entities that can appear on the resulting map may be selected.

5.4.3 A Knowledge-Based Approach to River Network Generalization

Artificial intelligence has been a tool used in map generalization by many researchers (Li et al. 2010) basically abiding by the following ideas: extract and express the knowledge and rules used in the process of map generalization, and then spatial reasoning methods are employed to generalize (e.g. select, simplify, collapse, move) map features.

Here introduces a knowledge-based approach to river network generalization proposed by Wu et al. (2007). This method classifies the knowledge used in river network selection into three categories, i.e. spatial knowledge, attribute knowledge and river generalization rules, and studies on how to obtain and formally represent such knowledge and constructs a rule library for river network selection. This rule library can be used in carrying out river selection by knowledge inference, and it considers the spatial and attribute knowledge of the river network and the priorities and weights of river generalization rules.

5.4.3.1 Acquiring Knowledge for River Network Selection

There are mainly three kinds of knowledge used in river generalization: the knowledge contained in the attributes of rivers, the knowledge regarding to spatial relations among river segments, and the fuzzy knowledge contained in some qualitative factors such as river grade and the hierarchical position of the river segment in the river network.

Firstly, the original river data are processed to get a river tree. Generally, the original river data are saved as separated singular lines in the spatial databases. To get a river tree, they are processed in light of the graph theory. The spatial topological relations among the river segments (i.e. the singular lines) are calculated and are saved in so-called RIVER-ARC and RIVER-NODE structures. The RIVER-ARC structure capsules the directions and the lengths of the river segments and topological relations (e.g. joint relation, neighbouring relation) among river segments; RIVER-NODE structure capsules the coordinates of the vertices of each river segment. The two structures lay a data model foundation for constructing river networks, because by which the river data are re-organized and the river entities are constructed which contain spatial knowledge of the river.

Secondly, the rules for river network generalization are represented, which consist of the knowledge at three aspects: (1) the rules and indices for river selection from mapping specifications, (2) the graphic patterns and characteristics of the river, and (3) the weights of the river.

It is easy to get the rules and indices from mapping specifications; however, use of these rules in map generalization is difficult, because the rules and indices from mapping specifications are usually expressed using constant values and cannot be directly used in the generalization of the maps from different geographic spaces. Obtaining the graphic patterns and characteristics of the river usually can be reached by the graph theory, pattern recognition and artificial intelligence, though it is difficult to achieve the goal. As far as the weights of the rivers are concerned, the rivers owning the following three characteristics can be assigned greater weights and therefore have priority to be retained in map generalization:

  • the rivers at the edge of the river basin: construct the river tree for the river basin. The tributaries at the first and the last level of the river tree belong to the rivers at the edge of the river basin.

  • the rivers connecting with lakes or flowings into seas: they can be easily judged and obtained by the spatial relations among the rivers, lakes and seas.

  • the rivers with higher grade, flowing past more residential areas and/or at the border of political regions should be assigned greater weights and have priority to be retained on the resulting map in map generalization. The grades of the rivers can be got by the river basin’s tree structure; the residential area of the river may be obtained by buffering the river; and whether a river is at the border of political regions can be judged by the topological relations among the river and the regions.

Lastly, structural knowledge of the river basin is extracted. The river basin can be viewed as a river tree consisting of mainstreams at different levels, i.e. mainstreams at first level, mainstreams at the second level etc. The tributaries connecting with the river mouth are called the mainstreams at the last level. The structure of a river basin can be organized in this way which makes the construction process of river trees concise, simple and reasonable.

After getting the river trees, some useful information regarding rivers should be obtained and attached to the rivers, including river flow direction, river grade, river length, distance between adjacent tributaries, relations of a tributary with its mainstream (i.e. if the tributary is the left or right child of the mainstream) etc.

5.4.3.2 Generalization of Rivers

The rule library can be established by the knowledge for automated river generalization. Each rule in the rule library is expressed by a function and the procedure regarding how to use these rules is designed, expressed and saved in the library. Obviously, this inference procedure determines river selection results; therefore, the river knowledge, river selection rules and their weights and priorities in the rule library should be considered jointly. Here, a forward reasoning method is designed which starts from the facts and known data to get resulting maps by the rule library. In the process of river selection, all of the rivers that can trigger the inference procedure should be selected firstly so that the procedure may be started, and then new facts and rules are selected to push the inference procedure forwards until each of the rivers is judged (Fig. 5.12).

Fig. 5.12
figure 12

River generalization by rules. (After Wu et al. 2007)

If a river basin is generalized using the methods based on spatial knowledge reasoning, two points need to be noticed: if the knowledge is correct and if the inference procedure is reasonable. Here presents an example (see Fig. 5.13) for selecting rivers by reasoning.

Fig. 5.13
figure 13

River selection process by rules. (After Wu et al. 2007)

  • Step 1: if AB is the mainstream of the river who at least has one tributary and the tributary has no tributary (e.g. CD, EL), select a tributary who has no tributary (i.e. CD in this example). Then, search the river tree. If no tributary who has no tributary can be found, end this procedure.

  • Step 2: judge if the river (i.e. CD) can be retained by some of its important attributes such as if it is a perennial river. Further, select some quantitative indices such as river length and river interval to judge if the river can be retained.

  • Step 3: judge if the river length and river interval of CD meet the indices for retaining it on the resulting map. If they do, retain this river and go to Step 1; otherwise go to Step 4.

  • Step 4: judge if this is an important river by the three rules mentioned previously: (1) if the river at the edge of the river basin, or (2) if the river connects with lakes or flows into seas, or (3) if the river has higher grade or flows past more residential areas or is at the border of political regions. If it meets one of the three rules, it can be retained on the resulting map. Otherwise, mark this river as “will-be-deleted” and go to Step 1.

A river marked as “will-be-deleted” is not certainly deleted. Whether it will be physically deleted from the database depends on other factors such as graphic characteristics of the river basin. For example, in Fig. 5.13, tributary EL does not meet the basic rule for retaining it on the resulting map, nor is it an important river. However, it is the only right tributary of the river tree, therefore it should be retained for the purpose of keeping graphic similarity of the river tree before and after map generalization.

5.4.4 An Approach to Generalizing River Networks by Catchment Area

Grade, length, catchment area and other attributes of rivers are usually considered in river network generalization (Ai et al. 2007). These factors are correlative to each other, and their correlative degrees generally change when geographic characteristics of the mapped area change; thus, it is difficult to use such factors in river network generalization.

If only one factor is considered the result is usually wrong or/and unacceptable. So, cartographers generally use multiple factors in river network generalization. The following three examples give an explanation of this issue, taking Fig. 5.14a as the original river network: on which 1 is the mainstream, and 2, 3 and 4 are the direct tributaries of the mainstream, and 5 and 6 are the direct tributary of 2.

Fig. 5.14
figure 14

Use of different factors generate different results (After Ai et al. 2007)

Example 1: If only river length is taken as the criterion and a threshold of river length is given and used in river selection, tributary 2 might be deleted and tributaries 5 and 6 might be retained (Fig. 5.14b). Nevertheless, this result is unacceptable.

Example 2: if only river grade is used in river selection, the rivers 5 and 6 are deleted and the other ones are retained on the resulting map, shown in Fig. 5.14c. This result is unreason, because tributary 5 is very long and is usually regarded as an important tributary; hence, it should be retained on the resulting map.

Example 3: if both river length and river grade are considered, the resulting map might be Fig. 5.14d: tributary 5 whose grade is low but whose length is much longer than many other tributaries is retained while tributary 3 whose grade is high but whose length is much shorter is deleted. Apparently, this result is more reasonable than the other two.

If river catchment area is used as a criterion in river selection, results can be improved (Ai et al. 2007), because on one hand, it is reasonable that the greater the catchment area of a river, the more probably the river can be retained on the resulting map; on the other hand, the hierarchical relations among the polygons of the catchment areas can ensure that the catchment area of a river is greater than that of its any tributary. Hence, the case in Fig. 5.14b is impossible if river catchment area is taken into consideration in river selection.

In addition, values of river catchment areas depend on the spatial distribution of the river segments. For example, there are two tributaries A and B in a river tree, and their lengths are equal. Tributary A has higher grade and is in an area of higher river network density; while tributary B has lower grade and is in an area of lower river network density. Under this condition, the catchment area of A is generally less than that of B. Thus, if river catchment area is used in river selection, a tributary with lower grade but greater catchment area has greater probability to be retained on the resulting map than the one with higher grade but less catchment area (see Fig. 5.14c as an example). Such case is very common in the region where river basins are extremely unevenly developed. If rivers selection only considers river grade then graphic characteristics of the river networks will be distorted.

The above analysis tells that the catchment area is a key index for evaluating the degrees of importance of rivers. However, in some cases, the catchment areas of tributaries have little differences, therefore it is inappropriate to use catchment areas to evaluate the importance of tributaries. For example, in Fig. 5.15, if catchment area is employed as the criterion, the six tributaries at the northern side of the mainstream will either totally deleted or totally retained, which is obviously unreasonable. To overcome this disadvantage, a new method using catchment area index has been proposed (Ai et al. 2007). This method includes two steps:

  • Step 1: take the interval between tributaries as the index in river selection and do initial selection. To achieve this goal, the tributaries at the leaves of the river tree are triangulated, by which the distance between a tributary and the tributaries at the tributary’s two sides are calculated respectively. If a distance value is less than human’s visual discrimination distance on the resulting map, the corresponding tributary is deleted.

  • Step 2: calculate the number of river segments that can be retained on the resulting map by the Radical Law (Töpfer and Pillewizer 1966), and then sort the tributaries in decreasing order according to their catchment areas. When the tributaries are selected, retain the tributaries one by one until the number of retained tributaries are equal to the number calculated by the Radical Law.

Fig. 5.15
figure 15

Tributaries are unevenly distributed at the two sides of the mainstream

In this method, after execution of Step 1, the tributaries at the area of very high density of river networks are deleted, and the catchment areas occupied by these tributaries are divided by other adjacent tributaries, which naturally increases the probabilities of the retained ones to be retained on the resulting map when Step 2 is executed.

The method takes into consideration the grades, lengths and catchment areas of tributaries and the intervals among tributaries, and it regards the catchment area as an integration of the grade and length of the tributary and the intervals of the tributary with its neighbours, which makes river selection process simple and the results reasonable.

5.4.5 River Selection by the BP Neural Network Algorithm

Back propagation (i.e. BP) is a method used in monitor learning (Ethem 2010) and artificial neural networks to calculate a gradient that is needed in the calculation of the weights to be used in the network. It is commonly used to train deep neural networks, a term referring to neural networks with more than one hidden layer. Back propagation is a special case of a traditional and more general technique called automatic differentiation. In the context of learning, back propagation is commonly used by the gradient descent optimization algorithm to adjust the weights of neurons by calculating the gradient of the loss function. This technique is also sometimes called backward propagation of errors, because the error is calculated at the output and distributed back through the network layers. The BP learning process consists of two steps:(1) Forward propagation of operating signal, and (2) Back propagation of error signal.

The back propagation algorithm has been repeatedly rediscovered and is equivalent to automatic differentiation in reverse accumulation mode. Back propagation requires the derivative of the loss function with respect to the network output to be known, which typically means that a desired target value is known. For this reason, it is considered to be a supervised learning method, although it is used in some unsupervised networks such as autoencoders. Back propagation is also a generalization of the delta rule to multi-layered feed forward networks, made possible by using the chain rule to iteratively compute gradients for each layer. Because a river basin can be represented by trees or networks, and its generalization process is similar to the basic idea of the back propagation algorithm, some scholars have used it in automated selection of river networks. Here presents an algorithm proposed by Shao et al. (2004) which consists of three steps: organizing river data, acquiring leaf nodes, and selecting rivers by the BP neural network technique.

5.4.5.1 Organizing River Data

Take the river as the basic unit and the river data are organized using river data structure: firstly, the mainstream of the river basin is constructed according to the methods mentioned in the previous sections of this chapter; secondly, hierarchical relations of the tributaries are obtained and a river tree is formed. Lastly, the attributes of each river segment are calculated and assigned to the river segment, and the hierarchical relations among the mainstream and the tributaries as well as their attributes are saved in a 2-dimensional table.

5.4.5.2 Acquiring Leaf Nodes

At this step the length of each river corresponding to a leaf node, the interval of two rivers corresponding two leaf nodes and quantitative importance of each river corresponding to a leaf node are calculated. Because it is easy to get the length of each river, only the other two aspects are discussed here.

5.4.5.3 Calculation of River Interval

Figure 5.16 shows an example of river interval calculation. Three cases should be considered.

Fig. 5.16
figure 16

River intervals

Case 1: if the river is the first leaf node in the river tree

$$ \mathrm{d}=\operatorname{Min}\ \mathrm{d}\left(\mathrm{a},{b}_i\right),{b}_i\in \mathrm{B} $$
(5.28)

where, d(x, y)is the distance between x and y; a is the starting point of the current river; and B is the set of the direct downstream of the current river.

Case 2: if the river is at the middle of the leaf nodes of the river tree

$$ {d}_{front}=\operatorname{Min}\ \mathrm{d}\left({a}_i,{x}_i\right),{a}_i\in \mathrm{A},{x}_i\in \mathrm{X} $$
$$ {d}_{back}=\operatorname{Min}\ \mathrm{d}\left({a}_i,{y}_i\right),{a}_i\in \mathrm{A},{y}_i\in \mathrm{Y} $$
$$ \mathrm{d}=\operatorname{Min}\left({d}_{front},{d}_{back}\right) $$
(5.29)

where, A is the set of the rivers corresponding to the leaf nodes of the river tree; X is the set of the rivers at the same level of A and at the same side of A and in front of A and each of the rivers corresponding to a leaf node of the river tree; and Y is the set of the rivers at the same level of A and at the same side of A and behind A and each of the rivers corresponding to a leaf node of the river tree.

Case 3: if the river is the last leaf node in the river tree

$$ \mathrm{d}=\operatorname{Min}\ \mathrm{d}\left({a}_i,{x}_i\right),{a}_i\in \mathrm{A},{x}_i\in \mathrm{X} $$
(5.30)

where, A is the set of the rivers corresponding to the leaf nodes of the river tree; and X is the set of the rivers at the same level of A and at the same side of A andneighbouring A and each of the rivers corresponding to a leaf node of the river tree.

5.4.5.4 Acquirement of Importance Degrees of Rivers

The importance degree of a river is complex concept, because it is related to multiple factors. Here, those geographically special rivers that connect with ponds, lakes and seas, or the borders of political regions, or perennial ones in arid areas are not discussed, but only common rivers are considered, and a formula for calculating their importance degrees are given:

$$ \mathrm{p}=\upalpha {x}_1+\upbeta {x}_2 $$
(5.31)

where, the residential areas are classified into two levels; p is the importance degree of the river; x 1 is the residential areas at the first level; x 2 is the residential areas at the second level; α is the weight of the residential areas at the first level; and β is the weight of the residential areas at the second level.

It should be noticed that to ensure p ∈ [0, 1],x 1,x 2,α and β need to be normalized before they are used in the calculation of p.

5.4.5.5 Selecting Rivers by the BP Neural Network Technique

The algorithm proposed by Shao et al. (2004) considers three factors in the process of river selection, i.e. river length, river interval and river importance degree, and the result is the selected rivers; thus, in this algorithm the BP neural network has three neurons at the input layer of the network and one neuron at the output layer of the network.

The data used in the machine learning are from a topographic map at scale 1:1 500 K. The rivers and buildings are extracted and re-organized, and the river lengths, river intervals and river importance degrees are calculated and saved in a database. The rivers and buildings on the map are generalized by experienced cartographers to get a map at smaller scale 1: 3 M. Then the original map and generalized map are used as sample data for training the BP neural network algorithm, and finally the weight matrix and improved algorithm can be achieved.

Figure 5.17 shows an example for testing the BP neural network algorithm proposed by Shao et al. (2004). The resulting river network generated by this algorithm is better than the square root algorithm, because it takes the weights of the rivers into consideration and uses the BP neural network technique which may make the algorithm smarter by accumulating map generalization through machine learning.

Fig. 5.17
figure 17

Test of the BP neutral network method (a) original rivers (b) generalized result by the BP neutral network method (c) generalized result by the square root method

5.4.6 River Network Selection Based on Structure and Pattern Recognition

Generalization aims at reducing the level of details of a database in order to meet new specifications, which includes cartographic generalization and model generalization. The former aims at producing maps and the latter aims at generating new databases of source data. As far as river network generalization is concerned, a lot of achievements have been made for many years, and most of them are on cartographic generalization. Thus, our concerns move from the former to the latter. Here introduces an algorithm proposed by Touya (2006) which addresses database generalization, specifically river network selection. The algorithm uses the principle of “good continuation” to enrich the database by constructing so-called “river strokes” and selects river networks based on river structures and patterns. It mainly consists of the following steps.

5.4.6.1 Data Pre-processing

River data from map database is not always correct. For example, the flow direction of the river segments in some cases is not completely reliable. So, the initial dataset has often to be cleaned to correct the errors in flow direction. Two processes are successively applied to infer flow directions in the river network.

The first process: a process adapted to initial data is carried out. It consists in using elevation data of stream segments (i.e. attribute data). Each stream segment has an elevation value for its initial vertex and final vertex. Obviously, if the elevation value of final vertex is greater than elevation value of initial vertex, stream segment geometry has to be reversed to be consistent with the flow direction.

The second process: this process is more independent to initial data as it consists in a neighbourhood analysis. Locally analyzing the inflows and the outflows of streams may allow to infer flow direction for simple cases like in Fig. 5.18. At sources or sinks, neighbourhood analysis may correct other errors of flow direction.

Fig. 5.18
figure 18

Two examples of wrong flow directions: (a) the lightened segment has a wrong flow direction as it is inconsistent with its neighbours; and (b) the lightened segment has a wrong flow direction, inconsistent with its neighbours, and it is the first river segment from a source

Furthermore, generalization processes often need topologically correct network data to provide meaningful results. Most real data contain errors and especially topological errors. So, a check of the topological relations among river networks is necessary. After these pre-processing steps, the selection process itself can take place.

5.4.6.2 Enriched Data Schema for Generalization

Automatic selection of river networks is a complex process that often needs to enrich the raw dataset to be generalized to recognize implicit structures and patterns.

To implement this algorithm in an object-oriented programming language (e.g. C++), a “River stroke” class is added to store the strokes that is used for selection. The classes “Source” and “Sink” are added to store the beginning and the end of the strokes so as to use them as database objects in further processes. Two classes are also added to manage the selection of river islands. In addition, a class to store irrigation zones is added. Indeed, the strokes building procedure does network in these zones, so they have to be previously detected.

5.4.6.3 Extraction of Sources and Sinks

In order to build the “river strokes” and the irrigation zones, it is firstly necessary to build the sources and sinks. A source is a node of the network that has a single output link and no input link. A sink is the opposite: a node with a single input link and no output link. Sources and sinks can usually be extracted from the river data by their attributes such as “type”.

5.4.6.4 Construction of River Strokes

The strokes building algorithm is a downstream pass: it begins at the sources of the network and ends when all the nodes (sources, sinks and confluences points) have been treated. “River strokes” are built to correspond to the classic ordering of rivers: a river stroke begins at a source and ends at a sink or at a confluence point with a more important river stroke. The tricky part of the creation of such strokes is to define the main stream at a confluence point. To get more realistic results, the Gestalt principle of “good continuation” is constrained by rules and the river strokes which are different from the purely geometric perceptual strokes. Many rules can be used to define the main path. Horton (1945) used the longest and straightest path, and Thomson and Brooks (2000) used the river name, the longest and straightest path, others used the largest drainage basin. The rules used here are a bit more complex:

  1. 1.

    Strokes always follow a named stream.

  2. 2.

    All other things being equal, a “permanent” regimen river has priority on an “intermittent” regimen river.

  3. 3.

    All other things being equal and the sources of the streams being “natural”, the longest and straightest path has priority. The length is the major parameter because it is more relevant than straightness. Straightness is rather used when the confluence angle is greater than a threshold (60°) and if the length difference is less than a threshold (500 m).

  4. 4.

    All other things being equal and one of the sources of the streams being “zone limit”, the straightest path has the priority.

To follow the good continuation principle, straightness means curvature continuity. The straightest path is the one that leads to more curvature continuity with the upstream stroke. The length of a river is measured from the source point to the confluence point.

River networks are sometimes composed of deltas and islands where a stream splits into two or more channels in the downstream direction. So, two types of river strokes are distinguished: “main channels” and “braided streams”. When a stream splits into two or more channels, one outflow is considered as the main channel and continues the “main channel” stroke and the others are used as the first segment of new “braided” river strokes. The chosen main channel is the straightest path (Fig. 5.19). The shortest path to the sink could have been used instead but the process would have slowed down as the shortest path is a computation time consuming algorithm and the contribution is not sufficient. But this rule could be used for small datasets.

Fig. 5.19
figure 19

The main stream on a confluence point with two outflow and two inflow streams

5.4.6.5 Formation of Islands

In river networks, braided streams often correspond to islands on the rivers, and there are often several adjacent islands on the river. These islands are significant data and it is interesting to select the outline of some islands in the generalized dataset. In the process of the data enrichment, the aim is to form islands and complex islands as database objects to make the selection of the outline easier. A complex island is an aggregation of islands. In the enriched dataset, each island is linked to the stream segments of its outline, even for complex islands. If selection specifications require a different level of abstraction for islands and especially a smaller level, adjacent islands can be aggregated to several adjacent complex islands in order to keep the information of adjacent islands after the selection. Then, the enriched data model would be a bit modified to allow adjacent complex islands.

To build such islands, the topologic faces of the network that are regarded as a graph have to be built. Only the small faces, the ones that really represent islands, are kept to give the geometry to the new objects: a threshold is determined based on the test dataset. Complex islands are built by clustering the simple islands. In order to build adjacent complex islands for a smaller level of abstraction, a hierarchical clustering would rather be used.

5.4.6.6 Detection of Streams Inside Irrigation Zones

The process for detecting irrigation zones is necessary, because irrigation zones are sources of errors during the creation of river strokes. As explained in previous sections of this chapter, the stroke creation procedure is based on flow directions, and in irrigation zones most streams are artificial and flat and have no real flow direction, because the river segments are often given an arbitrary flow direction in the original data. The inconsistency of flow directions obviously may forbid the algorithm to go further.

Irrigation zones are characterized by a flat ground, a strong density of straight and short streams and many sources and sinks. These characteristics are used to automatically detect irrigation areas and to build them as objects in the dataset. Then, compactness is used to remove over-detected areas.

The next step is the automatic recognition of artificial streams in regards to natural ones within irrigation zones in order to remove them from the selection process. Artificial streams are rather short and straight whereas natural ones are rather long, sinuous and have many intersections with other streams. To translate the difference into geo-computational measures, the strokes are used once again but this time with geometrical strokes (only curvature variation is used). So, strokes are computed in each irrigation area and then characterized by measures like sinuosity and number of intersections with other strokes in order to be differentiated.

The detection of natural and artificial streams inside irrigation zones ends up the data enrichment step of the proposed river network selection process.

5.4.6.7 River Selection

After completion of the data enrichment step, the river selection step itself can take place. The major part is the selection of the river strokes. As strokes are built to represent whole rivers, selection rather concerns stokes than segments. All segments of a stroke are selected or none is. In order to determine which strokes are to be selected, the main criterion used is a hierarchical organization of the strokes.

The Horton ordering (Horton 1945) is applied here on organizing strokes. To compute the Horton ordering, a Strahler ordering of the river segments is needed and used. The Strahler order is assigned to the river segments during the strokes building procedure. When a segment extends a river stroke, its Strahler order is computed. The segments belonging to a “braided stream” stroke are not used to compute the Strahler order and no Strahler order is assigned to them as their selection is not based on their own hierarchy. The Horton order of each stroke is finally computed after the building of all strokes in the network: the Horton order of a stroke stream is the maximum of the Strahler orders of the river segments that compose it.

The selection process is based on the river strokes using the Horton order to organize them in hierarchy. The less important ones in the hierarchy are not selected.

The selection criteria are a threshold on the Horton order of the strokes and a threshold on the stroke length. The river strokes are selected according to their type: “primary” or” braided”. High Horton order primary strokes are always selected and low ordered primary strokes are selected if they are long enough. The braided strokes which are long enough are selected only when their primary stroke is selected. All the primary strokes with a Horton order of three or more are selected. The strokes with an order of two or one are selected if the length is longer than specific values. The braided strokes are selected if their primary stroke is selected and if their length is longer than a specific value. Obviously, the parameters may be different in different cases.

To conclude, this algorithm is based on previous work on river network generalization, especially that introduces the notion of “strokes” but also for the organization in hierarchy of the streams of the network. This work adds the management of river islands, irrigation zones and allows the building of strokes on a clipped area where some sources are not natural. A data correction step dealing with network topology and flow direction has also been developed. The complete selection process is composed of three steps: data correction, enrichment and selection.

5.4.7 Comparison of the Algorithms for River Network Generalization

Automated map generalization is always an important issue and a major challenge in cartography and Geographical Information System research. Regarded as the skeleton of the terrain, the drainage system is already considered to preserve terrain features in generalization. By far many researchers have paid attention to the generalization of river networks and many approaches for river selection have been developed, typically including the approach for selecting river segments by indices (He 2004), the approach for selecting river segments by the river tree (Zhang 2006), a knowledge-based approach to river network generalization (Wu et al. 2007), an approach to generalizing river networks by catchment area (Ai et al. 2007), an approach to river selection by the BP neural network algorithm (Shao et al. 2004), an algorithm for river network selection based on structure and pattern recognition (Touya 2006), etc. Obviously, each approach has its advantages and disadvantages and each can be used in different cases. To use them efficiently, these approaches are compared and the results are shown in Table 5.1.

Table 5.1 Comparison of river network generalization approaches

5.5 Summary of the Chapter

River network is a type of important features on maps and in map databases. Multi-scale representation of river networks is necessary in constructing spatial data infrastructures.

This chapter introduces two categories of methods for describing river networks that can be found in literature. The first one is to describe river networks using measures and parameters, and the second one is by special methods for specific purposes including the information entropy-based method, the fractal dimension-based method, the river tree method, the topological order method and the graph theory-based method. These methods lay foundation for automated generalization of river networks on maps. Then it addresses the Fundamental principles that should be abided by in river network generalization. Last, it presents six approaches for river network generalization, i.e. the approach for select river segments by indices, the approach for select river segments by the river tree, a knowledge-based approach to river network generalization, an approach to generalizing river networks by catchment area, an approach to river selection by the BP neural network algorithm, an algorithm for river network selection based on structure and pattern recognition, respectively.