Skip to main content
Log in

Supervised and semi-supervised classifiers for the detection of flood-prone areas

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Supervised and semi-supervised machine-learning techniques are applied and compared for the recognition of the flood hazard. The learning goal consists in distinguishing between flood-exposed and marginal-risk areas. Kernel-based binary classifiers using six quantitative morphological features, derived from data stored in digital elevation models, are trained to model the relationship between morphology and the flood hazard. According to the experimental outcomes, such classifiers are appropriate tools when one is interested in performing an initial low-cost detection of flood-exposed areas, to be possibly refined in successive steps by more time-consuming and costly investigations by experts. The use of these automatic classification techniques is valuable, e.g., in insurance applications, where one is interested in estimating the flood hazard of areas for which limited labeled information is available. The proposed machine-learning techniques are applied to the basin of the Italian Tanaro River. The experimental results show that for this case study, semi-supervised methods outperform supervised ones when—the number of labeled examples being the same for the two cases—only a few labeled examples are used, together with a much larger number of unsupervised ones.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Bates PD, Marks KJ, Horritt MS (2003) Optimal use of high resolution topographic data in flood inundation models. Hydrol Process 17:537–557

    Article  Google Scholar 

  • Belkin M, Niyogi P, Sindhawani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 7:2399–2434

    MathSciNet  MATH  Google Scholar 

  • Bishop CM (2006) Pattern recognition and machine learning. Springer, New York

    MATH  Google Scholar 

  • Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, Cambridge

    Book  MATH  Google Scholar 

  • Degiorgis M, Gnecco G, Gorni S, Roth G, Sanguineti M, Taramasso AC (2012) Classifiers for the detection of flood-prone areas using remote sensed elevation data. J Hydrol 470–471:302–315

    Article  Google Scholar 

  • Degiorgis M, Gnecco G, Gorni S, Roth G, Sanguineti M, Taramasso AC (2013) Flood hazard assessment via threshold binary classifiers: the case study of the Tanaro basin. Irrigation Drainage 62:1–10

  • Do Carmo MP (1976) Differential geometry of curves and surfaces, vol 2. Prentice-Hall, Englewood Cliffs

    MATH  Google Scholar 

  • Dodov BA, Foufoula-Georgiou E (2006) Floodplain morphometry extraction from a high-resolution digital elevation model: a simple algorithm for regional analysis studies. IEEE Geosci Remote Sens Lett 3:410–413

  • Gallant JC, Dowling TI (2003) A multiresolution index of valley bottom flatness for mapping depositional areas. Water Resour Res 39:1347–1360

    Article  Google Scholar 

  • Giannoni F, Roth G, Rudari R (2005) A procedure for drainage network identification from geomorphology and its application to the prediction of the hydrologic response. Adv Water Resour 28:567–581

    Article  Google Scholar 

  • Guzzetti F, Stark CP, Salvati P (2005) Evaluation of flood and landslide risk to the population of Italy. Environ Manag 36:15–36

    Article  Google Scholar 

  • Hjerdt KN, McDonnell JJ, Seibert J, Rodhe A (2004) A new topographic index to quantify downslope controls on local drainage. Water Resour Res 40. doi:10.1029/2004WR003130

  • Horritt MS, Bates PD (2002) Evaluation of 1D and 2D numerical models for predicting river flood inundation. J Hydrol 268:87–99

    Article  Google Scholar 

  • Hunter NM, Bates PD, Horritt MS, Wilson MD (2007) Simple spatially-distributed models for predicting flood inundation: a review. Geomorphology 90:208–225

    Article  Google Scholar 

  • von Luxburg U (2007) A tutorial on spectral clustering. Stat Comput 17:395–416

    Article  MathSciNet  Google Scholar 

  • Manfreda S, Di Leo M, Sole A (2011) Detection of flood-prone areas using digital elevation models. J Hydrol Eng 16(10):781–790. doi:10.1061/(ASCE)HE.1943-5584.0000367

  • Melacci S, Belkin M (2012) Laplacian support vector machines trained in the primal. J Mach Learn Res 12:1149–1184

    MathSciNet  MATH  Google Scholar 

  • Nardi F, Vivoni ER, Grimaldi S (2006) Investigating a floodplain scaling relation using a hydrogeomorphic delineation method. Water Resour Res 42(9). doi:10.1029/2005WR004155

  • Nardi F, Grimaldi S, Santini M, Petroselli A, Ubertini L (2008) Hydrogeomorphic properties of simulated drainage patterns using digital elevation models: the flat area issue. Hydrol Sci J 53:1176–1193

    Article  Google Scholar 

  • Noman NS, Nelson EJ, Zundel AK (2001) Review of automated floodplain delineation from digital terrain models. J Water Resour Plan Manag 127(6):394–402

  • Santini M, Grimaldi S, Nardi F, Petroselli A, Rulli MC (2009) Preprocessing algorithms and landslide modelling on remotely sensed DEMs. Geomorphology 113:110–125

    Article  Google Scholar 

  • Theodoridis S, Koutroumbas K (2009) Pattern recognition, 4th edn. Academic Press, New York

  • Zhu X, Goldberg AB (2009) Introduction to semi-supervised learning. Morgan & Claypool Publishers, San Rafael

Download references

Acknowledgements

Marcello Sanguineti is a member of the Gruppo Nazionale per l’Analisi Matematica, la Probabilità e le loro Applicazioni (GNAMPA) of the Istituto Nazionale di Alta Matematica (INdAM).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marcello Sanguineti.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Communicated by V. Loia.

Appendices

Appendix 1: Support vector machines

Let a set made of a finite number l of labeled training data {(\(x_{i}, y_{i}\)), \(i = 1,\ldots ,l\)} be given, with \(x_{i}\in {\mathbb {R}}^{m}\) and \(y_{i}\in \){-1,1}. Here, with a slight change in notation with respect to the previous sections, the label \(-1\) (instead than the label 0) is used to denote the “negative” class, while +1 is the “positive” class label. Given a regularization parameter \(\gamma _{A}>0\) and a suitable function space \(H_\mathcal{K} \), more precisely, a reproducing kernel Hilbert space (Cristianini and Shawe-Taylor 2000), the (binary) support vector machine (SVM) training problem consists in searching for a classifier \(f^{*}\) that solves the following optimization problem: find

$$\begin{aligned} \text{ min }_{f \in H_\mathcal{K} } \left( {\frac{1}{l} \sum \limits _{i=1}^l ( {1-y_i f(x_i )})_+ +\gamma _A f_{H_\mathcal{K} }^2 }\right) . \end{aligned}$$
(1)

By \(||\cdot ||_{H_\mathcal{K} }^2 \) we denote the square of the norm in the reproducing kernel Hilbert space \(H_\mathcal{K} \), and \(({1-y_i f(x_i )})_+\) is the so-called hinge-loss function, which is defined as

$$\begin{aligned} ( {1-y_i f(x_i )})_+ :=\max ( {0,1-y_i f(x_i }). \end{aligned}$$
(2)

The term \(\frac{1}{l}\mathop \sum \nolimits _{i=1}^l ( {1-y_i f(x_i )})_+\) in (1) penalizes the classification error on the training set, whereas the term \(\gamma _A ||f||_{H_\mathcal{K} }^2 \) in (1) enforces a small norm of the optimal solution \(f^{*}\) in the reproducing kernel Hilbert space \(H_\mathcal{K} \) (i.e., typically, high smoothness for \(f^{*}\)). Given a (possibly unseen) data point \(x \in {\mathbb {R}}^{m}\), the optimal classifier \(f^{*}\) assigns to x the label +1 if \(f^*( x)\ge 0,\) otherwise it assigns to x the label \(-1\).

The optimization problem (1) can be rewritten in the following way: find

$$\begin{aligned} \text{ min }_{f \in H_\mathcal{K} , \xi _i {\mathbb {R}} } \left( {\frac{1}{l}\mathop \sum \limits _{i=1}^l \xi _i + \gamma _A ||f||_{H_\mathcal{K} }^2 }\right) \end{aligned}$$
(3)
$$\begin{aligned} \text{ subject } \text{ to } \left\{ {{\begin{array}{ll} y_i f(x_i )\ge 1-\xi _i , &{} \quad \mathrm{for} \ i=1,\ldots ,l, \\ \xi _i \ge 0, &{}\quad \mathrm{for} \ i=1,\ldots ,l. \\ \end{array} }} \right. \end{aligned}$$

We denote by \(\mathcal{K}:{\mathbb {R}}^m\times {\mathbb {R}}^m\rightarrow {\mathbb {R}} \) the (uniquely determined) kernel function associated with the reproducing kernel Hilbert space \(H_\mathcal{K}\) (Cristianini and Shawe-Taylor 2000). The optimal solution \(f^{*}\) of the optimization problem (3) is provided by the representer theorem (Cristianini and Shawe-Taylor 2000) in the following form:

$$\begin{aligned} f^*( x)= \mathop \sum \limits _{i=1}^l \alpha _i^*\mathcal{K}( {x,x_i }), \end{aligned}$$
(4)

where the optimal coefficients \(\alpha _i^*{\mathbb {R}}\). Therefore, solving the optimization problem (3) is reduced to determining the finite-dimensional coefficients \(\alpha _{i}\) that minimize its objective, when the function f is constrained to have the form (4). For a reproducing kernel Hilbert space \(H_\mathcal{K} \), the kernel \(\mathcal{K}\) has often a simple expression. This is the case, e.g., of the linear kernel

$$\begin{aligned} \mathcal{K}( {x,y}):= \langle x,y \rangle _{{\mathbb {R}}^m} , \end{aligned}$$
(5)

and of the Gaussian kernel

$$\begin{aligned} \mathcal{K}( {x,y}):=\text{ exp }\left\{ {-\frac{||x-y||_{{\mathbb {R}}^m}^2 }{2\sigma ^2}} \right\} , \end{aligned}$$
(6)

where \(\sigma > 0\) is a fixed width parameter. It often happens that only a small subset of the coefficients \(\alpha _{i}^{*}\) (with respect to their total number l) is different from 0; the input data points \(x_{i}\) associated with nonzero \(\alpha _{i}^{*}\) are called support vectors.

In practice, a binary SVM classifier can be interpreted as a binary linear classifier in a (possibly infinite-dimensional) auxiliary feature space associated with the reproducing kernel Hilbert space \(H_\mathcal{K} \). The mapping between the original feature space \({\mathbb {R}}^{m}\) and the auxiliary feature space is typically nonlinear. A binary SVM classifier often allows one to separate nonlinearly data points that are not linearly separable in the original feature space.

Appendix 2: Manifold regularization

Manifold regularization is a class of semi-supervised learning techniques, described in Belkin et al. (2006), whose goal consists in exploiting the information contained in the training objects in order to determine the underlying geometry of the input data in such a way to improve the overall classification performance. Manifold regularization aims at capturing such a geometrical structure and exploiting it to build a classifier having better classification performance than a fully supervised one, in situations where only a few labeled data are provided, but a much larger set of unlabeled data is available. Indeed, label information is not needed to determine the underlying geometry of the input data; hence, both labeled and unlabeled data can be used to this aim.

A main assumption of manifold regularization is that the input data points are drawn from a probability distribution whose support resides on a Riemannian manifold embedded in the original feature space. A two-dimensional manifold can be thought as a surface embedded in a higher dimensional Euclidean space (Do Carmo 1976). The surface of the Earth, for instance, is approximately a two-dimensional manifold embedded in a three-dimensional space. Similar remarks hold for larger dimensional manifolds. A Riemannian manifold is one on which one can define the “intrinsic distance” between any two points on the Riemannian manifold itself as their geodesic distance on the manifold, i.e., the length of the shortest path on the manifold between the two points. Manifold regularization assumes that similar labels are expected to be assigned to points \(x_{i}\) and \(x_{j}\) that are close with respect to the intrinsic distance on the Riemannian manifold they lie on. So, determining an approximation of the Riemannian manifold is expected to help the classification process.

In practice, an approximation of the Riemannian manifold can be obtained by using both the labeled and unlabeled input data, exploiting them to build an undirected graph G = (V, E) which provides a discrete model of the manifold itself and which is associated with a symmetric matrix W of suitable nonnegative weights. We recall that a graph is a representation of a set of objects where some of them are connected by links (“edges”); the graph is called undirected if no orientation on such links is defined, while it is weighted if one is given a measure of the strength of the links between pairs of objects (“vertices”). In the context of the present work, the vertices in the set V correspond to the input data points (the feature vectors), while the links between different pairs of input data points form the edge set E. In the context of manifold regularization, one considers weighted undirected graphs; assigning a weight to an edge means defining a measure of similarity between the associated vertices (input data points). Once a similarity measure has been chosen, the larger the similarity between the two input data points \(x_{i}\) and \(x_{j}\), the stronger their connection in the graph. Due to the basic assumption of manifold regularization reported in the paragraph above, the higher the weight \(W_{ij}=W_{ji }\)between \(x_{i}\) and \(x_{j}\), the higher the probability that they belong to the same class.

Determining a suitable similarity measure between every pair of input data points (hence, a suitable weight matrix W) is a challenging task, and several methods have been proposed in the literature to deal with such an issue. In fact, this measure is fundamental to build the graph that models the manifold where the data lie on. Usually, two choices for the weights \(W_{ij}\) are considered. They could be either binary weights, or Gaussian weights. In the first case, the weight between two points is set to 1 if they are sufficiently close in the original feature space, otherwise it is set to 0. In the second case, one sets to \(W_{ij} := e^{-\vert \vert x_i -x_j \vert \vert ^2/4t^2}\) the weight between two points \(x_{i}\) and \(x_{j}\) that are connected according to the first method, where \(t > 0\) is a suitable width parameter. All the other weights are defined as being equal to 0. In this work, we have decided to use only the second method to define the weights of the edges. Note that the first one can be considered a limit case of the latter, since it is obtained from it for \( t\rightarrow +\infty \) .

Another way to build the graph approximating the Riemannian manifold is described in von Luxburg (2007). One of its main features consists in applying a k-nearest neighbors procedure to determine the edges of the graph, where k is a user-defined parameter. In a first stage, each input data point is connected to its k-nearest input data points, using the Euclidean metric in the original feature space to define the set of k-nearest neighbors. In general, this procedure leads to the definition of a directed graph; in order to obtain an undirected one (which is needed by manifold regularization), two possible methods are usually implemented:

  • the first method consists in connecting two input data points \(x_{i}\) and \(x_{j }\) if and only if either \(x_{i }\) is among the k-nearest neighbors of \(x_{j }\) or, vice versa, \(x_{j }\) is among the k-nearest neighbors of \(x_{i}\);

  • the second method, which leads to a less connected (i.e., sparser) graph, creates a link between the two nodes \(x_{i }\) and \(x_{j}\) if and only if \(x_{i }\) is among the k-nearest neighbors of \(x_{j}\) and, vice versa, \(x_{j }\) is among the k-nearest neighbors of \(x_{i}\). In this work, we have applied this kind of method, because we prefer to deal with less connected graphs.

Once the topology (i.e., the edge set E) of the graph has been fixed, the weight matrix W can be defined by following one of the two k-nearest neighbors procedures described in the paragraph above, i.e., by assigning to the edges determined through such procedures either a binary weight or a Gaussian weight.

Figure 5 shows an example of construction of the graph. Note that in the figure, we consider only two features for each point in order to visualize easily the graph.

Fig. 5
figure 5

Representation of the Riemannian manifold approximation by means of a graph. All the objects (negative, positive, unlabeled) were used to generate the graph. The links between the different points represent the edges of the graph that approximates the manifold the data lie on. The links were generated according to the second method detailed in the text, i.e., a link between the two nodes \(x_{i}\) and \(x_{j}\) has been created if and only if \(x_{i}\) is among the k-nearest neighbors of \(x_{j}\) and, vice versa, \(x_{j}\) is among the k-nearest neighbors of \(x_{i}\) (the value \(k= 5\) has been chosen). Finally, binary weights have been used for the links

Appendix 3: Laplacian support vector machines

Likewise in “Appendix 1”, we assume that a set made of a finite number l of labeled training data {(\(x_{i}, y_{i}\)), \(i = 1 ,\ldots ,l\)}, with \(x_{i}\in {\mathbb {R}}^{m}\) and \(y_{i} \in \){\(-1\),1} is available. We also assume the presence of a second set made of a finite number u of unlabeled training data {\(x_{j}\), \(j = l+1,\ldots ,l+u\)}, with \(x_{j}\in {\mathbb {R}}^{m}\). As in “Appendix 1,” \(H_\mathcal{K} \) denotes a reproducing kernel Hilbert space, whereas \( \gamma _A >0\) is a regularization parameter. We also assume that a second regularization parameter \({\gamma }_I >0\) is given. With these premises, the (binary) Laplacian support vector machine (LapSVM) (Belkin et al. 2006) extends the SVM formulation described in “Appendix 1” by solving the following optimization problem (which is inspired by the principle of manifold regularization, see “Appendix 2”): find

$$\begin{aligned}&\text{ min }_{f \in H_K } \left( \frac{1}{l}\mathop \sum \limits _{i=1}^l ( {1-y_i f(x_i )})\right. \nonumber \\&\left. \quad + \gamma _A ||f||_{H_\mathcal{K} }^2 +\,\frac{\gamma _I }{(u+l)^2}{\varvec{f}}^{\varvec{T}} {\varvec{Lf}}\right) \end{aligned}$$
(7)

where \(\varvec{f}:=[f( {x_1 }),\ldots ,f(x_{l+u} )]^T\) and L is the graph Laplacian matrix defined as \(L:=D-W\). Here, W denotes a suitable (l + u\(\times \) (l + u) symmetric matrix of weights (see “Appendix 2” for one its possible constructions); its generic element \(W_{ij}\) = \(W_{ji}\) is the weight of the edge between the ith and the jth input data points. D is a diagonal matrix whose diagonal elements are defined as \(D_{ii} :=\mathop \sum \nolimits _{j=1}^{l+u} W_{ij} \). Likewise in Appendix A, the goal of the term \(\frac{1}{l}\mathop \sum \nolimits _{i=1}^l ( {1-y_i f(x_i )})_+ \) in (7) is to penalize the classification error on the training set, whereas the term \(\gamma _A ||f||_{H_\mathcal{K} }^2 \) in (7) enforces a small norm of the optimal solution \(f^{*}\) in the reproducing kernel Hilbert space \(H_\mathcal{K} \) (i.e., typically, high smoothness for \(f^{*}\)). Finally, the term \(\frac{\gamma _I}{(u+l)^2}\varvec{f}^{\varvec{T}} \varvec{Lf}\) enforces smoothness of the optimal solution \(f^{*}\) also with respect to the graph approximation of the Riemannian manifold.

The expression of the optimal solution \(f^{*}\) of problem (7) follows again from another form of the representer theorem and is given by

$$\begin{aligned} f^*(x):= \sum \limits _{i=1}^{l+u} \alpha _i^*K( {x,x_i }), \end{aligned}$$
(8)

for suitable optimal coefficients \(\alpha _i^*\,\epsilon \,{\mathbb {R}}\). Again, solving the optimization problem (7) is reduced to determine the finite-dimensional coefficients \(\alpha _{i}\) that minimize its objective, when the function f has the form (8).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gnecco, G., Morisi, R., Roth, G. et al. Supervised and semi-supervised classifiers for the detection of flood-prone areas. Soft Comput 21, 3673–3685 (2017). https://doi.org/10.1007/s00500-015-1983-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-015-1983-z

Keywords

Navigation