1 Introduction

Embankment dams (earthfill or rockfill) are the most commonly built type of dams in many countries. Their construction is preferred under certain circumstances, especially when sufficient materials are available near the dam site, the foundation is pervious, and the ratio of dam length to height is high. The main problem facing this kind of dam is piping or overtopping that may cause erosion of materials and ultimately breaching of the embankment. Accurate estimations of breach characteristics are needed as a basis in dam risk assessments. In order to carry out an embankment failure analysis, the average breach width in the dam is one of the key parameters that should be accurately estimated because it influences the severity of failure and affects the magnitude of the peak discharge. Singh and Snorrason (1984) used DAMBRK and HEC-1 models on 8 hypothetical breached dams and assessed that changes in breach width were more significant for large dams because it produced larger changes (35–87 %) in peak outflow than for smaller reservoirs (6–50 %). The breach shape of an embankment dam is assumed to vary from triangular to trapezoidal as the breach progresses (Wahl 1998). Historic embankment failure data report either the breach width at the top and bottom of the breach section or simply the average breach width resulting from passage of the complete breach hydrograph. The average breach width (\( B_{av} \)) is one-half the sum of the trapezoid top and bottom widths. Methods of estimating \( B_{av} \) are based on either case study data from past dam failures or physically based numerical models. Case study methods include parametric models, regression equations (RE), and analysis by comparison. Most of these methods are based on small dams having heights of <15 m. Parametric models (e.g. NWS DAMBRK by Fread 1977) use empirical observations of previous dam failures to develop the outflow hydrograph. Physically based models such as BREACH (Fread 1988), BEED (Singh and Scarlatos 1985), and FLDWAV (Fread 1993) relay on sediment erosion and water flow formulas and generally suffer from insufficient understanding of breach development (Wahl 1998). However, recent researches considering hydrodynamics and soil mechanics for embankment erosion are offering opportunities to better understand and model breach mechanisms (e.g. Froehlich 2004; Hanson et al. 2005). In practice, the most widely applied methods to predict \( B_{av} \) are based on regression analysis of recoded data from embankment dam failures, for example, the Bureau of Reclamation (1988), Von Thun and Gillette (1990), and Froehlich (1995). RE provide simple and convenient algorithms, especially in cases of strong linear relationships between the input and output variables and when detailed simulations are not required. The results of the available RE vary widely depending on the assumptions and subsets of data used in their formulation and internal uncertainties that are not taken explicitly into consideration. The linear regression approach assumes that the scatter of points around the best-fit line is approximately Gaussian and has the same standard deviation all along the line, the data points are independent of one another, and any imprecision in measuring the values of the independent variables is very small compared to the variability in the values of the dependent variable. If these assumptions are violated, then the linear regression approach leads to biased relationships. In practice, the breach extent depends on the embankment and the reservoir geometry in addition to other factors such as embankment material, type of protective cover, and mode of failure. Many of the available RE assumed the breach width as a linear function of only the dam (or the breach) height and/or the reservoir volume. This assumption may be valid for small embankments having similar geometry and soil characteristics. As the material changes, more uncertainties become included in the overall breaching process. In the regression line, fitting such a distinction is not considered and each point in the scatter diagram is treated equally in fixing the best straight line. Uncertainty is also included in determining the reservoir water volume and the breach height at the time of failure that will be used to predict the breach width. It is not possible to consider such variations in the coefficients through regression analysis. The fuzzy approach, on the other hand, can provide an alternative methodology for considering such uncertainties through vaguely defined membership functions. Fuzzy logic (FL) modeling is effectively utilized in applications ranging over perhaps all branches of engineering. However, there is currently no solution to predict \( B_{av} \) using this technique. A FL model is a logical-mathematical procedure based on linguistic variables (e.g., low, high, wide, etc.) and a system of IF–THEN rules that mimics the human way of thinking in computational form: an overall process called fuzzy inference. The idea behind fuzzy inference is to interpret (fuzzify) the crisp values of the system variables to express them in linguistic terms and, based on a set of IF–THEN rules, to assign values to the output vector. For each rule, inferencing looks up some membership values according to the condition of the rule and an implication method is employed to combine the IF and the THEN parts. The outputs of each rule are aggregated to produce a single fuzzy set that must be converted (defuzzified) to a crisp number representing the desired output. More details about the FL approach will be given in subsequent sections.

The main objective of the present paper is the development of a FL model to predict \( B_{av} \) as an alternative to the RE. This general objective includes: (1) developing a Mamdani’s FL inference system for predicting \( B_{av} \), (2) comparing the results of the FL model with those of the best RE for predicting \( B_{av} \), (3) performing an uncertainty analysis for the results of the FL model and the RE, and (4) applying the FL model on some of the available case studies that have adequate data.

The available case studies of embankment dam failures presented by Froehlich (2008) constitute the basis for the development of this FL rule-based model. The FL system becomes more complex when many inputs and outputs are chosen for a single implementation. This adds to the difficulty of building the IF–THEN rules that control the system. For this reason, the present FL system is selected to have two inputs and one output. Since the height of water above the breach invert (breach base) at the dam at time of failure (\( h_{w} \)) and the volume of water stored above the breach invert at time of failure (\( V_{w} \)) have been documented and used in a number of prediction equations, they are taken as the input parameters in the development of the present FL model for estimating \( B_{av} \). Even with these parameters, there is vagueness in the data (imprecision in discrete measurements and parameter uncertainty), FL set theory is especially well suited for handling such vagueness. The construction of the conditional statements ‘IF–THEN rules’ and the membership functions, based on the observed measurements and the relation between the input and the output variables, are the things that make FL useful and capable of representing the actual system behavior.

2 Review of available approaches

Several physically based models are available in literature to simulate the breach of embankment dams. The majority of those models are based on different erosion and sediment transport formulas that in turn assume different flow conditions (quasi-steady state or unsteady-state that may lead to numerical instability). Some researchers used 1-D cross-section-averaged flow models (e.g., Cristofano 1965; Brown and Rogers 1977; Ponce and Tsivoglou 1981; Fread 1984; Visser 1998; Hanson et al. 2005), while others used 2-D depth average flow models (e.g., Froehlich 2004). Although physically based models can provide better understanding and extensive information about the breach, they are complex, require several assumptions and inputs, make use of empirical coefficients to describe material and flow resistance, and the results of some of them do not adequately simulate observed case studies. For more practical and easily applied models, many researchers gathered detailed case studies of breached embankment dams and developed expressions to predict the characteristics and consequences of the breach. From those studies, Johnson and Illes (1976) were the first to predict the breach shapes for earth fill dams assuming that the breach begins as a triangle and ends as a trapezoid. They recommended a range for the breach width as a linear function of the dam height (\( h_{d} \)). Similarly, Singh and Snorrason (1984) plotted the breach widths versus dam heights for 20 case studies and stated a range for the breach width as a linear function of \( h_{d} \). MacDonald and Langridge-Monopolis (1984) used 42 case studies and suggested that the breach shape could be trapezoidal or triangular depending on whether the breach has reached the bottom of the dam or not. The Federal Energy Regulatory Commission (FERC 1987) also proposed a range for the breach width as a function of \( h_{d} \). Froehlich (1987) used nondimensional analysis and developed an equation that estimates the average breach width as a function of the non-dimensional reservoir storage (\( S^{*} \)). Froehlich realized that overtopping causes the most breach extension, which erodes at a higher rate than by any other mode of failure. The Bureau of Reclamation (1988) developed an equation for breach width of earthen dams depending on the height measured from the initial reservoir water level to the breach bottom elevation. This equation assumes a linear relationship between \( B_{av} \) and \( h_{w} \). Von Thun and Gillette (1990) used the data of MacDonald and Langridge-Monopolis (1984) and Froehlich (1987) and proposed a relation for estimating \( B_{av} \) knowing the depth of water at the dam at time of failure (\( h_{w} \)) and a coefficient (\( C_{b} \)) that depends on the reservoir storage. Later on, Froehlich (1995) published a revised equation that has better estimated coefficients to predict \( B_{av} \). The independent variables in this equation are the volume of water stored above the breach invert at time of failure (\( V_{w} \)), the breach height (\( h_{b} \)), and a factor (\( K_{o} \)) that accounts for failure mode. Wahl (1998, 2004) provided a summary of the available RE for predicting the breach width, performed an uncertainty analysis, and compared state-of-the-art prediction equations. Wahl stated that Froehlich’s (1995) equation had the best prediction performance for cases with observed breach widths <50 m. In 2008, Froehlich proposed another equation that will likely be accurate enough in application to estimate \( B_{av} \) as a function of \( V_{w}^{1/3} \) and \( K_{o} \). Most of the previous RE relate \( B_{av} \) to one or more characteristics of the dam and reservoir at failure, such as \( h_{w} \) (some investigators used \( h_{d} \) or \( h_{b} \)) and \( V_{w} \) or combination of the two. Based on the above cited researches, it can be inferred that models based on conventional mathematical tools (e.g., regression) require several assumptions to deal with nonlinear and uncertain systems. Hence, application of FL modeling offers an alternative that allows the modeler to include imprecise data and parameters without the need for any assumption.

3 Data used in developing the FL model

The ability of the FL rule-based model to accurately predict \( B_{av} \) depends upon the amount of the available historical data and experience in ‘IF–THEN’ rules construction. Generally, \( B_{av} \) is one of the most documented breach parameters in dam failure case studies. Froehlich (2008) presented tabulated information about 74 breached embankment case studies compiled from numerous sources in the literature. The development of the present FL model is mainly based on this information. Since the breach width is missing from four case studies and the reservoir volume is missing from another, the data set that is used in the FL model and the RE consists of the measured values of \( V_{w} \), \( h_{w} \), \( h_{b} \) and \( B_{av} \) given in the remaining 69 case studies. This data set is subdivided into two sets without any special selection process. The bigger set, consisting of 51 case studies, is used in the training phase of the FL model. The smaller set, consisting of 18 case studies, is used in the testing phase of the FL model. In order to save the space in this paper, these sets will be presented later in the training and testing phases of the FL model. The ‘estimated’ average breach widths are presented under the symbol \( \hat{B}_{av} \). The developed FL model is also applied to two embankment dam failures cited in recent publications (i.e., Jamestown and Big Bay dams).

4 Methodology (fuzzy sets, membership functions, and FL inference system)

As classical logic is based on classical set (crisp set) theory, FL is based on fuzzy set theory. An element x in a crisp set A from a given universe X may be defined by a membership function \( \mu_{A} \) where \( \mathop \forall \limits_{x \in X} \;\mu_{A} (x) = 1 \Leftrightarrow x \in A \) and \( \mu_{A} (x) = 0 \Leftrightarrow x \notin A \). This notation shows that crisp sets allow only full membership or no membership at all. Zadeh (1965) stated that in many systems, very precise numerical inputs are not always required, yet highly acceptable outputs are feasible. For this reason, Zadeh introduced the concept of the fuzzy set by defining partial membership, thus extending the degree of membership from {0, 1} to the continuous interval [0, 1]. A fuzzy set \( \underline{A} \) in a universe of discourse U may be represented by a set of ordered pairs consisting of a generic element x and its degree of membership to the fuzzy set \( \underline{A} \), that is, \( \underline{A} = \{ (x,\mu_{{\underline{A} }} (x))\;:\,x \in U,\;\mu_{{\underline{A} }} (x) \in [0,1]\} \). The notation \( \underline{A} = \{ (x_{1} ,0.8);\,(x_{2} ,0.25);\,(x_{3} ,0)\} \), for example, denotes the fact that elements x 1 and x 2 belong to a corresponding degree (0.8 and 0.25) to the fuzzy set \( \underline{A} \), while element x 3 does not belong to \( \underline{A} \). The function that defines to what degree an element x belongs to the fuzzy set \( \underline{A} \) is called the membership function, which is essentially a curve that can be selected as triangular, trapezoidal, or Gaussian, etc. The shape is generally less important than the number of curves and their placement. From 3 to 7 curves are generally appropriate to cover the required range of an input or an output variable (Ross 1995). In order to process the input to get the output reasoning, there are three main steps involved in the development of a FL inference system: fuzzification, rules processing, and defuzzification. The fuzzification step allows the system inputs and outputs to be expressed in linguistic terms (e.g., low, medium, high, etc.). Using membership functions, crisp input data are converted (fuzzified) against the appropriate linguistic fuzzy terms thus generating membership degrees for a fuzzy variable. The degree(s) of membership of an input parameter is determined by plugging the value of that input into the horizontal axis (universe of discourse) of the desired variable, projecting vertically to the upper boundary of the membership function(s) and then reading the corresponding membership degree(s) from the vertical axis representing the continuous interval [0, 1]. The output also has a set of membership functions that define the possible responses and outputs of the system. The rules processing step calculates the response from the system inputs according to the constructed rule base under which the FL system will operate. The rule base consists of IF–THEN conditional statements that relate the input to the desired output. The IF-part of a rule is called the antecedent, while the THEN-part is called the consequent. The fuzzified inputs are applied to the antecedents of the fuzzy rules. In case the fuzzy rule has multiple antecedents, the fuzzy AND or OR logical operators are used in order to obtain ‘a single number’ that represents the result of the antecedent for that rule. For most applications, the fuzzy membership function for a given rule with the AND operator is obtained with the minimum implication as proposed by Mamdani (1977) and given as: \( \mu_{A \cap B} (x) = \min [\mu_{A} (x),\mu_{B} (x)] \). This single number will then be used in shaping (clipping) the consequent membership function (implication from the antecedent to the consequent). Implication occurs for each rule. The outputs of each rule are then unified (aggregated) into a single fuzzy set. Thus, the input of the aggregation process is the list of clipped consequent membership functions and the output is a single fuzzy set. The defuzzification step then chooses the desired output crisp number from this aggregate fuzzy set. However, there are several defuzzification methods (centroids, bisectors, middle of maximum, etc.). The most popular method is the centroid calculation, which returns the center of area under the curve of the aggregate fuzzy set, thereby moving from a fuzzy set to a crisp number representing the desired output (Ross 1995).

5 Development and implementation of the present FL inference system

In order to develop a FL inference system, it is essential to decide the inputs, outputs, each of their domains, input and output membership functions, overlap between these functions, fuzzy inference rules, implication and aggregation methods, and the defuzzification method. The inputs to the present FL model are \( h_{w} \) and \( V_{w} \), while the output is the estimated \( B_{av} \). The ranges “universe of discourse” of the fuzzy input and output subsets in the present study are identified based on the available data of 69 breached embankment dams taken from Froehlich (2008). Statistical analysis of this data shows that there are no obvious trends between the input variables and the output, and the correlation coefficient between each of \( h_{w} \), \( V_{w} \), \( h_{b} \), and \( B_{av} \) is around 0.58. This enhances the applicability of the FL model for the considered database to give better results for estimating \( B_{av} \) in comparison with the RE. That is because FL modeling can control nonlinear systems that are difficult to model mathematically, does not require any assumption of linearity, and are capable of handling imprecision in discrete measurements and parameter uncertainty. The numbers and ranges of the fuzzy subsets of the input and output variables are proposed after classification of their data (i.e., arranging the data of each variable in an ascending order, considering only one data point from those values having similar magnitudes and drawing the ascending bar chart of the remaining values of that variable). Figure 1a, for example, is drawn after classification of the available heights of water (\( h_{w} \)). Similarly, Fig. 1b, c is drawn using the volumes of water (\( V_{w} \)) and the average breach widths (\( B_{av} \)), respectively.

Fig. 1
figure 1

Distribution of input and output variables: a for input variable \( h_{w} \), b for input variable \( V_{w} \), and c for output variable \( B_{av} \)

The horizontal axis in each of these figures represents the names of the embankments, which are omitted to save space. Such data distribution may help the modeler to get an overview of the data and identify obvious separations or clusters to be used in creating the possible number of fuzzy subsets with their ranges for the input and output variables. The proposed fuzzy sets and their spans for the input and output variables are given in Table 1.

Table 1 Fuzzy terms and corresponding ranges of input and output variables

The term set of the first input variable (\( h_{w} \)) is selected as {short (SH), medium (M), high (H)} in the universe of discourse [1.68–77.4] in meters. The term set of the second input (\( V_{w} \)) is {very low (VL), low (L), low medium (LM), medium (M), high medium (HM), high (H)} in the universe of discourse [0.0139–310], which represents the minimum and maximum storage volumes above breach invert at time of failure in Mm3. Similarly, the term set of the output variable (\( B_{av} \)) is selected as {very tight (VT), tight (T), medium (M), wide (W), very wide (VW)} in the universe of discourse [2.29–183] in meters. The present FL model for estimating the average breach width is implemented in MATLAB 6.5.1 (ITU Computer Center) using Mamdani’s (1977) inference method, which expects the output to be fuzzy sets. The MATLAB fuzzy toolbox allows for the creation of input membership functions, fuzzy rules, and output membership functions through a graphical user interface. Based on the data of Table 1, the membership functions of the input and output variables are drawn using the Membership Function Editor. The membership functions should overlap to allow smooth mapping of the system. Straight line functions (e.g., triangular membership functions) are commonly used because they are the simplest and easiest to implement. In the present model, triangular membership functions are selected for the input variables (\( h_{w} \) and \( V_{w} \)) and the output variable (\( B_{av} \)), as shown in Fig. 2a–c, respectively. In order to easily repeat the process by other users, it is preferred to show these figures as originally obtained from the framework of MATLAB. It is to be noted that the fuzzy terms of some membership functions (e.g., VL, L and LM in Fig. 2b) are written over each other by MATLAB because of their very small ranges.

Fig. 2
figure 2

Membership functions: a for input variable \( h_{w} \), b for input variable \( V_{w} \), and c for output variable \( B_{av} \)

Each particular input or output can be interpreted from such fuzzy sets and a degree of membership is read. For example, using Fig. 2a, an input with \( h_{w} \) = 10 m will have about a 15 % membership in the SH function and about a 35 % membership in the M function. Once the input and output membership functions are defined, the fuzzy rules that control the system can now be prepared using the Rule Editor. The fuzzy rules are in the form of IF–THEN statements that look at both inputs (\( h_{w} \) and \( V_{w} \)) and determine the desired output (\( \hat{B}_{av} \)). For each output, several different rules will usually be used since the inputs will usually be in more than one membership function. For the present FL model, the rule base contains all linguistic rules that are extracted from the available numeric data processed in Excel worksheets and provided by experience as follows:

  1. 1.

    Type the values of \( h_{w} \), \( V_{w} \), and \( B_{av} \) in an Excel worksheet in vertical order in three separate columns (say columns B, D, and F).

  2. 2.

    Fix a particular function in the in between columns (i.e., columns C, E, and G) in front of the values of each variable in order to produce linguistic variables for each value depending on the boundaries of the previously selected membership functions. Suppose that the values of \( h_{w} \) are entered in the cells B2:B52, then fix the following function at the cell C2 and scroll down to the cell C52.

$$ \begin{aligned} = {\text{IF}}({\text{AND}}({\text{B2}} \ge 1. 6 8,{\text{ B2}} \le 1 2. 3),\hbox{``}{\text{SH}}\hbox{''},{\text{ IF}}({\text{AND}}({\text{B2}} > 6. 5,{\text{ B2}} \le 2 7. 3), \\ & \quad \, \hbox{``}{\text{M}}\hbox{''},{\text{ IF}}({\text{AND}}({\text{B2}} > 2 5. 5,{\text{ B2}} \le 7 7. 4), \, \hbox{``}{\text{H}}\hbox{''}))) \\ \end{aligned} $$
  1. 3.

    Repeat step 2 for \( V_{w} \) and \( B_{av} \) in order to produce linguistic variables for each of them. Recognize that the functions for \( V_{w} \) and \( B_{av} \) will have different forms depending on the selected boundaries and number of membership functions for each.

By doing so, and looking at the resulting linguistic variables line by line (i.e., for each embankment dam), one can build an idea concerning the possible IF–THEN rules that may control the available numeric data. This can be achieved by selecting those rules that are most repeated for each category of linguistic variables. However, one may find that some of the rules contradict each other, that is, although the linguistic variables in the ‘IF’ part of some rules are identical, the output linguistic variables in the ‘THEN’ part of the same rules are very different. For such reasons, the modeler has to modify some of the rules and exclude others from the whole list, taking into consideration the overlap between the membership functions, the possibility that some data may not trigger the rules, and the strength of some rules. This process depends on trial and error and experience and becomes tedious and difficult in the case of a very large number of rules. The full set of rules in the present FL model is stated as shown in Table 2. The number of rules in this study is 18, where the ‘AND’ fuzzy operator is used in the formation of the antecedents of those rules. The fuzzy membership function for a given rule using the ‘AND’ fuzzy operator is obtained with the minimum implication as proposed by Mamdani (1977). It determines the degree to which the antecedent is satisfied for each rule. This is illustrated in Fig. 3, which assumes that rules 5 and 6 are activated when the two inputs \( h_{w} \) = 5 m and \( V_{w} \) = 80 Mm3 are considered. If it is assumed that the AND operator (min at work) evaluating the antecedent of rule 5 (\( h_{w} \) is SH and \( V_{w} \) is HM) yielded the fuzzy membership degrees 0.6 and 0.4, respectively, then the fuzzy AND operator simply selects the minimum of the two values, 0.4, and the fuzzy operation for that rule is complete. This minimum value, 0.4, is then used in clipping the output membership function (\( B_{av} \) is W) as shown in Fig. 3.

Table 2 List of fuzzy rules
Fig. 3
figure 3

Schematic representation showing the FL inference process

Each activated rule will provide a particular clipped membership function for the output variable; see also the output of rule 6. The clipped membership functions of all the activated rules are combined together (aggregated) into a single fuzzy set. In Fig. 3, the process proceeds from the inputs in the upper left, then across each rule line by line and then down the rule outputs to finish with a single fuzzy set in the lower right that should be defuzzified, or resolved to a single number representing the desired output. This can be done in many ways, but the most common method used is the center of gravity of that single set. If one value (singleton position, \( B_{{av_{i} }} \)) is associated with each level (\( i = 1, \ldots ,n \)) in the single fuzzy set of the output instead of a range of values, the centroid (\( \hat{B}_{av} \)) can be approximated by \( \hat{B}_{av} = \sum\nolimits_{i = 1}^{n} {B_{{av_{i} }} \mu (B_{{av_{i} }} )/\sum\nolimits_{i = 1}^{n} {\mu (B_{{av_{i} }} )} } \).

6 Results of the training phase of the FL model

In the training phase of the FL model, more work needs to be done before it becomes available in its final form for estimating the average breach width. The model should be trained with the measured data through several simulations. In each simulation, one should evaluate the results and tune the model until satisfactory outputs are obtained. Tuning the model can be done by changing some of the rules consequents or strengths, changing the centers of the input and/or output membership functions, or slightly adjusting the ranges of the input and/or output membership functions. Table 3 shows the estimated and calculated average breach widths resulting from the final accepted training phase of the present FL model and some of the available RE. From Table 3, it is seen that in general the observed ‘measured’ average breach widths are successfully estimated by implementing the present FL approach. Moreover, the FL estimates are either close to the observed values or to those calculated by Froehlich’s (1995) regression equation. Since breach formation depends on numerous factors (material properties, construction conditions, protective cover, etc.), the available data show contradiction in the outputs at some dams, although they have approximately the same height and the reservoir volume as inputs.

Table 3 Estimated average breach widths (training phase)

7 Results of the testing phase of the FL Model

For the testing phase of the developed FL model, a subset of eighteen dams is randomly selected from the database (Table 4) and used as a means of providing a comparison for assessing the performance of the FL model. Data from the same subset are also tested in some of the available RE for predicting average breach width. Table 4 summarizes the simulation results of this phase. From the values presented in Table 4, it is seen that the FL model gives comparable estimates to the observed average breach widths. The other notable feature in Table 4 is that the fuzzy estimates in most of the cases are approximately equal to those calculated by Froehlich’s (1995) regression equation.

Table 4 Estimated average breach widths (testing phase)

Figure 4a, b presents visual inspection of the results in order to compare the relations of the estimates of the FL model and Froehlich (1995) equation with case study data as obtained from the training and testing phases, respectively. While there is a reasonable match between the estimates of the FL model and the Froehlich (1995) equation at some dam sites (e.g., Hatchtown-Utah, Lake Avalon-N.M., Prospect-Colo., Wheatland No. 1-Wyo. in the training phase and Butler-Ariz., Castlewood-Colo., Coedty-UK, Fogelman-Tenn. in the testing phase), these estimates are different from the observed values. For the FL model, this can be attributed to the rules that are built to suit the larger number of data within a group of dams.

Fig. 4
figure 4

a Comparison of FL and Froehlich (1995) estimates with measured data in the training phase. b Comparison of FL and Froehlich (1995) estimates with measured data in the testing phase

8 Uncertainty analysis of FL and RE estimates (testing phase)

Using the available equations, Wahl (2004) prepared log–log plots between the observed and predicted breach widths and noticed that the data are scattered approximately evenly above and below the lines of perfect prediction, suggesting that uncertainties would best be expressed as a number of log cycles on either side of the predicted value. He stated that the available RE for predicting breach width had absolute mean prediction errors <1/10th of an order of magnitude, indicating that on average their predictions are on target with the Froehlich (1995) equation having the smallest uncertainty.

An uncertainty analysis on the FL estimates (\( \hat{B}_{av} \)) resulted from the testing phase is performed with a similar procedure to that conducted by Wahl (2004). Application of the uncertainty analysis, for the same database considered in this phase, is also illustrated for the breach widths calculated by Froehlich (1995), Bureau of Reclamation (1988), and Von Thun and Gillette (1990) equations. The uncertainty analysis method consists of the following steps:

  1. 1.

    Compute \( e_{i} = \log_{10} (\hat{B}_{av} /B_{av} ) \), where e i ’s are individual prediction errors in terms of the number of log cycles separating the estimated and observed value, \( \hat{B}_{av} \) and \( B_{av} \) are the estimated and the observed average breach widths, respectively.

  2. 2.

    Apply the outlier-exclusion algorithm to the series of prediction errors computed in step 2. The algorithm is described by Rousseeuw (1998):

    • Determine the estimator of location, \( T = {\text{median}}(e_{i} ) \).

    • Compute the deviations from the median and determine the median of these absolute deviations, \( {\text{MAD}} = {\text{median}}\left| {T - e_{i} } \right| \).

    • Compute an estimator of scale, \( S_{\text{MAD}} = 1.483 \times ({\text{MAD}}) \). The 1.483 factor makes \( S_{\text{MAD}} \) comparable to the standard deviation, which is the usual scale parameter of a normal distribution. Compute a Z score for each observation, \( Z_{i} = (e_{i} - T)/S_{\text{MAD}} \), then reject any observations for which \( \left| Z \right| > 2.5. \) If the samples are from a perfect normal distribution, this method rejects at the 98.7 % probability level.

  3. 3.

    Compute the mean, \( \bar{e} \), and the standard deviation, \( S_{e} \), of the remaining prediction errors. If the mean value is negative, it indicates that the prediction equation underestimated the observed values, and if positive the equation overestimated the observed values. A confidence band around the predicted value is expressed using the values of \( \bar{e} \) and \( S_{e} \), as \( \left\{ {\hat{B}_{av} \cdot 10^{{ - \bar{e} - 2S_{e} }} ,\hat{B}_{av} \cdot 10^{{ - \bar{e} + 2S_{e} }} } \right\} \). The use of \( \pm 2S_{e} \) approximately yields a 95 % confidence band.

Table 5 summarizes the results of the performed uncertainty analysis on the FL and RE estimates from the testing phase. From Table 5, it can be seen that the FL model and the Froehlich (1995) equation predicted the average breach widths with equal mean prediction error of \( \bar{e} \) = +0.03; however, the fuzzy uncertainty band of ±2\( S_{e} \) = ±0.51 order of magnitude is slightly smaller than the ±2\( S_{e} \) = ±0.56 order of magnitude obtained by the Froehlich (1995) equation. The results suggest that the FL model could be a good tool for estimating average breach width, although not replacing the best available regression equation of Froehlich (1995), but complementing it with extra information that may help the modeler. The FL approach provides the modeler with the capability to express the uncertainty inherent in such real systems, especially when confidence intervals of uncertainty concerning the data and the parameters are unknown. It also allows the modeler to use his intelligence, understanding, and descriptive capabilities for making decisions to solve nonlinear systems without complex mathematical formulation. Moreover, the FL model provides a comparatively effective tool for estimating the breach width.

Table 5 Uncertainty estimates of average breach widths from FL model and some RE (testing phase)

Moreover, the uncertainty analysis showed that the FL approach and Froehlich (1995) equation yielded average breach widths with comparatively less mean prediction error than other RE. In light of the results presented in Tables 3, 4, and 5, one can conclude that the FL model can give reliable estimations for the average breach widths of embankment dams in comparison with the available RE. Linear regression methods performed well in case of strong linear relationship between the inputs and output. However, as the data and their functional relationships possess nonlinear behavior, the modifications become necessary for regression analysis. In contrast, the fuzzy approach can offer an alternative for considering such nonlinearity through vaguely defined membership functions.

9 Application

For validation of the FL model, two embankment failure case studies cited in recent publications are used. These are the risk assessment study for the Jamestown embankment dam, conducted in January 2001 by the Bureau of Reclamation, and the Big Bay dam failure that occurred on March 12, 2004. The Jamestown dam is a zoned-earth fill with a height of 24.7 m above the original streambed. The crest length is 432 m at an elevation of 448.36 m, the crest width is 9.14 m, and assumed initial elevation of piping failure is 423.7 m, Wahl (2004). One potential reservoir water surface elevation at failure is considered in this study, that is, the top-of-flood-space with elevation 443.18 m and reservoir capacity of about 273.3 × 106 m3. The Big Bay embankment is approximately 576 m long and 15.6 m high. A pool elevation of 84.89 m, which corresponds to storage of 17.5 × 106 m3, was used in the analysis. Site survey indicated an average breach width of 83.2 m, original ground elevation of 71.3 m, and assumed initial piping elevation of 71.3 m (Yochum et al. 2008).

Application of the FL model for estimating the breach width is illustrated for the mentioned data in these case studies. Two simulation runs for the FL model were made using the interactive View Rule option. The FL model requires the input data related to the water height and the reservoir storage in these case studies. Table 6 presents the results of these case studies using the present FL model and some of the available RE.

Table 6 Estimated breach widths for Jamestown and Big Bay dams using the FL model and some RE

Using some error criteria such as the mean absolute error \( {\text{MAE}} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {\left| {\hat{B}_{{av_{i} }} - B_{{av_{i} }} } \right|} \), mean relative error \( {\text{MRE}} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {100\frac{{(\hat{B}_{{av_{i} }} - B_{{av_{i} }} )}}{{B_{{av_{i} }} }}} \), and mean square error \( {\text{MSE}} = \frac{1}{N}\sum\nolimits_{i = 1}^{N} {(\hat{B}_{{av_{i} }} - B_{{av_{i} }} )^{2} } \), where N is the total number of predicted outputs, \( \hat{B}_{{av_{i} }} \) is the ith average breach width estimated by FL model or RE, and \( B_{{av_{i} }} \) is the corresponding ith observation, can help in comparisons between measured versus estimated average breach widths obtained using the FL model and the RE, (Table 7). In Table 7, the error values from RE are higher than those obtained from the FL model. The results show that the FL approach provides quite reasonable estimates for the observed average breach widths of these case studies compared with the applied RE. Moreover, the FL model and the Froehlich (1995) equation gave more accurate estimates for the average breach widths of these case studies than the other available RE.

Table 7 Error criteria using the estimates of the FL model and some RE (application phase)

10 Conclusions

This paper presented a FL rule-based model to estimate average breach widths of embankment dams. It provides an alternative solution to treat uncertainty and imprecision of breach data through the vaguely defined membership functions. The model is based on Mamdani’s inference system, which expects the output to be a fuzzy set and is very good for the representation of human reasoning to describe systems in linguistic terms rather than in terms of complex relationships. Two data sets of 51 dams and 18 dams were used in the training and testing phases of the FL model, respectively. After performing an uncertainty analysis for its estimates, the developed FL model is applied through simulation of two case studies. Simulation results of the testing and application phases indicated that the proposed FL model exhibits reasonable accuracy and its estimates are in good agreement with the observed measurements and the results of the best available regression equation. The FL model offers a good tool that can be effectively used to predict the average breach width. Although the application of FL for estimating average breach width is promising, the proposed FL model can further be adjusted by searching for optimum key parameters and membership functions and modifying the rules accordingly.