The usefulness of any model is in part dependent on the accuracy and reliability of its output data. Yet, because all models are abstractions of reality, and because precise input data are rarely if ever available, all output values are subject to imprecision. The input data and modeling uncertainties are not independent of each other. They can interact in various ways. The end result is imprecision and uncertainty associated with model output. This chapter focuses on ways of identifying, quantifying, and communicating the uncertainties in model outputs .

## 8.1 Introduction

Models are the primary way we have to estimate the multiple impacts of alternative water resource system design and operating policies. Models are used to estimate the values of various system performance indicators resulting from specific design and/or operating policy decisions. Model outputs are based on model structure , hydrologic and other time series inputs and a host of parameters whose values characterize the system being simulated. Even if these assumptions and input data reflect, or are at least representative of, conditions believed to be true, we know the model outputs or results will be wrong. Our models are always simplifications of the real systems we are analyzing. Furthermore, we simply cannot forecast the future with precision. So we know the model outputs defining future conditions are uncertain estimates, at best.

Some input data uncertainties can be reduced by additional research and further data collection and analysis. Before spending money and time to gather and analyze additional data, it is reasonable to ask what improvement in estimates of system performance or what reduction in the uncertainty associated with those estimates would result if all data and model uncertainties could be reduced if not eliminated. Such information helps determine how much one would be willing to “pay” to reduce model output uncertainty. If the uncertainty on average is costing a lot, it may pay to invest in additional data collection, in more studies, or in developing better models, all aimed at reducing that uncertainty. If that uncertainty only a very modest, impact on the likely decision that is to be made, one should find other issues to worry about.

If it appears that reducing uncertainty is worthwhile, then the question is how best to do it. If doing this involves obtaining additional information, then it is clear that the value of this additional information, however measured, should exceed the cost of obtaining it. The value of such information will be the benefits of more precise estimates of system performance, or the reduction of the uncertainty, that one can expect from obtaining such information. If additional information is to be obtained, it should be focused on that which reduces the uncertainties considered important, not the unimportant ones.

This chapter reviews some methods for identifying and communicating model output uncertainty. The discussion begins with a review of the causes of risk and uncertainty in model output. It then examines ways of measuring or quantifying uncertainty and model output sensitivity to model input imprecision, concentrating on methods that seem most relevant or practical for analyses of large-scale regional systems. It builds on some of the statistical and stochastic modeling methods reviewed in the previous two chapters.

## 8.2 Issues, Concerns, and Terminology

Outcomes or events that cannot be predicted with certainty are often called risky or uncertain. Some individuals draw a special and interesting distinction between risk and uncertainty. In particular, the term risk is often reserved to describe situations for which probabilities are available to describe the likelihood of various possible events or outcomes. Often risk refers to these probabilities times the magnitude of the consequences of these events or outcomes. If probabilities of various events or outcomes cannot be quantified, or if the events themselves are unpredictable, some would say the problem is then one of uncertainty, and not of risk. In this chapter what is not certain is considered uncertain, and uncertainty is often estimated or described using probability distributions. When the ranges of possible events are known and their probabilities are measurable, risk is called objective risk. If the probabilities are based solely on human judgment, the risk is called subjective risk.

Such distinctions between objective and subjective risk, and between risk and uncertainty, rarely serve any useful purpose to those developing and using models. Likewise the distinctions are often unimportant to those who should be aware of the risks or uncertainties associated with system performance indicator values. If the probabilities associated with possible events or outcomes are unknown, and especially if the events themselves are unknown, then the approaches for performing sensitivity and uncertainty analyses will differ from those that are based on assumed known events and their probabilities.

Uncertainty in information is inherent in future-oriented planning efforts. Uncertainty stems from inadequate information and incorrect assumptions, as well as from the variability and possibly the nonstationarity of natural processes. Water managers often need to identify both the uncertainty as well as the sensitivity of system performance due to any changes in possible input data. They are often obligated to reduce any uncertainty to the extent practicable. Finally, they need to communicate the residual uncertainties clearly so that decisions can be made with this knowledge and understanding.

Sensitivity analysis can be distinguished from uncertainty analysis. Sensitivity analysis procedures explore and quantify the impact of possible changes (errors ) in input data on predicted model output s and system performance indices. Simple sensitivity analysis procedures can be used to illustrate either graphically or numerically the consequences of alternative assumptions about the future. Uncertainty analyses employing probabilistic descriptions of model inputs can be used to derive probability distributions of model outputs and system performance indices. Figure 8.1 illustrates the impact of both input data sensitivity and input data uncertainty on model output uncertainty.

It is worthwhile to explore the transformation of uncertainties in model inputs and parameters into uncertainty in model outputs when conditions differ from those reflected by the model inputs. Historical records of system characteristics are typically used as a basis for model inputs. Yet conditions in the future may change. There may be changes in the frequency and amounts of precipitation , changes in land cover and topography, and changes in the design and operation of control structures, all resulting in changes of water stages and flows, and their qualities, and consequently changes in the impacted ecosystems .

If asked how the system would operate with inputs similar to those observed in the past, the model should be able to provide a fairly precise estimate. Still that estimate will not be perfect. This is because our ability to reproduce current and recent operations is not perfect, though it should be fairly good. If asked to predict system performance for situations very different from those in the past, or when the historical data are not considered representative of what might happen in the future, say due to climate or technology change, such predictions become much less precise. There are two reasons why. First, our description of the characteristics of those different situations or conditions may be imprecise. Second, our knowledge base may not be sufficient for calibrating model parameters in ways that would enable us to reliably predict how the system will operate under conditions unlike those that have been experienced historically. The more conditions of interest are unlike those in the past, the less confidence we have that the model is providing a reliable description of systems operation. Figure 8.2 illustrates this issue.

Clearly a sensitivity analysis needs to consider how well a model can replicate current operations, and how similar the target conditions or scenarios are to those that existed in the past. The greater the required extrapolation from what has been observed, the greater will be the importance of parameter and model uncertainties.

The relative and absolute importance of different parameters will depend on the system performance indicators of interest. Seepage rates may have a very large local effect, but a small global effect. Changes in system-wide evapotranspiration rates will likely impact system-wide flows. The precision of model projections and the relative importance of errors in different parameters will depend upon the:

1. (1)

precision with which the model can reproduce observed conditions,

2. (2)

difference between the conditions predicted in the future and the those that occurred in the past, and the

3. (3)

system performance characteristics of interest .

Errors and approximations in input data measurement, parameter values , model structure and model solution algorithms, are all sources of uncertainty. While there are reasonable ways of quantifying and reducing these errors and the resulting range of uncertainty of various system performance indicator values they are impossible to eliminate. Decisions will still have to be made in the face of a risky and uncertain future. Some decisions may be able to be modified as new data and knowledge are obtained in a process of adaptive management .

There is also uncertainty with respect to human behavior and reaction related to particular outcomes and their likelihoods, i.e., to their risks and uncertainties. As important as risks and uncertainties associated with human reactions are to particular outcomes, they are not usually part of the models themselves. Social uncertainty may often be the most significant component of the total uncertainty associated with just how a water resource system will perform. For this reason, we should seek designs and operating policies that are flexible and adaptable.

When uncertainties associated with system operation under a new operating regime are large, one should anticipate the need to make changes and improvements as experience is gained and new information accumulates. When predictions are highly unreliable, responsible managers should favor actions that are robust (e.g., good under a wide range of situations), gain information through research and experimentation, monitor results to provide feedback for the next decision, update assessments and modify policies in the light of new information, and avoid irreversible actions and commitments.

## 8.3 Variability and Uncertainty in Model Output

Differences between model output and observed values can result from either natural variability, say caused by unpredictable rainfall, evapotranspiration, water consumption, and the like, and/or by both known and unknown errors in the input data, the model parameters, or the model itself. The later is sometimes called knowledge uncertainty but it is not always due to a lack of knowledge. Models are always simplifications of reality and hence “imprecision” can result. Sometimes imprecision occurs because of a lack of knowledge, such as just how much rainfall, evapotranspiration and consumption will occur, or just how a particular species will react to various environmental and other habitat conditions. Other times known errors are introduced simply for practical reasons.

Imperfect representation of processes in a model constitutes model structural uncertainty. Imperfect knowledge of the values of parameters associated with these processes constitutes model parameter uncertainty . Natural variability includes both temporal variability and spatial variability, to which model input values may be subject.

Figure 8.3 illustrates these different types of uncertainty. For example, the rainfall measured at a weather station within a particular model grid cell may be used as an input value for that cell, but the rainfall may actually vary at different points within that cell and its mean value will vary across the landscape. Knowledge uncertainty can be reduced through further measurement and/or research. Natural variability is a property of the natural system, and is usually not reducible. Decision uncertainty is simply an acknowledgement that we cannot predict ahead of time just what decisions individuals and organizations will make, or even just what particular set of goals or objectives will be considered in the future and the relative importance of each of them.

Rather than contrasting “knowledge” uncertainty versus natural variability versus decision uncertainty, one can classify uncertainty in another way based on specific sources of uncertainty, such as those listed below, and address ways of identifying and dealing with each source of uncertainty.

Informational Uncertainties:

• imprecision in specifying the boundary and initial conditions that impact the output variable values

• imprecision in measuring observed output variable values

Model Uncertainties:

• uncertain model structure and parameter values

• variability of observed input and output values over a region smaller than the spatial scale of the model

• variability of observed model input and output values within a time smaller than the temporal scale of the model. (e.g., rainfall and depths and flows within a day)

• errors in linking models of different spatial and temporal scales

Numerical Errors:

• errors in the model solution algorithm

### 8.3.1 Natural Variability

The main source of hydrologic model output value variability is the natural variability in hydrological and meteorological input series. Periods of normal precipitation and temperature can be interrupted by periods of extended drought and intense meteorological events such as hurricanes and tornadoes. There is reason to think such events will continue to occur and become even more frequent and extreme. Research has demonstrated that climate has been variable in the past and concerns about anthropogenic activities that may increase that variability increase each year. Sensitivity analysis can help assess the effect of errors in predictions if those predictions are based only on past records of historical time series data describing precipitation, temperature, and other exogenous forces in and on the border of the regions being studied.

Time series input data are often actual, or at least based on, historical data. The time series values typically describe historical conditions including droughts and wet periods. What is distinctive about natural uncertainty, as opposed to errors and uncertainty due to modeling limitations, is that natural variability in meteorological forces cannot be reduced by improving the model’s structure, increasing the resolution of the simulation, or by better calibration of model parameters.

Errors result if meteorological values are not measured or recorded accurately, or if mistakes are made when creating computer data files. Furthermore, there is no assurance the statistical properties of historical data will accurately represent the statistical properties of future data. Actual future precipitation and temperature scenarios will be different from those in the past, and this difference in many cases may have a larger affect than the uncertainty due to incorrect parameter values . However, the effects of uncertainties in the parameter values used in stochastic generation models are often much more significant than the effects of using different stochastic generation models (Stedinger and Taylor 1982).

While variability of model output is a direct result of variability of model input (e.g., hydrologic and meteorological data), the extent of the variability, and the lower and upper limits of that variability, may also be affected by errors in the inputs, the values of parameters, initial boundary conditions , model structure , processes and solution algorithms.

Figure 8.4 illustrates the distinction between the variability of a system performance indicator due to input data variability, and the extended range of variability due to the total uncertainty associated with any combination of the causes listed in the previous section. This extended range is what is of interest to water resource planners and managers.

In practice a time series of system performance indicator values can range anywhere within or even outside the extended range, assuming the confidence level of that extended range is less than 100%. The confidence one can have that some future value of a time series will be within a given range is dependent on two factors. The first is the number of measurements used to compute the confidence limits. The second is on the assumption that those measurements are representative of—come from the same statistical or stochastic process yielding—future measurements. Figure 8.5 illustrates this point. Note that the time series may even contain values outside the range “b” defined in Fig. 8.4 if the confidence level of that range is less than 100%. Confidence intervals associated with less than 100% certainty will not include every possible value that might occur. Furthermore, it is unlikely one will ever know the 100% confident interval that includes all values that could ever occur.

### 8.3.2 Knowledge Uncertainty

Referring to Fig. 8.3, knowledge uncertainty includes model structure and parameter value uncertainties. First, we consider parameter value uncertainty including boundary condition uncertainty, and then model and solution algorithm uncertainty.

#### 8.3.2.1 Parameter Value Uncertainty

A possible source of uncertainty in model output results from uncertain estimates of various model parameter values. If the model calibration procedure was repeated using different data sets, it would have been resulted in different parameter values. Those values would yield different simulated system behavior, and thus different predictions. We can call this parameter uncertainty in the predictions because it is caused by imprecise parameter values. If such parameter value imprecision was eliminated, then the prediction would always be the same and so the parameter value uncertainty in the predictions would be zero. But this does not mean that predictions would be perfectly accurate.

In addition to parameter value imprecision, uncertainty in model output can result from imprecise specification of boundary conditions . These boundary conditions can be either fixed or variable. However, because they are not being computed based on the state of the system, their values can be uncertain. These uncertainties can affect the model output, especially in the vicinity of the boundary, in each time step of the simulation.

#### 8.3.2.2 Model Structural and Computational Errors

Uncertainty in model output can also result from errors in the model structure compared to the real system, and approximations made by numerical methods employed in the simulation. No matter how good our parameter value estimates, our models are not perfect and there is a residual model error . Increasing model complexity to more closely represent the complexity of the real system may not only add to the cost of data collection , but also introduce even more parameters, and thus even more potential sources of error in model output. It is not an easy task to judge the appropriate level of model complexity, and to estimate the resulting levels of uncertainty associated with various assumptions regarding model structure and solution methods. Kuczera (1988) provides an example of a conceptual hydrologic modeling exercise with daily time steps where model uncertainty dominated parameter value uncertainty.

### 8.3.3 Decision Uncertainty

Uncertainty in model predictions can result from unanticipated changes in what is being modeled. These can include changes in nature, human goals, interests, activities, demands, and impacts. An example of this is the deviation from standard or published operating policies by operators of infrastructure such as canal gates, pumps, and reservoirs in the field, as compared to what is specified in documents and incorporated into the water systems models. Comparing field data with model data for model calibration may yield incorrect calibrations if operating policies actually implemented in the field differ significantly from those built into the models. What do operators do in times of stress? And can anyone identify a place where deviations from published policies do not occur? Policies implemented in practice tend to address short-term changes in policy objectives.

What humans will want to achieve in the future may not be the same as what they want today. Predictions of what people will want in the future are clearly sources of uncertainty. A perfect example of this is in the very flat Greater Everglades region of south Florida in the US. Sixty years ago, folks wanted the swampy region protected from floods and drained for agricultural and urban development. Today, many want just the opposite at least where there are no human settlements. They want a return to a more natural hydrologic system with more wetlands and unobstructed flows, but now for ecological restoration objectives that were not a major concern or much appreciated half a century ago. Once the mosquitoes return and if the sea level continues to rise, future populations who live there may want more flood control and drainage again. Who knows? Complex changing social and economic processes influence human activities and their demands for water resources and environmental amenities over time.

Sensitivity scenarios that include human activities can help define the effects of those human activities within an area. It is important that these alternative scenarios realistically capture the forces or stresses that the system may face. The history of systems studies is full of examples where the issues studied were rapidly overwhelmed by much larger social forces resulting from, for example, the relocation of major economic activities, an oil embargo, changes in national demand for natural resources, economic recession, sea-level rise, an act of terrorism, or even war. One thing is sure: the future will be different than the past, and no one can be certain just how.

#### 8.3.3.1 Surprises

Water resource managers may also want to consider how vulnerable a system is to undesirable environmental surprises. What havoc might an introduced species like the zebra mussel invading the Great Lakes of North America have in a particular watershed ? Might some introduced disease suddenly threaten key plant or animal species? Might management plans have to be restructured to address the survival of some species such as salmon in the Rhine River in Europe or in the Columbia River in North America? Such uncertainties are hard to anticipate when by their nature they are truly surprises. But surprises should be expected. Hence system flexibility and adaptability should be sought to deal with changing management demands, objectives, and constraints .

## 8.4 Sensitivity and Uncertainty Analyses

An uncertainty analysis is not the same as a sensitivity analysis. An uncertainty analysis attempts to describe the entire set of possible outcomes, together with their associated probabilities of occurrence. A sensitivity analysis attempts to determine the relative change in model output values given modest changes in model input values. A sensitivity analysis thus measures the change in the model output in a localized region of the space of inputs. However, one can often use the same set of model runs for both uncertainty analyses and sensitivity analyses. It is possible to carry out a sensitivity analysis of the model around a current solution and then use it as part of a first-order uncertainty analysis.

This discussion begins by focusing on some methods of uncertainty analysis. Then various ways of performing and displaying sensitivity analyses are reviewed.

### 8.4.1 Uncertainty Analyses

Recall that uncertainty involves the notion of randomness. If a value of a performance indicator or performance measure, or in fact any variable (like the phosphorus concentration or the depth of water at a particular location) varies and this variation over space and time cannot be predicted with certainty, it is called a random variable . One cannot say with certainty what the value of a random variable will be but only the likelihood or probability that it will be within some specified range of values. The probabilities of observing particular ranges of values of a random variable are described or defined by a probability distribution. Here we are assuming we know, or can compute, or can estimate, this distribution.

Suppose the random variable is X. If the observed values of this random variable can be only discrete values, the probability distribution of X can be expressed as a histogram, as shown in Fig. 8.6a. The sum of the probabilities for all possible outcomes must equal 1. If the random variable is a continuous variable that can assume any real value over a range of values, the probability distribution of X can be expressed as a continuous distribution as shown in Fig. 8.6b. The shaded area under the density function for the continuous distribution is 1. The area between two values of the continuous random variable, such as between u and v in Fig. 8.6c, represents the probability that the observed value x of the random variable value X will be within that range of values.

The probability distribution, P X (x) shown in Fig. 8.6a is called a probability mass function. The probability distributions shown in Fig. 8.6b, c are called probability density functions (pdf) and are denoted by f X (x). The subscript X of P X and f X represents the random variable, and the variable x (on the horizontal axes in Fig. 8.6) is some value of that random variable X.

Uncertainty analyses involve identifying characteristics of various probability distributions of model input and output variables, and subsequently functions of those random output variables that are performance indicators or measures. Often targets associated with these indicators or measures are themselves uncertain.

A complete uncertainty analysis would involve a comprehensive identification of all sources of uncertainty that contribute to the joint probability distributions of each input or output variable. Assume such analyses were performed for two alternative project plans, A and B, and that the resulting probability density distributions for a specified performance measure were as shown in Fig. 8.7. Figure 8.7 also identifies the costs of these two projects. The introduction of two performance criteria , cost and probability of exceeding a performance measure target (e.g., a pollutant concentration standard) introduces a conflict where a tradeoff must be made.

#### 8.4.1.1 Model and Model Parameter Uncertainties

Consider a situation as shown in Fig. 8.8, in which for a specific set of model inputs, the model outputs differ from the observed values, and for those model inputs, the observed values are always the same. Here nothing randomly occurs. The model parameter values or model structure needs to be changed. This is typically done in a model calibration process.

Given specific inputs, the outputs of deterministic models are always going to be the same each time those inputs are simulated. If for specified inputs to any simulation model the predicted output does not agree with the observed value, as shown in Fig. 8.8, this could result from imprecision in the measurement of observed data. It could also result from imprecision in the model parameter values, the model structure, or the algorithm used to solve the model.

Next consider the same deterministic simulation model but now assume at least some of the inputs are random, i.e., not predictable, as may be case when random outputs of one model are used as inputs into another model. Random inputs will yield random outputs. The model input and output values can be described by probability distributions. If the uncertainty in the output is due only to the uncertainty in the input, the situation is similar to that shown in Fig. 8.8. If the distribution of performance measure output values does not fit or is not identical to the distribution of observed performance measure values, then calibration of model parameter values or modification of model structure may be needed.

If a model calibration or “identification” exercise finds the “best” values of the parameters to be outside reasonable ranges of values based on scientific knowledge, then the model structure or algorithm might be in error. Assuming the algorithms used to solve the models are correct and observed measurements of system performance vary for the same model inputs, as shown in Fig. 8.9, it can be assumed that the model structure does not capture all the processes that are taking place and that impact the value of the performance measures. This is often the case when relatively simple and low-resolution models are used to estimate the hydrological and ecological impacts of water and land management policies. However, even large and complex models can fail to include or adequately describe important phenomena.

In the presence of informational uncertainties, there may be considerable uncertainty about the values of the “best” parameters during calibration. This problem becomes even more pronounced with increases in model complexity.

An example: Consider the prediction of a pollutant concentration at some site downstream of a pollutant discharge site. Given a streamflow Q (in units of 1000 m3/day), the distance between the discharge site and the monitoring site, X (m), the pollutant decay rate constant k (day−1), and the pollutant discharge W (kg/day), we can use the following simplified model to predict the concentration of the pollutant C (g/m3 = mg/l) at the downstream monitoring site:

$$C = \left( {W/Q} \right){ \exp }\left\{ { - k\left( {X/U} \right)} \right\}$$

In the above equation assume the velocity U (m/day) is a known function of the streamflow Q.

In this case the observed value of the pollutant concentration C may differ from the computed value of C even for the same inputs of W, Q, k, X, and U. Furthermore, this difference varies in different time periods . This apparent variability, as illustrated in Fig. 8.9, can be simulated using the same model but by assuming a distribution of values for the decay rate constant k. Alternatively the model structure can be modified to include the impact of streamflow temperature T on the prediction of C.

$$C = \left( {W/Q} \right){ \exp }\{ - k\uptheta^{{T - 20}} \left( {X/U} \right)\}$$

Now there are two model parameters, the decay rate constant k and the dimensionless temperature correction factor θ, and an additional model input, the streamflow temperature , T. It could be that the variation in streamflow temperature was the sole cause of the first equation’s “uncertainty” and that the assumed parameter distribution of k was simply the result of the distribution of streamflow temperatures on the term kθT−20.

If the output were still random given constant values of all the inputs, then another source of uncertainty exists. This uncertainty might be due to additional random loadings of the pollutant, possibly from nonpoint sources. Once again the model could be modified to include these additional loadings if they are knowable. Assuming these additional loadings are not known, a new random parameter could be added to the input variable W or to the right hand side of the equations above that would attempt to capture the impact on C of these additional loadings. A potential problem, however, might be the likely correlation between those additional loadings and the streamflow Q.

While adding model detail removed some “uncertainty” in the above example, increasing model complexity will not always eliminate or reduce uncertainty in model output . Adding complexity is generally not a good idea when the increased complexity is based on processes whose parameters are difficult to measure, the right equations are not known at the scale of application, or the amount of data for calibration is small compared to the number of parameters.

Even if more detailed models requiring more input data and more parameter values were to be developed, the likelihood of capturing all the processes occurring in a complex system is small. Hence those involved will have to make decisions taking this uncertainty into account. Imprecision will always exist due to less than a complete understanding of the system and the hydrologic processes being modeled. A number of studies have addressed model simplification, but only in some simple cases have statisticians been able to identify just how one might minimize model output uncertainty due to model structure .

The problem of determining the “optimal” level of modeling detail is particularly important when simulating the hydrologic events at many sites over large areas. Perhaps the best approach for these simulations is to establish confidence levels for alternative sets of models and then statistically compare simulation results. But even this is not a trivial or costless task. Increases in the temporal or spatial resolution typically require considerable data collection and/or processing, model recalibrations, and possibly the solution of stability problems resulting from the numerical methods used in the models. Obtaining and implementing alternative hydrologic simulation models will typically involve considerable investments of money and time for data preparation and model calibration .

What is needed is a way to predict the variability evident in the system shown in Fig. 8.9. Instead of a fixed output vector for each fixed input vector, a distribution of outputs is needed for each performance measure based on fixed inputs (Fig. 8.9) or a distribution of inputs (Fig. 8.10). Furthermore, the model output distribution for each performance measure should “match” as well as possible the observed distribution of that performance measure .

#### 8.4.1.2 What Uncertainty Analysis Can Provide

An uncertainty analysis takes a set of randomly chosen input values (that can include parameter values), passes them through a model (or transfer function) to obtain the distributions (or statistical measures of the distributions) of the resulting outputs. As illustrated in Fig. 8.11, the output distributions can be used to

• Describe the range of potential outputs of the system at some probability level.

• Estimate the probability that the output will exceed a specific threshold or performance measure target value.

Common uses for uncertainty analyses are to make general inferences, such as the following:

• Estimating the mean and standard deviation of the outputs.

• Estimating the probability the performance measure will exceed a specific threshold.

• Putting a reliability level on a function of the outputs, e.g., the range of function values that is likely to occur with some probability.

• Describing the likelihood of different potential outputs of the system.

Implicit in any uncertainty analysis are the assumptions that statistical distributions for the input values are correct and that the model is a sufficiently realistic description of the processes taking place in the system. Neither of these assumptions is likely to be entirely correct.

### 8.4.2 Sensitivity Analyses

“Sensitivity analysis” is aimed at describing how much model output values are affected by changes in model input values. It is the investigation of the importance of imprecision or uncertainty in model inputs in a decision-making or modeling process. The exact character of sensitivity analysis depends upon the particular context and the questions of concern. Sensitivity studies can provide a general assessment of model precision when used to assess system performance for alternative scenarios, as well as detailed information addressing the relative significance of errors in various parameters. As a result, sensitivity results should be of interest to the general public , federal and state management agencies, local watershed planners and managers, model users, and model developers.

Clearly, upper level management and the public may be interested in more general statements of model precision, and should be provided such information along with model predictions. On the other hand, detailed studies addressing the significance and interactions among individual parameters would likely be meaningful to model developers and some model users. They can use such data to interpret model results and to identify where efforts to improve models and their input values should be directed.

Initial sensitivity analysis studies could focus on two products:

1. (1)

detailed results to guide research and assist model development efforts, and

2. (2)

calculation of general descriptions of uncertainty associated with model predictions so that policy decisions can reflect both the predicted system performance and the precision of such predictions.

In the first case, knowing the relative uncertainty in model projections due to possible errors in different sets of parameters and input data should assist in efforts to improve the precision of model projections. This knowledge should also contribute to a better understanding of the relationships between model assumptions, parameters, data and model predictions.

For the second case, knowing the relative precision associated with model predictions should have a significant effect on policy development. For example, the analysis may show that, given data inadequacies, there are very large error bands associated with some model variable values. When such large uncertainties exist, predictions should be used with appropriate skepticism. Incremental strategies should be explored along with monitoring so that greater experience can accumulate to resolve some of those uncertainties.

Sensitivity analysis features are available in many linear and nonlinear programming (optimization ) packages. They identify the changes in the values of the objective function and unknown decision variables given a change in the model input values, and a change in levels set for various constraints (Chap. 4). Thus sensitivity analysis addresses the change in “optimal” system performance associated with changes in various parameter values, and also how “optimal” decisions would change with changes in resource constraint levels, or target output requirements. This kind of sensitivity analysis provides estimates of how much another unit of resource would be worth, or what “cost” a proposed change in a constraint places on the optimal solution. This information should be of value to those making investment decisions.

Various techniques have been developed to determine how sensitive model outputs are to changes in model inputs. Most approaches examine the effects of changes in a single parameter value or input variable assuming no changes in all the other inputs. Sensitivity analyses can be extended to examine the combined effects of multiple sources of error as well.

Changes in particular model input values can affect model output values in different ways. It is generally true that only a relatively few input variables dominate or substantially influence the values of a particular output variable or performance indicator at a particular location and time. If the range of uncertainty of only some of the output data is of interest, then undoubtedly only those input data that significantly impact the values of those output data need be included in the sensitivity analysis.

If input data estimates are based on repeated measurements, a frequency distribution can be estimated that characterizes input data variability . The shorter the record of measurements, the greater will be the uncertainty regarding the long-term statistical characteristics of that variability. If obtaining a sufficient number of replicate measurements is not possible, subjective estimates of input data ranges and probability distributions are often made. Using a mixture of subjective estimates and actual measurements does not affect the application of various sensitivity analysis methods that can use these sets or distributions of input values, but it may affect the conclusions that can be drawn from the results of these analyses.

It would be nice to have available accurate and easy-to-use analytical methods for relating errors in input data to errors in model outputs , and to errors in system performance indicator values that are derived from model outputs. Such analytical methods do not exist for complex simulation models . However, methods based on simplifying assumptions and approximations can be used to yield useful sensitivity information. Some of these are reviewed in the remainder of this chapter.

#### 8.4.2.1 Sensitivity Coefficients

One measure of sensitivity is the sensitivity coefficient. This is the derivative of a model output variable with respect to an input variable or parameter. A number of sensitivity analysis methods use these coefficients. First-order and approximate first-order sensitivity analyses are two such methods that will be discussed later. The difficulty of

1. 1.

obtaining the derivatives for many models,

2. 2.

needing to assume mathematical (usually linear) relationships when obtaining estimates of derivatives by making small changes of input data values near their nominal or most likely values, and

3. 3.

having large variances associated with most hydrologic process models have motivated the replacement of analytical methods by numerical and statistical approaches to sensitivity analysis.

By varying the input probability distributions, one can determine the sensitivity of these distributions on the output distributions. If the output distributions vary significantly, then the output is sensitive to the specification of the input distributions and hence they should be defined with care. A relatively simple deterministic sensitivity analysis can be of value here (Benaman 2002). A sensitivity coefficient can be used to measure the magnitude of change in an output variable Q per unit change in the magnitude of an input parameter value P from its base value P o . Let SI PQ be the sensitivity index for an output variable Q with respect to a change ∆P in the value of the input variable P from its base value P o . Noting that the value of the output Q(P) is a function of P, a sensitivity index could be defined as

$$\text{SI}_{PQ} = [Q(P_{o} + {\Delta} P)-Q(P_{o} - {\Delta} P)]/{2}{\Delta} P$$
(8.1)

Other sensitivity indices could be defined (McCuen 1973). Let the index i represent a decrease and j represent an increase in the parameter value from its base value P o , the sensitivity index SI PQ for parameter P and output variable Q could be defined as

$$\text{SI}_{PQ} = \{ |\left( {Q_{o} -Q_{i} } \right)/(P_{o} -P_{i} \left| + \right|\left( {Q_{o} -Q_{j} } \right)/(P_{o} - P_{j} )|\} / 2$$
(8.2)

or

$${\text{SI}}_{PQ} = \max \left\{ {\left| {\left( {Q_{o} - Q_{i} } \right)/(P_{o} - P_{i} )} \right|,\left| {\left( {Q_{o} - Q_{j} } \right)/\left( {P_{o} - P_{j} } \right)} \right|} \right\}$$
(8.3)

A dimensionless expression of sensitivity is the elasticity index, EI PQ that measures the relative change in output Q for a relative change in input P.

$$\text{EI}_{PQ} = \left[ {P_{o} /Q\left( {P_{o} } \right)} \right] {\text{SI}}_{PQ}$$
(8.4)

#### 8.4.2.2 A Simple Deterministic Sensitivity Analysis Procedure

This deterministic sensitivity analysis approach is very similar to those most often employed in the engineering economics literature. It is based on the idea of varying one uncertain parameter value, or set of parameter values, at a time, and observing the results.

The output variable of interest can be any performance measure or indicator. Thus one does not know if more or less of a given variable is better or worse. Perhaps too much and/or too little is undesirable. The key idea is that, whether employing physical measures or economic metrics of performance, various parameters (or sets of associated parameters) are assigned high and low values. Such ranges may reflect either the differences between the minimum and maximum values for each parameter, the 5 and 95 percentiles of a parameter’s distribution, or points corresponding to some other criteria. The system model is then run with the various alternatives , one at a time, to evaluate the impact of those errors in various sets of parameter values on the output variable.

Table 8.1 illustrates the character of the results that one would obtain. Here Y 0 is the nominal value of the model output when all parameters assume the estimated best values, and Y i,L and Y i,H are the values obtained by increasing or decreasing the values of the ith set of parameters.

A simple water quality example is employed to illustrate this deterministic approach to sensitivity analysis. The analysis techniques illustrated here are just as applicable to complex models. The primary difference is that more work would be required to evaluate the various alternatives with a more complex model, and the model responses might be more complicated.

The simple water quality model is provided by Vollenweider’s empirical relationship for the average phosphorus concentration in lakes (Vollenweider 1976). He found that the phosphorus concentration, P (mg/m3), is a function of the annual phosphorus loading rate, L (milligrams per square meter per year, mg/m2 a), the annual hydraulic loading, q (m/a or more exactly m3/m2 a), and the mean water depth , z (m).

$$P = \left( {L/q} \right)/\left[ { 1+ \left( {z/q} \right)^{0. 5} } \right]$$
(8.5)

L/q and P have the same units; the denominator is an empirical factor that compensates for nutrient recycling and elimination within the aquatic lake environment.

Data for Lake Ontario in North America would suggest that reasonable values of the parameters are L = 680 mg/m2a; q = 10.6 m/a; and z = 84 m, yielding P = 16.8 mg/m3. Values of phosphorus concentrations less than 10 mg/m3 are considered oligotrophic, whereas values greater than 20 mg/m3 generally correspond to eutrophic conditions. Reasonable ranges reflecting possible errors in the three parameters yield the values in Table 8.2.

One may want to display these results so they can be readily visualized and understood. A tornado diagram (Eschenback 1992) would show the lower and upper values of P obtained from variation of each parameter, with the parameter with the widest limits displayed on top, and the parameter having smallest limits on the bottom. Tornado diagrams (Fig. 8.12) are easy to construct and can include a large number of parameters without becoming crowded.

These error bars shown in Fig. 8.12 indicate there is substantial uncertainty associated with the phosphorus concentration P, primarily due to uncertainty in the loading rate L.

An alternative to tornado diagrams is a Pareto chart showing the width of the uncertainty range associated with each variable, ordered from largest to smallest. A Pareto chart is illustrated in Fig. 8.13.

Another visual presentation is a spider plot showing the impact of uncertainty in each parameter on the variable in question, all on the same graph (Eschenback 1992; DeGarmo 1993, p. 401). A spider plot, Fig. 8.14, shows the particular functional response of the output to each parameter on a common scale, so one needs a common metric to represent changes in all of the parameters. Here we use percentage change from the nominal or best values.

Spider plots are a little harder to construct than tornado diagrams, and can generally include only 4–5 variables without becoming crowded. However, they provide a more complete view of the relationships between each parameter and the performance measure . In particular, a spider plot reveals nonlinear relationships and the relative sensitivity of the performance measure to (percentage) changes in each variable.

In the spider plot, the linear relationship between P and L and the gentle nonlinear relationship between P and q is illustrated. The range for z has been kept small given the limited uncertainty associated with that parameter.

#### 8.4.2.3 Multiple Errors and Interactions

An important issue that should not be ignored is the impact of simultaneous errors in more than one parameter. Probabilistic methods directly address the occurrence of simultaneous errors, but the correct joint distribution needs to be employed. With simple sensitivity analysis procedures, errors in parameters are generally investigated one at a time, or in groups. The idea of considering pairs or sets of parameters is discussed here.

Groups of factors. It is often the case that reasonable error scenarios would have several parameters changing together. For example, possible errors in water depth would be accompanied with corresponding variations in aquatic vegetation and chemical parameters. Likewise, alternatives related to changes in model structure might be accompanied with variations in several parameters. In other cases, there may be no causal relationship among possible errors (such as model structure versus inflows at the boundary of the modeled region), but they might still interact to affect the precision of model predictions.

Combinations. If one or more non-grouped parameters interact in significant ways, then combinations of one or more errors should be investigated. However, one immediately runs into a combinatorial problem. If each of m parameters can have three values (high, nominal, and low) there are 3m combinations, as opposed to 2m + 1 if each parameter is varied separately. [For m = 5, the differences are 35 = 243 versus 2(5) + 1 = 11.] These numbers can be reduced by considering instead only combinations of extremes so that only 2m + 1 cases need be considered [25 + 1 = 33], which is a more manageable number. However, all of the parameters would be at one extreme or the other, and such situations would be unlikely.

Two factors at a time. A compromise is to consider all pairs of two parameters at a time. There are m(m − 1)/2 possible pairs of m parameters. Each parameter has a high and low value. Since there are four combinations of high and low values for each pair, there are a total of 2m(m − 1) combinations. [For m = 5 there are 40 combinations of two parameters each having two values.]

The presentation of these results could be simplified by displaying for each case only the maximum error, which would result in m(m − 1)/2 cases that might be displayed in a Pareto diagram. This would allow identification of those combinations of two parameters that might yield the largest errors and thus are of most concern.

For the water quality example, if one plots the absolute value of the error for all four combinations of high (+) and low (−) values for each pair of parameters, they obtain Fig. 8.15.

Considering only the worst error for each pair of variables yields Fig. 8.16.

Here we see, as is no surprise, the worst error results from the most unfavorable combination of L and q values. If both parameters have their most unfavorable values, the predicted phosphorus concentration would be 27 mg/m3.

Looking for nonlinearities. One might also display in a Pareto diagram the maximum error for each pair as a percentage of the sum of the absolute values of the maximum error from each parameter separately. The ratio of the joint error to the individual errors would illustrate potentially important nonlinear interactions. If the model of the system and the physical measure or economic metric were strictly linear, then the individual ratios should add to one.

#### 8.4.2.4 First-Order Sensitivity Analysis

The above deterministic analysis has trouble representing reasonable combinations of error s in several parameter sets. If the errors are independent, it is highly unlikely that any two sets would actually be at their extreme ranges at the same time. By defining probability distributions of the values of the various parameter sets, and specifying their joint distributions, a probabilistic error analysis can be conducted. In particular, for a given performance indicator, one can use multivariate linear analyses to evaluate the approximate impact on the performance indices of uncertainty in various parameters. As shown below, the impact depends upon the square of the sensitivity coefficients (partial derivatives) and the variances and covariances of the parameter sets.

For a performance indicator I = F(Y), which is a function F(•) of model outputs Y, that are in turn a function g(P) of input parameters P, one can use a multivariate Taylor series approximation of F to obtain the expected value and variance of the indicator:

\begin{aligned} {\text{E}}\left[ I \right] & = F\left( {{\text{based}}\;{\text{on}}\;{\text{mean}}\;{\text{values}}\;{\text{of}}\;{\text{input}}\;{\text{parameters}}} \right) \\ & \quad + \left( { 1/ 2} \right)\sum\limits_{i} {\sum\limits_{j} {[\partial \text{F}^{ 2} /\partial P_{i} \partial P_{j} \left] {\text{ Cov }} \right[P_{i} ,P_{j} ]\} } } \\ \end{aligned}
(8.6)

and

$${\text{Var}}\left[ I \right] = \sum\limits_{i} {\sum\limits_{j} {\left( {\partial F/\partial P_{i} } \right)\left( {\partial F/\partial P_{j} } \right){\text{Cov}}\left[ {P_{i} ,P_{j} } \right]} }$$
(8.7)

where (∂F/∂P i) are the partial derivative of the function F with respect to P i evaluated at the mean value of the input parameters P i , and ∂F 2/∂P i P j are the second partial derivatives. The covariance of two random input parameters P i and P j is the expected value of the product of differences between the values and their means.

$${\text{Cov}}\left[ {P_{i} ,P_{j} } \right] = {E}[\left( {P_{i} -{E}\left[ {P_{i} } \right]} \right) (P_{j} -{E}[P_{j} \left] ) \right]$$
(8.8)

If all the parameters are independent of each other, and the second-order terms in the expression for the mean E[I] are neglected, one obtains

$${E}\left[ I \right] =F\left( {\text{based}\;{\text{on}}\;{\text{mean}}\;{\text{values}}\;{\text{of}}\;{\text{input}}\;{\text{parameters}}} \right)$$
(8.9)

and

$${\text{Var }}\left[ I \right] =\sum\limits_{i} {[\partial F/\partial P_{i} \left] {^{2} {\text{Var }}} \right[P_{i} ]}$$
(8.10)

Benjamin and Cornell (1970). Equation 8.6 for E[I] shows that in the presence of substantial uncertainty, the mean of the output from nonlinear systems is not simply the system output corresponding to the mean of the parameters (Gaven and Burges 1981, p. 1523). This is true for any nonlinear function.

Of interest in the analysis of uncertainty is the approximation for the variance Var[I] of indicator I. In Eq. 8.10 the contribution of P i to the variance of I equals Var[P i ] times [∂F/∂P i ]2, which are the squares of the sensitivity coefficients for indicator I with respect to each input parameter value P i .

##### 8.4.2.4.1 An Example of First-Order Sensitivity Analysis

It may appear that first-order analysis is difficult because the partial derivatives of the performance indicator I are needed with respect to the various parameters. However, reasonable approximations of these sensitivity coefficients can be obtained from the simple sensitivity analysis described in Table 8.3. In that table, three different parameter sets, P i , are defined in which one parameter of the set is at its high value, P iH, and one is at its low value, P iL, to produce corresponding values (called high, I iH, and low, I iL) of a system performance indicator I.

It is then necessary to estimate some representation of the variances of the various parameters with some consistent procedure. For a normal distribution, the distance between the 5 and 95 percentiles is 1.645 standard deviations on each side of the mean , or 2(1.645) = 3.3 standard deviations. Thus, if the high/low range is thought of as approximately a 5–95 percentile range for a normally distributed variate, a reasonable approximation of the variance might be

$${\text{Var}}\left[ {P_{i} } \right] = \left\{ {\left[ {P_{{i{\text{H}}}} -P_{{i{\text{L}}}} } \right]/ 3. 3 { }} \right\}^{ 2} .$$
(8.11)

This is all that is needed. Use of these average sensitivity coefficients is very reasonable for modeling the behavior of the system performance indicator I over the indicated ranges.

As an illustration of the method of first-order uncertainty analysis , consider the lake quality problem described above. The “system performance indicator” in this case is the model output , the phosphorus concentration P, and the input parameters, now denoted as X = L, q, and z. The standard deviation of each parameter is assumed to be the specified range divided by 3.3. Average sensitivity coefficients ∂P/∂X were calculated. The results are reported in Table 8.4.

Assuming the parameter errors are independent:

$${\text{Var}}\left[ P \right] = 9. 1 8+ 2. 9 2+ 0.0 2= 1 2. 1 2$$
(8.12)

The square root of 12.12 is the standard deviation and equals 3.48. This agrees well with a Monte Carlo analysis reported below.

Note that 100 * (9.18/12.12), or about 76% of the total parameter error variance in the phosphorus concentration P is associated in the phosphorus loading rate L and the remaining 24% is associated with the hydrologic loading q. Eliminating the uncertainty in z would have a negligible impact on the overall model error . Likewise, reducing the error in q would at best have a modest impact on the total error.

Due to these uncertainties, the estimated phosphorus concentration has a standard deviation of 3.48. Assuming the errors are normally distributed, and recalling that ±1.645 standard deviations around the mean define a 5–95 percentile interval, the 5–95 percentile interval would be about

$$1 6. 8\pm 1. 6 4 5\left( { 3. 4 8} \right){\text{mg}}/{\text{m}}^{ 3} = 16.8 \pm 5.7\,{\text{mg}}/{\text{m}}^{ 3} = 11.1\, \text{to} \,22.5\,{\text{mg}}/{\text{m}}^{ 3} .$$
(8.13)

The upper bound of 22.5 mg/m3 is considerably less than the 27 mg/m3 that would be obtained if both L and q had their most unfavorable values. In a probabilistic analysis with independent errors, such a combination is highly unlikely.

##### 8.4.2.4.2 Warning on Accuracy

First-order uncertainty analysis is indeed an approximate method based upon a linearization of the response function represented by the full simulation model . It may provide inaccurate estimates of the variance of the response variable for nonlinear systems with large uncertainty in the parameters. In such cases, Monte Carlo simulation (discussed below and in the previous chapter) or the use of higher order approximation may be required. Beck (1987, p. 1426) cites studies that found that Monte Carlo and first-order variances were not appreciably different, and a few studies that found specific differences. Differences are likely to arise when the distributions used for the parameters are bimodal (or otherwise unusual), or some rejection algorithm is used in the Monte Carlo analysis to exclude some parameter combinations. Such errors can result in a distortion in the ranking of predominant sources of uncertainty. However, in most cases very similar results were obtained.

#### 8.4.2.5 Fractional Factorial Design Method

An extension of first-order sensitivity analysis would be a more complete exploration of the response surface using a careful statistical design . First consider a complete factorial design. Input data are divided into discrete “levels.” The simplest case is two levels. These two levels can be defined as a nominal value, and a high (low) value. Simulation runs are made for all combinations of parameter levels . For n different inputs, this would require 2n simulation runs. Hence for a three-input variable or parameter problem, 8 runs would be required. If four discrete levels of each input variable or parameter were allowed to provide a more reasonable description of a continuous variable , the three-input data problem would require 43 or 64 simulation runs. Clearly, this is not a useful tool for large regional water resources simulation models .

A fractional factorial design involves simulating only a fraction of what is required from a full factorial design method. The loss of information prevents a complete analysis of the impacts of each input variable or parameter on the output.

To illustrate the fractional factorial design method, consider the two-level with three-input variable or parameter problem. Table 8.5 shows the 8 simulations required for a full factorial design method. The “+” and the “−” show the upper and lower levels of each input variable or parameter P i where i = 1, 2, 3. If all eight simulations were performed, seven possible effects could be estimated. These are the individual effects of the three inputs P 1, P 2, and P 3, the three two-input variable or parameter interactions, (P 1)(P 2), (P 1)(P 3), and (P 2)(P 3), and the one three-input variable or parameter interaction (P 1)(P 2)(P 3).

Consider an output variable Y, where Y j is the value of Y in the jth simulation run. Then an estimate of the effect, denoted δ(Y|P i ) that input variable or parameter P i has on the output variable Y, is the average of the four separate effects of varying P i :

For i = 1:

\begin{aligned} {\delta}\left( {Y|P_{1} } \right) & = 0.25\left[ \left({Y_{2} - Y_{1}}\right) + \left({Y_{4} - Y_{3}}\right) \right. \\ & \left.+ \left({Y_{6} - Y_{5}}\right) + \left({Y_{8} - Y_{7}}\right) \right] \end{aligned}
(8.14)

Each difference in parentheses is the difference between a run in which P 1 is at its upper level and a run in which P 1 is at its lower level, but the other two parameter values, P 2 and P 3, are unchanged. If the effect is equal to 0, then, in this case, P 1 has no impact on the output variable Y.

Similarly the effects of P 2 and P 3, on variable Y can be estimated as:

$${\delta }\left( {Y|P_{ 2} } \right) = 0. 2 5\left\{ {\left( {Y_{ 3} - Y_{ 1} } \right) + \left( {Y_{ 4} - Y_{ 2} } \right) + \left( {Y_{ 7} - Y_{ 5} } \right) + \left( {Y_{ 8} - Y_{ 6} } \right)} \right\}$$
(8.15)

and

$${\delta }\left( {Y|P_{ 3} } \right) = 0. 2 5\left\{ {\left( {Y_{ 5} - Y_{ 1} } \right) + \left( {Y_{ 6} - Y_{ 2} } \right) + \left( {Y_{ 7} - Y_{ 3} } \right) + \left( {Y_{ 8} - Y_{ 4} } \right)} \right\}$$
(8.16)

Consider next the interaction effects between P 1 and P 2. This is estimated as the average of the difference between the average P 1 effect at the upper level of P 2, and the average P 1 effect at the lower level of P 2. This is the same as the difference between the average P 2 effect at the upper level of P 1 and the average P 2 effect at the lower level of P 1:

\begin{aligned} \delta \left( {Y|P_{1} ,P_{2} } \right) & = \left( {1/2} \right)\left\{ {\left[ {\left( {Y_{8} - Y_{7} } \right) + \left( {Y_{4} - Y_{3} } \right)} \right]/2} \right. \\ & \quad \left. { - \left[ {\left( {Y_{2} - Y_{1} } \right) + \left( {Y_{6} - Y_{5} } \right)} \right]/2} \right\} = \left( {1/4} \right)\left\{ {\left[ {\left( {Y_{8} - Y_{6} } \right)} \right.} \right. \\ & \quad \left. {\left. { + \left( {Y_{4} - Y_{2} } \right)} \right] - \left[ {\left( {Y_{3} - Y_{1} } \right) + \left( {Y_{7} - Y_{5} } \right)} \right]} \right\} \\ \end{aligned}
(8.17)

Similar equations can be derived for looking at the interaction effects between P 1 and P 3, and between P 2 and P 3 and the interaction effects among all three inputs P 1, P 2, and P 3.

Now assume only half of the simulation runs were performed, perhaps runs 2, 3, 5, and 8 in this example. If only outputs Y 2, Y 3, Y 5, and Y 8 are available, for our example:

$${\delta }\left( {Y|P_{ 3} } \right) = {\delta }\left( {Y|P_{ 1} ,P_{ 2} } \right) = 0. 5\left\{ {\left( {Y_{ 8} - Y_{ 3} } \right) - \left( {Y_{ 2} - Y_{ 5} } \right)} \right\}$$
(8.18)

The separate effects of P 3 and of P 1 P 2 are not available from the output. This is the loss in information resulting from fractional instead of complete factorial design.

#### 8.4.2.6 Monte Carlo Sampling Methods

The Monte Carlo method of performing sensitivity analyses, illustrated in Fig. 8.17, first selects a random set of input data values drawn from their individual probability distributions. These values are then used in the simulation model to obtain some model output variable values. This process is repeated many times, each time making sure the model calibration is valid for the input data values chosen. The end result is a probability distribution of model output variables and system performance indices that results from variations and possible errors in all of the input values.

Using a simple Monte Carlo analysis, values of all of the parameter sets are selected randomly from distributions describing the individual and joint uncertainty in each, and then the modeled system is simulated to obtain estimates of the selected performance indices. This must be done many times (often well over 100) to obtain a statistically significant description of system performance variability. The number of replications needed is generally not dependent on the number of parameters whose errors are to be analyzed. One can include in the simulation the uncertainty in parameters as well as natural variability. This method can evaluate the impact of single or multiple uncertain parameters.

A significant problem that arises in such simulations is that some combinations of parameter values may result in unreasonable models. For example, model output based on calibrated data sets might be inconsistent with available data sets. The calibration process places interesting constraints on different sets of parameter values. Thus, such Monte Carlo experiments often contain checks that exclude combinations of parameter values that are unreasonable. In these cases the generated results are conditioned on this validity check.

Whenever sampling methods are used, one must consider possible correlations among input data values. Sampling methods can handle spatial and temporal correlations that may exist among input data values, but the existence of correlation requires defining appropriate conditional distributions .

One major limitation of applying Monte Carlo methods to estimate ranges of risk and uncertainty for model output variable values, and system performance indicator values based on these output variable values, is the computing time required. To reduce the computing times needed to perform sensitivity analyses using sampling methods, some tricks and as well as stratified sampling methods are available. The discussion below illustrates the idea of a simple modification (or trick) using a “standardized” Monte Carlo analysis. The more general Latin Hypercube Sampling procedure is also discussed.

##### 8.4.2.6.1 Simple Monte Carlo Sampling

To illustrate the use of Monte Carlo sampling methods consider again Vollenweider’s empirical relationship, Eq. 8.5, for the average phosphorus concentration in lakes (Vollenweider 1976). Two hundred values of each parameter were generated independently from normal distributions with the means and variances as shown in Table 8.6.

The table contains the specified means and variances for the generated values of L, q, and z, and also the actual values of the means and variances of the 200 generated values of L, q, z and also of the 200 corresponding generated output phosphorus concentrations, P. Figure 8.18 displays the distribution of the generated values of P.

One can see that given the estimated levels of uncertainty, phosphorus levels could reasonably range from below 10 to above 25. The probability of generating a value greater than 20 mg/m3 was 12.5%. The 5% to 95 percentile range was 11.1–23.4 mg/m3. In the figure, the cumulative probability curve is rough because only 200 values of the phosphorus concentration were generated, but these are clearly enough to give a good impression of the overall impact of the errors .

##### 8.4.2.6.2 Sampling Uncertainty

In this example, the mean of the 200 generated values of the phosphorus concentration, P, was 17.07. However, a different set of random values would have generated a different set of P values as well. Thus it is appropriate to estimate the standard error, SE, of this average. The standard error equals the standard deviation σ of the P values divided by the square root of the sample size n:

$$\text{SE} =\upsigma/\left( n \right)^{0.5} = 3.61/\left( {200} \right)^{0.5} = 0.25.$$
(8.19)

From the central limit theorem of mathematical statistics, the average of a large number of independent values should have very nearly a normal distribution. Thus, 95% of the time, the true mean of P should be in the interval 17.1 ± 1.96 (0.25), or 16.6–17.6 mg/m3. This level of uncertainty reflects the observed variability of P and the fact that only 200 values were generated.

##### 8.4.2.6.3 Making Sense of the Results

A significant challenge with complex models is to determine from the Monte Carlo simulation which parameter errors are important. Calculating the correlation between each generated input parameter value and the output variable value is one way of doing this. As Table 8.7 shows, based upon the magnitudes of the correlation coefficients, errors in L were most important, and those in q second in importance.

One can also use regression to develop a linear model defining variations in the output based on errors in the various parameters. The results are shown in Table 8.8. The fit is very good, and R 2 = 98%. If the model for P had been linear, a R 2 value of 100% should have resulted. All of the coefficients are significantly different from zero.

Note that the correlation between P and z was positive in Table 8.7, but the regression coefficient for z is negative. This occurred because there is a modest negative correlation between the generated z and q values. Use of partial correlation coefficients can also correct for such spurious correlations among input parameters.

Finally we display a plot, Fig. 8.19, based on this regression model illustrating the reduction in the variance of P that is due to dropping each variable individually. Clearly L has the biggest impact on the uncertainty in P, and z the least.

##### 8.4.2.6.4 Standardized Monte Carlo Analysis

Using a “standardized” Monte Carlo analysis, one could adjust the generated values of L, q, and z above so that the generated samples actually have the desired mean and variance. While making that correction, one can also shuffle their values so that the correlations among the generated values for the different parameters are near zero, as is desired. This was done for the 200 generated values to obtain the statistics shown in Table 8.9.

Repeating the correlation analysis from before (shown in Table 8.10) now yields much clearer results that are in agreement with the regression analysis. The correlation between P and both q and z are now negative as they should be. Because the generated values of the three parameters have been adjusted to be uncorrelated, the signal from one is not confused with the signal from another.

The mean phosphorus concentration changed very little. It is now 17.0 instead of 17.1 mg/m3.

Using control variates with a linear predictive model in conjunction with the standardized Monte Carlo variates, the standard deviation of the errors associated with the 200 observations is only 0.45. Thus the standard error for this estimate of the mean of P is 0.45/(200)0.5 or just 0.03. Thus this is a highly accurate result. The regressions were also repeated and yielded very similar results. The only real difference was that the parameter estimates had small standard errors and were more significant because of the elimination of correlation between the generated parameters.

##### 8.4.2.6.5 Generalized Likelihood Estimation

Beven (1993) and Binley and Beven (1991) suggest a Generalized Likelihood Uncertainty Estimation (GLUE) technique for assessment of parameter error uncertainty using Monte Carlo simulation . It is described as a “formal methodology for some of the subjective elements of model calibration ” (Beven 1989, p. 47). The basic idea is to begin by assigning reasonable ranges for the various parameters and then to draw parameter sets from those ranges using a uniform or some similar (and flat) distribution. These generated parameter sets are then used on a calibration data set so that unreasonable combinations can be rejected, while reasonable values are assigned a posterior probability based upon a likelihood measure which may reflect several dimensions and characteristics of model performance.

Let L(P i ) > 0 be the value of the likelihood measure assigned to the ith parameter set’s calibration sequence. Then the model predictions generated with parameter set/combination P i are assigned posterior probability, p(P i ).

$$p\left({P_{i} } \right) = L\left({P_{i} } \right)/\sum\limits_{j} L\left({P_{j} } \right)$$
(8.20)

These probabilities reflect the form of Bayes theorem, which is well supported by probability theory (Devore 1991). This procedure should capture reasonably well the dependence or correlation among parameters, because reasonable sequences will all be assigned larger probabilities, whereas sequences that are unable to reproduce the system response over the calibration period will be rejected or assigned small probabilities.

However, in a rigorous probabilistic framework, the L would be the likelihood function for the calibration series for particular error distributions. (This could be checked with available goodness-of-fit procedures; for example, Kuczera 1988.) When relatively ad hoc measures are adopted for the likelihood measure with little statistical validity, the p(P i ) probabilities are best described as pseudo-probabilities or “likelihood” weights.

Another concern with this method is the potential efficiency. If the parameter ranges are too wide, a large number of unreasonable or very unlikely parameter combinations will be generated. These will either be rejected or else will have small probabilities and thus little effect on the analysis. In this case the associated processing would be a waste of effort. A compromise is to use some data to calibrate the model and to generate a prior or initial distribution for the parameters that is at least centered in the best range (Beven 1993, p. 48). Then use of a different calibration period to generate the p(P i ) allows an updating of those initial probabilities to reflect the information provided by the additional calibration period with the adopted likelihood measures.

After the accepted sequences are used to generate sets of predictions, the likelihood weights would be used in the calculation of means, variances and quantiles , rather than the customary procedure of giving all the generated realizations equal weight. The resulting conditional distribution of system output reflects the initial probability distributions assigned to parameters, the rejection criteria, and the likelihood measure adopted to assign “likelihood” weights.

#### 8.4.2.7 Latin Hypercube Sampling

For the simple Monte Carlo simulations described above, with independent errors , a probability distribution is assumed for each input parameter or variable. In each simulation run, values of all input data are obtained from sampling those individual and independent distributions. The value generated for an input parameter or variable is usually independent of what that value was in any previous run, or what other input parameter or variable values are in the same run. This simple sampling approach can result in a clustering of parameter values and hence a redundancy of information from repeated sampling in the same regions of a distribution and a lack of information from no sampling in other regions of the distributions.

A stratified sampling approach ensures more even coverage of the range of input parameter or variable values with the same number of simulation runs. This can be accomplished by dividing the input parameter or variable space into sections and sampling from each section with the appropriate probability.

One such approach, Latin hypercube sampling (LHS), divides each input distribution into sections of equal probability for the specified probability distribution, and draws one observation randomly from each range. Hence the ranges of input values within each section actually occur with equal frequency in the experiment. These values from each interval for each distribution are randomly assigned to those from other intervals to construct sets of input values for the simulation analysis. Figure 8.20 shows the steps in constructing a LHS for six simulations involving three inputs P j (P 1, P 2, and P 3) and six intervals of their respective normal, uniform and triangular probability distributions.

## 8.5 Performance Indicator Uncertainties

### 8.5.1 Performance Measure Target Uncertainty

Another possible source of uncertainty is the selection of performance measure target values. For example, consider a target value for a pollutant concentration based on the effect of exceeding it in an ecosystem. Which target value is best or correct? When this is not clear, there are various ways of expressing the uncertainty associated with any target value. One such method is the use of qualitative approaches involving membership functions (Chap. 5). Use of “grey” numbers or intervals instead of “white” or fixed target values is another. When some uncertainty or disagreement exists over the selection of the best target value for a particular performance measure , it seems to us the most direct and transparent way to do this is to subjectively assume a distribution over a range of possible target values. Then this subjective probability distribution can be factored into the tradeoff analysis, as outlined in Fig. 8.21.

One of the challenges associated with defining and including in an analysis the uncertainty associated with a target or threshold value for a performance measure is that of communicating just what the result of such an analysis means. Referring to Fig. 8.20, suppose the target value represents some maximum limit of a pollutant, say phosphorus, concentration in the flow during a given period of time at a given site or region, and it is not certain just what that maximum limit should be. Subjectively defining the distribution of that maximum limit, and considering that uncertainty along with the uncertainty (probability of exceedance function) of pollutant concentrations—the performance measure—one can attach a confidence to any probability of exceeding the maximum desired concentration value.

The 95% probability of exceedance shown on Fig. 8.20, say P 0.95, should be interpreted as “we can be 95% confident that the probability of the maximum desired pollutant concentration being exceeded will be no greater than P 0.95.” We can be only 5% confident that the probability of exceeding the desired maximum concentration will be no greater than the lower P 0.05 value. Depending on whether the middle line through the subjective distribution of target values in Fig. 8.20 represents the most likely or median target value, the associated probability of exceedance is either the most likely, as indicated in Fig. 8.20, or that for which we are only 50% confident.

Figure 8.21 attempts to show how to interpret the reliabilities when the uncertain performance targets are

• minimum acceptable levels that are to be maximized,

• maximum acceptable levels that are to be minimized or

• optimum levels.

An example of a minimum acceptable target level might be the population of wading birds in an area. An example of a maximum acceptable target level might be, again, the phosphorus concentration of the flow in a specific wetland or lake. An example of an optimum target level might be the depth of water most suitable for selected species of aquatic vegetation during a particular period of the year.

For performance measure targets that are not expressed as minimum or maximum limits but that are the “best” values, referring to Fig. 8.22, one can state that one is 90% confident that the probability of achieving the desired target is no more than B. The 90% confidence level probability of not achieving the desired target is at least A + C. The probability of the performance measure being too low is at least A and the probability of the performance measure being too high is at least C, again at the 90% confidence levels. As the confidence level decreases the bandwidth decreases, and the probability of not meeting the target increases.

Now, clearly there is uncertainty associated with each of these uncertainty estimations, and this raises the question of how valuable is the quantification of the uncertainty of each additional component of the plan in an evaluation process. Will plan evaluators and decision makers benefit from this additional information, and just how much additional uncertainty information is useful?

Now consider again the tradeoffs that need to be made as illustrated in Fig. 8.7. Instead of considering a single target value as shown on Fig. 8.7, assume there is a 90% confidence range associated with that single performance measure target value. Also assume that the target is a maximum desired upper limit (e.g., of some pollutant concentration).

In the case shown in Fig. 8.23, the tradeoff is clearly between cost and reliability . In this example, no matter what confidence one chooses, Plan A is preferred to Plan B with respect to reliability, but Plan B is preferred to Plan A with respect to cost. The tradeoff is only between these two performance indicators or measures.

Consider however a third plan, as shown in Fig. 8.24. This situation adds to the complexity of making appropriate tradeoffs . Now there are three criteria: cost, probability of exceedance (reliability) and the confidence in those reliabilities or probabilities. Add to this the fact that there will be multiple performance measure targets, each expressed in terms of their maximum probabilities of exceedance and the confidence in those probabilities.

In Fig. 8.23, in terms of cost the plans are ranked, from best to worst, B, C, and A. In terms of reliability at the 95% confidence level, they are ranked A, B, and C but at the 5% confidence level the ranking is A, C, and B.

If the plan evaluation process has difficulty handling all this it may indicate the need to focus the uncertainty analysis effort on just what is deemed important, achievable, and beneficial. Then when the number of alternatives has been narrowed down to only a few that appear to be the better ones, a more complete uncertainty analysis can be performed. There is no need nor benefit in performing sensitivity and uncertainty analyses on all possible management alternatives . Rather one can focus on those alternatives that look the most promising, and then carry out additional uncertainty and sensitivity analyses only when important uncertain performance indicator values demand more scrutiny. Otherwise the work is not likely to affect the decision anyway.

### 8.5.2 Distinguishing Differences Between Performance Indicator Distributions

Simulations of alternative water management infrastructure designs and operating policies require a comparison of the simulation outputs—the performance measures or indicators—associated with each alternative. Now the question is whether or not the observed differences are statistically significant. Can one really tell if one alternative is better than another or are the observed differences explainable by random variations attributable to variations in the inputs and how the system responds?

This is a common statistical issue that is addressed by standard hypothesis tests (Devore 1991; Benjamin and Cornell 1970). Selection of an appropriate test requires that one first resolve what type of change one expects in the variables. To illustrate, consider the comparison of two different operating policies. Let Y 1 denote the set of output performance variable values with the first policy, and Y 2 the set of output performance variable values of the second policy. In many cases, one would expect one policy to be better than the other. One measure might be the difference in the mean of the variables. For example, is E[Y 1] < E[Y 2]? Alternatively one could check the difference in the median (50 percentile) of the two distributions.

In addition, one could look for a change in the variability or variance , or a shift in both the mean and the variance. Changes described by a difference in the mean or median often make the most sense and many statistical tests are available that are sensitive to such changes. For such investigations parametric and nonparametric tests for paired and unpaired data can be employed.

Consider the differences between “paired” and “unpaired” data. Suppose that the meteorological data for 1941–1990 is used to drive a simulation model generating data as described in Table 8.11.

Here there is one sample, Y 1(1) through Y 1(50), for policy 1, and another sample, Y 2(1) through Y 2(50), for policy 2. However, the two sets of observations are not independent. For example, if 1943 was a very dry year, then we would expect both Y 1(3) for policy 1 in that year and Y 2(3) for policy 2 to be unusually small. With such paired data, one can use a paired hypothesis test to check for differences. Paired tests are usually easier than the corresponding unpaired tests that are appropriate in other cases. (For example, if one were checking for a difference in average rainfall depth between 1941–1970, and 1971–2000, they would have two sets of independent measurements for the two periods. With such data, one should use a two-sample unpaired test.)

Paired tests are generally based on the differences between the two sets of output, Y 1(i) − Y 2(i). These are viewed as a single independent sample. The question is then: are the differences positive (say Y 1 tends to be larger then Y 2), or negative (Y 1 tends to be smaller), or are positive and negative differences are equally likely (there is no difference between Y 1 and Y 2).

Both parametric and nonparametric families of statistical tests are available for paired data. The common parametric test for paired data (a one-sample T test) assumes that the mean of the differences

$$X\left(i \right) = Y_{ 1} \left(i \right)-Y_{ 2} \left(i \right)$$
(8.21)

is normally distributed. Then the hypothesis of no difference is rejected if the T statistic is sufficiently large, given the sample size n.

Alternatively, one can employ a nonparametric test and avoid the assumption that the differences X(i) are normally distributed. In such a case, one can use the Wilcoxon Signed Rank test. This nonparametric test ranks the absolute values |X(i)| of the differences. If the sum S of the ranks of the positive differences deviates sufficiently from its expected value , n(n + 1)/4 (were there no difference between the two distributions), one can conclude that there is a statistically significant difference between the Y 1(i) and Y 2(i) series. Standard statistical texts have tables of the distribution of the sum S as a function of the sample size n, and provide a good analytical approximation for n > 20 (for example, Devore 1991). Both the parametric t test and the nonparametric Wilcoxon Signed Rank test require that the differences between the simulated values for each year be computed.

## 8.6 Communicating Model Output Uncertainty

Spending money on reducing uncertainty would seem preferable to spending it on ways of calculating and describing it better. Yet attention to uncertainty communication is critically important if uncertainty analyses and characterizations are to be of value in a decision-making process. In spite considerable efforts by those involved in risk assessment and management, we know very little about how to ensure effective risk communication to gain the confidence of stakeholders, incorporate their views and knowledge, and influence favorably the acceptability of risk assessments and risk management decisions.

The best way to communicate concepts of uncertainty may well depend on what the audiences already know about risk and the various types of probability distributions (e.g., density, cumulative, exceedance) based on objective and subjective data, and the distinction between mean or average values and the most likely values. Undoubtedly graphical representations of these ways of describing uncertainty considerably facilitate communication.

The National Research Council (NRC 1994) addressed the extensive uncertainty and variability associated with estimating risk and concluded that risk characterizations should not be reduced to a single number or even to a range of numbers intended to portray uncertainty. Instead, the report recommended managers and the interested public should be given risk characterizations that are both qualitative and quantitative and both verbal and mathematical.

In some cases, communicating qualitative information about uncertainty to stakeholders and the public in general may be more effective than quantitative information. There are, of course, situations in which quantitative uncertainty analyses are likely to provide information that is useful in a decision-making process. How else can tradeoffs such as illustrated in Figs. 8.10 and 8.27 be identified? Quantitative uncertainty analysis often can be used as the basis of qualitative information about uncertainty, even if the quantitative information is not what is communicated to the public.

One should acknowledge to the public the widespread confusion regarding the differences between variability and uncertainty. Variability does not change through further measurement or study, although better sampling can improve our knowledge about variability. Uncertainty reflects gaps in information about scientifically observable phenomena.

While it is important to communicate uncertainties and confidence in predictions, it is equally important to clarify who or what is at risk, possible consequences, and the severity and irreversibility of an adverse effect should a target value, for example, not be met. This qualitative information is often critical to informed decision-making. Risk and uncertainty communication is always complicated by the reliability and amounts of available relevant information as well as how that information is presented. Effective communication between people receiving information about who or what is at risk, or what might happen and just how severe and irreversible an adverse effect might be should a target value not be met, is just as important as the level of uncertainty and the confidence associated with such predictions. A two-way dialog between those receiving such information and those giving it can help identify just what seems best for a particular audience.

Risk and uncertainty communication is a two-way street. It involves learning and teaching. Communicators dealing with uncertainty should learn about the concerns and values of their audience, their relevant knowledge, and their experience with uncertainty issues. Stakeholders ’ knowledge of the sources and reasons for uncertainty needs to be incorporated into assessment and management and communication decisions. By listening, communicators can craft risk messages that better reflect the perspectives, technical knowledge, and concerns of the audience.

Effective communication should begin before important decisions have been made. It can be facilitated in communities by citizen advisory panels. Citizen advisory panels can give planners and decision-makers a better understanding of the questions and concerns of the community and an opportunity to test its effectiveness in communicating concepts and specific issues regarding uncertainty.

One approach to make uncertainty more meaningful is to make risk comparisons. For example, a ten-parts-per-billion target for a particular pollutant concentration is equivalent to 10 s in over 31 years. If this is an average daily concentration target that is to be satisfied “99%,” of the time, this is equivalent to an expected violation of less than one day every three months.

Many perceive the reduction of risk by an order of magnitude as though it were a linear reduction. An alternative way to illustrate orders of magnitude of risk reduction is shown in Fig. 8.25, in which a bar graph depicts better than words that a reduction in risk from one in a 1000 (10−3) to one in 10,000 (10−4) is a reduction of 90% and that a further reduction to one in 100,000 (10−5) is a reduction 10-fold less than the first reduction of 90%. The percent of the risk that is reduced by whatever measures is an easier concept to communicate than reductions expressed in terms of estimated absolute risk levels, such as 10−5.

Risk comparisons can be helpful, but they should be used cautiously and tested if possible. There are dangers in comparing risks of diverse character, especially when the intent of the comparison is seen as minimizing a risk (NRC 1989). One difficulty in using risk comparisons is that it is not always easy to find risks that are sufficiently similar to make a comparison meaningful. How is someone able to compare two alternatives having two different costs and two different risk levels, for example, as is shown in Fig. 8.7? One way is to perform an indifference analysis (as discussed in the next chapter), but that can lead to different results depending who performs it. Another way is to develop utility functions using weights, where, for example reduced phosphorus load by half is equivalent to a 25% shorter hydroperiod in that area, but again each person’s utility or preferred tradeoff may differ.

At a minimum, graphical displays of uncertainty can be helpful. Consider the common system performance indicators that include:

• Time series plots for continuous time-dependent indicators (Fig. 8.26 upper left)

• Probability exceedance distributions for continuous indicators (Fig. 8.26 upper right),

• Histograms for discrete event indicators (Fig. 8.26 lower left), and

• Overlays on maps for space-dependent discrete events (Fig. 8.26 lower right).

The first three graphs in Fig. 8.26 could show, in addition to the single curve or bar that represents the most likely output, a range of outcomes associated with a given confidence interval. For overlays of information on maps, different colors could represent the spatial extents of events associated with different ranges of risk or uncertainty. Figure 8.27, corresponding to Fig. 8.26, illustrates these approaches for displaying these ranges.

## 8.7 Conclusions

This chapter provides an overview of uncertainty and sensitivity analyses in the context of hydrologic or water resources systems simulation modeling. A broad range of tools are available to explore, display, and quantify the sensitivity and uncertainty in predictions of key output variables and system performance indices with respect to imprecise and random model inputs and to assumptions concerning model structure. They range from relatively simple deterministic sensitivity analysis methods to more involved first-order analyses and Monte Carlo sampling methods.

Because of the complexity of many watersheds or river basins, Monte Carlo methods for uncertainty analyses may be a very major and unattractive undertaking. Therefore it is often prudent to begin with the relatively simple deterministic procedures. This coupled with a probabilistically based first-order uncertainty analysis method can help quantify the uncertainty in key output variables and system performance indices, and the relative contributions of uncertainty in different input variables to the uncertainty in different output variables and system performance indices. These relative contributions may differ depending upon which output variables and indices are of interest .

A sensitivity analysis can provide a systematic assessment of the impact of parameter value imprecision on output variable values and performance indices, and of the relative contribution of errors in different parameter values to that output uncertainty. Once the key variables are identified, it should be possible to determine the extent to which parameter value uncertainty can be reduced through field investigations, development of better models, and other efforts.

Model calibration procedures can be applied to individual catchments and subsystems, as well as to composite systems. Automated calibration procedures have several advantages including the explicit use of an appropriate statistical objective function, identification of those parameters that best reproduce the calibration data set with the given objective function, and the estimations of the statistical precision of the estimated parameters.

All of these tasks together can represent a formidable effort. However, knowledge of the uncertainty associated with model predictions can be as important to management decision and policy formulation as are the predictions themselves.

No matter how much attention is given to quantifying and reducing uncertainties in model outputs, uncertainties will remain. Professionals who analyze risk, managers and decision-makers who must manage risk, and the public who must live with risk and uncertainty, have different information needs and attitudes regarding risk and uncertainty. It is clear that information needs differ among those who model or use models, those who make substantial investment or social decisions, and those who are likely to be impacted by those decisions. Meeting those needs should result in more informed decision-making. But it comes at a cost that should be considered along with the benefits of having this sensitivity and uncertainty information.