1 Introduction

Water management requires decisions to be taken under uncertainty. Under uncertainty, the outcome of decisions can be only partially predicted as the future is never completely predictable, nor can uncertainty be eliminated. Uncertainty reduces efficiency and jeopardize robustness and reliability. Risk-based decisions, where uncertainty is explicitly taken into account, reduce the cost due to uncertainty (Weijs 2011; Verkade and Werner 2011), augmenting the system resilience against possible negative outcomes. Nonetheless, uncertainty always poses a risk.

In the face of uncertainty, a decision maker has two possible options: either accepting the current level of uncertainty, or reducing it by obtaining more information. Information is an intangible but valuable good, because it leads to better decisions. Obtaining information, however, comes at a cost. The secondary problem of i) deciding whether to get additional information and ii) selecting the most valuable new observation, is referred to as the Optimal Design (OD) problem (DeGroot 1962; Raiffa 1974).

Uncertainty can be measured by selected “indicators”, related to the variance or to the entropy. Uncertainty reduction is measured as an improvement of the indicator value, such as a reduction in squared error or Kullback-Leibler divergence (Weijs et al., 2010; Nearing and Gupta 2015). If these criteria are employed to select new information in an OD problem, the solution would indicate which observation is the most informative.

The most informative observation, however, is not necessarily the most valuable one (Alfonso et al., 2016; Bode et al., 2016). In the context of a defined decision problem, OD can be employed to quantify the expected usefulness of information, in terms of performance improvement due to better actions, balancing it with the expected costs to gather that new information. In this case, solving the OD problem will suggest i) which, among all possible observations, is the most valuable, and ii) whether to keep gathering information, or to stop and take action, accepting the present level of uncertainty. OD has been extensively applied in various domains, including water management (Yokota and Thompson 2004; Davis 1971; Maddock 1973; Bhattacharjya et al., 2010; Reed and Kollat 2012), where applications focus mostly on remediation problems in groundwater systems (Ben-Zvi et al., 1988; James and Gorelick 1994; Nowak et al., 2012; Bierkens 2006; Kim and Lee 2007; Mantoglou and Kourakos 2007; Chadalavada and Datta 2008). In case the decision problem is a model selection problem, information-based cost functions are most appropriate (Nowak and Guthke 2016; Weijs et al., 2010), while in economically important decisions, where actions are taken, a value of information approach can optimize benefits through the OD problem (Eidsvik and Ellefmo 2013; Trainor-Guitton et al., 2014).

The application of OD is limited by two factors: i) OD problem requires a stochastic hydro-economic model, which is the explicit representation of the most relevant uncertainties. Currently, however, most of the existing models in this domain are deterministic, and therefore unusable for solving the OD problem; ii) Solving the OD Problem requires a large computational effort, limiting its application to small systems. In this paper we introduce an innovative methodology that makes deterministic hydro-economic models with parameters identified using Least Squares Estimation (LSE) directly usable in OD problem. The methodology uses some properties of LSE to get a leaner approximation that can be solved in a shorter time and, potentially, applied to larger systems.

The proposed technique is applied on a test-case in the White Cart river, in Scotland. In this test-case, the primary decision is whether to issue a warning in case predicted water levels exceed a defined threshold. The uncertainty of the rating curve can be reduced by new gaugings, which can be selected on a scale between observations at low-flow , i.e. low-cost but also less informative, and observation at high flow, i.e. costly and more informative.

This paper is structured as follows: In Section 2 we introduce the decision problem, the OD problem, and the innovative solution to solve the OD problem using deterministic model based on LSE; in Section 3 the proposed solution is applied to the test-case; the conclusions are discussed in Section 4.

2 Methodology

We start from the primary decision problem that the decision maker is faced with. In the utility theory framework (Neumann and Morgenstern 1947; Raiffa 1974), a decision problem under uncertainty can be defined as in Eq. 1a.

$$\begin{array}{@{}rcl@{}}&& \text{find } \mathbf{u}^{*} \text{ such that} \end{array} $$
(1a)
$$\begin{array}{@{}rcl@{}} && \mathcal{J}\left(\mathbf{u}^{*},f(\boldsymbol{\lambda})\right) = \min\limits_{\mathbf{u}} \mathcal{J}\left(\mathbf{u},f(\boldsymbol{\lambda})\right) \end{array} $$
(1b)
$$\begin{array}{@{}rcl@{}} &&\text{where } \\ && \mathcal{J}\left(\mathbf{u},f(\boldsymbol{\lambda})\right) = \underset{\mathbf{y}_{\mathcal{M}} \sim f(\mathbf{y}_{\mathcal{M}}) }{ \mathbb{E}} \left[ J (\mathbf{y}_{\mathcal{M}} ) \right] \end{array} $$
(1c)
$$\begin{array}{@{}rcl@{}} &&\mathbf{y}_{\mathcal{M}} = \mathcal{M} \left(\mathbf{u},\mathbf{x}_{\mathcal{M}},\boldsymbol{\lambda} \right) \end{array} $$
(1d)

In Eq. 1b, u is the optimal decision, selected from the bounded set of alternatives \(\mathbf {u}\in \mathbb {U}\), where \(\mathbb {U} \subseteq \mathbb {R}^{N_{u}}\), and N u is the number of decision variables.

In Eq. 1c, J(⋅) is the loss function, \(\mathcal {M} (\mathbf {u},\mathbf {x}_{\mathcal {M}},\boldsymbol {\lambda })\) is the system model. The loss function is defined here as a cost to be minimized. Alternatively, Expression (1c) can be written as a benefit to be maximized. \(\mathbf {y}_{\mathcal {M}}\) is the model output, \(\mathbf {x}_{\mathcal {M}}\) is the model input, and λ the model parameters. Model \(\mathcal {M} (\cdot )\) produces an output \(\mathbf {y}_{\mathcal {M}}\) given inputs \(\mathbf {x}_{\mathcal {M}},\mathbf {u}\) and parameters λ. Both model inputs and parameters can be uncertain. In this case inputs \(\mathbf {x}_{\mathcal {M}}\), parameters λ and outputs \(\mathbf {y}_{\mathcal {M}}\) are stochastic variables, such that \(\mathbf {x}_{\mathcal {M}} \sim f(\mathbf {x}_{\mathcal {M}})\), λf(λ), and \(\mathbf {y}_{\mathcal {M}}\sim f(\mathbf {y}_{\mathcal {M}})\), where f(⋅) is the probability density function (pdf). Output uncertainty can be estimated by integrating input and parameter uncertainty. \(\mathbb {E}(\cdot )\) is the average operator, and \(\mathcal {J}\left (\mathbf {u},f(\boldsymbol \lambda )\right )\) is the expected cost for actions u, given the present state of knowledge f(λ); f(λ) contains the present state of information about λ after all available information has been assimilated.

The problem defined in Eq. 1a provides a firm theoretical framework that has been used in water management applications for risk based decision making (van Overloop et al., 2008; Verkade and Werner 2011; Raso et al., 2014; Vogel 2017).

2.1 Optimal Design Problem

The OD problem, defined in Eq. 2a, identifies the next observation that has the highest expected marginal value.

In Problem (2a), metadata and data make up the new observation. B and C are the conditional benefit and cost of data , and \(\mathcal {B}\) and \(\mathcal {C}\) are the average benefit and cost of getting information from . Metadata allows framing the new data within the available model to give meaning to it. Meta-data specifies where and when an observation has been realized, the observational uncertainty of the measuring instrument, etcetera. Metadata must identify univocally the variable, defined on the bounded set , to which data is related. is the space of observable new information. Data is the numerical value registered after that the observation has been obtained; is the dimension of new data.

The conditional benefit B of data depends on the present available information f(λ) and the new information y. B quantifies the benefit of new information conditioned to the new data, as described in Eq. 3.

$$ {B}(\mathbf{y})= \mathcal{J}\left(\mathbf{u}^{*}_{0},f(\boldsymbol{\lambda}|\mathbf{y})\right)- \mathcal{J}\left(\mathbf{u}^{*},f(\boldsymbol{\lambda}|\mathbf{y})\right) $$
(3)

In Eq. 3, f(λ|y) is the present information after the new information y has been taken into account, u0∗ is the optimal action conditional to information before the obtaining of new data, such that \(\mathbf {u}^{*}_{0} = \arg \min \mathcal {J}\left (\mathbf {u},f(\boldsymbol {\lambda })\right )\); \(\mathcal {J}\left (\mathbf {u}^{*}_{0},f(\boldsymbol {\lambda }|\mathbf {y})\right )\) is the expected cost if action did not adapt, calculated a posteriori, once y is given; \(\mathcal {J}\left (\mathbf {u}^{*},f(\boldsymbol {\lambda }|\mathbf {y})\right )\) the costs after that new information has been assimilated and actions adapt consequently; u is the optimal decision after new information has been assimilated. The benefit of information is proportional to the performance improvement, which ameliorates thanks to the better actions due to better information.

The cost depends on how the new piece of information is collected, and includes only the marginal costs of getting the new information. Costs that do not depend directly on the specific decision about getting additional data are “sunk costs” (Menger 1981), and must be excluded. Costs can include “generalized” costs, such as the cost related to the time to wait before getting the data. Equation 4 defines the conditional cost.

In Eq. 4, costs are decomposed in a deterministic C D and stochastic C S . The stochastic component depends on the data outcome. Additional information reduces uncertainty in the stochastic component of costs. According to the certainty equivalence principle (Philbrick and Kitanidis 1999; Van de Water and Willems 1981), any other randomness independent of y or other problem components can be treated as deterministic, using its expected value only.

In Problem (2a), however, the variable y is yet to be observed. Nonetheless, the pdf of y can be estimated using the best present knowledge, as in Eq. 5.

$$ f(\mathbf{y})=\underset{\boldsymbol{\lambda}} {\mathbb{E}} [ f(\mathbf{y}|\boldsymbol{\lambda}) ] = {\int}_{\Lambda} f(\mathbf{y}|\boldsymbol{\lambda})\cdot f(\boldsymbol{\lambda}) \cdot d\boldsymbol{\lambda} $$
(5)

In Eq. 5, f(y) is the pdf of y; f(y) depends on both present information f(λ) and observational uncertainty p(y|λ). Equation 5 can then be used to estimate the first term in Eq. 2c.

When, on average, \(\mathcal {J}\left (\mathbf {u}^{*}_{0},f(\boldsymbol {\lambda }|\mathbf {y})\right )\!<\! \mathcal {J}\left (\mathbf {u}^{*},f(\boldsymbol {\lambda })\right )\), new information is expected to improve the actions and get better results. \(\mathcal {B}\) is always non-negative (Parmigiani and Inoue 2009). However, if u does not change for all possible values of y, then both terms of the right side of Eq. 3 are equal, and \(\mathcal {B}\,=\,0\). In such a case, additional information has no added benefit.

OD can also be used as stopping criterion: when the expected benefit is lower than the expected cost for all possible observations, the next piece of information is expected to be not worth its cost, hence a level of “rational ignorance” (Simon 1990) is reached: reducing uncertainty can still be reduced, but it is not effective from a cost-benefit point of view.

The dependency of f(y) on f(λ) implies that OD is a iterative problem. Every time a new piece of information is obtained, this is processed and assimilated within present information, and the uncertainty represented by f(λ) is reduced. Then, the problem in Eq. 2a is reformulated using the new state of information.

2.2 OD Using LSE

The OD Problem requires: 1) for all observable variables , 2) f(λ), 3) f(y), and 4) f(λ|y). In the following we will show how to define these distributions if data are assimilated using Least Squares Estimation (LSE).

LSE can be used to identify the model parameters given the data and a deterministic model structure. LSE are extensively used in many fields (Kariya and Kurata 2004), including hydrology (Weijs et al., 2013). Equation 6a presents the rule for parameter estimation using LSE.

$$\begin{array}{@{}rcl@{}}&&\min\limits_{\boldsymbol \lambda} \sum\limits_{i=1}^{k} {\mathbf{e}_{i}}^{2} \end{array} $$
(6a)
$$\begin{array}{@{}rcl@{}} &&\text{where } \\ &&\mathbf{e}_{i}= \mathbf{y}_{i} - \mathcal{R} (\mathbf{x}_{i}, \boldsymbol \lambda) \end{array} $$
(6b)

In Eq. 6a, k is the number of data points, \(\mathbf {e}_{i} \in \mathbb {R}^{N_{y}}\) are residuals, defined in Eq. 6b as the difference between the observed data and the results from relation \( \mathcal {R} (\cdot )\); \( \mathcal {R}\) is a sub-component of the whole system model \(\mathcal {M}\); \( \mathcal {R} (\cdot )\) is intended to predict y; in \( \mathcal {R} (\cdot )\), x is the input and λ the parameters. Uncertainty on effects of actions can be included considering actions u as part of the inputs of model \(\mathcal {R} (\cdot )\), such that ux.

If relation \(\mathcal {R}\) is in the form as Eq. 7, then a LSE solution exists in closed form.

$$ g_{y}(\mathrm{\mathbf{y}}) = \sum\limits_{j}^{N_{\lambda}} \lambda_{j} \cdot g_{j}(\mathbf{x}) $$
(7)

In Eq. 7, g j (⋅) and g y (⋅) are functions, also non-linear, of x and y.

In LSE, when residuals e are normal, independent, and identically distributed, i.e. f(y|λ) ∼ N(0, Σ), then parameters λ are also normal (Leon 1980), defined as in Eq. 8a.

$$\begin{array}{@{}rcl@{}} f(\boldsymbol{\lambda}) &\sim& \mathcal{N}(\boldsymbol \mu_{\lambda},\mathbf{\Sigma}_{\lambda}) \end{array} $$
(8a)
$$\begin{array}{@{}rcl@{}} \boldsymbol{\mu}_{\lambda} &=& ({X}^{T} X)^{-1} {X}^{T} Y \end{array} $$
(8b)
$$\begin{array}{@{}rcl@{}} \mathbf{\Sigma}_{\lambda} &=&\boldsymbol {\Sigma} \left({X}^{T} X\right)^{-1} \end{array} $$
(8c)

In Eqs. 8b, \(X=[\underline {g} (\mathbf {x}_{i}),\ldots , \underline {g}(\mathbf {x}_{k})]\) is a matrix of dimension k × N λ , made of k data-points of N λ -dimension; regressors \(\underline {g}(\mathbf {x}_{i})=[g_{j} (\mathbf {x}_{i}), \ldots , g_{N_{\lambda }}(\mathbf {x}_{i}) ]\); Y = [g y (y i ),…, g y (y k )] is a matrix of dimensions k × N y , made of the observed outputs. Σ is estimated as in Eq. 9.

Under the same conditions on the residuals, LSE parameter estimation is equivalent to Maximum Likelihood Estimation (Charnes et al., 1976). Maximum Likelihood estimation is equivalent to using Bayes rule starting from a condition of no a-priori information and optimal learning from data (Jaynes and Bretthorst 2003)). f(y|λ) is in fact the conjugate prior in the Bayes Equation for the conjugate distributions f(λ) and f(λ|y), that are both normal. When Σ is a known parameter, then the conjugate prior is also normal (Diaconis et al., 1979). When observational uncertainty variance is estimated from data, then the conjugate prior is Inverse-Gamma distributed, or Inverse-Wishart distributed in the multidimensional case (Diaconis et al., 1979). Normal and Inverse-Gamma distributions converge when k is sufficiently large (Leemis and McQueston 2008).

f(y|λ) is the distribution of data after its observation. The uncertainty of this stochastic variable is considered irreducible. In f(y|λ), the average is zero and the covariance matrix is Σ. When the average is not zero, any knowledge about the presence of a bias can be used to correct its estimation and bring it to zero average (Sorooshian and Dracup 1980). Σ is a measure of the observational uncertainty: it can be either a given parameter, when, for example, the observational uncertainty is known from the characteristics of the observation technique, or estimated from data. When observational uncertainty is to be estimated from data, the parameter set is extended to include variance parameters, such that Σλ. In LSE, covariance Σ can be estimated as in Eq. 9.

$$ \boldsymbol {\Sigma} = \frac{E^{*}}{k-N_{\lambda}} $$
(9)

In Eq. 9, E is the minimum of sum of squares errors, as defined in Eq. 6a. For models as in Eq. 7, E = (YX(X T X)−1 X T)T(YX(X T X)−1 X T), and kN λ is the degree of freedom.

The distribution of the next data-point can be derived from Eq. 5. Given observable , and related data Eq. 10a defines the distribution of next data-point.

$$\begin{array}{@{}rcl@{}}f(\mathbf{y}_{\mathbf{x}}|\mathbf{x})&\sim& \mathcal{N}(\boldsymbol \mu_{y}, \boldsymbol {\Sigma}_{y}) \end{array} $$
(10a)
$$\begin{array}{@{}rcl@{}} \boldsymbol \mu_{y}&=& \boldsymbol \mu_{\lambda} \mathbf{x} \end{array} $$
(10b)
$$\begin{array}{@{}rcl@{}} \boldsymbol {\Sigma}_{y} &=& \mathbf{x} \boldsymbol {\Sigma}_{\lambda} \mathbf{x}^{T} \end{array} $$
(10c)

Once a new data-point is available, this is assimilated in the set of available data, such that \(X \leftarrow [X, \underline {g}(\mathbf {x}_{k+1})]\) and \({Y}\leftarrow [Y, g_{y}(\mathbf {y}_{k+1})]\), where [ x k+1, y k+1] is the new data-point.

Once f(y|λ) are defined for all observable variables , f(λ), and f(λ|y) are defined using LSE properties, these can be used within the OD problem.

3 Application

The methodology outlined in the previous section is applied to a test case in the White Cart River in Scotland. This case has been selected as it offers a simple non-trivial case of a monitoring problem related to a well defined decision problem. Monitoring based on OD is compared to monitoring uniquely based on uncertainty reduction. In the latter, new data is selected according to an information criterium, as in Alfonso et al. (2010) and Mogheir and Singh (2002).

In this river a flood warning scheme has been established. A hydrological model of the upstream catchment is used to predict the discharge at the gauging station at Overlee. The rating curve available at Overlee is used to transform the predicted discharge to a predicted water level, which is then compared to the flood warning threshold. If the threshold is exceeded, a warning may be issued and response to mitigate flood losses initiated. A more complete description of the system can be found in Werner et al. (2009) and in Verkade and Werner (2011). The flood warning scheme considered here is simplified for the purpose of this study, when compared to the actual operational scheme.

In predicting the levels, we consider two sources of uncertainty: i) discharge prediction and ii) rating curve. In this study, we focus only on the rating curve uncertainty and its reduction by new data. The parameters of the rating curve are estimated using the gauging data. This data includes a set of observed water level and discharge pairs, referred to as gaugings. These are collected through an in-situ discharge measurement by a team of hydrometrists. A complete rating curve estimation requires gaugings at a range of levels/discharges. The rating curve model used at Overlee is a power law equation, with the parameters of this power law established through regression using the available set of gaugings. The uncertainty in the rating curve affects the uncertainty of water level predictions, especially for flood situations (Tomkins 2014; Sikorska et al., 2013; Domeneghetti et al., 2012; Di Baldassarre and Montanari 2009; Pappenberger et al., 2004; Le Coz et al., 2014; McMillan et al., 2012). The uncertainty in the rating curve can be reduced by adding data points in the regression, which are obtained through extra gaugings. These gaugings are laborious and when carried out manually, such as at Overlee, labour intensive, particularly at higher flows, when gauging may even become too dangerous to be done. The timing of the new gauging must be selected to coincide with either low-flow conditions, that are low-cost but also less informative, or high flow, that are costly and more informative, particularly for flood events.

In the primary decision problem the decision maker must decide whether to issue a flood warning. In the secondary monitoring problem the decision maker must efficiently select the additional gaugings to be carried out, and at what flow conditions.

3.1 Decision Problem

Equations 1112 define the optimal decision problem for this system.

$$ \mathcal{J}^{*} = \min\limits_{\{u_{t}\}_{t}^{H}} \sum\limits_{t=1}^{H} \mathcal{J}_{t} $$
(11)

where

$$\begin{array}{@{}rcl@{}} \mathcal{J}_{t} &=& u_{t} \times L \times y_{F,t} \\ &=& \left[ \begin{array}{c} u_{t} \\ 1-u_{t} \end{array} \right]^{T} \times \left[ \begin{array}{cc} C_{w} + L_{u} & C_{w} \\ L_{a} + L_{u} & 0 \end{array} \right] \times \left[ \begin{array}{c} p_{t} \\ 1-p_{t} \end{array} \right] \\ &=& \left[ \begin{array}{c} u_{t} \\ 1-u_{t} \end{array} \right]^{T} \times \left[ \begin{array}{c} p_{t} \cdot (C_{w} + L_{u})+ (1-p_{t}) \cdot C_{w} \\ p_{t} \cdot (L_{a} +L_{u}) \end{array} \right] \\ &=& \left[ \begin{array}{c} u_{t} \\ 1-u_{t} \end{array} \right]^{T} \times \left[ \begin{array}{c} C_{w} \\ p_{t} \cdot L_{a} \end{array} \right] \end{array} $$
(12)

In Eq. 11, \(\mathcal {J}_{t}\) is the time-step cost function; u t ∈{0,1} is the daily decision whether a flood warning is to be issued. This decision must be taken at all time-steps t ∈{1,…, H}, where H is the length of the problem horizon.

In Eq. 12, y F, t ∈{0,1} is a stochastic variable defining the occurrence of a flood event at day t. A flood event occurs when water level exceeds the flood threshold, as defined in Expression (14). p t represents the flood probability, i.e. P(y F, t = 1) = p t . L is a 2 × 2 matrix representing the cost structure for all possible combinations of events and decisions. In matrix L, C w is the cost of warning and the cost of response to the warning, L u is the unavoidable loss in case of flood, and L a is the loss that can be avoided due to the response to a flood warning, while L a + L u is the total loss in case of a flood occurring without warning. Costs are all positive values, and such that L a > C w . Maintenance costs are not considered, being irrelevant in this decision problem.

The no-flood no-warning situation, in the lower-right quadrant of L, is the “business as usual” case. The flood warning situation, upper-left quadrant, is the “hit” case, where the warning is correctly issued before a flood event. The no-flood warning situation, upper-right quadrant, is the “false alarm” case, where the warning issues turned out to be unnecessary. The flood no-warning situation, lower-left quadrant, is the “missen case. The warning system failed to anticipate the event. Costs due to uncertainty are due to the occurrence of the last two cases. In this experiment, C w = 1000 $, L a = 4000 $, and L u = 1000 $.

The action is selected according to Rule (13).

$$ u_{t}= \left\{\begin{array}{l}1 \text{ if } p_{t}>C_{w}/L_{a} \\ \text{0 otherwise} \end{array}\right. $$
(13)

This rule is derived from Expression (12), simplifying its application: action u t can be selected according to this simple rule rather than solving the correspondent optimization problem.

The flood event is defined by Condition (14).

$$ y_{F,t}=\left\{\begin{array}{l}1 \text{ if } h_{t}>\bar{h} \\ \text{0 otherwise} \end{array}\right. $$
(14)

Condition (14) states that flood occurs when water level h t exceeds a given threshold \(\bar {h}\), or equivalently, when \(y_{t} > \bar {y}\), where \(y_{t}= \log {h_{t}}\) and \(\bar {y}= \log \bar {h}\). In the numerical experiment, the flood warning threshold is set at \(\bar {h}= 1.1\) m, corresponding to 1/2 year return period. This threshold is lower than the true warning threshold at Overlee, but it ensures a sufficient number of flood events in the experiment. The discharge uncertainty is included by adding an artificial noise to include the upstream rating curve uncertainty and the presence of an unknown lateral discharge. The forecasted discharge is considered log-normally distributed with sigma parameter equal to 0.05.

Water level can be estimated from river discharge using the rating curve. Equation 15 defines the rating curve, relating water level h and discharge q, with the parameters a, b and c determined through fitting the curve to the available set of gauged discharge-level pairs.

$$ q= a \cdot (h+b)^{c} $$
(15)

The rating curve in Eq. 15 is a deterministic power law. Rating curve estimations generally use multiple parameter sets for different river stages (Reitan and Petersen-Øverleir 2009; Petersen-Øverleir and Reitan 2005). Two parameters sets are used at Overlee, for h below or above 0.5. We are interested in floods, and therefore we will consider only the upper part of the rating curve.

Rating curve parameters identification can be made easier by transforming Eq. 15 in a linear regression. Equation 16 is obtained by log transform of Eq. 15.

$$ x = \lambda_{0} + \lambda_{1} \cdot y $$
(16)

In Eq. 16, \(x= \log q\), \(y= \log (h+ b)\), \(\lambda _{0} = \log a \), and λ 1 = c; b has little influence on the high-flow part of the rating curve, therefore it will be considered known (or zero); λ = [λ 0, λ 0] are parameters of Relation (16).

The system model is made of i) Relation (15), ii) Condition (14), iii) Decision Rule (13), and iv) Loss matrix L. The causal chain of model \(\mathcal {M}\) is \(x_{t} \rightarrow y_{t} \rightarrow y_{F,t} \rightarrow p_{t} \rightarrow u^{*}_{t} \rightarrow {\mathcal {J}^{*}}_{t}\) (Fig. 1).

Fig. 1
figure 1

Test case model schema. At each timestep t, discharge \(\hat {x}_{t}\) (squares) is forecasted with uncertainty f(x). Rating curves parameters λ = [λ 0, λ 1] are estimated from data-points [y, x] (circles). Rating curve (bold dash-dot line) is also uncertain f(λ) (grey dash lines). Water level uncertainty f(y) is the result of discharge and rating curve uncertainty. Flood event y F, t happens if \(y_{t} > \bar {x}\) (grey dot line), with probability p t (dashed area)

3.2 Optimal Design problem

The uncertainty of the rating curve, in Eq. 16, can be reduced by additional data. Setting up the OD Problem requires defining the space of possible observations, relative data, cost and observational uncertainty for all possible observations.

Each observation is defined by the water level at which the gauging is realized, . The relative data is the couple water level-discharge, [y, x]. The space of possible observations is . In this experiment, however, the space of possible observation is limited by the available data-points, made of 173 pairs. Observational uncertainty is f(y|λ) ∼ N(0, σ R C ), where σ R C is either a given parameter, derived from instrument characteristics, or estimated from data. We consider the case when σ R C is estimated from data, as in Eq. 9. Observation costs are made up of the direct gauging costs and indirect costs, or the “cost-to-wait”. The gauging cost has a fixed component, required for all gaugings, and a variable one, which increases with increasing flows. The cost-to-wait is related to lower frequency of higher water levels. Waiting for the occurrence of higher water levels has a cost, because, if the waiting time increases, costly flood events are more likely to happen. Eq. 17a defines the cost of gauging. Discharge dynamics and its uncertainty (Angulo et al., 2000) are not considered.

$$\begin{array}{@{}rcl@{}}C(\mathbf{y}) &=& C_{g}(y) + C_{w} (x) \end{array} $$
(17a)
$$\begin{array}{@{}rcl@{}} &=& c_{g} + {c_{q}} \cdot \exp(x) \end{array} $$
(17b)
$$\begin{array}{@{}rcl@{}} &&+ \mathcal{J}^{*}/H\cdot \frac{1}{P(Y\geq y)} \end{array} $$
(17c)

In Eq. 17a, Line (17b) defines the direct cost of gauging: c g is the fixed cost component, and c q the part proportional to discharge, such that C g (q) = c g + c q q, written in Eq. 17b as function of x. In the experiment, c g =500 $ and c q =7 $/ m 3/s. Line (17c) defines the indirect cost of gauging, linked to the cost to wait. Cost-to wait is proportional to the expected cost per day due to possible floods, \(\mathcal {J}^{*}/H\), where \(\mathcal {J}^{*}\) is the expected cost for the entire horizon H as defined in Expression (11), multiplied by the waiting time in days, 1/P(Yy). This implies that additional information reduces observation cost. Time is inversely proportional to the frequency of events larger than y, P(Yy). P(Yy) = 1 − P(Yy) = 1 − F(Y ), where F(⋅) is the cumulative density function of y, estimated here using the empirical distribution on available data y.

Table 1 shows the main variables and parameters of the decision and design problem.

Table 1 List of main variables and parameters

3.3 Results

Starting from a rating curve with 3 data-points taken at low-flow conditions, new observations are selected by iteratively solving the OD problem. The OD problem is solved numerically using a sample average approximation. The samples dimensions are: 60 from f(λ), 20 from f(x t ), ∀ t, 30 from f(x|y).

Figure 2, plot a, shows expected observation benefit and cost at different water levels, for the selection of the first additional gauging. Observations are more informative and more costly as the water level increases. Cost starts rising rapidly from about y > 0, exceeding the benefit of information in the proximity of y > 0.1, corresponding to water level h = 1.1m. The most valuable observation is where the difference between benefit and cost is the largest. In this case the OD will select gauging at y = −0.21, corresponding to water level h = 0.81 m.

Fig. 2
figure 2

Above: Plot a. Expected Benefit and Cost for observations at different water levels, at first gauging. Below: Plot b. Expected Benefit and Cost for observations at different water levels, at first gauging, with different initial information: low-flow data-points only (black lines) and evenly distributed data-points (grey lines)

Figure 2, plot b, shows the effect of different initial states of knowledge on observation benefit and cost. The black lines refer to an initial condition where the three available gauging pairs are sampled at a low water level, all in the neighbourhood y = −0.52763. The grey lines refer to an initial condition where the three data-points are evenly distributed on the water level distribution, namely y = [−0.53,−0.49,−0.30]. The vertical scale is logarithmic.

In the case with evenly distributed data-points, the initial condition contains more information than in the case with low-flow data-points. Despite having the same number of data-points the distributed data allow a better estimation of the parameters. When starting from a more informative situation, the added value of one additional data-point is lower. Consequently, in Fig. 2, plot b, the continuous grey line is below the continuous black line for all water levels. More information allows a better prediction of flood events, reducing type I and type II errors, consequently reducing average cost. Observation cost depends on expected cost, \(\mathcal {J}^{*}()\), on which cost-to-wait depends. New information reduces the expected cost-to-wait, hence the cost of observation, on average. In Fig. 2b, in fact, the dashed grey line is below the dashed black line. Data-points taken at high water level will increase the relative benefit of observation at lower water levels. In the evenly distributed case the OD suggests an optimal trade-off at y = −0.56, corresponding to a water level h = 0.57 m, lower than the optimal trade-off in the low-flow data case.

Monitoring based on OD is compared to monitoring uniquely based on uncertainty reduction. In the latter case, the criterium is selecting an observation that reduces the uncertainty of the water level estimation the most. Uncertainty is quantified by the total variance. i.e. \(\min VAR(y)\), as defined in Appendix I, Eq. 20a.

Figure 3 shows the relative frequency of observations at different levels of y for the OD problem and the information-based. The relative frequency at which the actual gaugings were taken is also shown. The OD criterion selects mostly low-flow values, due to their relative low cost. The information-based criterion, on the other hand, distributes observations approximately equally between very high flow and low flow conditions, in order to reduce the degree of freedom of the rating curve relation. In fact, if data-points are to be selected considering an information criterium only, it is preferable to explore the entire relation domain and, in the linear case, its extremes. Actual observations are evenly distributed. In the latter case, other criteria to select at what discharge/level the gauging is taken, more than just data for flood protection, are likely to be present. Moreover, the present OD problem setting considers average observation cost, without taking into account current information about river stage dynamics. In case of high-flow, in fact, one can take immediate advantage of the present condition to obtain a highly informative data-point, without having to wait for it.

Fig. 3
figure 3

Observations frequency at different water levels

In the efficiency-based case, for the present cost structure, the average benefit per observation is about 270 $. If compared to the problem costs, this implies that 4 gaugings make up the cost of a false alarm, and about 15 gaugings the extra-cost for failing in forecasting a flood.

In the information-based the average observation value is − 13 × 103 $. The high negative value means that the costs of observation is much larger than its benefit, showing the inconvenience of selecting additional data using an information-based criterion only. In practice, however, if observations are costly, there will be a balance between informativeness and efficiency. In this case the weights to be assigned to information and cost are subjective. In the efficiency based approach this duality is solved by evaluating the benefit of information, which can be compared to the cost to obtain it.

The information based approach, despite its inefficiency, is highly effective in reducing uncertainty. Figure 4 represents how rating curve uncertainty is reduced as a function of the number of gauging data pairs when data is selected using efficiency and information based criteria, taking the variance of the parameters λ 0 and λ 1 as indicator of uncertainty.

Fig. 4
figure 4

Rating curve parameters variance in the information-based (black line) and the efficiency-based (grey line) cases

Figure 4 shows the evolution of rating curve parameters variance with observations. In the long run, uncertainty is reduced in both cases, but in the information-based case the variance is smaller, resulting in better defined parameters.

4 Conclusions

In this paper we propose and test an innovative methodology to make deterministic hydro-economic models calibrated with Least Squares Estimation (LSE) usable for solution of the Optimal Design (OD) problem.

The application of the OD problem has been limited by its large computational burden and the need to use stochastic models, whereas most of existing hydro-economic models are deterministic. The solution we present uses some of the properties of LSE to handle deterministic models and get a leaner approximation of the OD problem. Thanks to this innovation, LSE can be used to identify relation parameters given the data and a deterministic model structure. Under some mild assumptions of the distribution of residuals, the parameter distribution family can be derived and used in the OD problem, and the OD problem boils down to a parametric estimation. Using this approach, the OD can be solved in a shorter time and, potentially, applied to larger systems.

Despite the proven effectiveness of the methodology, OD remains a computationally intensive method: scaling it up to large hydro-economic models is not explored here, and requires further research. Moreover, the methodology demands some assumptions on residual distribution to be valid. The test case is well characterised, with residual sufficiently “weln distributed. The validity of the method when these assumptions on residuals distribution do not hold is to be explored.

Solving the OD problems indicates the piece of information that has the higher net benefit, i.e. the difference between the added value of that piece of information in terms of decision improvements, and the cost to obtain it. We presented the procedure to set a OD problem. Required elements of an OD problem are: i) a defined decision problem, ii) the space of possible observations, and, for all possible observations: ii-a) data uncertainty and ii-b) cost structure.

The proposed methodology is applied to a simplified although non-trivial decision problem under uncertainty: the flood warning system in the White Cart River, Scotland. The decision-maker must decide whether to issue a flood warning. River discharge forecast and the rating curve are uncertain. The rating curve parameters estimation can be improved by new gaugings, which can be selected on a scale between low-flow observations, that are low-cost but also less informative, and high flow ones, that are costly and more informative. The OD finds a balance between these two suggesting the most efficient data collection. Results from OD problem using an efficiency based approach is compared to results from an information based approach, where the criterium is reducing uncertainty as such: the two criteria lead to completely different results: the information based criterium leads to effective, but very costly, data-points selection; an economic criteria leads to economic efficiency. We suggest including an economic evaluation in data selection, in order to maximize monitoring efficiency.

The methodology can be applied to a large set of monitoring problems related to a well defined decision problem. The monitoring problem regards the decision on which piece of information to get, when getting that information has some cost: the purchase of remote sensing data, the optimal number and position of rain-gauges, the trajectory of drones are a few examples of monitoring problems. Example of defined decision problems are, at the operational level, a flood warning systems, as seen in the test case, or reservoir operation with defined objectives, i.e. electricity production. At the planning level, some examples of possible applications are: levee dimensioning, reservoir location, or even non-structural policies, such as new rules, the entity of incentives or Pigouvian tax.

The methodology presented here can open up new potential application for efficient monitoring. Efficient monitoring, guided by the value of information, is closely related to the needs of the decision maker. Value of information could be used as a means to inform the decision maker on priorities, and, being closer to the cognitive framework of the decision maker, it is probably more convincing than just uncertainty reduction.

To become fully mature, the methodology presented must prove its applicability to larger systems, and future research will tackle the computational issues related to more complex applications.