1 Introduction

In well maintained water distribution systems (WDS), only 3 - 7 % of water is lost. Though in developing countries this number can go up to 50 % and more (Colombo et al. 2009). Most of the lost water originates from leaks.

Therefore, water loss reduction through locating and repairing leaks in WDS is becoming increasingly important.

In terms of leak localization water utilities follow two different approaches, the active and passive leakage control (Puust et al. 2010). In case of the passive approach, the utility waits until customers report leaks because they experience supply problems or leakages find their way to the surface and thus become visible. However, not all leaks become visible leading to high unreported leaks and hence to high total water losses.

The better strategy to reduce water losses is active leakage control (Farley and Trow 2003). One central task of an active leakage control strategy is searching for leaks on a regular basis. Different methods are applied for finding the position of leaks, like e.g. step testing or by temporary placing acoustic loggers. These methods are referred to as leak localization, which has the purpose of limiting the possible location of a leak to several hundred meters. Leak localization has to be followed by pinpointing to get the exact position of the leak e.g. with ground microphones, leak-noise correlators or gas leak detection.

Model-based leak localization methods have been introduced over the past few decades using measurement data from hydraulic sensors along with hydraulic models. In general, two different techniques are available. First, steady state analysis based techniques, second, transient based techniques (Colombo et al. 2009).

This paper is based on steady state leak localization, which was first proposed by Pudar and Liggett (1992). They formulated an inverse problem where the discrepancy between real-world measurements and data obtained from hydraulic simulations was minimized. Since then, leak localization by solving inverse model-based approaches has been extensively investigated (e.g. Wu et al. 2010; Pérez et al. 2011).

In fact, the success of model-based leak localization depends to a great extent on the choice of the measurement positions within a WDS (Kang and Lansey 2010). For that reason optimal sensor placement (OSP) has the potential for significantly improving the localization of leaks.

While much research work has been put into dealing with OSP for contaminant detection and WDS model calibration (see for example Savic et al. 2009), only few studies have focused on the same problem for leak localization.

The first discussion and analysis of OSP for leak localization emerged in 2008 with the work of Farley et al. (2008, 2010, 2013), followed in 2009 by Pérez et al. (2009). Both OSP approaches are based on the leak sensitivity matrix. The approach of Pérez et al. (2009) binarizes this matrix which leads to a loss of information (Quevedo et al. 2011).

To overcome this information loss, Casillas et al. (2013) formulated the OSP as an integer optimization problem based on projections from a non-binarized leak sensitivity matrix solved with a semi-exhaustive search and a genetic algorithm (GA).

Subsequently, Pérez et al. (2014) defined a greedy-search algorithm, which minimizes the maximum distance of leak scenarios to the gravity center of the nodes with projections over 99 % of the maximum value of the projections. This resulted in more robust sensor placements.

Recently, Cugueró-Escofet et al. (2015) extended the approach of Casillas et al. (2013) with a relaxed isolation index taking practical applications like the acceptable isolation distance into account.

Alternatively, Sarrate et al. (2012) invented a strategy maximizing a ”leak isolability index” based on structural analysis of a WDS solved by a depth-first search algorithm. Since this strategy is only capable of medium-sized networks, it is combined in Sarrate et al. (2013) with clustering techniques to reduce the size and complexity of the OSP problem. However, due to its simple description, structural analysis cannot ensure that the diagnosis performance obtained from such models hold for real-world systems. Therefore, in Sarrate et al. (2014) the approach is combined with projections calculated again from sensitivity matrices.

Nejjari et al. (2015) invented an algorithm based on pressure sensitivity matrix analysis using an exhaustive search strategy with average worst leak expansion distances as sensor placement performance measure.

Currently, Casillas et al. (2015) placed sensors in an optimal way by minimizing the overlapping signatures in the leak signature space.

A completely different approach was introduced by Christodoulou et al. (2013), who invented an entropy-based OSP algorithm by maximizing the total entropy in the network.

In hydraulic modeling as well as in measuring hydraulic parameters one has to face several sources of uncertainty (Hutton et al. 2014). We therefore assume that these effects should be taken into account in OSP algorithms. Blesa et al. (2014) studied the robustness of the methodology introduced by Sarrate et al. (2014) against sensitivity matrix uncertainties. The main result was that the sensor positions are not sensitive to the size of the leaks, but to the working point of the WDS. Furthermore, Blesa et al. (2015) used multi-objective optimization to place sensors. The objectives were to determine the mean and the worst locatability index.

Both papers investigate the consequences of uncertainties on the OSP algorithms, but neither of them incorporates these uncertainties in the OSP itself. To the best of our knowledge, Steffelbauer et al. (2014) was the first study incorporating uncertainties in the OSP problem. Unfortunately, the problem was only solved for four sensors.

The aim of this paper is to extend the work described in Steffelbauer et al. (2014) providing a methodology that uses cost-benefit functions for predicting the sensor placements quality for leak localization depending on the number of sensors as well as the strength of general and not only demand uncertainties. Therefore, it provides WU with a methodology that helps them to decide how many sensors should be placed in their system for leak localization.

2 Methodology

2.1 Efficient Sensor Placement - General Approach

Figure 1 depicts an overview of the methods used in this paper and their chronological order.

Fig. 1
figure 1

Flowchart of the methodology

First, hydraulic simulations are used to calculate the sensitivity matrices s and the residuals r (see Section 2.2). s and r are later used in the OSP algorithm.

Additionally, the hydraulic model is used to calculate the effect of uncertain input parameters on the outputs of the hydraulic model (σ), called model output uncertainty (MOU) throughout this paper. The MOU calculations are performed through Monte Carlo simulations (MCS) (Section 2.3).

The MOU obtained from MCS are integrated into the sensor placement algorithm of Casillas et al. (2013), by extending this approach to punish measurement points with high uncertainties. The effect of uncertain demands on the pressures of the hydraulic model serves as an example of application throughout this paper.

Together with the number N of sensors and a weighting factor ω to adjust the strength of the uncertain demands on the sensor placement, s, r and σ are used with a GA (see Section 2.5) for placing the sensors in an optimal manner (Section 2.4) resulting in a leak localization quality parameter 𝜖.

The calculation of 𝜖 for different N and ω allows the derivation of sensor placement cost-benefit functions and different fit-functions are tested to describe the behavior of 𝜖 as a function of N. Goodness-of-fit (GoF) statistics (Section 2.6) are then used to find the cost-benefit function which fits the simulation data the best.

Sensitivities, residuals, MCS and the OSP are calculated by means of OOPNET, an object-oriented Python network analysis tool for WDS (see Steffelbauer and Fuchs-Hanusch 2015). Leaks are simulated using the power leak law (Ferrante et al. 2014) outflow equation

$$ Q = c_{e} \cdot p^{e_{e}} \qquad , $$

where Q is the leak outflow, c e and e e the emitter coefficient and exponent and p the pressure at the leak position.

2.2 Calculating Sensitivity Matrices and Residual Vectors

The sensitivity matrix S states the key to decide where to install pressure measurements in the system (Pudar and Liggett 1992)

$$ S = \left( \begin{array}{cccccccccc} s_{11} & {\ldots} & s_{1n} \\ {\vdots} & {\ddots} & {\vdots} \\ s_{m1} & {\ldots} & s_{mn} \end{array}\right) \qquad . $$

m is the number of leak scenarios and n is the number of possible measurement positions. Leak scenarios are the output of hydraulic simulations by introducing a leak of a specific size and on a specific position in the system. The individual elements of S are calculated by subtracting pressures of simulations with no leak \(\hat {p}_{j}\) from the pressures which are calculated at the same position \(p_{j}^{f_{i}}\) under leak scenario i with a leak of size f i and normalizing according to the leak size

$$ s_{ij} = \frac{\partial p_{j}}{\partial f_{i}}= \frac{p_{j}^{f_{i}} - \hat{p_{j}}}{f_{i}} \qquad . $$

In order to estimate the general sensitivity of a potential measurement point with respect to all possible leak scenarios, the mean over the column of S is also calculated

$$ \overline{s}_{j} = \frac{1}{m} \sum\limits_{i = 1}^{m} s_{ij} \qquad . $$

This is conducted for the graphical representation of the sensitivity calculation results (see for example Fig. 2a).

Fig. 2
figure 2

(a) Effects of demand uncertainties on nodal pressure and (b) overall sensitivity of nodal pressures to leaks

The leak localization also relies on the computation of residuals, which are expressed as the difference between real-world measurements and the expected behavior of the system described by the hydraulic model. The residuals r i j are thus built in a similar manner to the sensitivity matrices, but in general with a different leak size f i and without normalization

$$ r_{ij} = p_{j}^{f_{i}} - \hat{p_{j}} \qquad . $$

This results in a matrix R which represents measurements in the system. Furthermore, the i-th column of R is called i-th residual vector r i .

2.3 Model Output Uncertainty Calculation with Monte Carlo Simulations

The quality of a hydraulic model relies heavily on the model input parameters. Generally, these input parameters are fraught with uncertainty. Consequently, the uncertain input parameters will lead to MOU.

Hydraulic solvers like EPANET (Rossman et al. 2000) merely simulate the deterministic forward model without taking uncertainties into account. To compute the MOU, the hydraulic solver must be combined with uncertainty propagation methods such as e.g. first order second moment, MCS or Latin hyper-cube sampling (Kang et al. 2009). In this paper MCS is chosen to calculate the MOU since MCS has the ability to converge to the exact uncertainty estimates if a large number of parameter sets is simulated.

MCS can in general be applied to every combination of model input and model output parameters in a hydraulic model and therefore every kind of input uncertainty can be incorporated in the enhanced sensor placement algorithm in Section 2.4. However, in order to make the method more tangible, it is described on the basis of calculating the effect of uncertain demands on the pressures in the system.

For this example, the parameter set is realized in such a way that each demand in the system is randomly drawn from a normal distribution with a mean μ q and a standard deviation σ q . Subsequently, EPANET is called using these randomly distributed demands to perform a hydraulic simulation for obtaining the resulting pressures p j at every node j in the system. This procedure is repeated mcsteps-times to get the approximate probability distributions P(p j ) of p j for every possible measurement position in the system.

Once P(p j ) is calculated, the standard deviation \(\sigma _{p_{j}}\) is taken from this distribution serving as MOU of the pressures in point j. The higher \(\sigma _{p_{j}}\), the noisier the signal is at this point, the less ideal this location is for pressure measurements.

2.4 Enhanced Sensor Placement Considering Uncertainties

The sensor placement algorithm described in Casillas et al. (2013) is based on calculating projections ψ i j between r i and s j

$$ \psi_{ij}(\mathbf{q}) = \frac{\mathbf{r}_{i}^{T} Q(\mathbf{q}) \mathbf{s}_{j}} {|\mathbf{r}_{i}^{T} Q(\mathbf{q})^{T}||Q(\mathbf{q}) \mathbf{s}_{j}|} \qquad, $$

where Q(q) is a diagonal matrix constructed from a binary vector q with the length of the possible sensor positions, where q i is 1, if a sensor is placed at node i respectively 0 for no sensor. The largest projection value represents the supposed leak spot. If this value lies on the diagonal the correct leak spot is found by the sensor configuration q. Therefore, an error index 𝜖(q) is defined

$$ \epsilon(\mathbf{q}) = \left\{\begin{array}{cccccccc} 0 \ {\ldots} \ \text{if} \ \psi_{ii}(\textbf{q}) = \max (\psi_{i1}(\textbf{q}), \ldots, \psi_{im}(\mathbf{q})) \\ 1 \ {\ldots} \ \text{otherwise} \end{array}\right. \qquad . $$

The mean over all leak scenarios takes every possible leak into account

$$ \overline{\epsilon}(\mathbf{q}) = \sum\limits_{i=1}^{m} \frac{\epsilon_{i}(\mathbf{q})}{m} \quad \rightarrow \quad \min_{\mathbf{q}} \overline{\epsilon}(\mathbf{q}) \qquad . $$

This value must be minimized to find the optimal sensor configuration q. The lower bound of \(\overline {\epsilon }(\mathbf {q})\) is 0, denoting that every leak in the system is found correctly by the sensor configuration q. If no leak scenario is identified correctly, \(\overline {\epsilon }(\mathbf {q})\) is 1.

To incorporate MOU in Ψ(q), the pressure uncertainties \(\mathbf {\sigma }_{p_{i}}\) (see Section 2.3) are subtracted from r i . This punishes points with high MOU, since high MOU leads to a high distortion of the residual vectors r i resulting in smaller ψ i j (q). Therefore, points with sensors at these positions will be less optimal. The extension of Eq. 6 under consideration of demand uncertainties results in

$$ \psi_{ij}(\mathbf{q}) = \frac{\left( \mathbf{r}_{i}^{T} - \omega \sigma_{p_{i}}\right) Q(\mathbf{q}) \mathbf{s}_{j}} {| \left( \mathbf{r}_{i}^{T} - \omega \sigma_{p_{i}}\right) Q(\mathbf{q})^{T}||Q(\mathbf{q}) \mathbf{s}_{j}|} \qquad , $$

where ω represents a weighting factor controlling the strength of the influence of MOU on the OSP.

Furthermore, in a District Metering Area (DMA) flow and pressure of the inlet and outlet of the system are assumed to be known. For this reason fixed pressure sensor positions are incorporated into the sensor placement algorithm. This is achieved by setting q j = 1 if j is the inlet respectively the outlet of the system.

2.5 Solving the Optimal Sensor Placement Problem with a Genetic Algorithm

GAs are widely used to obtain optimal solutions to countless problems in water related fields (Nicklow et al. 2010). Meta-heuristics like GA are the only alternative for finding optimal solutions in reasonable time for problems with enormous search space sizes.

For example, in the OSP problem calculating \(\overline {\epsilon }(\mathbf {q})\) for every possible combination of sensors can result in an almost countless solution space. If there are M different possible measurement positions for pressure sensors in a network, there are \(\mathcal {C}\) possible sensor combinations for placing N sensors

$$ \mathcal{C} = {M \choose N} \qquad . $$

This results in \(\mathcal {C}\) independent calculations of \(\overline {\epsilon }(\mathbf {q})\), which grows exponentially with N.

The GA used in this paper is a simple GA as described in Bäck et al. (2000). The genome of the individuals in the GA consists of N integers, were N is the number of sensors. Population size is set to 100 and the evolution takes place for 100 generations. Single crossover is used with a probability of p c = 80% followed by uniform integer mutation with a probability of p m = 20%. Tournament selection with a tournament size of k = 3 is chosen.

2.6 Finding and Fitting an Ideal Cost-Benefit Function

Running the GA described in Section 2.5 for OSP defined in Section 2.4 for a fixed ω and different number of sensors N results in different \(\overline {\epsilon }(\mathbf {q})\) values. The values of \(\overline {\epsilon }(\mathbf {q})\) serve as a sensor placement quality parameter, since it is connected to the number of leak scenarios which have not been allocated appropriately. The percentage L % of correctly found leak scenarios is

$$ L_{\%} = 100 \cdot \left( 1 - \overline{\epsilon}(\mathbf{q}) \right) \qquad . $$

As N increases, \(\overline {\epsilon }(\mathbf {q})\) has to decrease since more sensors lead to better leak localization. The behavior of \(\overline {\epsilon }(\mathbf {q})\) as a function of N can be interpreted as a cost-benefit analysis answering the following questions:

  • How many sensors are needed to identify a specific number of leak scenarios correctly?

  • How does the OSP improve leak localization if M more sensors are placed?

For reasons of simplicity, the cost is simplified here as the number of sensors whereas the benefit is the quality of the leak localization in terms of leak scenarios found.

Nevertheless, carrying out this analysis is computationally expensive and leads only to single points in dependency of N. Finding a cost-benefit function which describes the behavior of \(\overline {\epsilon }(\mathbf {q})\) due to N will lead to computational shortcuts, since the GA does not have to find all possible quality parameters as a function of N and might also lead to better insights into the sensor placement problem itself. Two questions arise; First, how to fit a function optimally to the \(\overline {\epsilon }(\mathbf {q})\)? Second, how to decide which function best describes the data?

For answering the first question the Levenberg-Marquardt algorithm is applied.

To answer the second question, GoF statistics are generally used. In this paper four different GoF measurements are applied to find the best cost-benefit function. The first one is χ 2 statistics

$$ \chi^{2} = \sum\limits_{i=1}^{n} \left( y_{i} - f(x_{i}) \right)^{2} \quad , $$

where y i is the datapoint on position x i and f(x i ) is the value calculated from the fit-function and n is the total number of data points. χ 2 does not solve the problem of over-fitting. Therefore, the degrees of freedom ν, respectively the number of function parameters (nν), must be taken into account. This is done by using the reduced χ 2 statistics (\(\chi _{\nu }^{2}\) )

$$ \chi_{\nu}^{2} = \frac{\chi^{2}}{\nu} \quad . $$

Third GoF measure is Akaike’s information criterion (aic), which penalizes the number of parameters of the fitting function more than \(\chi _{\nu }^{2}\)

$$ aic = n \ln \left( \chi^{2} / n \right) + 2 \left( n - \nu \right) \quad . $$

Bayesian information criterion (bic) punishes the number of parameters even more than aic

$$ bic = n \ln\left( \chi^{2} / n\right) + \left( n - \nu \right)\ln \left( n\right) \quad . $$

The smaller the value of all four criteria compared to other fitting functions, the better the chosen model describes the data.

2.7 Case Study

The hydraulic model of the real-world case study WDS consists of one tank at the inflow point, 392 nodes and 452 pipes with a total length of approximately 37 km. The pipe diameters vary from 70 to 400 mm. The pressure at night minimum flow varies between 29.31 and 43.46 m. Flow and pressure are continuously measured at the inflow and outflow point. The hydraulic model was calibrated in terms of pipe roughness several years ago. The nodal base demands were allocated from the billing information of the particular customers. The area is mainly a residential zone with a few small industries. For the purpose of leak localization and hence for the application of the introduced sensor placement approach, the nodal minimum night consumption is of interest. This was derived for every node from the assigned demand patterns which were also available from the hydraulic model. These patterns were derived during the model calibration process and hence are based on real-world measurements. The uncertainties resulting from this very general demand allocations are taken into account with the method described in Section 2.3.

3 Results

First, the effect of demand uncertainties on possible pressure measurement points is investigated according to Section 2.3. MCS with 10000 mcsteps are performed with random demands drawn from a normal distribution using the nodal demand at minimum night flow as mean μ q and a standard deviation σ d of 10 % . The resulting \(\sigma _{p_{i}}\) of the nodal pressures are depicted in Fig. 2a.

The sensitivity matrix S is calculated as described in Section 2.2. Leakage scenarios were generated with an emitter coefficient of c e = 0.5 in EPANET for every node in the system. Only one leak size c e = 0.5 was chosen to build the sensitivity matrices, since Blesa et al. (2015) has found that the sensor positions are not sensitive to leak sizes. The overall sensitivity at every possible measurement position \(\overline {s_{j}}\) in the system is calculated by Eq. 4 and plotted in Fig. 2b.

In the upper region of the case study area exist points that are sensitive to demand uncertainties Fig. 2a. Therefore, this points should be punished by an OSP algorithm. For solving the OSP problem it is needed to compute the residuals R. In this paper R are calculated with the same emitter coefficient c e = 0.5 as S.

\(\sigma _{p_{i}}\), S and R are then used for solving the enhanced OSP problem described in Section 2.4. The OSP problem is solved for a different N ranging from 2 to 10 and different ω (with ω = (0.0, 0.125, 0.25, 0.5, 1.0)).

The range of N was chosen, because for N = 1 the ideal sensor position is trivially at the point with the highest overall sensitivity and N > 10 was viewed as uneconomical for the size of the investigated WDS.

ω was chosen between ω = 0.0, which corresponds to the case with no uncertainties, and ω = 1.0, which takes the uncertainties into account with full strength. Simulations with ω = 1.0 have shown that the sensors cluster too much in regions with low demand uncertainties and are not spread anymore over the whole system (e.g. see Fig. 3h). Therefore, ω was halved so long \((\omega =1.0\xrightarrow {1/2}0.5\xrightarrow {1/2}0.25\xrightarrow {1/2}0.125)\) until sensors appeared again in the upper part of the system with high demand uncertainties.

Fig. 3
figure 3

Sensor positions (green triangles) resulting from the enhanced sensor placement algorithm described in Section 2.4 for different number of sensors N and different weighting factors ω. Already installed sensors at the inflow and outflow point of the DMA are depicted as purple triangles

For each combination of N and ω the GA described in Section 2.5 is applied 10 times solving the enhanced OSP. Due to the stochastic behavior of the GA, the resulting sensor placement quality parameters \(\overline {\epsilon }(\mathbf {q})\) may vary for different optimization runs with same N and ω parameters. Table 1 thus illustrates the statistics of \(\overline {\epsilon }(\mathbf {q})\) found during 10 optimization runs and contains the mean, the minimum and the standard deviation of \(\overline {\epsilon }(\mathbf {q})\) of all runs.

Table 1 Results for enhanced sensor placement algorithm for different numbers of sensors N and different ω parameters

In Fig. 3 the optimal sensor positions for selected values of N and ω are presentedFootnote 1. The first four plots (Fig. 3a to d) show the OSP results for N = 2 to N = 5 with ω = 0.0, whereas the last four plots Fig. 3e and h show results for five sensors under incorporation of uncertainties with varying ω (between 0.125 and 1.0).

Without uncertainties the algorithm tends to place sensors in regions with high demand uncertainties (upper part in Fig. 2). This can be seen in Fig. 3a to d. The higher ω, the more the sensors move to regions with low demand uncertainties (see Fig. 3d to h). Therefore, the extension of the OSP algorithm by incorporating uncertainties shows the desired behavior by punishing measurement points with high uncertainties. Nevertheless, this decreases the spreading of the sensors over the whole system since sensors tend to cluster in areas with low uncertainties.

Additionally, in Fig. 3a to d it can be seen that the optimal set for N − 1 sensor positions \({\mathcal {P}(N-1)}\) is not a subset of the optimal placement \(\mathcal {P}(N)\) of N sensors (\(\mathcal {P}(N-1) \nsubseteq \mathcal {P}(N)\)). The algorithm leads to different sensor positions for different N. The same results are also found by Kapelan et al. (2005) for sensor placement for model calibration.

Finding the mathematical law behind the cost-benefit function describing the OSP is a main objective of this paper since it provides WU with a methodology to answer how many sensors should be placed in a WDS for leak localization. To predict the behavior of \(\overline {\epsilon }(\mathbf {q})\) on N for different ω, the resulting mean values were chosen, fitted with different functions and subsequently tested with GoF statistics to figure out which mathematical law the cost-benefit functions follow.

Different functions were taken into account describing the decreasing behavior one can see in Fig. 4 of the results of the GA.

Fig. 4
figure 4

Cost-benefit functions of OSP for different ω

Table 2 shows the result of the fits. In the first column, the different functions which were fitted to the \(\overline {\epsilon }(\mathbf {q})\) values as a function of N are shown. N p is the number of parameters, ν the number of degrees of freedom. χ 2 is the value resulting from χ 2 statistics calculated by Eq. 12, \({\chi _{r}^{2}}\) is the value from the reduced χ 2-statistic (13), the value for Akaike’s information criterion aic is calculated using Eq. 14 and the value for Bayesian information criterion bic by Eq. 15. All values are the resulting means over the simulations with different ω. For example, χ 2 in Table 2 is calculated through

$$ \chi^{2} = \frac{1}{5} \left( \chi_{\omega=0.0}^{2} + \chi_{\omega=0.125}^{2} + \chi_{\omega=0.25}^{2} + \chi_{\omega=0.5}^{2} + \chi_{\omega=1.0}^{2}\right) \quad . $$

According to Table 2, function f 6(N), an extended power law equation, leads to the best χ 2 and \({\chi _{r}^{2}}\) whereas the simple power law equation f 7(N) leads to the best aic and bic values. Table 3 shows the best-fit values of the parameters of functions f 6(N) and f 7(N) together with the estimated standard error for these values. The best-fit values for equation f 6(N) show high standard errors, especially for parameters a and b, whereas the best-fit values of f 7(N) show much lower error bounds. Therefore f 7(N) is chosen as a cost-benefit function for the values in Fig. 4.

Table 2 Goodness-of-Fit table for different fit-functions f i (N) containing the number of parameters N P for every function, the degrees of freedom ν, the resulting values for χ 2 and reduced \({\chi _{r}^{2}}\) statistics, Akaike’s (aic) and Bayes information criterion (bic)
Table 3 Resulting fit parameters and error limits for fit-function f 6(N) and f 7(N) from Table 2

The power-law behavior of f 7(N) has practical consequences for water utilities. This is shown through an example for ω = 0.0. Placing 3 sensors instead of 2 improves the quality of the sensor placement by

$$ \epsilon_{N=2}^{\omega=0} (\mathbf{q} ) - \epsilon_{N=3}^{\omega=0} (\mathbf{q} ) = 0.327-0.265 \approx 0.06 \quad . $$

Values are taken from Table 1. To get again the same improvement for 3 sensors, the water utility has to place double as many sensors as before going from 3 to 5 sensors

$$ \epsilon_{N=3}^{\omega=0} (\mathbf{q} ) - \epsilon_{N=5}^{\omega=0} (\mathbf{q} ) = 0.265-0.203 \approx 0.06 \quad . $$

For getting the same improvement for 5 sensors, 4 additional sensors, again double as many as before, have to be placed

$$ \epsilon_{N=5}^{\omega=0} (\mathbf{q} ) - \epsilon_{N=9}^{\omega=0} (\mathbf{q} ) = 0.203-0.144 \approx 0.06 \quad . $$

Therefore, for a linear improvement of the quality the number of sensors has to be doubled.

Furthermore, the power-law behavior stays true under incorporation of uncertainties, hence the former example is in general true. The only difference to simulations without uncertainties (ω = 0.0) is that the quality parameter of a sensor placement with a specific N becomes worse if ω is increased. This makes sense, since leaks are harder to find in real-world networks with high uncertainties compared to theoretical perfect known systems. For example, to achieve the same quality parameter as in simulations with no uncertainties (ω = 0.0) for N = 2 sensors, N = 4 sensors are needed for ω = 0.5 and N = 6 are needed if the uncertainty is fully taken into account (ω = 1.0). This can also be seen in Fig. 4.

4 Discussion

Incorporating uncertainties in the OSP problem for leak localization is of great importance. Since points which are sensitive to demand variations can also be points which are sensitive to leaks (upper part of Fig. 2). Hence, leaks can be hidden by demand fluctuations for sensors placed at these points. Consequently, it may be harder to locate leaks with sensors at these points.

Nevertheless, nodes from this region are always chosen by the sensor placement algorithm if the uncertainty effects are not taken into account (see Fig. 3a to d for example). The enhanced algorithm in Section 2.4 supposes a way out of this dilemma. Increasing the ω value leads to placements avoiding these regions of high uncertainty.

Additionally, it is apparent from Fig. 3d to h that the higher ω is the more sensors are placed in regions with low uncertainties. Unfortunately, this leads to a clustering of sensors. However, a good sensor placement should also lead to positions that are spread over the whole WDS, since sensors are more sensitive to changes in their close proximity. Therefore, the OSP was solved not only for ω = 1.0, taking the full uncertainty into account, but also for smaller ω values. Nevertheless, the paper lacks in providing a methodology for choosing this value.

Another important finding is that the optimal set for N−1 sensor positions \({\mathcal {P}(N-1)}\) is not a subset of the optimal placement \(\mathcal {P}(N)\) of N sensors. Therefore, OSP algorithms that solve the problem by placing one sensor and find the next sensor position through incorporation of the previous one (greedy sensor placement algorithms) may not find global optimal solutions of the problem.

It has to be mentioned that for a high number of sensors the results in Table 1 must be treated with caution, since the search space becomes very large. Placing 10 sensors on 392 measurement locations (network nodes of the case study) results in ≈ 2 ⋅ 1019 possible combinations. Since the GA evaluates approximately 8000 times the fitness functions in one optimization run, this is just a small part of the total solution space.

Another interesting finding is that fitting the functions from Table 2 to the simulated sensor placement cost-benefit values lead to two equations, which both have a power law behavior (f 6(N) and f 7(N)). f 6(N) has lower χ 2 and \({\chi _{r}^{2}}\) values while f 7(N) results in better Akaike and Bayes information criteria values. A closer look at the confidence intervals of the fitness coefficients leads to favor equation f 7(N) . This curve is therefore chosen as the cost-benefit curve of the OSP for all ω parameters (see Fig. 4). It is surprising that higher ω parameters do not change the form of the cost-benefit curve, they merely lead to higher values of the cost-benefit function. One surprising finding is that the form of the function is robust against uncertainties.

The limitations of this study are that it can only be applied to WDS where a hydraulic model already exists since it is necessary to compute the sensitivity matrix. Additionally, the OSP algorithm is only as good as the model. For example, an unknown closed valve can have strong influence on the hydraulics and therefore the OSP result might not be optimal in reality.

Furthermore, for bigger systems the computation time increases for both, calculating the MOU as well as solving the OSP problem, making the method not directly applicable for big real world systems with thousands of nodes. Additionally, the OSP algorithm can only take single leaks into account, not multiple leaks.

5 Conclusion and Outlook

This paper set out to determine the effect of MOUs (e.g. the effect of uncertain demands on pressures) on OSP for leak localization in a real-world network. The research has shown that pressure measurement points, which are sensitive to leaks may also be points which are sensitive to uncertain demand and hence less ideal to place sensors on. Incorporating MOU in the OSP algorithm leads to placements avoiding regions of high uncertainty.

The simulations confirmed that the optimal sensor positions for N − 1 sensors is not a subset of the optimal locations of N sensors. As a consequence, greedy sensor placement algorithms may fail in finding optimal measurement locations. Therefore, choosing GA to solve the OSP problem in this paper was the right decision.

Furthermore, this paper showed that the effect of the number of sensors on the sensor placement quality has a power law behavior. This even stays true under incorporation of uncertainties.

Although, the work in this paper focuses on the effect of demand uncertainties on the placement of pressure sensors, it should be mentioned that this method is more general in scope. The method is also applicable for other sensors (e.g. flow or quality sensors) and has capability for different sources of uncertainties as well (e.g. roughness, elevation, bulk reactions, ...).

A limitation of this study is the high computation complexity since the search space of the OSP problem can become very large even for small WDS. Clustering algorithms as in Sarrate et al. (2014) can show a way out in future work to reduce the complexity of the problem.

Additionally, further investigations are needed using different networks to provide proof that the power law behavior of optimal sensor positions is true. Finally, what value has to be chosen for the weighting factor ω in the sensor placement algorithm is still an open question. This will be investigated in future work.