Bayesian optimization of pump operations in water distribution systems
Abstract
Bayesian optimization has become a widely used tool in the optimization and machine learning communities. It is suitable to problems as simulation/optimization and/or with an objective function computationally expensive to evaluate. Bayesian optimization is based on a surrogate probabilistic model of the objective whose mean and variance are sequentially updated using the observations and an “acquisition” function based on the model, which sets the next observation at the most “promising” point. The most used surrogate model is the Gaussian Process which is the basis of wellknown Kriging algorithms. In this paper, the authors consider the pump scheduling optimization problem in a Water Distribution Network with both ON/OFF and variable speed pumps. In a global optimization model, accounting for time patterns of demand and energy price allows significant cost savings. Nonlinearities, and binary decisions in the case of ON/OFF pumps, make pump scheduling optimization computationally challenging, even for small Water Distribution Networks. The wellknown EPANET simulator is used to compute the energy cost associated to a pump schedule and to verify that hydraulic constraints are not violated and demand is met. Two Bayesian Optimization approaches are proposed in this paper, where the surrogate model is based on a Gaussian Process and a Random Forest, respectively. Both approaches are tested with different acquisition functions on a set of test functions, a benchmark Water Distribution Network from the literature and a largescale reallife Water Distribution Network in Milan, Italy.
Keywords
Global optimization Bayesian optimization Pump scheduling optimization Simulation optimization1 Introduction
Optimization of Water Distribution Networks (WDNs) operations has been a very important field for the Operation Research (OR) community at least in the last 40 years [1] and many tools from mathematical programming as well as metaheuristics have been proposed. An updated review on optimal water distribution is given in [2] where several classes of existing solutions, including linear programming, nonlinear programming, dynamic programming, metamodeling, heuristics, and metaheuristics are deeply analyzed and referenced. Different issues are considered, such as solutions for sensors placement and leakage detection and localization [3], and more in detail optimization of pumping operations.
In this paper the authors are concerned with pump scheduling optimization (PSO): which pumps are to be operated and with which settings at different periods of the day, so that the energy cost, the largest operational cost for water utilities, is minimized. Nonlinearities, and binary decisions in the case of ON/OFF pumps, make PSO computationally challenging, even for small WDNs [4]. While mathematical programming approaches linearize/convexify the equations regulating the flow distribution in the WDN, the more recent optimization strategies use a hydraulic simulation software which can solve all the equations and provide computed values relatively to objective function (e.g. energy costs) and hydraulic feasibility (e.g. satisfaction of the demand, pressures within a given range, tanks level within min–max range, etc.).
The decision variables can be formulated in 2 ways: the ON/OFF pump state during fixed time intervals or the start/end run times of pumps [5]. Even if the latter results in a decrease in the number of decision variables, the former is the most widely used and will be considered in this paper.
If there are storage tanks in the network, cost reductions can be made by concentrating most of pumping during the time windows when electricity is least expensive and by using this water capacity during the remaining part of the day to meet consumer demand. According to this aim, an accurate demand forecasting system, as the one proposed in [6, 7, 8], can be used to set the demand, which basically drives every simulation run, in order to optimize pump schedule in advance. The cost is generally associated with the cost of energy for pumping, and the system performance requires satisfying operational constraints as: assuring pumps can satisfy user demand, keeping pressures within certain bounds to reduce leakage and the risk of pipe burst, keeping reservoir levels within bounds to avoid overflow.
As the sizes of the tested example networks increase from simplified hypothetical examples to large real networks, the number of variables and objectives considered grows leading to increase in the number of function evaluations, computational resources (simulation time increases with the number of hydraulic components of the WDN) and memory requirements. Moreover, the headloss equations used to model water flow in pipes lead to nonlinear optimization problems [9].
Linear approaches have been considered extensively in the past, emphasizing their advantage for obtaining fast solutions, however they require linearity for objective function and constraints, and their performance depends on finding a suitable linearization of the problem at hand. Pasha and Lansey in [10], to linearize the problem, rely on the relationship between energy, pump flow, user demand and tank water levels, while in [11] headloss convex equation from water supply systems was iteratively linearized and incorporated in linear programming optimization models. As the number of decision variables and constraints increases considerably with the number of timecontrol intervals, this leads to increased computational and memory requirements. Also nonlinear programming has been used, however since the problem is nonconvex, there is no guarantee that the global optimum will be found [12].
Since the 1990’s metaheuristics, such as Genetic Algorithms and Simulated Annealing among others, have been increasingly applied given their potential to solve nonlinear, nonconvex and even blackbox problems [13] where traditional mathematical programming methods would face difficulties.
In [14] a hybrid approach was applied where genetic algorithm was coupled with two hillclimber search algorithms for improving local search and even though this approach allowed for better performance, in terms of efficiency, than classical genetic algorithms, it still could not be applied in nearreal time conditions. Similarly, [12] applied hydraulic network linearization for two stage simulated annealing approach and while the approach finds global near optimal solution it is still limited to offline optimization problems. Overall, metaheuristic optimization methods offer versatility and a problemindependent global approach that can be applied to complex problems, but over large search spaces, the coupling with the network simulator results in lengthy computation when searching for the optimum solution. Generally, the application of these methods to pump scheduling is rarer since the solutions need to be produced both rapidly and reliably. For these reasons, more efficient deterministic methods have been applied but the simulation runs might still constrain their application to midsize networks. Thus, all the optimization approaches require to reduce the computational load of the simulator (which is usually the open source EPANET 2.0 [15]): one way to do this is “internal” working on the mathematical model of the hydraulic components, the other is “external” and takes the form of fitting a surrogate function or metamodel which approximates the simulation environment and can be used to drive global optimization strategies.
In this paper we propose Bayesian Optimization (BO) for the solution of PSO, as it is suitable to solve simulation–optimization problems and/or to be applied when the objective function is a blackbox as well as it is computationally expensive to evaluate.
The diagonal approach [16] offers global convergence properties and guaranteed accuracy estimation of the global solutions, e.g., in the case of Lipschitz global optimization [17, 18]. A significant application of these methods to hyperparameter optimization is given, e.g. in [19] for the case of SVM regression or in [20] for the case of signal processing. Random Search [21, 22] and the related metaheuristics like simulated annealing, evolutionary/genetic algorithms, multistart&clustering have recently received fresh attention from the machine learning community and is increasingly considered as a baseline for global optimization as in Hyperband [23]. A further approach is related to the adoption of probabilistic branch and bound for level set estimation [24]. Sequential Model Based Optimization (SMBO), and in particular Bayesian Optimization, came to dominate the landscape of hyperparameter optimization in the machine learning community [25, 26, 27]. BO’s key ingredients are a probabilistic model which captures our beliefs—given the evaluations already performed—and an acquisition function which computes the “utility” of each candidate point for the next evaluation of f. The Bayesian framework is particularly useful when evaluations of f are costly, no derivatives are available and/or f is nonconvex/multimodal.
Two alternatives BO approaches are proposed in this paper: a method using Gaussian Processes (GP) to fit the surrogate model and another using Random Forest (RF), which is an ensemble of Decision Trees. Although GP is the most widely adopted approach, RF recently turns out to be more computationally efficient [28], in particular when some decision variables are discrete and/or conditional. A similar framework has been proposed to address Kernelbased Clustering in a leakage localization application [29]. Both GP and RF are presented in Sect. 3.1.
The remainder of the paper is organized as follows. Section 2 presents the pump scheduling optimization (PSO) problem in Water Distribution Networks (WDN), addressing both ON/OFF pumps and Variable Speed Pumps (VSP) and by considering, in the optimization model, the time patterns of demand and energy price. The wellknown EPANET 2.0 simulator is used to check, for every pump schedule, that hydraulic constraints are not violated, demand is met and to compute the associated energy cost. Section 3 is devoted to introduce some background on BO, focusing on surrogate models and acquisition functions considered in the implementation of the proposed BO framework. Section 4 presents the computational results obtained on both test functions and the PSO problem, by comparing different alternatives in the BO framework’s set up. Results on WDN refer to two different case studies: a benchmark WDN widely adopted in literature (namely, Anytown) and a reallife largescale WDN in Milan, Italy. The proposed BO approaches are compared to two commercial products: an “optimizationasaservice” cloudbased platform (SigOpt^{1}) and the combination of ALAMO^{2} with BARON^{3} for the generation of an accurate surrogate model of the analyzed system and the successive optimization process. Finally, Sect. 5 summarizes relevant findings.
2 Pump scheduling optimization
The goal of PSO is to identify the pump schedule, consisting in the status of each pump over time, associated to the lowest energy cost while satisfying hydraulic feasibility (i.e., water demand satisfaction, pressure level within a given operational range, etc.). Status of a pump is its activation—in the case of an ON/OFF pump—or its speed—in the case of a VSP, leading to discrete or continuous decision variables, respectively. Thus, a general formulation of the PSO, involving both the two types of decision variables (i.e. a WDN with both ON/OFF pumps and VSP), is reported as follows:
The time horizon usually adopted in PSO is 1 day with hourly time steps, leading to T = 24 time steps overall. This choice is basically motivated by the energy tariffs which are typically characterized by a different energy price depending on the hour of the day.
Although the PSO objective function (1) has an apparently closed expression, its exact resolution through mathematical programming requires to approximate the flow equations (linearization or convexification) which regulates the hydraulic behaviour of a pressurized WDN. More recent approaches address the optimization of the objective function by considering it as “blackbox”, computing the value associated to a given schedule through a hydraulic simulation software which solves all the flow equations. Naturally, hydraulic simulation is performed at the selected time resolution—usually hourly basis for 24 h—which is adopted for the input data (i.e. demand at consumption points) as well as the outputs obtained at the end of a simulation run (i.e. pressure at nodes, flow/velocity on the pipes).
Depending on the size and complexity of the WDN, time required for a complete simulation can significantly change. This means that in the case of very large and complex WDN evaluating a schedule may also become significantly expensive in terms of time.
Although not expressed in the formulation above, there exists a set of hydraulic constraints which are checked by the hydraulic simulation software. Even in papers reporting the analytical formulation of the constraints, such as in [30], this check is usually verified by running a simulation. A violation message is usually provided by the software in the case a given pump schedule does not satisfy at least one of the hydraulic constraints. Some examples are: a negative pressure in some point of the WDN, impossibility to supply the demand at some consumption point, imbalance of the overall hydraulic system, etc. The hydraulic simulation software used in this study is EPANET 2.0 and a detailed list of errors/warnings provided by EPANET, with respect to hydraulic unfeasibility of a pump schedule, is reported in [15].
Again, \( P_{i}^{t} \) and \( L_{l}^{t} \) are computed through hydraulic software simulation.
PSO is computationally hard to solve through an exhaustive search, due to the large number of possible pump schedules even for a very tiny WDN. A simple WDN, consisting of just 1 ON/OFF pump, and considering an hourly resolution and a 24hours horizon, leads to an optimization problem with 24 discrete decision variables and, consequently, 2^{24} (i.e. more than 16 million) possible pump schedules.
An efficient search method to identify promising pump schedule is therefore crucial to implement a practical decision support for PSO, with the aim to identify an optimal schedule within a limited number of objective function evaluations. This is the motivation of the BO framework for PSO proposed in this paper.
3 Bayesian optimization

The first is a probabilistic model of the objective function (also known as “surrogate function”) whose mean and variance are sequentially updated accordingly to observations of the objective function. Two such models are considered in Sect. 3.1, a Gaussian Process (GP) and a Random Forest (RF).

The second key component is the acquisition function based on the GP model, whose optimization drives the querying process of the model and sets the next observation at the most “promising” point. Several acquisition functions are considered in Sect. 3.2.
BO has become the most promising approach even in the case there is not any information about derivatives as well as \( f\left( x \right) \) is not convex or multimodal. However, when gradients of \( f\left( x \right) \) can be inferred by the surrogate, they can be incorporated in the BO framework to improve the value through a local search [31].
With respect to the PSO problem, \( f\left( x \right) \) is the energy cost C reported in (1) while the dimension d of the search space is given by \( T \cdot \left( {N_{b} + N_{v} } \right) \), where \( T \) is the number of time steps and \( N_{b} \) and \( N_{V} \) are the number of ON/OFF and variable speed pumps, respectively. Every component of \( x \) refers to a specific time step and a specific pump, so \( x_{i} \in \left\{ {0,1} \right\} \) in case of an ON/OFF pump and \( x_{i} \in \left[ {0,1} \right] \) in case of a VSP.
This notation holds in the following sections which are related to the probabilistic surrogate models and acquisition functions used in this study.
3.1 The probabilistic model
Thus, variance of the RF estimator decreases with \( \sigma^{2} \) and \( \rho \) decreasing (i.e. if the number m of selected features decreases) and with the size \( S \) the forest increasing.
With respect to PSO, it is important to highlight that using GP or RF as surrogate model may significantly affect effectiveness and efficiency of the overall BO framework proposed. More specifically, a surrogate model based on RF should be, at least ideally, well suited for WDNs with only ON/OFF pumps or mixed (both ON/OFF and VSP) pumps. Furthermore, a RFbased surrogate offers higher scalability with respect to the number of evaluations of the objective functions, leading to a wallclock time lower than that required by a GPbased surrogate, given the same number of function evaluations. According to these considerations, using RF as probabilistic model of the BO framework should be the most profitable choice.
3.2 Acquisition functions
The acquisition function is the mechanism to implement the tradeoff between exploration and exploitation in the BO. The basic idea is to improve over the best solution observed so far (“best seen”), even if an improvement is not guaranteed from one iteration to the next one, due to the exploration. In particular, any acquisition function aims to guide the search of the optimum towards points with potential high values of objective function either because the prediction of \( f\left( x \right) \), based on the surrogate model, is high or the uncertainty is high (or both). While exploiting means to consider the area providing more chance to improve the current solution (with respect to the current surrogate model), exploring means to move towards less explored regions of the search space.
However, one of the drawback of PI is that it is biased towards exploitation.
To solve the blackbox optimization the optimization of the acquisition function is required but, usually, this is cheaper and easier than to optimize the objective function. Methods used are basically deterministic, such as the derivative free DIRECT [41] or multistart BFGS (Broyden–Fletcher–Goldfarb–Shanno) [42]. Another relevant approach is the so called “focus search”, proposed in the mlrMBO software to handle numeric, categorical, mixed and hierarchical/conditional decision variable spaces, and therefore well suited for a RF based surrogate model.
3.3 Software environment
In order to implement the BO framework proposed for solving the pump scheduling optimization problem, the R package named “mlrMBO” has been adopted: it is a flexible and comprehensive toolbox for modelbased optimization (MBO). This toolbox has been designed for both single and multiobjective optimization with mixed continuous, categorical and conditional decision variables, and it has been implemented in a modular fashion, such that single components can be easily replaced or adapted by the user for specific use cases, e.g., any regression learner from the R package “mlr” for machine learning can be used (more than 60 machine learning regression algorithms), and infill criteria and infill optimizers are easily exchangeable.
The mlrMBO toolbox implements the SMBO framework giving the possibility to customize each step of the overall sequential optimization process. An initial set of evaluations has to be performed in order to create the first instance of the surrogate model, which will be then updated over the iterations of the SMBO process. The user can manually specify the initial set of points to be evaluated or generate them either completely at random, coarse grid designs or by using spacefilling Latin Hypercube Sampling (LHS) [21].
With respect to the surrogate model, two default choices are available: GP for continuous unconditional decision variables and RF for the other cases. In any case, any other regression algorithm (named “learner” in mlrMBO) can be used to fit the surrogate model. Three different acquisition functions—also named infill criteria in mlrMBO—are available: EI, Augmented EI and LCB/UCB. The toolbox is anyway extendible, so user can implement its own acquisition function.
Finally, different termination criteria can be defined depending on the specific optimization problem: maximum number of function evaluations is reached, the optimum is found (in the case that optimum value of the objective function is known) or “budget” is exhausted (e.g., overall wallclock time or costs for cloud computing).
4 Experimental results
This section presents the results obtained both on a set of test functions and two PSO case studies: a benchmark well known in the PSO literature and a reallife WDN in Milan, Italy. All the experiments, with exception of SigOpt (which is a cloudbased platform), were carried out on a machine with: intel i7 (3.4gHz) processor with 4 cores and 16Gb of RAM.
4.1 Experiments on test functions
Parameters configuration for the generation of the GKLS test functions
GKLS test function  Number of local minima  Radius of local minima  Distance from the paraboloid vertex to the global minimum  Global minimum value 

GKLS3D, GKLS3DND, GKLS6D, GKLS6DND  5  0.10  0.20  − 1.50 
GKLS2D, GKLS2DND, GKLS20D, GJLS20DND  20  0.10  0.20  − 1.50 
Experiments on test functions have been performed with the aim to have a preliminary understanding about the benefits provided by BO and its different alternatives, in particular relatively to the adoption of GP or RF for the surrogate model and the selection of EI or CB as acquisition function. As they are test functions, the optimum is known—both x* and f(x*)—thus the difference between the optimum f(x*) and the “best seen” f(x^{+}), after t evaluations of the objective function, is computed. More specifically, a budget of 1500 evaluations was fixed for each experiment (i.e. t = 1500).
The following tables report, separately, the difference between f(x*) and f(x^{+}) and the function evaluation (also indicated with “iteration)” at which best seen occurred for the first time. Latin Hypercube Sampling (LHS) is used to initialize the design, with only 5 initial evaluations, 7 in the cases of 6D test functions and 21 in the case of 20D test functions, since LHS requires a number of evaluations greater than the dimensions.
Difference between known optimum and optimal solution (i.e. best seen)
GPLCB  GPEI  RFLCB  RFEI  SigOpt  ALAMO + BARON  

Branin2D  0.0000  0.0000  0.0000  0.0006  0.0001  0.0231 
Hartmann3D  0.0002  0.0002  0.7731  0.0004  0.0002  0.0528 
Hartmann6D  0.2800  0.2803  0.3426  0.2851  0.0026  3.3224 
Schwefel20D  5418.6950  4766.8740  3788.6270  5046.0320  5330.7000  8372.7920 
GKLS2D  –  0.0000  1.3162  0.0000  0.0001  1.0740 
GKLS2DND  –  –  0.0000  0.0000  0.0003  0.6180 
GKLS3D  –  0.4906  0.0006  0.4909  0.4906  1.5140 
GKLS3DND  –  0.4905  0.4905  0.4907  0.4906  1.4460 
GKLS6D  1.5000  1.5001  0.0263  1.5183  1.5001  1.9990 
GKLS6DND  0.0215  1.1774  0.0270  1.5117  0.6118  1.9990 
GKLS20D  1.8163  1.7083  2.7499  2.9320  1.8491  6.5759 
GKLS20DND  1.7852  1.8415  2.4220  3.4086  1.8042  6.5759 
Function evaluation corresponding tot he first occurrence of the best seen
GPLCB  GPEI  RFLCB  RFEI  SigOpt  ALAMO + BARON  

Branin2D  44  153  624  1093  704  868 
Hartmann3D  28  42  860  181  62  500 
Hartmann6D  1218  917  1211  657  447  500 
Schwefel20D  1316  860  1024  1060  653  500 
GKLS2D  –  142  1445  1145  174  516 
GKLS2DND  –  –  1065  1431  305  564 
GKLS3D  –  417  1038  1201  185  500 
GKLS3DND  –  503  1101  1287  234  500 
GKLS6D  777  1411  968  1295  974  500 
GKLS6DND  1480  57  1352  455  724  500 
GKLS20D  1498  1349  1434  1015  1404  500 
GKLS20DND  815  1013  823  1407  1475  500 

All the approaches are effective on lowdimensions, in particular on the test function Branin2D;

BO with RF as surrogate model and LCB as acquisition function is the only approach providing the best solution on 50% of all the test functions considered;

BO with GP as surrogate model, in some cases, is not able to perform all the function evaluations, resulting in a premature interruption of the SMBO process. This issue arises more frequently when the acquisition function is LCB;

ALAMO with BARON does not use all the budget (1500 function evaluations) because error between surrogate model and preliminary function evaluations is very low, so the surrogate is considered accurate. However, when optimization is performed on the generated surrogate model, the value of the optimal solution is quite different from the known optimum of the related test function.
4.2 Experiments on the Anytown WDN
A recent work has addressed the PSO on the Anytown network through Harmony Search [44], a metaheuristics search method. It is important to underline that we do not compare directly with Harmony Search: it has been recently proposed to solve the PSO problem on Anytown as the plethora of metaheuristics—extensively Genetic Algorithms—proposed in the last decade. Our aim is to propose a more mathematically sound approach to solve the PSO problem by exploiting blackbox optimization and making it an effective and efficient tool for practical/operational problems, such as PSO in WDNs. However, the more recent, detailed and completely replicable PSO setting on Anytown is—at our knowledge—proper reported in [44].
We have used a penalty on the objective function in the case warnings occur during the corresponding simulation run (i.e. some hydraulic constraint is not satisfied). However, contrary to [44], who used as penalty a default cost equals to 9999, we have decided to adopt, as penalty value, the cost obtained by using the “naïve” schedule consisting in having all the 4 pumps switched ON for all the hours of the day. We have adopted the same setting reported in [44]: minimum and maximum tank heads are set to 67.67 and 76.20 m, respectively. Hourly demand factors ranged from 0.4 to 1.2 and base demand is doubled from the Anytown test case to allow longer pump operation and tank storage, defining a total inflow between 161.51 and 484.53 lps. Finally, an electricity tariff with a price of 0.0244 $/kW h between 0:00–7:00 and 0.1194 $/kW h between 8:00 and 24:00 was considered for simulations.
The objective function f(x) is given in (1) as the energy cost associated to the pump schedule, and x_{ t } represents the state of the pumps at a specific time step (i.e. hour). If an ON/OFF pump is considered, the domain of the decision variable is {0, 1}, while it is a continuous variable in [0, 1] in the case of VSP. As previously mentioned, f(x) is not available in closed form but just as a blackbox whose evaluation requires the run an EPANET simulation. Furthermore, we assume that each evaluation produces an unbiased (not noisy) pointwise estimation of f(x) because we consider that both WDN structure and demand do not change from one simulation to another.
Usually PSO on this WDN has been targeted considering ON/OFF pumps in the system, leading to 4 × 24 = 96 variables. Thus, the total number of possible schedules is 2^{4×24} = 2^{96}, with an hourly step and a 24 h horizon; consequently, the total search space is 7.92 × 10^{28}.
The number of overall evaluations has been fixed at 800, where 400 are used for the initial design (i.e. generation of the first surrogate model) and the other 400 are related to the iterations of the SMBO process. For the initial 400 evaluations, the LHS approach has been used.
Results on the case study related to 4 ON/OFF pumps
Acquisition Function  Learner  Energy cost ($)  Best seen iteration  Overall clock time 

LCB  GP  1281.01  360  22,785.44 
RF  1054.98  183  2698.45  
EI  GP  1219.11  292  25,096.23 
RF  1117.90  149  3505.28 
Results on the case study related to 4 VSPs
Acquisition function  Learner  Energy cost ($)  Best seen iteration  Overall clock time 

LCB  GP  653.55  356  21,256.48 
RF  687.43  43  3412.79  
EI  GP  609.17  290  23,594.36 
RF  719.36  286  4342.30  
ALAMO + BARON  1087.54  NA  4005.54 
The iteration associated to the occurrence of the best seen and the overall wall clock time are also reported. It is easy to note that, in the case a GPbased surrogate is used, the overall clock time is one order of magnitude higher. This basically proves the most relevant drawback of GPs: they scale poorly with the number of decision variables, since computational complexity is O(n^{3}) with n the number of function evaluations.
Results on the case study related to 3 ON/OFF pumps
Acquisition Function  Learner  Energy cost ($)  Best seen iteration  Overall clock time 

LCB  GP  900.54  142  16,091.20 
RF  870.84  3  1887.06  
EI  GP  842.25  390  15,879.03 
RF  859.23  306  2828.00 
Results on the case study related to 3 VSPs
Acquisition Function  Learner  Energy cost  Best seen iteration  Overall clock time 

LCB  GP  659.27  301  17,588.74 
RF  520.42  240  3118.83  
EI  GP  549.77  47  16,028.29 
RF  667.48  385  3828.93  
SigOpt  540.27  705  4223.78  
ALAMO + BARON  875.44  NA  3997.41 
4.3 Experiments on a reallife largescale WDN
After the experiments on the benchmark WDN, we have addressed a reallife largescale WDN in Milan, Italy. The water distribution service in the urban area of Milan is managed by Metropolitana Milanese (MM) through a cyberphysical system consisting of: a transmission network (550 wells and pipelines bringing water to 29 storage and treatment facilities located inside the city) and a distribution network (26 different pump stations spread over the city—95 pumps overall—which take water from the storage and treatment facilities and supply it to the final customers, about 1.350.000 habitants). Financial burden for MM is approximately 16.000.000 € for energy costs, with 45% of them due to pumping operations in the distribution network [45]. During the project ICeWater—cofounded by the European Union—a Pressure Management Zone (PMZ), namely “Abbiategrasso”, was identified in the South of the city and isolated from the rest of the network.
As previously reported in [45], preliminary results showed that two pumps are sufficient to match supply and demand within the PMZ. This consideration allows for limiting the number of decision variables (only two pumps are, at most, active simultaneously) and quantify which is the economic return of MM in using VSPs instead of ON/OFF pumps.

(lowprice) 0.0626 €/kW h between 00:00–7:00 and between 23:00–24:00

(midprice) 0.0786 €/kW h between 08:00 and 19:00

(highprice) 0.0806 €/kW h between 07:00–08:00 and between 19:00–23:00
All the available data for the study, including the reported energy tariff, are related to the year 2014.
Optimal energy costs identified for the reallife largescale WDN, the Abbiategrasso PMZ in Milan, with two ON/OFF pumps
Acquisition function  Learner  Energy cost  Best seen iteration  Overall clock time 

LCB  GP  427.37*  596  3986.43 
RF  271.47  524  4018.76  
EI  GP  427.37*  168  3990.26 
RF  271.47  473  4049.37 
Optimal energy costs identified for the reallife largescale WDN, the Abbiategrasso PMZ in Milan, with two VSPs
Acquisition Function  Learner  Energy cost  Best seen iteration  Overall clock time 

LCB  GP  100.44  1  82,563.94 
RF  192.46  703  6945.39  
EI  GP  195.60  76  50,228.34 
RF  160.08  557  7390.14  
SigOpt  200.30  590  7998.23  
ALAMO + BARON  427.37*  500  3045.56 
With respect to the ON/OFF pumps case, only BO with RF as surrogate model was able to identify a feasible solution. Moreover, the value of the best seen is independent on the acquisition function.

Energy costs are naturally lower than those obtained on the ON/OFF pumps case, thanks to the nature of the pumps—and, therefore, nature of the decision variables;

ALAMO in combination with BARON is not able to identify a feasible solution;

Although BO with GP as surrogate model and LCB as acquisition function provided the lowest energy cost, using GP is too computational expensive (overall wall clock time is about 10 times that required by RF). This makes BO with GP unsuitable from an operational point of view, since too much time is required to generate a suggestion and turn decision into action timely.
Finally, the reallife largescale case study allowed to highlight a further and important difficulty in solving the PSO problem. Solutions in the search space are not all feasible; however, the feasible region is not known a priori. As for the evaluation of the objective function, also the feasibility check is performed through EPANET: more specifically, EPANET returns a numeric value (i.e., energy cost) when the considered point in the search space (i.e., pump schedule) is feasible, otherwise it returns a specific error code.
As previously mentioned, we have assigned a penalty to unfeasible points, with the aim to move towards feasible points/regions. This approach is quite naïve because the penalty does not depend on the entity of violation, but it is the approach usually adopted in the PSO literature. This leads to almost “flat” objective functions, especially when the unknown feasible region results extremely narrow with respect to the overall search space. The two WDN use cases considered, but more specifically the reallife WDN, seem to be characterized by this kind of situation: in the following figures it is possible to see the extremely narrow feasible region for the reallife WDN, with x and y speed of the first and the second VSP, respectively. To have a 2D representation, the first hour of the day is considered, only.
Since BO gives, along the sequential optimization process, some chance to explore unseen regions of the search space, there is some opportunity to get to the feasible region. On the contrary, ALAMO in combination with BARON just tries to generate a surrogate model as accurate as possible. Thus, it is deceived by the false flatness of the objective function induced by the penalty and the narrow extent of the feasible region. A possibility to exploit the potential of ALAMO in combination with BARON, in a future work, could consists in using it instead of GP or RF within the SMBO process. Using ALAMO in combination with BARON, within the BO framework, with the aim to create/update the surrogate model requires to understand how to model uncertainty of the surrogate, similarly to the GP and RF approaches.
5 Conclusions
Energy saving in WDN is one of the most challenging issues of smart urban water management and PSO is critical to reduce energy costs while guaranteeing a satisfactory water supply service. Simulation–optimization is usually adopted to solve the PSO, where the optimization is mostly performed via metaheuristics, such as Genetic Algorithms or Harmony Search, and simulation is widely performed via EPANET.
According to the emerging interest on Sequential Model Based Optimization (SMBO), in particular Bayesian Optimization (BO), to solve blackbox and expensive optimization processes, we have proposed the SMBO as optimization engine of the simulation–optimization PSO problem and compared different alternative approaches. A set of test functions has been initially used to make preliminary considerations about the different alternative schemes proposed, then two PSO case studies have been addressed: a widely studied benchmark WDN, namely Anytown, and a reallife largescale WDN in Milan, Italy. Considering different alternative was motivated by the type of decision variables, which are discrete in the ON/OFF pumps case and continuous in the VSPs case.
Although an “always winning” approach cannot be identified, relevant conclusions can be summarized. BO with RF as surrogate model and LCB acquisition function was the only approach providing the best solution in 50% of the test functions considered.
With respect to PSO on the two case studies, SMBO proved to be effective, even using a limited number of function evaluations (400 for initial design and 400 for the BO process) with respect to the overall search space. However, GPbased and RFbased surrogate models alternate in providing the best result. Although this could appear inconclusive, the most relevant conclusion is related to the time needed to obtain a solution: GPbased BO requires from 5 to 10 times the wall clock time needed to RFbased BO, independently on the acquisition function considered. Thus, timetodecision could be the key for the final choice on the SMBO configuration to use. From an operational point of view, wall clock time is acceptable for the benchmark WDN only: a suitable pump schedule can be identified in few hours (from 1 to 5, according to the setup of the SMBO framework) which is a reasonable time, in particular when some reliable urban water demand forecasting solution is available. Wall clock time for RFbased SMBO resulted 5 times lower than GPbased one’s (1 vs. 5 h), meaning that, in the case the time required by the GPbased SMBO is anyway acceptable, the number of evaluations for the RFbased SMBO could be increased to raise the chance to find a better pump schedule while keeping wallclock time acceptable.
A different conclusion arises with respect to the results on the reallife WDN: while RFbased BO requires about 2 h for performing 800 iterations of the SMBO process, in the VSPs case, GPbased BO may require about 22 h, making it unsuitable for optimizing pump schedule in a reallife setting. Therefore, RFbased BO is the most effective and efficient choice.
Finally, the reallife WDN allowed to highlight that PSO is characterized by a feasible region but it is unknown a priori. Like the value of the objective function, feasibility of a solution (i.e., a pump schedule) can only be verified by querying the blackbox (i.e., EPANET). A naïve strategy based on assigning a constant penalty value to unfeasible points could be inefficient in BO and completely ineffective in the generation and consecutive optimization of an accurate surrogate model (i.e., ALAMO in combination with BARON). Future works will address how to include an estimation of the feasibility region, as well as a more sophisticated and effective quantification of penalties to unfeasible points.
Footnotes
Notes
Acknowledgements
We would like to acknowledge SigOpt for use of their optimization service and complimentary academic license, under the SigOpt for Education program, as part of this research.
References
 1.Fantozzi, M., Popescu, I., Farnham, T., Archetti, F., Mogre, P., Tsouchnika, E.: ICT for efficient water resources management: the ICeWater energy management and control approach. Procedia Eng. 70, 633–640 (2014)CrossRefGoogle Scholar
 2.MalaJetmarova, H., Sultanova, N., Savic, D.: Lost in optimisation of water distribution systems? A literature review of system operation. Environ. Model. Softw. 93, 209–254 (2017)CrossRefGoogle Scholar
 3.Candelieri, A., Soldi, D., Archetti, F.: Costeffective sensors placement and leak localization—The Neptun pilot of the ICeWater project. J. Water Supply Res. Technol. AQUA 64(5), 567–582 (2015)CrossRefGoogle Scholar
 4.NaoumSawaya, J., Ghaddar, B., Arandia, E., Eck, B.: Simulationoptimization approaches for water pump scheduling and pipe replacement problems. Eur. J. Oper. Res. 246(1), 293–306 (2015)CrossRefzbMATHGoogle Scholar
 5.Bagirov, A.M., Barton, A.F., MalaJetmarova, H., Al Nuaimat, A., Ahmed, S.T., Sultanova, N., Yearwood, J.: An algorithm for minimization of pumping costs in water distribution systems using a novel approach to pump scheduling. Math. Comput. Model. 57(3–4), 873–886 (2013)CrossRefGoogle Scholar
 6.Candelieri, A.: Clustering and support vector regression for water demand forecasting and anomaly detection. Water 9(3), 224 (2017)CrossRefGoogle Scholar
 7.Candelieri, A., Soldi, D., Archetti, F.: Shortterm forecasting of hourly water consumption by using automatic metering readers data. Procedia Eng. 119(1), 844–853 (2015)CrossRefGoogle Scholar
 8.Candelieri, A., Archetti, F.: Identifying typical urban water demand patterns for a reliable shortterm forecasting—The icewater project approach. Procedia Eng. 89, 1004–1012 (2014)CrossRefGoogle Scholar
 9.D’Ambrosio, C., Lodi, A., Wiese, S., Bragalli, C.: Mathematical programming techniques in water network optimization. Eur. J. Oper. Res. 243(3), 774–788 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 10.Pasha, M.F.K., Lansey, K.: Optimal pump scheduling by linear programming. In: World Environmental and Water Resources Congress, pp. 395–404 (2009)Google Scholar
 11.Price, E., Ostfeld, A.: Iterative linearization scheme for convex nonlinear equations: application to optimal operation of water distribution systems. J. Water Resour. Plan. Manag. 139(3), 299–312 (2013)CrossRefGoogle Scholar
 12.McCormick, G., Powell, R.S.: Derivation of nearoptimal pump schedules for water distribution by simulated annealing. J. Oper. Res. Soc. 55, 728–736 (2004)CrossRefzbMATHGoogle Scholar
 13.Wu, W., Dandy, G., Maier, H.: Optimal control of total chlorine and free ammonia levels in a water transmission pipeline using artificial neural networks and genetic algorithms. J. Water Resour. Plan. Manag. 141(7), 04014085 (2014)CrossRefGoogle Scholar
 14.Van Zyl, J.E., Savic, D.A., Walters, G.A.: Operational optimization of water distribution systems using a hybrid genetic algorithm. J. Water Resour. Plan. Manag. 130(2), 160–170 (2004)CrossRefGoogle Scholar
 15.Rossman, L.A.: EPANET2 User’s Manual. U.S. Environmental Protection Agency, Washington, DC (2000)Google Scholar
 16.Sergeyev, Y.D., Kvasov, D.E.: Deterministic Global Optimization: An Introduction to the Diagonal Approach. Springer, Berlin (2017)CrossRefzbMATHGoogle Scholar
 17.Sergeyev, Y.D., Kvasov, D.E.: Global search based on efficient diagonal partitions and a set of Lipschitz constants. SIAM J. Optim. 16(3), 910–937 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
 18.Sergeyev, Y.D., Kvasov, D.E.: A deterministic global optimization using smooth diagonal auxiliary functions. Commun. Nonlinear Sci. Numer. Simul. 21(1–3), 99–111 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
 19.Barkalov, K., Polovinkin, A., Meyerov, I., Sidorov, S., Zolotykh, N.: SVM Regression parameters optimization using parallel global search algorithm. In: International Conference on Parallel Computing Technologies, pp. 154–166 (2013)Google Scholar
 20.Gillard, J.W., Kvasov, D.E.: Lipschitz optimization methods for fitting a sum of damped sinusoids to a series of observations. Stat. Interface 10, 59–70 (2017)MathSciNetCrossRefGoogle Scholar
 21.Zabinsky, Z.B.: Stochastic adaptive search for global optimization, 1st edn. Springer, US (2003)CrossRefzbMATHGoogle Scholar
 22.Steponavičė, I., ShiraziManesh, M., Hyndman, R.J., SmithMiles, K., Villanova, L.: On sampling methods for costly multiobjective blackbox optimization. Springer Optim. Its Appl. 107, 273–296 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
 23.Li, L., Jamieson, K., DeSalvo, G., Rostamizadeh, A., Talwalkar, A.: Hyperband: a novel banditbased approach to hyperparameter optimization, pp. 1–48 (2016). arXiv:1603.06560
 24.Huang, H., Zabinsky, Z.B.: Adaptive probabilistic branch and bound with confidence intervals for level set approximation. In: Winter Simulation Conference (2013)Google Scholar
 25.Shahriari, B., Swersky, K., Wang, Z., Adams, R.P., De Freitas, N.: Taking the human out of the loop: a review of Bayesian optimization. Proc. IEEE 104(1), 148–175 (2016)CrossRefGoogle Scholar
 26.Snoek, J., Larochelle, H., Adams, R.P.: Practical bayesian optimization of machine learning algorithms, pp. 1–12 (2012)Google Scholar
 27.Archetti, F., Betrò, B.: A probabilistic algorithm for global optimization. Calcolo 16(3), 335–343 (1979)MathSciNetCrossRefzbMATHGoogle Scholar
 28.Hutter, F., Hoos, H.H., LeytonBrown, K.: Sequential modelbased optimization for general algorithm configuration. In: International Conference on Learning and Intelligent Optimization (LION 5), pp. 507–523 (2011)Google Scholar
 29.Candelieri, A., Giordani, I., Archetti, F.: Automatic configuration of Kernelbased clustering: an optimization approach. In: International Conference on Learning and Intelligent Optimization (LION). Springer, Cham, pp. 34–49 (2017)Google Scholar
 30.Castro Gama, M., Pan, Q., Salman, M.A., Jonoski, A.: Multivariate optimization to decrease total energy consuption in the water supply system of Abbiategrasso (Milan, Italy). Environ. Eng. Manag. J. 14(9), 2019–2029 (2015)Google Scholar
 31.Wu, J., Poloczek, M., Wilson, A.G., Frazier, P.I.: Bayesian optimization with gradients, 1–16 (2017)Google Scholar
 32.Mockus, J.: Application of Bayesian approach to numerical methods of global and stochastic optimization. J. Glob. Optim. 4, 347–365 (1994)MathSciNetCrossRefzbMATHGoogle Scholar
 33.Wang, Z., Zoghi, M., Hutter, F., Matheson, D., De Freitas, N.: Bayesian optimization in high dimensions via random embeddings. In: Proceedings of International Jt. Conference on Artificial Intelligence, pp. 1778–1784 (2013)Google Scholar
 34.Kandasamy, K., Schneider, J., Pòczos, B.: High dimensional bayesian optimisation and bandits via additive models. Int. Conf. Mach. Learn. 37, 295–304 (2015)Google Scholar
 35.Ho, T.K.: Random decision forests. In: Conference in Document Analysis and Recognition, pp. 278–282 (1995)Google Scholar
 36.Breiman and Liaw: U.S. trademark registration number 3185828 (2006)Google Scholar
 37.Kushner, H.J.: A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise. J. Basic Eng. 86, 97–106 (1964)CrossRefGoogle Scholar
 38.Mockus, J., Tiesis, V., Zilinskas, A.: The application of Bayesian methods for seeking the extremum. In: Dixon, L., Szego, G. (eds.) Towards Global Optimisation 2, pp. 117–130. Elsevier, Amsterdam (1978)Google Scholar
 39.Auer, P.: Using confidence bounds for exploitationexploration tradeoffs. J. Mach. Learn. Res. 3(3), 397–422 (2002)MathSciNetzbMATHGoogle Scholar
 40.Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.: Gaussian process optimization in the bandit setting: no regret and experimental design, pp. 1–8 (2009)Google Scholar
 41.Jones, D.R., Perttunen, C.D., Stuckman, B.E.: Lipschitzian optimization without the Lipschitz constant. J. Optim. Theory Appl. 79(1), 157–181 (1993)MathSciNetCrossRefzbMATHGoogle Scholar
 42.Fletcher, R.: Practical Methods of Optimization. John Wiley & Sons, Hoboken (2013)zbMATHGoogle Scholar
 43.Gaviano, M., Kvasov, D.E., Lera, D., Sergeyev, Y.D.: Software for generation of classes of test functions with known local and global minima for global optimization. ACM Trans. Math. Softw. 29(4), 469–480 (2003)MathSciNetCrossRefzbMATHGoogle Scholar
 44.De Paola, F., Fontana, N., Giugni, M., Marini, G., Pugliese, F.: An application of the harmonysearch multiobjective (HSMO) optimization algorithm for the solution of pump scheduling problem. Procedia Eng. 162, 494–502 (2016)CrossRefGoogle Scholar
 45.CastroGama, M., Pan, Q., Lanfranchi, E.A., Jomoski, A., Solomatine, D.P.: Pump scheduling for a large water distribution network. Milan, Italy. Procedia Eng. 186, 436–443 (2017)CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.