1 Introduction

The planning for a water distribution system may include topology design and sizing of components to evaluate the hydraulic properties of the system. Since failures may occur due to pipe material deterioration with time or sudden increase in pressure, for example, the system’s reliability is worth considering also. With regard to the topology, branched systems are suitable for small and low-density rural areas, while fully or partially looped systems are proper for urban areas (Swamee and Sharma 2008). Branched systems have the disadvantage that a break in any pipe puts all consumers downstream out of service. In fully looped systems, each demand node can be supplied from the source(s) through at least two independent paths. Two supply paths are said to be independent if they do not have a pipe in common. In the literature, the joint effects of topology and pipe size optimization were dealt with typically as two separate stages in which topology design followed by pipe sizing was carried out (Rowel and Barnes 1982; Morgan and Goulter 1982, Kessler et al. 1990, Cembrowicz 1992). However, such methods neglect the strong coupling between topology and components design to varying degrees.

Also, the relationship between topology, pipe sizes and hydraulic reliability is strong. However, previous studies that included reliability generally did not optimize the topology. Various reliability measures that are easy to calculate have been suggested including statistical entropy (Tanyimboh and Templeman 1993), resilience index (Todini 2000), network resilience (Prasad and Park 2004), modified resilience index (Jayaram and Srinivasan 2008) and surplus power factor (Vaabel et al. 2006). Among these measures, statistical entropy has been shown to be the most consistent (Reca et al. 2008; Raad et al. 2010; Baños et al. 2011; Tanyimboh et al. 2011; Saleh et al. 2012). For water distribution systems, the statistical entropy may be considered a measure of the uniformity of the pipe flow rates (Tanyimboh and Templeman 1993).

Awumah et al. (1989) developed a two-stage model for optimizing the pipe sizes and topology. In the first stage, a topology model determines whether a link is to be included using integer programming. In the second stage, pipe diameters are adjusted. Awumah and Goulter (1992) also proposed an alternative approach using statistical entropy theory. Tanyimboh and Sheahan (2002) also used statistical entropy in an approach in which the topology, pipe sizing, reliability and redundancy were considered in successive stages.

Evolutionary optimization algorithms have been used also (Davidson and Goulter 1995; Walters and Smith 1995; Geem et al. 2000; Afshar and Jabbari 2007). Evolutionary algorithms often generate infeasible solutions when solving problems that involve constraints. Case-specific constraint-violation penalties (Kougias and Theodossiou 2013) that require calibration are frequently introduced to address this issue. Saleh and Tanyimboh (2013) introduced an approach that optimizes both the topology and pipe sizes. The algorithm provides a single optimal solution and reliability aspects beyond the topology were not addressed.

This paper describes a new multi-objective evolutionary approach for the simultaneous topology, pipe size and entropy-based optimization of water distribution systems. Unlike previous entropy-based approaches such as Tanyimboh and Sheahan (2002), the pipe flow directions and candidate topologies are not specified in advance. Also, the algorithm promotes full exploitation of all feasible and infeasible solutions generated to guide the search. Our algorithm includes a robust measure for the infeasibility of any solution and a seamless generic procedure for redundant binary codes. Results for two test problems in the literature are included.

2 Optimization Approach

The difficulties associated with constraint-violation penalties that are commonly used in evolutionary algorithms include time-consuming trial runs and parameter calibration (Dridi et al. 2008). On the other hand, penalty-free methods eliminate the need to design penalty functions and are relatively straightforward to implement without sacrificing the computational efficiency (Siew and Tanyimboh 2012). Also, penalty-free methods can maintain infeasible solutions that may have useful properties that may not be common in feasible solutions in successive generations of the optimization. Other constraint handling methods have been proposed (Deb et al. 2002). For example, Ray et al. (2001) suggested three stages of nondomination ranking using different combinations of the objective and constraint functions. Constraint handling in Deb et al. (2002) involves a binary tournament in which feasible solutions automatically dominate infeasible solutions. We developed a penalty-free strategy that exploits all efficient solutions generated, without introducing additional measures aimed at reducing the propagation of infeasible solutions.

2.1 Details of the Optimization Model

We used the EPANET 2 hydraulic simulation model (Rossman 2000) to determine the hydraulic properties of all solutions generated in the optimization process and to ensure the solutions satisfy conservation of mass and energy. The optimization model minimizes the initial construction cost, f 1 , the infeasibility measure, f 2 , and the number of pipes, f 3 , as explained below.

$$ {f}_1={\displaystyle \sum_{ij}f\left({L}_{ij},{D}_{ij}\right)} $$
(1)
$$ \begin{array}{ccc}\hfill {f}_2=l+h+\left({S}^{*}-S\right)+\left({S}_g^{*}-{S}^{*}\right):\hfill & \hfill l={\displaystyle \sum_{i=1}^N \max \left(0,{R}_i^{req}-{R}_i\right),}\hfill & \hfill h={\displaystyle \sum_i \max \left(0,{H}_i^{req}-{H}_i\right)}\hfill \end{array} $$
(2)
$$ {f}_3={\displaystyle \sum_{ij}{p}_{ij}} $$
(3)

in which N = number of nodes; for pipe ij, L ij  = length; D ij  = diameter; p ij  = 1 if pipe ij is included in the topology and p ij  = 0 otherwise; H i and H i req = available and required residual head at demand node i, respectively; R i and R i req = actual and required number of independent supply paths to node i, respectively; S = entropy; S * = maximum entropy; and S * g = global maximum entropy.

The function l in Eq. 2 represents the total topological infeasibility of a candidate solution. The topological infeasibility at node i was taken as the shortfall in the number of independent supply paths R i . The required number of independent supply paths, R i req, is typically 1 and 2, respectively, for branched and fully looped configurations. The function h in Eq. 2 represents the residual head infeasibility. If H i  ≥ H i req for all demand nodes, then the solution is hydraulically feasible. The required residual head H i req is the head at a node above which demands are satisfied in full. H i req is typically not less than a minimum of about 7 m (OFWAT 2008).

For any feasible topology that has loops, there are multiple feasible sets of flow directions each of which has a maximum entropy value. S * is the theoretical maximum value of entropy for a particular feasible set of flow directions while S * g is the global maximum entropy value considering all permissible topologies. The global maximum entropy value S * g is not known a priori; our algorithm evolves the global maximum entropy solution by assuming it corresponds to the largest entropy value it has so far identified. The infeasibility measure f 2 seeks feasible solutions that have high values of entropy (a proxy for hydraulic reliability and redundancy). Minimizing the infeasibility measure f 2 promotes the inclusion of a range of maximum entropy solutions for which, by definition, S = S *, in the nondominated set in addition to S * g .

To complete the characterization of the infeasibility function f 2 , the entropy functions are described here briefly (Tanyimboh and Templeman 1993).

$$ S={S}_0+{\displaystyle {\sum}_{i=1}^N{P}_i{S}_i}; $$
(4)

S = entropy; S 0  = entropy of source supplies; S i  = entropy of node i; P i  = T i /T = fraction of the total flow through the network that reaches node i; T i  = total flow that reaches node i; T = total demand;

$$ {S}_0=-{\displaystyle \sum_{i\in I}\frac{Q_{0i}}{T} \ln \left(\frac{Q_{0i}}{T}\right)}; $$
(5)

Q 0i  = inflow rate at source node i; I = the set of source supply nodes;

$$ \begin{array}{cc}\hfill {S}_i=-\frac{Q_{i0}}{T_i} \ln \left(\frac{Q_{i0}}{T_i}\right)-{\displaystyle \sum_{ij\in out\left({N}_i\right)}\frac{Q_{ij}}{T_i}} \ln \left(\frac{Q_{ij}}{T_i}\right),\hfill & \hfill\ i=1,.....,N;\hfill \end{array} $$
(6)

Q i0  = demand at node i; Q ij  = flow rate in pipe ij; and out(N i ) = set of all pipe flows from node i.

For a typical node with, say, two incident pipes downstream, it can be shown that S i  ≤ ln(3) ≈ 1.1 (Shannon 1948). Given that P i  = T i /T ≤ 1.0, it is expected that the value of the network entropy S in Eq. 4 will be relatively small for the typical water distribution system. Therefore, it is expected that the contributions of the entropy terms (S * − S) and (S * g  − S *) to the infeasibility measure f 2 in Eq. 2 will be relatively small. The objective function f 2 may be considered an entropy-augmented infeasibility measure. Minimizing f 2 aims simultaneously to satisfy residual head and topology requirements and maximize entropy. Eqs. 46 are an extension of the statistical entropy function that is a measure of uncertainty (Shannon 1948). In a probabilistic system the uncertainty is a maximum if all possible system states or outcomes are equally likely. Conversely, the uncertainty decreases as the probabilities associated with the states or outcomes become more unequal. The term [(S * − S) + (S * g  − S *)] = (S * g  − S) in the infeasibility measure f 2 may be considered an estimate of the unrealized entropy potential; by definition its value is zero for S = S * = S * g .

2.2 Practical Topology Confirmation and Redundant Binary Codes

We developed a topology confirmation algorithm coded in C, to enable a consistent and bias-free fitness assessment of all feasible and infeasible solutions. The total number of paths NP i supplying demand node i from all sources collectively was determined with regard to the pipe flow directions obtained from EPANET 2. We used an efficient path enumeration algorithm proposed by Yassin-Kassab et al. (1999). If NP i  = 0, the node cannot be supplied. If NP i  = 1, the node can be supplied. If NP i  ≥ 2, for all nodes, a path inter-dependency investigation is carried out to check whether the network is fully looped. We adopted a practical procedure that does not involve an exhaustive enumeration of all the paths supplying each node. For a pair of independent supply paths, removing a pipe from one path does not affect the other path. Therefore, the procedure entails removing all pipes one at a time and in each case observing whether all nodes can be reached. If all nodes can be supplied from one or more sources after the removal of all pipes one by one with replacement, then all nodes have at least two independent supply paths. It is worth observing that EPANET 2 sets default values of node pressures and pipe flows within parts of a network that are not connected to a source. We addressed this by assigning zero flows and pressures, respectively, to such pipes and nodes.

In order to represent the vector of decision variables in a genetic algorithm, an n-bit binary string gives rise to 2n different n-bit codes and, depending on the number of decision variables, some codes may be redundant. We assumed redundant codes represent closed pipes whose flow-carrying capacity is zero. The closed pipes are allocated pipe sizes taken from just above the upper end of the real set of available pipe diameters. The data required to implement the procedure are the unit costs for the fictitious or assumed diameters. As the fictitious diameters have no functional value, it is anticipated they will become extinct through evolution and natural selection. The benefits of this novel approach are that it is entirely generic and very practical; additional parameters that require special calibration are not introduced and pre-optimization trial runs are not required. The premature loss of potentially useful genes is thus avoided, and the genetic code that is transmitted in successive generations is not degraded (Herrera et al. 1998).

3 Computational Solution

We used the Nondominated Sorting Genetic Algorithm (NSGA) II that has been used extensively, and its merits have been reported elsewhere (Deb et al. 2002; Dridi et al. 2008). Selection for crossover was carried out with a binary tournament. Single-point crossover was used to produce two offspring from two parents. Once the offspring population was created, the mutation operator reversed the selected bits. The optimization problem was posed as:

$$ \mathrm{Minimize}\ \mathbf{f}={\left({f}_1,{f}_2,{f}_3\right)}^{\mathrm{T}} $$
(7)

The decision variables are the pipe diameters D ij and link selection variables p ij for the entire network. To make all three objectives in Eq. 7 roughly similar in magnitude, each f m i , i.e. the value of objective m for solution i, was normalized as

$$ f{n}_i^m=\left({f}_i^m-{f}_{\min}^m\right)/\left({f}_{\max}^m-{f}_{\min}^m\right);\forall i,\forall m $$
(8)

In the generation in question, f mmin and f mmax = minimum and maximum value of objective m, respectively; and fn m i = normalized value of objective m for solution i.

In each generation of the optimization algorithm, each solution in the population is analysed using EPANET 2. The resulting pipe flow rates are used to calculate the entropy (Eq. 4). In general, numerical nonlinear optimization is required to calculate the maximum value of the entropy S *. However, computationally efficient path entropy methods that do not involve numerical optimization directly are available. We used the “simplified path entropy method” developed by Ang and Jowitt (2005) for the single-source network example (Section 4.1) and an algorithm known as the “α-method” developed by Yassin-Kassab et al. (1999) for the multiple-source network example (Section 4.2). Application of the α-method involves solving a non-linear system of equations and, for a two-source network, it reduces to the solution of a single nonlinear equation for which we used the bisection method (Press et al. 2003).

4 Results and Discussion

Two networks from the literature were considered. The Hazen-Williams roughness coefficient for all pipes is 130. For each network, the optimization algorithm was executed 30 times on a desktop personal computer (Processor: Intel Core 2 Duo, CPU: 2.99 GHz, RAM: 3.21 GB). The population size, cross-over probability and stopping criterion were: 100, 1.0 and 106 hydraulic simulations, respectively. The 100 solutions in each of the 30 nondominated sets achieved were then merged. Out of the 30 × 100 i.e. 3,000 solutions the final set of 100 nondominated solutions was obtained by a screening procedure that considers the Pareto-optimality and diversity (i.e. crowding distance) of the solutions in the objective space (as in NSGA II). The convergence point in the optimization was taken as the point after which there was no further improvement in both the entropy and cost for the feasible solution with the highest entropy value.

Given a set of nondominated solutions, the hypervolume is a measure of the fraction of the objective space dominated by the said solutions. Its value increases as the achieved solutions approach the real Pareto-optimal front. The value increases also as the range of solutions in the nondominated set increases or their distribution becomes more uniform. Larger hypervolume values are thus preferred (Knowles 2005). The hypervolume was calculated after normalizing the objectives according to Eq. 8, for each optimization run and the union of all the 30 runs.

4.1 Sample Network 1

The network shown in Fig. 1a (Awumah et al. 1990) has one supply node, 17 pipes and 11 demand nodes. The elevation of the nodes is 0 m. The head at the supply node is 100 m. H i req = 30 m for all demand nodes. All pipes have length of 1,000 m. R i req = 2 specifies a fully looped topology. We used 12 pipe diameters (100, 125, 150, 200, 250, 300, 350, 400, 450, 500, 550 and 600 mm) i.e. 1317 = 8.65 × 1018 solutions including pipe omission. Given 106 hydraulic simulations the sampling ratio was 106/8.65 × 1018 = 8.65 × 10−12. Each solution was represented by a 68-bit chromosome based on a 4-bit pipe-size representation scheme. A 4-bit binary string produces 24 = 16 codes three of which are redundant as there are 13 pipe-size alternatives. We allocated three assumed pipe diameters of 650, 700 and 750 mm to the three redundant codes. The pipe costs were taken as 800D 1.5 (£/m) where D is the pipe diameter (in metres). The absolute probability of bit mutation was 1/68 ≈ 0.015.

Fig. 1
figure 1

Topologies of Networks 1 and 2 with all the candidate pipes

Table 1 shows the general characteristics of the optimization algorithm. The minimum cost achieved for the global maximum entropy (GME) solution was £2,177,413. The maximum value of entropy for the GME solution was 3.592494. The mean number of function evaluations and CPU time required to achieve convergence were 733,413 and about 64 min, respectively. There is a multiplicity of maximum entropy values and one of the aims of the optimization is to provide a wide range of maximum entropy solutions. The maximum entropy value that is the smallest gives rise to the smallest Maximum Entropy (SME) solution. The minimum cost of the SME solution was £1,181,715. The maximum value of entropy for the SME solution was 2.660135. The minimum surplus head at the critical node was 0.007 m. The optimization model includes multiple conflicting objectives. Therefore, it is not guaranteed that any minimum node pressure constraints will be active. Furthermore, the slack for a limiting minimum node pressure constraint need not be exactly zero, due to the discrete pipe diameters.

Table 1 Results and convergence statistics for 30 optimization runs

Figure 2 shows the frontier-optimal solutions achieved of which the most infeasible solution has cost = 0; entropy = 0; topological infeasibility = 24 (i.e. 2 independent paths per node × 12 nodes); and residual head infeasibility = 330 m (i.e. 11 demand nodes × 30 m of residual head for each demand node). This solution survives until the end of the optimization because the algorithm is bias-free with respect to constraint violations. Any crossover between this solution and another solution will likely create new layouts. Also, the hypervolume value for the final merged Pareto-optimal front was 0.676. This is similar to the values in Table 1 for the individual optimization runs.

Fig. 2
figure 2

Pareto-optimal fronts for Network 1 showing 30 optimization runs

Tables 2 and 3 (in the appendix) illustrate the range of feasible solutions achieved. The final Pareto-optimal set has 23 hydraulically feasible fully-looped solutions and 11 different fully looped topologies (see Fig. 3a). All infeasible solutions in the final Pareto-optimal set were found to be topologically infeasible (i.e. ∃ i : R i  < R req i  = 2), of which only three were hydraulically feasible (i.e. H i  ≥ H i req = 30 m; ∀ i) (see Fig. 3b). Fig. 4 provides further confirmation that the solutions achieved are essentially maximum entropy solutions.

Fig. 3
figure 3figure 3figure 3

a Topologies and flow directions of fully looped hydraulically feasible maximum entropy families for Network 1. The solid circles represent nodes with the smallest residual heads. b Topologies and flow directions of branched and partially looped hydraulically feasible maximum entropy families for Network 1. The solid circles represent nodes with the smallest residual heads. c Topologies of looped and partially looped feasible solutions for Network 2. The rectangles represent sources. d Topologies of branched and partially branched feasible solutions for Network 2. The rectangles represent sources

Fig. 4
figure 4

Achieved vs theoretical maximum entropy values of Networks 1 and 2

Figure 5 shows the progress of the optimization. The fictitious pipe diameters were eliminated in the early stages consistently (Fig. 5b-c). Prior to their complete elimination, fictitious pipe diameters were present in both hydraulically feasible and infeasible solutions. Also, the observed rates of elimination reflected the pipe sizes and costs (Table 1 and Fig. 5c). On average the larger more expensive assumed diameters were eliminated more quickly. These results suggest convergence of the algorithm is very quick and the proposed procedure for handling redundant binary strings is highly effective.

Fig. 5
figure 5

Illustration of convergence characteristics with Network 1. (a) and (b) show 30 individual optimization runs while (c) and (d) show averages based on the 30 optimization runs. AIM abbreviates the entropy-augmented infeasibility measure

4.2 Sample Network 2

The network shown in Fig. 1b has two supply nodes, 18 demand nodes and 37 pipes. The node demands, required residual heads, pipe lengths and costs are available in Morgan and Goulter (1985). There are 13 pipe sizes, i.e. 1437 = 2.55 × 1042 solutions including pipe omission. Given 106 hydraulic simulations the sampling ratio was 106/2.55 × 1042 = 3.92 × 10−37. A 4-bit binary substring for each pipe size gave a chromosome with length of 148 bits. The absolute probability of bit mutation was 1/148. With 14 options for each pipe, two codes (out of 24 = 16) were redundant. Two fictitious pipe diameters of 750 mm and 800 mm, with costs of 520.9/m and 591.7/m respectively, were allocated to the two redundant codes by extending the cost function of the real pipe diameters. The costs are in generic currency units (CU).

Table 1 summarizes the results achieved. The final Pareto-optimal front had 31 feasible solutions based on 26 layouts (Fig. 3c) that are fully non-dendritic (i.e. layouts with no dead ends). Additionally, seven branched and partially-branched feasible solutions were achieved (Fig. 3d). The cheapest fully-looped feasible solution (Layout 24 in Fig. 3c) with a cost of 2,374,070 CU (Solution 29 in Table 4) had 12 pipes removed. The most expensive fully-looped feasible solution (Layout 2 in Fig. 3c) with a cost of 7,738,914 CU (Solution 4 in Table 4) had one pipe removed. Figure 6 shows the relationship between the cost, entropy and infeasibility.

Fig. 6
figure 6

Pareto-optimal fronts for Network 2 based on the union of the results of 30 optimization runs

5 Conclusions

A new approach to the simultaneous topology and reliability-based pipe-size optimization of water distribution systems has been developed. The method provides a multiplicity of cost-effective candidate solutions distributed among a diverse range of optimal topologies. We used statistical entropy as a computationally efficient surrogate measure of the hydraulic reliability/redundancy and reduced the computational complexity by introducing a new entropy-augmented infeasibility measure. Our optimization model includes the following essential features: (a) entropy maximization within individual feasible sets of flow directions; (b) entropy maximization across all feasible sets of flow directions within individual topologies; (c) entropy maximization across all topologies; (d) minimization of initial construction cost; (e) promotion of a wide variety of alternative solutions; (f) satisfaction of minimum topological adequacy (i.e. supply node and demand node reachability); (g) satisfaction of minimum topological redundancy (i.e. alternative independent supply paths); and (h) adequacy of nodal flows and pressures.

Clearly, many complex objectives and constraints are involved. The entropy-augmented infeasibility measure introduced here simplifies the optimization and reduces the computational complexity considerably as the objectives have been reduced to only three (Saxena et al. 2013; Deb et al. 2002). The optimization problem addressed has six objectives. Sinha et al. (2013) emphasize that the computational solution of a six-objective optimization problem is a ‘formidable task’ for most evolutionary multi-objective optimization algorithms that aim to generate the entire Pareto-optimal front. Some of the challenges include: difficulties in achieving at once both diversity of solutions and convergence on the true Pareto-optimal front; and difficulties arising from the inability to visualize the Pareto-optimal front geometrically.

The genetic algorithm approach proposed allows full exploitation of all the efficient feasible and infeasible solutions generated in the optimization. Any redundant binary codes created are eliminated in a seamless and generic way through natural selection. This avoids arbitrary loss of potentially useful genetic material and preserves the quality of the information that is transmitted from one generation to the next. The results for the two test problems considered are sufficiently encouraging to suggest further research to improve and extend the algorithms proposed may be beneficial.