A Multi-step Data Assimilation framework to investigate the effect of measurement uncertainty in the reduction of water distribution network model errors

.


Introduction
WDNs are critical infrastructures that deliver potable water to consumers.Proper design and operation of these networks is essential to guarantee a reliable and safe water supply, guaranteeing public health and economic growth Adedeji et al (2018).
Hydraulic models are a commonly used decision support tool that water operators use to simulate and analyze WDNs.The results of simulations help optimize their design, operation, and maintenance to ensure reliability and efficiency.Traditionally, these models have been built and operated using historical data from sensors.However, digital transformation of the water sector are promoting Advanced Metering Infrastructure (AMI) such as smart water meters and Supervisory Control and Data Acquisition (SCADA) systems that open the door to real-time modelling of WDNs, which is becoming increasingly important (Johnson (2022)).The use of real-time sensor data in models can help capturing the complexity and variability of real-world systems, leading to improved and timely decision-making (Rossman (1993), Antonowicz et al (2018)).Although there are different ways to use real-time data in modelling, the integration of technology and digitalization has given rise to new approaches to updating model states, such as Data Assimilation (DA), with improved accuracy in real-time by utilizing long-term measurement data (Hill et al (2014)).DA synthesizes prior knowledge of model states with available measurements to provide an optimized estimate of current model states and reduces uncertainties.However, these measurements can be unstable and contain larger errors, and analyzing the data can be challenging.The ability to address measurement errors using calibration methods and efficiently utilizing a large amount of data is a challenge Zhou et al (2018).
The use of Kalman Filters (KF) for WDNs was first introduced by (Todini (1999)) for calibrating pipe roughness coefficients in WDNs with a simple linear structure.
As KF can only be used for linear systems, the Extended Kalman Filter (EKF) was applied by Shang et al (2006); Shang et al (2008) to estimate nodal demands in a small hypothetical network by approximating nonlinear systems with tangent linear operators.These studies showed good results with KF and EKF in cases of limited nonlinearity and uncertainty, but their efficacy may be limited in highly looped networks (van den Bossche ( 2013)) or the presence of large measurement errors Shang et al (2006); Shang et al (2008).
The effectiveness of the EnKF was proven in updating water demands and water demand model parameters for a Water Demand Forecasting Model under the assumption of known pipe roughness values and no leakage in the system (Okeya et al (2014)).Okeya et al (2014) explored the possibility of burst detection using Kalman filtering of flow observations and forecasts from the hydraulic model, and an extension of this study by Okeya et al (2014) showed that the applied methodology was effective in detecting bursts in real-time and estimating the leak flow.Ruzza (2017) carried out a similar leak detection study in WDNs using KF, EnKF, Ensemble Smoothing, and Normal-Score EnKF to identify nodal leakages.Ensemble-based methods are also effective in providing stable calibration results to ensure the long-term accuracy of models as demonstrated by Zhou et al (2018Zhou et al ( , 2022)).
EnKF avoids model linearization by simulating model states using an ensemble of parameters derived from Monte Carlo perturbations.PF, which extends the use of the ensemble to non-Gaussian models and increases the ensemble size, was successfully used by Do et al (2017a,b) to estimate nodal demand patterns in WDN models using measurements with specific errors.A recent study by Bragalli et al. Bragalli et al (2016) tested the use of EnKF in WDNs using an innovative 3-step EnKF for a small WDN which showed promising results for the capabilities of a multi-step DA in WDNs.
EnKF is an ideal and optimal method for applying DA for WDN as EnKF is stable with large nonlinear systems and a low probability of divergence from the true value.
Despite the previous research efforts, the application of DA techniques in WDNs is still limited.In particular, extended-period simulations have not been carried out for a multi-step DA algorithm.Also, in previous studies performed by Bragalli et al. (2016) and Okeya et al. (2014a), Demand Driven Analysis (DDA) methods were used for the hydraulic modelling of the WDNs.
In this paper, a three-step Ensemble Kalman Filter-based DA for WDNs (3-EnKF-WDN) approach is presented, within a hydraulic modelling approach that involves extended period simulation and Pressure-Dependant Demand (PDD).The objective is to understand to which extent model errors can be reduced under measurement uncertainty, in particular due to sensor precision and noise, when incrementally assimilating the system states of pressure (step 1), flow (step 2) and demand (step 3).We also propose a new evaluation metric, Combined Total Variance Ratio, to quantify the overall effectiveness of this multi-step DA.Additional analyses include the effect of the number of ensembles in the EnKFs, and the computational demand of 3-EnKF-WDN.
The remainder of the paper presents the methodology section, outlining the approach used in this study.Two case studies are used to demonstrate the application of the multi-step DA method.Afterwards, the results are presented and discussed.Some conclusions and findings are drawn in the last section.

Methodology
The methodology consists of three parts.The first part details the implementation of the improved multi-step DA algorithm, which starts with an initialization, and moves incrementally by assimilating pressure, flow and demand data.The second part presents the new evaluation metric, Combined Total Variance Ratio, to quantify the overall effectiveness of the multi-step DA.Finally, the third part includes an experimental setup to evaluate the effect of the measurement uncertainties on the effectiveness of the multi-step DA, the effect of different numbers of ensembles in the EnKF and the computational demand of the multi-step DA.
2.1 Three-step Ensemble Kalman Filter-based DA for WDNs (3-EnKF-WDN) The structure of the 3-EnKF-WDN algorithm is shown in Figure 1, and further detailed in the sections below.
The multi-step EnKF for WDNs involves initializing the ensemble of state estimates and updating the ensembles with measurements of head, flow, and demands.This process is repeated at each time step of the simulation to estimate the hydraulic state of the network over time.The q j after the state symbol refers to the "known" demand which is used to initialize the 3-step DA.
Fig. 1 Step-by-step implementation of the 3-Step DA Algorithm

Initialization
Before proceeding with the 3-steps, it is necessary to generate the initial ensemble of states describing our prior knowledge, using the following procedure: (a) Generate an ensemble of demands (q) with a mean µ qj (base demand of each node) and variance σ 2 qj (b) Compute matrices of pressure (H qj ) and flowrate (Q qj ) initialized in the network with the ensembles of demands (q meas ) and their averages H |qj , Q |qj , with |, denoting the average of the respective state being calculated.This is done by means of the EPANET 2.2 modelling system and the WNTR Python library (Klise et al (2017b,a)), (c) The number of ensembles "m" must be large enough for the estimated co-variance matrices to be inverted.
Once initialised, data assimilation is carried out for up to 3-steps depending on the available type of measurements, as follows.

Step One -Assimilation of Pressure Head
Update the ensemble of state estimates with head measurements by calculating the Kalman gain, assimilating these measurements and estimating the flow and demand, as follows: (a) Calculate the ensemble mean µ H and ensemble prior variance of Head P H , using Eq. ( 1) and Eq. ( 2).
(b) Calculate the Kalman Gain K H for the head using the error in the estimate and the errors in the measurement of the head (Eq 3) where R z H is the precision of head sensors and v z H is the noise in head sensors.
(c) Assimilate the measurements of Head (Z H ) and update the Head values (H qj z H ), using Eq (4): (d) Estimate Flow (Q qj z H ) using hydraulic head losses (e) Estimate Demand (q qj z H ) using the Pipe-Node Incidence Matrix (A 21 ) as defined by Todini and Pilati (1988), Eq (5).

Step Two -Assimilation of Flow
Update the ensemble of state estimates with measurements of flow by assimilating the measurements, estimating the head and demand, and calculating the Kalman gain.
(a) Calculate the ensemble mean µ Q and ensemble prior variance of the Flow P Q . Where; (b) Calculating the Kalman Gain K F for flow using the error in the estimate and the errors in the measurement of flow (Precision of flow sensors; R zQ , noise in flow sensors; v zQ ) (c) Assimilate the measurements of Flow (Z Q ) and update the Flow values (b) Calculate the Kalman Gain K ′ Q for flow prime using the error in the estimate and the errors in the measurement of demands (Precision of demand sensors; R zq , noise in demand sensors; v zq ) (c) Assimilate the measurements of demands(z q ) and update flow values

Evaluation Metric
The effectiveness of the DA can be estimated using the Total Variance (TV), Eq (16), as suggested by Bragalli et al (2016).
where T V is the Total Variance, ⊗ is the state variable (either H, Q or q), ⊗ is the ensemble mean, S is the number of state variables (i.e., number of nodes or pipes), m is the number of ensembles, i is the iterator for the state variable and j is the iterator for the ensembles.
However, for extended period simulation, we use the daily average T V value, obtained by dividing T V by the number of time steps used for the DA.
where T V R {⊗} is the Total Variance Ratio of the system state, T V {⊗} is the posterior system state assimilation (either 1 step, 2 steps assimilated), and T V ⊗ is the prior system state ⊗ without the assimilation of measurements from the current step.
To quantify the overall effectiveness of the multi-step DA the T V values for each system state are normalized to obtain a Total Variance Ratio (T V R), which are averaged to obtain a Combined Total Variance Ratio (CT V R), which indicates the overall effectiveness of all system states (head, demand and flow) of all assimilation steps.
where CT V R is the Combined Total Variance Ratio, N is the number of system states assimilated and being combined, T V R⊗ k is the Total Variance Ratio for System State, and k is either head (H), flow(Q) or demand (q).

Evaluating the effect of measurement uncertainty
Measurements are always affected by a degree of uncertainty.In the case of WDNs, measurement uncertainty depends on the sensors used for measuring the system's states.The precision and noise of these sensors are important in determining how well the sensors can capture the true states of the system.
Therefore, it is important to identify the limit of applicability of the proposed 3step DA algorithm under uncertain observations To this end, we propose a number of experiments to test the effect of uncertainty due to sensor precision and uncertainty due to sensor noise, applied to the measurements of head, flow and demand.On the one hand, to investigate the effect of the uncertainty due to noise, six different levels of noise were applied to each state measurement.The selected noise values were varied using a normal distribution with a 5% standard deviation.In total 600 simulations were carried out for each sensor type.On the other hand, the effect of the uncertainty due to sensor precision was investigated applying six different precision values for each state of sensor, and for all the possible combinations of sensor precision values.

Case Studies
Two networks of different sizes which are representative of real-world WDNs are taken for this study.
The first case study is the Modena network which is the same WDN used by Bragalli et al ( 2016), Han et al (2020) Bhave and Gupta (2006) and in many other similar studies.The network consists of 317 pipes, 268 nodes and 4 reservoirs with a fixed head between 72.0 m and 74.5 m.The network has a total length of 71. 8 km of pipes with diameters between 100 mm and 400 mm.Although the network of Modena is small, the topology and distribution of the network make it suitable for the proposed research as the network is comparable to real world small WDNs as seen in Figure 2. The second case study is the Five Reservoir network (F iveRes), which is much larger than Modena.The network consists of 1278 pipes, 935 nodes and 5 reservoirs Zheng and Zecchin (2014).The layout of the network is given in Figure 3.The network has a total length of 253.7 km of pipes with a diameter of 600 mm.The F iveRes network provides a suitable comparison of how the DA algorithm can handle larger and more complex WDN models.The monitoring network in Modena is more distributed compared to the F iveRes Network as seen in Figure 4.The number of sensors is also much less in F iveRes compared to the size of the network as seen in Table 1.Hence, it may not provide a good representation of the hydraulic states within the WDN for F iveRes.As such the experiments for measurement uncertainty were repeated for the F iveRes network with sensors located at all the nodes and links.

Results and Discussion
The methodology in Figure 1 was applied to the networks of Modena and FiveRes, modifying precision and noise of the measurements and evaluating their effect on the models' error.

Uncertainty due to noise
In the case of Modena, as seen in Figure 5 the multi-step DA is most sensitive to noise in the flow measurements, and any noise beyond one litre per second results in the DA algorithm being ineffective, as CT V R exceeds one.The threshold of noise for the head is between 0.1 and 0.2 meters of noise.Noise in the measurement of demand, on the other hand, is resilient to an increase in noise up to 0.5 litres per second.Therefore, in the case of Modena, both flow and head sensors must be calibrated regularly, to ensure that the accuracy of sensors does not exceed the threshold within which the DA is effective.However, demand sensors require less maintenance and calibration as they can be effective to a higher threshold of noise compared to the other state measurement sensors.The results for the F iveRes network, also show that increasing noise in the measurement of the systems states results in an increase in CTVR.However, as seen from the results for F iveRes in Figure 6, both head and demand show a big variation of results, where the DA is effective for the entire range of heads tested based on the minimum CTVR, and at the same time, the CTVR exceeds one even at the lowest noise levels based on the maximum CTVR.In the case of demand, the DA is found to be completely ineffective beyond a noise level of 0.8 lps based on the minimum CTVR.Similar to the head, the demand also shows the maximum CTVR exceeding the threshold of one at very low noise levels.The noise in flow shows that the DA is effective until about 2 lps based on the minimum CTVR, however, the maximum CTVR shows that the DA is not effective at all ranges similar to head and demand.
The main reason for this result is believed to be since the monitoring network in the F iveRes WDN is sub-optimal, as the number of sensors in the WDN is also relatively less compared to the size of the WDN as shown in Table 1.
In the case of the fully monitored F iveRes WDN results show that noise is not acceptable for successful DA when the network is fully monitored for both head and flow.As the results go beyond the threshold of CTVR being one, for all noise levels beyond 0. The results also show that when the network is being fully monitored by head and flow sensors, the DA is less dependent on the sensors of demand and the noise in demand does not have a significant impact on the effectiveness of the DA In general, it is observed that an increase in the noise in state measurements results in reduced effectiveness of the 3-EnKF-WDN, as the CT V R increases when noise increases.Recall that CT V R > 1 indicates that the prior state yields less error compared to the assimilated states, and therefore 3-EnKF-WDN is ineffective.

Uncertainty due to sensor precision
Figure 7 shows how the average T V of flow varies with varying precision of sensors for Modena.It can be seen from the sub-plots that, in general, an increase in the precision value of flow sensors results in an increase in the average T V of flow.Figure 8 shows a close-up of one of the sub-plots from Figure 7, when the precision of demand sensors (R zq ) is fixed at 0.1 litres/second and the precision of head sensors (R z H ) is fixed at 0.1 meters.It shows that the average total variance of flow states increases with the increased precision value of flow sensors.
These findings suggest that an additional step of DA has a greater impact on decreasing the average T V of flow (i.e., on reducing the model error), than increasing the precision of flow sensors.In practical cases, this implies that operators may opt for less precise sensors, but a variety of sensors, to perform a multi-step DA and achieve better results.As seen in Figure 8, carrying out a multi-step DA seems to be much more effective to reduce model error than assimilating just one system state.Also, that the precision of the sensor does not improve the results more considerably than the improvement obtained by an additional DA step.Similar results were obtained for both F iveRes and Modena WDN's with slight variations which may be due to factors such as the network topology, and hydraulic state of the WDN, among others.

Demand
As each member of the ensembles in the EnKF is an independent realization of the model, we discuss how the number of ensembles affect the performance of the proposed 3-step EnKF.Generally, the higher the number of ensembles, the better the estimation of uncertainty, and computational demand is greater Mulder (2014).Hence the implemented 3-step EnKF was simulated with ensembles between 5 members up to 100 members.
Figure 9 and Figure 10 show that the increase in the number of ensembles improves the results of the DA as the average T V decreases in all cases with the increase in the number of ensembles used by the EnKF.It can also be seen that the consecutive steps of assimilation result in a reduction in the model error as well.The asymptotic behaviour also indicates that few ensembles yield high average T V , but that it reduced rapidly as more ensembles are added.However, the rate of reduction of T V starts to be marginal after 30 to 50 ensembles, indicating that more ensembles are not necessary.
This behaviour can be seen for both Modena and F iveRes networks.The simulation time was compared for the different number of ensembles using various configurations of computer systems.It is observed that an increase in the number of ensembles from 5 to 100 results in an increase in simulation time from 37 seconds to 593 seconds for Modena and 269 seconds to 4464 seconds for F iveRes.
With the increase in the size of the network from Modena (268 nodes, 317 links) to F iveRes (935 Nodes, 1278 Links) the increase in simulation time is exponential.This can be seen by the increase in the gradient of the graph in Figure 11.Although the increase in the size of the network is ≈ 3.5 times, the increase in simulation time is by ≈ 7.5 times.In addition, 3-EnKF-WDN was tested on three different computer systems with different computational resources.The specifications of these systems are given in Table 2.
From the three different computer systems tested, the processors and their respective clock speeds show the most significant effect on the computational time.The current implementation of the algorithm runs serially without any parallel components, as such the computation time depends on the single-core clock speeds of the processors.Hence the results seen from Figure 12 are representative of the base and boosted clock speeds of the processors used in the systems in Table 2 If the ensembles are generated in parallel, it will bring about a significant improvement in the computation time of the 3-EnKF-WDN.This will allow for the use of the 3-EnKF-WDN for larger WDNs, for running it for more time-steps without a significant computational burden.The study demonstrated the importance of considering the effect of measurement uncertainty when using the 3-step DA algorithm.Two sources of uncertainty in the measurements were explored, namely precision and noise.It was found that the precision of sensors and the noise in measurements affect the efficacy of the 3-step DA.
When noise is added to the measurements, 3-EnKF-WDN became generally ineffective, within a small range of variation.The effect of the noise is significant in extensively monitored WDN.The findings also confirm the importance of maintaining the sensors with noise as small as possible.This could be achieved by carrying out regular maintenance and calibration of sensors.In practical applications, it is recommended to carry out simulations like the experiments with noise-in-state measurements used in this study to determine the respective thresholds of noise up to which the 3-step DA is still effective for the respective WDN.
It was also found that having high-precision sensors measuring one variable brings less reduction in model error than having less precise sensors measuring more variables.
The study also demonstrated that 30 to 50 ensembles are enough for the 3-EnKF-WDN to perform well, on the two studied networks, and that increasing ensembles beyond this number only introduces unnecessary computational burden.
It was also found that sensor data of demand do not improve the model error when applying 3-EnKF-WDN, when the WDN is fully monitored (i.e., with head sensors in all the nodes and flow sensors in all the links).This is similar to the results obtained by Bragalli et al (2016) where the T V R(q) of demand was found to be the least sensitive to reduction in the T V Rs for the multi-objective optimization carried out in their study.
The proposed method has the potential to be applied to diverse WDN problems such as leak detection, anomaly detection, demand estimation, and water quality evaluation.This can be achieved by adapting the multi-step DA algorithm for the required purpose.
Some limitations of the study include the heavy computational time required.
Parallelization of the algorithm using a method that can run the hydraulic simulations in parallel is a solution to be explored in future research.In addition, the effect of the order and synchronicity of the assimilated data needs to be established.Other explorations to be made include the effect of the standard deviation or variation of the ensembles of demands and the effect of measurement uncertainty on the Kalman Gain.

Fig. 2
Fig. 2 Layout of the Modena network

Fig. 3
Fig. 3 Layout of the F iveRes network

Fig. 4
Fig. 4 Monitoring Networks for Head, Demand and Flow Sensors in Modena (Left) and F iveRes WDN (Right)

Fig. 7
Fig. 7 Average Total Variance of Flow against precision of Sensors for Modena Network

Fig. 9
Fig. 9 Average TV for different numbers of ensembles for M odena WDN

Fig. 11
Fig. 11 Simulation time against the number of ensembles for Modena and F iveRes Network

Fig. 12
Fig. 12 Simulation time against the number of ensembles for Modena and F iveRes WDNs using different computer systems Estimation of Head (H qj z H z Q ) using hydraulic head losses and Pipe-Node Incidence Matrices (A 11 , A 12 and A 21 ) as detailed in Bragalli et al (2016) 2.1.4Step Three -Assimilation of Demand Update the ensemble of state estimates with measurements of demands by assimilating the measurements, estimating the flow and head, and calculating the Kalman gain.(a) Calculate the ensemble mean µ ′ Q and ensemble prior variance of the Q qj z H z Q Bragalli et al (2016)H qj z H z Q zq ) using hydraulic head losses and Pipe-Node Incidence Matrices (A 11 , A 12 and A 21 ) as detailed inBragalli et al (2016)

Table 1
Configuration of sensors in the tested case studies