Skip to main content
Log in

Contamination source detection in water distribution networks using belief propagation

  • Original Paper
  • Published:
Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Abstract

We present a Bayesian approach for the Contamination Source Detection problem in water distribution networks. Assuming that contamination is a rare event (in space and time), we try to locate the most probable source of such events after reading contamination patterns in few sensed nodes. The method relies on strong simplifications considering binary clean/contaminated states for nodes in discrete time, and therefore focuses on the time structure of the sensed patterns rather than on the concentration levels. As a result, a posterior probability over discrete variables is written, and posterior marginals are computed using belief propagation algorithm. The resulting algorithm runs once on a given observation and reports probabilities for each node being the source and for the contamination patterns altogether. We test it on Anytown model, proving its efficacy even when only a single sensed node is known.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

Notes

  1. It is customary in factor graph representation labeling the interaction nodes with the first letters \(a,b,c,\ldots \) and the variable nodes with \(i,j,k,\ldots \).

References

  • Altarelli F, Braunstein A, Dall’Asta L, Lage-Castellanos A, Zecchina R (2014) Bayesian inference of epidemics on networks via belief propagation. Phys Rev Lett 112(11):118701

    Article  Google Scholar 

  • Banik B, Di Cristo C, Leopardi A (2015) A pre-screening procedure for pollution source identification in sewer systems. Procedia Eng 119:360–369

    Article  Google Scholar 

  • Banik BK, Di Cristo C, Leopardi A, de Marinis G (2017) Illicit intrusion characterization in sewer systems. Urban Water J 14(4):416–426

    Article  Google Scholar 

  • Barandouzi M, Kerachian R (2016) Probabilistic contaminant source identification in water distribution infrastructure systems. Civil Eng Infrastruct J 49(2):311–326

    Google Scholar 

  • Braunstein A, Lage-Castellanos A, Ortega E (2017) Contamination source inference in water distribution networks. Rev Cub Fís 34(2):100–107

    Google Scholar 

  • Cristo CD, Leopardi A (2008) Pollution source identification of accidental contamination in water distribution networks. J Water Resour Plan Manag 134(2):197–202

    Article  Google Scholar 

  • De Sanctis AE, Shang F, Uber JG (2008) Determining possible contaminant sources through flow path analysis. Water Distrib Syst Anal Symp 2006:1–12

    Google Scholar 

  • Donoho DL (2006) Compressed sensing. IEEE Trans Inform Theory 52:1289–1306

    Article  Google Scholar 

  • Guan J, Aral MM, Maslia ML, Grayman WM (2006) Identification of contaminant sources in water distribution systems using simulation-optimization method: case study. J Water Resour Plan Manag 132(4):252–262

    Article  Google Scholar 

  • Hu C, Zhao J, Yan X, Zeng D, Guo S (2015) A mapreduce based parallel niche genetic algorithm for contaminant source identification in water distribution network. Ad Hoc Netw 35:116–126

    Article  Google Scholar 

  • Huang JJ, McBean EA (2009) Data mining to identify contaminant event locations in water distribution systems. J Water Resour Plan Manag 135(6):466–474

    Article  Google Scholar 

  • Khan MAI, Banik BK (2017) Contamination source characterization in water distribution network. Global Sci Technol J 5(1):44–55

    Google Scholar 

  • Kumar J, Brill ED, Mahinthakumar G, Ranjithan SR (2012) Contaminant source characterization in water distribution systems using binary signals. J Hydroinform 14(3):585–602

    Article  Google Scholar 

  • Laird CD, Biegler LT, van Bloemen Waanders BG (2006) Mixed-integer approach for obtaining unique solutions in source inversion of water networks. J Water Resour Plan Manag 132(4):242–251

    Article  Google Scholar 

  • Laird CD, Biegler LT, van Bloemen Waanders BG (2007) Real-time, large-scale optimization of water network systems using a subdomain approach. In: Real-time PDE-constrained optimization, SIAM, pp 289–306

  • Liu L, Sankarasubramanian A, Ranjithan SR (2011) Logistic regression analysis to estimate contaminant sources in water distribution systems. J Hydroinform 13(3):545–557

    Article  Google Scholar 

  • Lokhov AY, Mézard M, Ohta H, Zdeborová L (2014) Inferring the origin of an epidemic with a dynamic message-passing algorithm. Phys Rev E 90(1):012801

    Article  Google Scholar 

  • Luo W, Tay WP, Leng M (2014) How to identify an infection source with limited observations. IEEE J Sel Top Signal Process 8(4):586–597

    Article  Google Scholar 

  • Mezard M, Montanari A (2009) Information, physics, and computation. Oxford University Press Inc., New York

    Book  Google Scholar 

  • Perelman L, Ostfeld A (2010) Bayesian networks for estimating contaminant source and propagation in a water distribution system using cluster structure. In: Water distribution systems analysis 2010, pp 426–435

  • Preis A, Ostfeld A (2006) Contamination source identification in water systems: a hybrid model trees-linear programming scheme. J Water Resour Plan Manag 132(4):263–273

    Article  Google Scholar 

  • Propato M, Sarrazy F, Tryby M (2009) Linear algebra and minimum relative entropy to investigate contamination events in drinking water systems. J Water Resour Plan Manag 136(4):483–492

    Article  Google Scholar 

  • Salomons E, Ostfeld A (2010) Identification of possible contamination sources using reverse hydraulic simulation. Water Distrib Syst Anal 2010:447–453

    Google Scholar 

  • Sambito M, Di Cristo C, Freni G, Leopardi A (2019) Optimal water quality sensor positioning in urban drainage systems for illicit intrusion identification. J Hydroinform 22(1):46–60. https://doi.org/10.2166/hydro.2019.036

    Article  Google Scholar 

  • Takhar D, Laska JN, Wakin MB, Duarte MF, Baron D, Sarvotham S, Kelly KF, Baraniuk RG (2006) A new compressive imaging camera architecture usingoptical-domain compression. In: Proceeding of computational imaging IV, vol 6065, pp 43–52

  • Tao T, Lu Yj, Fu X, Xin Kl (2012) Identification of sources of pollution and contamination in water distribution networks based on pattern recognition. J Zhejiang Univ Sci A 13(7):559–570

    Article  Google Scholar 

  • Todini E, Pilati S (1988) A gradient algorithm for the analysis of pipe networks. In: Computer applications in water supply: vol. 1—systems analysis and simulation. Research Studies Press Ltd., pp 1–20. ISBN 0471917834

  • Wang H, Harrison KW (2011) Bayesian update method for contaminant source characterization in water distribution systems. J Water Resour Plan Manag 139(1):13–22

    Article  Google Scholar 

  • Wang H, Harrison KW (2012) Improving efficiency of the bayesian approach to water distribution contaminant source characterization with support vector regression. J Water Resour Plan Manag 140(1):3–11

    Article  Google Scholar 

  • Wang H, Harrison KW (2013) Bayesian approach to contaminant source characterization in water distribution systems: adaptive sampling framework. Stoch Environ Res Risk Assess 27(8):1921–1928

    Article  Google Scholar 

  • Yedidia JS, Freeman WT, Weiss Y (2005) Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Trans inf Theory 51(7):2282–2312

    Article  Google Scholar 

  • Zhu K, Ying L (2016) Information source detection in the sir model: a sample-path-based approach. IEEE/ACM Trans Netw 24(1):408–421

    Article  Google Scholar 

Download references

Acknowledgements

Work supported by the European Union Horizon 2020 research and innovation program MSCA-RISE-2016 under Grant agreement No 734439 INFERNET.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alejandro Lage-Castellanos.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix 1: factor graph representation

Algorithm 1 details the construction of a factor graph from the information regarding the water distribution network and the transport times along the pipes. An example of the resulting factor graph can be found in the main text Fig. 2.

figure a

Appendix 2: belief propagation

Formally speaking, a consistent description of the full measure in terms of single variables and interacting variables distributions is achieved through a variational representation of the free energy \(F = -\log Z\) of the problem in (10) as

$$\begin{aligned} F[b_i(x_i), b_{a}(x_{\partial a})] = \sum _a F_a[b(x_{\partial a})] - \sum _i (d_i-1) F_i[b(x_i)] \end{aligned}$$

The free energy approximation sums the contribution of each factor node free energy, and subtracts the over-counting of the free energies of individual variables (\(d_i\) is the number of factors that a variable interacts with). Each free energy term \(F = U - T S\) consists, as usual, of an energetic part \(U=\sum _{x_{\partial a}} b_a(x_{\partial a}) E_a(x_{\partial a})\) and \(S = -\sum _{x_{\partial a}} b_a(x_{\partial a}) \log b_a(x_{\partial a})\), where the energetic term \(E_a(\partial a)\) contains only the corresponding additive term in the exponential (10).

Minimization of this free energy over the distributions \(b(\cdot )\) needs to respect the consistency among them: if two distributions share a common variable, the marginals should agree:

$$\begin{aligned} \forall _{i,a\in \partial i} \,\, b_i(x_i) = \sum _{x \in a{\setminus } x_i} b_a(x_1,x_2,\ldots ) \end{aligned}$$
(15)

In order to achieve this, we introduce a set of Lagrange multipliers \(m_{a \rightarrow i}(x_i)\) that are finally interpreted as messages flowing from every interaction towards every variable in it (Yedidia et al. 2005). The beliefs then are found in terms of these multipliers as in Eq. (12) while the belief over every set of interacting variables in \( E_{i,t}\) are given as in Eq. (13)

In Algorithm 2 we outline the main steps of the whole procedure presented in this paper, starting from the WDN graph up to the Belief Propagation fixed point iteration.

figure b

Appendix 3: list of symbols

Water network symbols

\(C_i(t)\) :

Actual concentration of contaminants at time t in node i of the WDN

\(L_{i,j}\) :

Length of the pipe connecting nodes i and node j in the WDN

\(v_{i,j}\) :

Velocity of fluid in the pipe connecting nodes i and node j the WDN (considered as constant in time)

\(\Delta _{i,j}\) :

Transport time between node i and node j of the WDN, given as an integer number of some time unit

\(c_{{\mathrm{th}}}\) :

Concentration threshold for the sensitivity of the sensors placed in the network

Graph symbols

\(G(V,E,\Delta )\) :

Weighted directed graph, representing the WDN, where \(V=\{i \arrowvert i\in [1,N]\}\) is the set of nodes, E is the set of edges \((i\rightarrow j)\) and \(\Delta = \{ \Delta _{i,j} \arrowvert (i\rightarrow j)\in E\}\) is the set of transport times along the edges (pipes)

\(G'(V',E')\) :

Time-extended graph representation of the WDN, with a set \(V'=\{s_i^t\arrowvert i\in V , t\in [1,T]\}\) of nodes and a set \(E'=\{((i,t)-(j,t')) \arrowvert (i\rightarrow j)\in E, t' = t+\Delta _{i, j}\}\) of edges

\(\partial ^+_{i,t}\) :

Upstream neighborhood of node \(s_i^t\in V'\): subset of nodes \(s_j^{t'} \in V'\) of graph \(G'\) such that \(((j,t')-(i,t)) \in E'\)

Model symbols

\(o_i^t\) :

Binary (0/1) sensor report for the state of node i at discrete time t, given as input for the inference

\(s_i^t\) :

Binary (0/1) state of node i at discrete time t, to be inferred

\(y_i^t\) :

Binary (0/1) contaminant-pouring patterns, representing the external addition of contaminants at time t on node i, to be inferred

\(\upsilon _i\) :

Binary (1/0) contamination exposure, representing that node i will or will not incorporate contamination coming from the exterior, to be inferred

\(\eta \) :

A field penalizing discrepancies between the observed state \(o_i^t\) and the real state \(s_i^t\) of the observed nodes

\(\beta \) :

A temperature-like parameter enforcing the dynamics [Eq. (2)] in the network [see Eqs. (6) and (7)]

\(\lambda \) :

A field forcing the contamination to be a rare event (in space), by penalizing the non zero values of \(\upsilon _i\) [see Eq. (8)]

\(\gamma \) :

A field forcing the contamination to be a short event (in time), by penalizing the non zero values of \(y_i^t\) [see Eq. (9)]

\(\varphi (s_i^t , \upsilon _i , y_i^t , s_{\partial ^+_i}^{t-\Delta })\) :

Function of soft satisfaction of Eq. 6

\(E_{i,t}(s_i^t,\upsilon _i, y_i^t, s_{\partial ^+_i}^{t-\Delta } )\) :

Binary function (7) checking the consistency of the variables with the proposed dynamics (2)

\(P(S,V, Y \arrowvert O)\) :

Probability of sources V patterns Y and the states of nodes in time S given the observation O

\(P(O \arrowvert S,V, Y)\) :

Probability of an observation O given the active sources V the pouring patterns Y and the and the set of states of the nodes S (actually is only a function \(P(O \arrowvert S)\) of O and S)

P(SVY):

a pirori probability of sources V patterns Y and the states of contaminants S in the system

\(Z=P(O)\) :

Normalization constant, corresponding in the bayesian setting to the probability of the observation O

F :

Free Energy

b(x):

Inferred probability distribution, also called belief, over any binary variable x, approximating the marginal probability over that variable of the full distribution \(P(S,V,Y \arrowvert O)\)

\(m_{a \rightarrow i}(x_i)\) :

Message sent from the interaction a in the factor node representation to one of its variables i, equivalent to the probability distribution of that variable according to the interaction a and the probabilities of the other variables involved in it. Can also be seen as the Lagrange multiplier enforcing the consistency between the single variable distribution \(b_i(x_i)\) and the multivariate factor node distribution \(b_a(x_{\partial _a})\)

Appendix 4: supplementary data

The following data, models, or code generated or used during the study are available as supplemental material of the submitted manuscript

  • File with the directed weighted graph structure of the Anytown network,

  • Concentration pattern sensed at node 9 which is all the relevant information required for our inference procedure.

The following data, models, or code generated or used during the study are available from the corresponding author by request

  • Belief Propagation code in C++ will be granted upon request to the authors.

  • All codes required to generate the input files for the BP code.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ortega, E., Braunstein, A. & Lage-Castellanos, A. Contamination source detection in water distribution networks using belief propagation. Stoch Environ Res Risk Assess 34, 493–511 (2020). https://doi.org/10.1007/s00477-020-01788-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00477-020-01788-y

Navigation