Skip to main content
Log in

Using data mining techniques to isolate chemical intrusion in water distribution systems

  • Published:
Environmental Monitoring and Assessment Aims and scope Submit manuscript

Abstract

The security of water distribution systems has become the subject of an increasing volume of research over the last decade. Data analysis and machine learning are linked to hydraulic and quality modeling for improving the capacity of water utilities to save lives when faced with the contamination of water networks. This research applies k-nearest neighbor and random forest algorithms to estimate the location of contamination sources at near-real time. Epanet and Epanet-MSX software are used to simulate intrusions of pesticide into water distribution system and the interaction with compounds already present in water bulk. Different pesticide concentrations are considered in the simulations, and chlorine monitoring occurs through placed quality sensors. The results show that random forest can localize \(88\%\) of contamination scenarios, while the KNN algorithm found \(87\%\). Finally, an assessment of contamination spread is made for a better understanding of the impacts of non-localized contamination.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Availability of data and material

The data and input file model are available contacting directly the authors.

Code availability

The codes are available contacting directly the authors.

References

  • Ambrosio, J. K., Brentan, B. M., Herrera, M., Luvizotto, E., Ribeiro, L., & Izquierdo, J. (2019). Committee machines for hourly water demand forecasting in water supply systems. Mathematical Problems in Engineering, 2019.

  • Andrade, M. A., Choi, C. Y., Lansey, K., & Jung, D. (2016). Enhanced artificial neural networks estimating water quality constraints for the optimal water distribution systems design. Journal of Water Resources Planning and Management, 142, 04016024.

    Article  Google Scholar 

  • Arad, J., Housh, M., Perelman, L., & Ostfeld, A. (2013). A dynamic thresholds scheme for contaminant event detection in water distribution systems. Water Research, 47, 1899–1908.

    Article  CAS  Google Scholar 

  • Aranha, A., & Rocha, L. (2019). coquetel com 27 agrotxicos foi achado na gua de 1 em cada 4 municpios. Reprter Brasil,Agncia Pblica and Public Eye organization.

  • Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.

    Article  Google Scholar 

  • Burnet, J. B., Sylvestre, É., Jalbert, J., Imbeault, S., Servais, P., Prévost, M., & Dorner, S. (2019). Tracking the contribution of multiple raw and treated wastewater discharges at an urban drinking water supply using near real-time monitoring of \(\beta\)-d-glucuronidase activity. Water Research, 164,.

  • Campbell, E., Izquierdo, J., Montalvo, I., Ilaya-Ayza, A., Pérez-García, R., & Tavera, M. (2015). A flexible methodology to sectorize water supply networks based on social network theory concepts and multi-objective optimization. Journal of Hydroinformatics, 18, 62–76.

    Article  Google Scholar 

  • Cardoso, S. M., Barros, D. B., Oliveira, E., Brentan, B., & Ribeiro, L. (2021). Optimal sensor placement for contamination detection: A multi-objective and probabilistic approach. Environmental Modelling and Software, 135, 104896.

  • Chen, G., Long, T., Xiong, J., & Bai, Y. (2017). Multiple random forests modelling for urban water consumption forecasting. Water Resources Management, 31, 4715–4729.

    Article  Google Scholar 

  • Costa, D., Melo, L., & Martins, F. (2013). Localization of contamination sources in drinking water distribution systems: a method based on successive positive readings of sensors. Water Resources Management, 27, 4623–4635.

    Article  Google Scholar 

  • Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.

    Article  Google Scholar 

  • Deb, K., Agrawal, S., Pratap, A., & Meyarivan, T. (2000). A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In International conference on parallel problem solving from nature (pp. 849–858). Springer.

  • Eliades, D. G., Kyriakou, M., Vrachimis, S., & Polycarpou, M. M. (2016). Epanet-matlab toolkit: An open-source software for interfacing epanet with matlab. In Proceedings of the 14th International Conference on Computing and Control for the Water Industry, CCWI (pp. 1–8).

  • Fix, E. (1951). Discriminatory analysis: nonparametric discrimination, consistency properties. USAF school of Aviation Medicine.

  • Goldberg, D. E., & Holland, J. H. (1988). Genetic algorithms and machine learning. Kluwer Academic Publishers, (p. 95).

  • Golkarian, A., Naghibi, S. A., Kalantar, B., & Pradhan, B. (2018). Groundwater potential mapping using c5. 0, random forest, and multivariate adaptive regression spline models in gis. Environmental monitoring and assessment, 190, 149.

  • Grbčić, L., Kranjčević, L., & Družeta, S. (2021). Machine learning and simulation-optimization coupling for water distribution network contamination source detection. Sensors, 21, 1157.

    Article  Google Scholar 

  • Grbčić, L., Lučin, I., Kranjčević, L., & Družeta, S. (2020). Water supply network pollution source identification by random forest algorithm. Journal of Hydroinformatics, 22, 1521–1535.

    Article  Google Scholar 

  • Haykin, S. S. et al. (2009). Neural networks and learning machines/simon haykin.

  • He, G., Zhang, T., Zheng, F., & Zhang, Q. (2018). An efficient multi-objective optimization method for water quality sensor placement within water distribution systems considering contamination probability variations. Water Research, 143, 165–175.

    Article  CAS  Google Scholar 

  • Ho, T. K. (1995). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (pp. 278–282). IEEE volume 1.

  • Hu, C., Ren, G., Liu, C., Li, M., & Jie, W. (2017). A spark-based genetic algorithm for sensor placement in large scale drinking water distribution systems. Cluster Computing, 20, 1089–1099.

    Article  Google Scholar 

  • Huang, J. J., & McBean, E. A. (2009). Data mining to identify contaminant event locations in water distribution systems. Journal of Water Resources Planning and Management, 135, 466–474.

    Article  Google Scholar 

  • Jiang, D. N., & Li, W. (2019). Multi-objective optimal placement of sensors based on quantitative evaluation of fault diagnosability. IEEE Access, 7, 117850–117860.

    Article  Google Scholar 

  • Khalili, S., Tabesh, M., & Ghaemi, E. (2021). Determining the contamination source in water distribution networks using genetic algorithm. Journal of Water and Wastewater; Ab va Fazilab (in persian).

  • Khatri, K. B., Strong, C., Kochanski, A. K., Burian, S., Miller, C., & Hasenyager, C. (2018). Water resources criticality due to future climate change and population growth: Case of river basins in utah, usa. Journal of Water Resources Planning and Management, 144, 04018041.

    Article  Google Scholar 

  • Khorshidi, M. S., Nikoo, M. R., & Sadegh, M. (2018). Optimal and objective placement of sensors in water distribution systems using information theory. Water Research, 143, 218–228.

    Article  CAS  Google Scholar 

  • Liaw, A., & Wiener, M. (2013). Documentation for r package randomforest. PDF). Retrieved, 15, 191.

  • Lučin, I., Grbčić, L., Čarija, Z., & Kranjčević, L. (2021). Machine-learning classification of a number of contaminant sources in an urban water network. Sensors, 21, 245.

    Article  Google Scholar 

  • Modaresi, F., & Araghinejad, S. (2014). A comparative assessment of support vector machines, probabilistic neural networks, and k-nearest neighbor algorithms for water quality classification. Water Resources Management, 28, 4095–4111.

    Article  Google Scholar 

  • Naghibi, S. A., Ahmadi, K., & Daneshi, A. (2017). Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resources Management, 31, 2761–2775.

    Article  Google Scholar 

  • Ohar, Z., Lahav, O., & Ostfeld, A. (2015). Optimal sensor placement for detecting organophosphate intrusions into water distribution systems. Water Research, 73, 193–203.

    Article  CAS  Google Scholar 

  • Ostfeld, A., Uber, J. G., Salomons, E., Berry, J. W., Hart, W. E., Phillips, C. A., et al. (2008). The battle of the water sensor networks (bwsn): A design challenge for engineers and algorithms. Journal of Water Resources Planning and Management, 134, 556–568.

    Article  Google Scholar 

  • Perelman, L., & Ostfeld, A. (2013). Bayesian networks for source intrusion detection. Journal of Water Resources Planning and Management, 139, 426–432.

    Article  Google Scholar 

  • Pranckevičius, T., & Marcinkevičius, V. (2017). Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic Journal of Modern Computing, 5, 221–232.

    Article  Google Scholar 

  • Preis, A., & Ostfeld, A. (2006). Contamination source identification in water systems: A hybrid model trees-linear programming scheme. Journal of Water Resources Planning and Management, 132, 263–273.

    Article  Google Scholar 

  • Quinlan, J. (1993). Program for machine learning. C4. 5.

  • Rathi, S., & Gupta, R. (2017). Optimal sensor locations for contamination detection in pressure-deficient water distribution networks using genetic algorithm. Urban Water Journal, 14, 160–172.

    Article  Google Scholar 

  • Rokach, L., & Maimon, O. (2005). Decision trees. In Data mining and knowledge discovery handbook (pp. 165–192). Springer.

  • Rutkowski, T., & Prokopiuk, F. (2018). Identification of the contamination source location in the drinking water distribution system based on the neural network classifier. IFAC-PapersOnLine, 51, 15–22.

    Article  Google Scholar 

  • Sela, L., & Amin, S. (2018). Robust sensor placement for pipeline monitoring: Mixed integer and greedy optimization. Advanced Engineering Informatics, 36, 55–63.

    Article  Google Scholar 

  • Tyralis, H., Papacharalampous, G., & Langousis, A. (2019). A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water, 11, 910.

    Article  Google Scholar 

  • Vankayala, P., Sankarasubramanian, A., Ranjithan, S. R., & Mahinthakumar, G. (2009). Contaminant source identification in water distribution networks under conditions of demand uncertainty. Environmental Forensics, 10, 253–263.

    Article  CAS  Google Scholar 

  • Villarin, M. C., & Rodriguez-Galiano, V. F. (2019). Machine learning for modeling water demand. Journal of Water Resources Planning and Management, 145, 04019017.

    Article  Google Scholar 

  • Yan, X., Gong, W., & Wu, Q. (2017). Contaminant source identification of water distribution networks using cultural algorithm. Concurrency and Computation: Practice and Experience, 29, e4230. E4230 CPE-16-0479.R2.

  • Yi, J., Mao, X., Xue, Y., & Compare, A. (2013). Facial expression recognition based on t-sne and adaboostm2. In 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing (pp. 1744–1749). IEEE.

  • Zabihi, M., Rad, A. B., Katsaggelos, A. K., Kiranyaz, S., Narkilahti, S., & Gabbouj, M. (2017). Detection of atrial fibrillation in ecg hand-held devices using a random forest classifier. In 2017 Computing in Cardiology (CinC) (pp. 1–4). IEEE.

  • Zulkifli, S. N., Rahim, H. A., & Lau, W. J. (2018). Detection of contaminants in water supply : A review on state-of-the-art monitoring technologies and their applications. Sensors and Actuators: B. Chemical, 255, 2657–2689.

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The authors acknowledge the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance code 001 and Instituto de Pesquisas Tecnológicas do Estado de São Paulo (IPT) to support this research.

Funding

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance code 001.

Author information

Authors and Affiliations

Authors

Contributions

D. B. Barros performed creation of codes and writing. S. M. Cardoso and E. Oliveira performed writing and correction. B. Brentan contributed to creation of codes, corrections and coordination. L. Ribeiro performed corrections and coordination.

Corresponding author

Correspondence to Daniel Bezerra Barros.

Ethics declarations

Ethics approval

Not applicable.

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Barros, D.B., Cardoso, S.M., Oliveira, E. et al. Using data mining techniques to isolate chemical intrusion in water distribution systems. Environ Monit Assess 194, 203 (2022). https://doi.org/10.1007/s10661-022-09867-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10661-022-09867-z

Keywords

Navigation