Abstract
The security of water distribution systems has become the subject of an increasing volume of research over the last decade. Data analysis and machine learning are linked to hydraulic and quality modeling for improving the capacity of water utilities to save lives when faced with the contamination of water networks. This research applies k-nearest neighbor and random forest algorithms to estimate the location of contamination sources at near-real time. Epanet and Epanet-MSX software are used to simulate intrusions of pesticide into water distribution system and the interaction with compounds already present in water bulk. Different pesticide concentrations are considered in the simulations, and chlorine monitoring occurs through placed quality sensors. The results show that random forest can localize \(88\%\) of contamination scenarios, while the KNN algorithm found \(87\%\). Finally, an assessment of contamination spread is made for a better understanding of the impacts of non-localized contamination.
Similar content being viewed by others
Availability of data and material
The data and input file model are available contacting directly the authors.
Code availability
The codes are available contacting directly the authors.
References
Ambrosio, J. K., Brentan, B. M., Herrera, M., Luvizotto, E., Ribeiro, L., & Izquierdo, J. (2019). Committee machines for hourly water demand forecasting in water supply systems. Mathematical Problems in Engineering, 2019.
Andrade, M. A., Choi, C. Y., Lansey, K., & Jung, D. (2016). Enhanced artificial neural networks estimating water quality constraints for the optimal water distribution systems design. Journal of Water Resources Planning and Management, 142, 04016024.
Arad, J., Housh, M., Perelman, L., & Ostfeld, A. (2013). A dynamic thresholds scheme for contaminant event detection in water distribution systems. Water Research, 47, 1899–1908.
Aranha, A., & Rocha, L. (2019). coquetel com 27 agrotxicos foi achado na gua de 1 em cada 4 municpios. Reprter Brasil,Agncia Pblica and Public Eye organization.
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.
Burnet, J. B., Sylvestre, É., Jalbert, J., Imbeault, S., Servais, P., Prévost, M., & Dorner, S. (2019). Tracking the contribution of multiple raw and treated wastewater discharges at an urban drinking water supply using near real-time monitoring of \(\beta\)-d-glucuronidase activity. Water Research, 164,.
Campbell, E., Izquierdo, J., Montalvo, I., Ilaya-Ayza, A., Pérez-García, R., & Tavera, M. (2015). A flexible methodology to sectorize water supply networks based on social network theory concepts and multi-objective optimization. Journal of Hydroinformatics, 18, 62–76.
Cardoso, S. M., Barros, D. B., Oliveira, E., Brentan, B., & Ribeiro, L. (2021). Optimal sensor placement for contamination detection: A multi-objective and probabilistic approach. Environmental Modelling and Software, 135, 104896.
Chen, G., Long, T., Xiong, J., & Bai, Y. (2017). Multiple random forests modelling for urban water consumption forecasting. Water Resources Management, 31, 4715–4729.
Costa, D., Melo, L., & Martins, F. (2013). Localization of contamination sources in drinking water distribution systems: a method based on successive positive readings of sensors. Water Resources Management, 27, 4623–4635.
Cover, T., & Hart, P. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.
Deb, K., Agrawal, S., Pratap, A., & Meyarivan, T. (2000). A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: Nsga-ii. In International conference on parallel problem solving from nature (pp. 849–858). Springer.
Eliades, D. G., Kyriakou, M., Vrachimis, S., & Polycarpou, M. M. (2016). Epanet-matlab toolkit: An open-source software for interfacing epanet with matlab. In Proceedings of the 14th International Conference on Computing and Control for the Water Industry, CCWI (pp. 1–8).
Fix, E. (1951). Discriminatory analysis: nonparametric discrimination, consistency properties. USAF school of Aviation Medicine.
Goldberg, D. E., & Holland, J. H. (1988). Genetic algorithms and machine learning. Kluwer Academic Publishers, (p. 95).
Golkarian, A., Naghibi, S. A., Kalantar, B., & Pradhan, B. (2018). Groundwater potential mapping using c5. 0, random forest, and multivariate adaptive regression spline models in gis. Environmental monitoring and assessment, 190, 149.
Grbčić, L., Kranjčević, L., & Družeta, S. (2021). Machine learning and simulation-optimization coupling for water distribution network contamination source detection. Sensors, 21, 1157.
Grbčić, L., Lučin, I., Kranjčević, L., & Družeta, S. (2020). Water supply network pollution source identification by random forest algorithm. Journal of Hydroinformatics, 22, 1521–1535.
Haykin, S. S. et al. (2009). Neural networks and learning machines/simon haykin.
He, G., Zhang, T., Zheng, F., & Zhang, Q. (2018). An efficient multi-objective optimization method for water quality sensor placement within water distribution systems considering contamination probability variations. Water Research, 143, 165–175.
Ho, T. K. (1995). Random decision forests. In Proceedings of 3rd international conference on document analysis and recognition (pp. 278–282). IEEE volume 1.
Hu, C., Ren, G., Liu, C., Li, M., & Jie, W. (2017). A spark-based genetic algorithm for sensor placement in large scale drinking water distribution systems. Cluster Computing, 20, 1089–1099.
Huang, J. J., & McBean, E. A. (2009). Data mining to identify contaminant event locations in water distribution systems. Journal of Water Resources Planning and Management, 135, 466–474.
Jiang, D. N., & Li, W. (2019). Multi-objective optimal placement of sensors based on quantitative evaluation of fault diagnosability. IEEE Access, 7, 117850–117860.
Khalili, S., Tabesh, M., & Ghaemi, E. (2021). Determining the contamination source in water distribution networks using genetic algorithm. Journal of Water and Wastewater; Ab va Fazilab (in persian).
Khatri, K. B., Strong, C., Kochanski, A. K., Burian, S., Miller, C., & Hasenyager, C. (2018). Water resources criticality due to future climate change and population growth: Case of river basins in utah, usa. Journal of Water Resources Planning and Management, 144, 04018041.
Khorshidi, M. S., Nikoo, M. R., & Sadegh, M. (2018). Optimal and objective placement of sensors in water distribution systems using information theory. Water Research, 143, 218–228.
Liaw, A., & Wiener, M. (2013). Documentation for r package randomforest. PDF). Retrieved, 15, 191.
Lučin, I., Grbčić, L., Čarija, Z., & Kranjčević, L. (2021). Machine-learning classification of a number of contaminant sources in an urban water network. Sensors, 21, 245.
Modaresi, F., & Araghinejad, S. (2014). A comparative assessment of support vector machines, probabilistic neural networks, and k-nearest neighbor algorithms for water quality classification. Water Resources Management, 28, 4095–4111.
Naghibi, S. A., Ahmadi, K., & Daneshi, A. (2017). Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Resources Management, 31, 2761–2775.
Ohar, Z., Lahav, O., & Ostfeld, A. (2015). Optimal sensor placement for detecting organophosphate intrusions into water distribution systems. Water Research, 73, 193–203.
Ostfeld, A., Uber, J. G., Salomons, E., Berry, J. W., Hart, W. E., Phillips, C. A., et al. (2008). The battle of the water sensor networks (bwsn): A design challenge for engineers and algorithms. Journal of Water Resources Planning and Management, 134, 556–568.
Perelman, L., & Ostfeld, A. (2013). Bayesian networks for source intrusion detection. Journal of Water Resources Planning and Management, 139, 426–432.
Pranckevičius, T., & Marcinkevičius, V. (2017). Comparison of naive bayes, random forest, decision tree, support vector machines, and logistic regression classifiers for text reviews classification. Baltic Journal of Modern Computing, 5, 221–232.
Preis, A., & Ostfeld, A. (2006). Contamination source identification in water systems: A hybrid model trees-linear programming scheme. Journal of Water Resources Planning and Management, 132, 263–273.
Quinlan, J. (1993). Program for machine learning. C4. 5.
Rathi, S., & Gupta, R. (2017). Optimal sensor locations for contamination detection in pressure-deficient water distribution networks using genetic algorithm. Urban Water Journal, 14, 160–172.
Rokach, L., & Maimon, O. (2005). Decision trees. In Data mining and knowledge discovery handbook (pp. 165–192). Springer.
Rutkowski, T., & Prokopiuk, F. (2018). Identification of the contamination source location in the drinking water distribution system based on the neural network classifier. IFAC-PapersOnLine, 51, 15–22.
Sela, L., & Amin, S. (2018). Robust sensor placement for pipeline monitoring: Mixed integer and greedy optimization. Advanced Engineering Informatics, 36, 55–63.
Tyralis, H., Papacharalampous, G., & Langousis, A. (2019). A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water, 11, 910.
Vankayala, P., Sankarasubramanian, A., Ranjithan, S. R., & Mahinthakumar, G. (2009). Contaminant source identification in water distribution networks under conditions of demand uncertainty. Environmental Forensics, 10, 253–263.
Villarin, M. C., & Rodriguez-Galiano, V. F. (2019). Machine learning for modeling water demand. Journal of Water Resources Planning and Management, 145, 04019017.
Yan, X., Gong, W., & Wu, Q. (2017). Contaminant source identification of water distribution networks using cultural algorithm. Concurrency and Computation: Practice and Experience, 29, e4230. E4230 CPE-16-0479.R2.
Yi, J., Mao, X., Xue, Y., & Compare, A. (2013). Facial expression recognition based on t-sne and adaboostm2. In 2013 IEEE International Conference on Green Computing and Communications and IEEE Internet of Things and IEEE Cyber, Physical and Social Computing (pp. 1744–1749). IEEE.
Zabihi, M., Rad, A. B., Katsaggelos, A. K., Kiranyaz, S., Narkilahti, S., & Gabbouj, M. (2017). Detection of atrial fibrillation in ecg hand-held devices using a random forest classifier. In 2017 Computing in Cardiology (CinC) (pp. 1–4). IEEE.
Zulkifli, S. N., Rahim, H. A., & Lau, W. J. (2018). Detection of contaminants in water supply : A review on state-of-the-art monitoring technologies and their applications. Sensors and Actuators: B. Chemical, 255, 2657–2689.
Acknowledgements
The authors acknowledge the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance code 001 and Instituto de Pesquisas Tecnológicas do Estado de São Paulo (IPT) to support this research.
Funding
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES) - Finance code 001.
Author information
Authors and Affiliations
Contributions
D. B. Barros performed creation of codes and writing. S. M. Cardoso and E. Oliveira performed writing and correction. B. Brentan contributed to creation of codes, corrections and coordination. L. Ribeiro performed corrections and coordination.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Barros, D.B., Cardoso, S.M., Oliveira, E. et al. Using data mining techniques to isolate chemical intrusion in water distribution systems. Environ Monit Assess 194, 203 (2022). https://doi.org/10.1007/s10661-022-09867-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10661-022-09867-z