Skip to main content

A review of data-driven modelling in drinking water treatment

Abstract

There are significant opportunities to optimize drinking water treatment and water resource management using data-driven models. Modelling can help define complex system behaviour, such as water quality and environmental systems, giving insight into expected outcomes from changing conditions. Many water treatment models have been developed, such as predicting treated water quality based on coagulant addition or disinfection by-product formation from chlorination, which can be used to better inform operators of optimal treatment parameters to limit risk and reduce cost. Data-driven models, in particular, present an opportunity to learn relationships from patterns in historical data without the need to pre-define mechanisms or variable interactions. Furthermore, models built on currently monitored data are likely easier to implement since they utilize water quality measures that are already in place. However, data-driven approaches have significant challenges, including increased uncertainty in model validity, challenges in interpreting model behaviour and decision logic, and increased likelihood of incorporating biases from training data. This article presents a review of data-driven model applications in drinking water treatment to highlight opportunities related to protecting source water quality, optimizing treatment processes, and interpreting of sensor data. There is a focus on identifying approaches and algorithms best suited for specific applications and the interpretability of trained models. Successful implementation of data-driven models in critical systems, such as water treatment, requires that models be validated, and a model’s decision-making logic can be identified and scrutinized.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Data availability

Canada’s National Long-term Water Quality Monitoring database (open data).

References

  1. Abba SI, Pham QB, Saini G et al (2020) Implementation of data intelligence models coupled with ensemble machine learning for prediction of water quality index. Environ Sci Pollut Res. https://doi.org/10.1007/s11356-020-09689-x

    Article  Google Scholar 

  2. Abbaspour KC, Schulin R, Schläppi E, Flühler H (1996) A Bayesian approach for incorporating uncertainty and data worth in environmental projects. Environ Model Assess 1:151–158. https://doi.org/10.1007/BF01874902

    Article  Google Scholar 

  3. Aggarwal CC (2018) An introduction to neural networks. In: Aggarwal CC (ed) Neural networks and deep learning: a textbook. Springer International Publishing, Cham, pp 1–52

    Chapter  Google Scholar 

  4. Aghel B, Rezaei A, Mohadesi M (2019) Modeling and prediction of water quality parameters using a hybrid particle swarm optimization–neural fuzzy approach. Int J Environ Sci Technol 16:4823–4832. https://doi.org/10.1007/s13762-018-1896-3

    Article  Google Scholar 

  5. Aguilera PA, Fernández A, Fernández R et al (2011) Bayesian networks in environmental modelling. Environ Model Softw 26:1376–1388. https://doi.org/10.1016/j.envsoft.2011.06.004

    Article  Google Scholar 

  6. Avila R, Horn B, Moriarty E et al (2018) Evaluating statistical model performance in water quality prediction. J Environ Manage 206:910–919. https://doi.org/10.1016/j.jenvman.2017.11.049

    CAS  Article  Google Scholar 

  7. Banadkooki FB, Ehteram M, Panahi F et al (2020) Estimation of total dissolved solids (TDS) using new hybrid machine learning models. J Hydrol 587:124989. https://doi.org/10.1016/j.jhydrol.2020.124989

    CAS  Article  Google Scholar 

  8. Barzegar R, Aalami MT, Adamowski J (2020) Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model. Stoch Environ Res Risk Assess 34:415–433. https://doi.org/10.1007/s00477-020-01776-2

    Article  Google Scholar 

  9. Baxter CW, Stanley SJ, Zhang Q (1999) Development of a full-scale artificial neural network model for the removal of natural organic matter by enhanced coagulation. J Water Supply Res Technol AQUA 48:129–136. https://doi.org/10.2166/aqua.1999.0013

    CAS  Article  Google Scholar 

  10. Baxter CW, Zhang Q, Stanley SJ et al (2001) Drinking water quality and treatment: the use of artificial neural networks. Can J Civ Eng 28:26–35. https://doi.org/10.1139/l00-053

    Article  Google Scholar 

  11. Bieroza M, Baker A, Bridgeman J (2011) Classification and calibration of organic matter fluorescence data with multiway analysis methods and artificial neural networks: an operational tool for improved drinking water treatment. Environmetrics 22:256–270. https://doi.org/10.1002/env.1045

    CAS  Article  Google Scholar 

  12. Bikmukhametov T, Jäschke J (2020) Combining machine learning and process engineering physics towards enhanced accuracy and explainability of data-driven models. Comput Chem Eng 138:106834. https://doi.org/10.1016/j.compchemeng.2020.106834

    CAS  Article  Google Scholar 

  13. Biondi D, Freni G, Iacobellis V et al (2012) Validation of hydrological models: conceptual basis, methodological approaches and a proposal for a code of practice. Phys Chem Earth Parts A/b/c 42–44:70–76. https://doi.org/10.1016/j.pce.2011.07.037

    Article  Google Scholar 

  14. Bishop CM (1995) Neural networks for pattern recognition. Oxford University Press

    Google Scholar 

  15. Breiman L (2001) Random Forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324

    Article  Google Scholar 

  16. Bridgeman J, Bieroza M, Baker A (2011) The application of fluorescence spectroscopy to organic matter characterisation in drinking water treatment. Rev Environ Sci Biotechnol 10:277. https://doi.org/10.1007/s11157-011-9243-x

    CAS  Article  Google Scholar 

  17. Bridgeman J, Jefferson B, Parsons SA (2009) Computational fluid dynamics modelling of flocculation in water treatment: a review. Eng Appl Comput Fluid Mech 3:220–241. https://doi.org/10.1080/19942060.2009.11015267

    Article  Google Scholar 

  18. Bro R (1997) PARAFAC. Tutorial and applications. Chemom Intell Lab Syst 38:149–171. https://doi.org/10.1016/S0169-7439(97)00032-4

    CAS  Article  Google Scholar 

  19. Brookes JD, Carey CC, Hamilton DP et al (2014) Emerging challenges for the drinking water industry. Environ Sci Technol 48:2099–2101. https://doi.org/10.1021/es405606t

    CAS  Article  Google Scholar 

  20. Brooks W, Corsi S, Fienen M, Carvin R (2016) Predicting recreational water quality advisories: a comparison of statistical methods. Environ Model Softw 76:81–94. https://doi.org/10.1016/j.envsoft.2015.10.012

    Article  Google Scholar 

  21. Burchard-Levine A, Liu S, Vince F et al (2014) A hybrid evolutionary data driven model for river water quality early warning. J Environ Manage 143:8–16. https://doi.org/10.1016/j.jenvman.2014.04.017

    CAS  Article  Google Scholar 

  22. Chau K (2006) A review on integration of artificial intelligence into water quality modelling. Mar Pollut Bull 52:726–733. https://doi.org/10.1016/j.marpolbul.2006.04.003

    CAS  Article  Google Scholar 

  23. Chen B, Westerhoff P (2010) Predicting disinfection by-product formation potential in water. Water Res 44:3755–3762. https://doi.org/10.1016/j.watres.2010.04.009

    CAS  Article  Google Scholar 

  24. Chen C-L, Hou P-L (2006) Fuzzy model identification and control system design for coagulation chemical dosing of potable water. Water Supply 6:97–104. https://doi.org/10.2166/ws.2006.782

    CAS  Article  Google Scholar 

  25. Chen H, Chen A, Xu L et al (2020a) A deep learning CNN architecture applied in smart near-infrared analysis of water pollution for agricultural irrigation resources. Agric Water Manag 240:106303. https://doi.org/10.1016/j.agwat.2020.106303

    Article  Google Scholar 

  26. Chen K, Chen H, Zhou C et al (2020b) Comparative analysis of surface water quality prediction performance and identification of key water parameters using different machine learning models based on big data. Water Res 171:115454. https://doi.org/10.1016/j.watres.2019.115454

    CAS  Article  Google Scholar 

  27. Cordoba GAC, Tuhovčák L, Tauš M (2014) Using artificial neural network models to assess water quality in water distribution networks. Proc Eng 70:399–408. https://doi.org/10.1016/j.proeng.2014.02.045

    Article  Google Scholar 

  28. Dahan H, Cohen S, Rokach L, Maimon O (2014) Proactive data mining: a general approach and algorithmic framework. In: Dahan H, Cohen S, Rokach L, Maimon O (eds) Proactive Data Mining with Decision Trees. Springer, New York, NY, pp 15–20

    Chapter  Google Scholar 

  29. De’ath G, Fabricius KE, (2000) Classification and regression trees: a powerful yet simple technique for ecological data analysis. Ecology 81:3178–3192. https://doi.org/10.1890/0012-9658(2000)081[3178:CARTAP]2.0.CO;2

    Article  Google Scholar 

  30. Debnath A, Majumder M, Pal M (2015) A cognitive approach in selection of source for water treatment plant based on climatic impact. Water Resour Manag 29:1907–1919

    Article  Google Scholar 

  31. Delpla I, Florea M, Rodriguez MJ (2019) Drinking water source monitoring using early warning systems based on data mining techniques. Water Resour Manag 33:129

    Article  Google Scholar 

  32. Deng W, Wang G (2017) A novel water quality data analysis framework based on time-series data mining. J Environ Manage 196:365–375. https://doi.org/10.1016/j.jenvman.2017.03.024

    CAS  Article  Google Scholar 

  33. Dogo EM, Nwulu NI, Twala B, Aigbavboa C (2019) A survey of machine learning methods applied to anomaly detection on drinking-water quality data. Urban Water Journal 16:235–248. https://doi.org/10.1080/1573062X.2019.1637002

    Article  Google Scholar 

  34. Doshi-Velez F, Kim B (2017) Towards a rigorous science of interpretable machine learning. [cs, stat]

  35. D’Souza CD, Kumar MSM (2010) Comparison of ANN models for predicting water quality in distribution systems. J AWWA 102:92–106. https://doi.org/10.1002/j.1551-8833.2010.tb10152.x

    Article  Google Scholar 

  36. Eggimann S, Mutzner L, Wani O et al (2017) The Potential of knowing more: a review of data-driven urban water management. Environ Sci Technol 51:2538–2553. https://doi.org/10.1021/acs.est.6b04267

    CAS  Article  Google Scholar 

  37. El Hasadi YMF, Padding JT (2019) Solving fluid flow problems using semi-supervised symbolic regression on sparse data. AIP Adv 9:115218. https://doi.org/10.1063/1.5116183

    CAS  Article  Google Scholar 

  38. Elkiran G, Nourani V, Abba SI, Abdullahi J (2018) Artificial intelligence-based approaches for multi-station modelling of dissolve oxygen in river. GJESM. https://doi.org/10.22034/gjesm.2018.04.005

  39. Ellison AM (2004) Bayesian inference in ecology. Ecol Lett 7:509–520. https://doi.org/10.1111/j.1461-0248.2004.00603.x

    Article  Google Scholar 

  40. Everaert G, Bennetsen E, Goethals PLM (2016) An applicability index for reliable and applicable decision trees in water quality modelling. Eco Inform 32:1–6. https://doi.org/10.1016/j.ecoinf.2015.12.004

    Article  Google Scholar 

  41. Farnham DJ, Lall U (2015) Predictive statistical models linking antecedent meteorological conditions and waterway bacterial contamination in urban waterways. Water Res 76:143–159. https://doi.org/10.1016/j.watres.2015.02.040

    CAS  Article  Google Scholar 

  42. Fenton N, Neil M (2012) Risk Assessment and Decision Analysis with Bayesian Networks. CRC Press

    Book  Google Scholar 

  43. Ferretto N, Tedetti M, Guigue C et al (2014) Identification and quantification of known polycyclic aromatic hydrocarbons and pesticides in complex mixtures using fluorescence excitation–emission matrices and parallel factor analysis. Chemosphere 107:344–353. https://doi.org/10.1016/j.chemosphere.2013.12.087

    CAS  Article  Google Scholar 

  44. Finlay S (2014) Predictive analytics, data mining and big data: myths. Springer, Misconceptions and Methods

    Book  Google Scholar 

  45. Flach P (2012) Machine learning: the art and science of algorithms that make sense of data. Cambridge University Press, Cambridge

    Book  Google Scholar 

  46. Gagnon C, Grandjean BPA, Thibault J (1997) Modelling of coagulant dosage in a water treatment plant. Artif Intell Eng 11:401–404. https://doi.org/10.1016/S0954-1810(97)00010-1

    Article  Google Scholar 

  47. García S, Luengo J, Herrera F (2015) Data Preprocessing in Data Mining. Springer International Publishing, Cham

    Book  Google Scholar 

  48. Gilpin LH, Bau D, Yuan BZ, et al (2019) Explaining explanations: an overview of interpretability of machine learning. [cs, stat]

  49. Gokgoz E, Subasi A (2015) Comparison of decision tree algorithms for EMG signal classification using DWT. Biomed Signal Process Control 18:138–144. https://doi.org/10.1016/j.bspc.2014.12.005

    Article  Google Scholar 

  50. Gomes LS, Souza FAA, Pontes RST et al (2015) Coagulant dosage determination in a water treatment plant using dynamic neural network models. Int J Comp Intel Appl 14:1550013. https://doi.org/10.1142/S1469026815500133

    Article  Google Scholar 

  51. Goodfellow I, Bengio Y, Courville A (2016) Deep Learning. MIT Press

    Google Scholar 

  52. Griffiths KA, Andrews RC (2011) The application of artificial neural networks for the optimization of coagulant dosage. Water Supply 11:605–611. https://doi.org/10.2166/ws.2011.028

    CAS  Article  Google Scholar 

  53. Guidotti R, Monreale A, Ruggieri S et al (2019) A survey of methods for explaining black box models. ACM Comput Surv 51:1–42. https://doi.org/10.1145/3236009

    Article  Google Scholar 

  54. Guo D, Lintern A, Webb JA et al (2019) Key factors affecting temporal variability in stream water quality. Water Resour Res 55:112–129. https://doi.org/10.1029/2018WR023370

    CAS  Article  Google Scholar 

  55. Hamilton KA, Waso M, Reyneke B et al (2018) Cryptosporidium and Giardia in wastewater and surface water environments. J Environ Qual 47:1006–1023. https://doi.org/10.2134/jeq2018.04.0132

    CAS  Article  Google Scholar 

  56. Handelman GS, Kok HK, Chandra RV et al (2019) Peering into the black box of artificial intelligence: evaluation metrics of machine learning methods. Am J Roentgenol 212:38–43. https://doi.org/10.2214/AJR.18.20224

    Article  Google Scholar 

  57. Harris J, Tzafestas SG, Chen CS, et al (eds) (2006) Comments and definitions. In: Fuzzy Logic Applications in Engineering Science. Springer Netherlands, Dordrecht, pp 1–10

  58. Harris TD, Graham JL (2017) Predicting cyanobacterial abundance, microcystin, and geosmin in a eutrophic drinking-water reservoir using a 14-year dataset. Lake Reser Manage 33:32–48. https://doi.org/10.1080/10402381.2016.1263694

    CAS  Article  Google Scholar 

  59. Heddam S, Bermad A, Dechemi N (2012) ANFIS-based modelling for coagulant dosage in drinking water treatment plant: a case study. Environ Monit Assess 184:1953–1971. https://doi.org/10.1007/s10661-011-2091-x

    CAS  Article  Google Scholar 

  60. Heibati M, Stedmon CA, Stenroth K et al (2017) Assessment of drinking water quality at the tap using fluorescence spectroscopy. Water Res 125:1–10. https://doi.org/10.1016/j.watres.2017.08.020

    CAS  Article  Google Scholar 

  61. Hey T (2009) The Fourth Paradigm: Data-Intensive Scientific Discovery, 1st Edition. Microsoft Research, Redmond, Washington

  62. Hosseini-Asl E, Zurada JM, Nasraoui O (2016) Deep learning of part-based representation of data using sparse autoencoders with nonnegativity constraints. IEEE Trans Neural Netw Learn Syst 27:2486–2498. https://doi.org/10.1109/TNNLS.2015.2479223

    Article  Google Scholar 

  63. Huang J, Zhang Y, Arhonditsis GB et al (2020) The magnitude and drivers of harmful algal blooms in China’s lakes and reservoirs: a national-scale characterization. Water Res 181:115902. https://doi.org/10.1016/j.watres.2020.115902

    CAS  Article  Google Scholar 

  64. Humphrey GB, Maier HR, Wu W et al (2017) Improved validation framework and R-package for artificial neural network models. Environ Model Softw 92:82–106. https://doi.org/10.1016/j.envsoft.2017.01.023

    Article  Google Scholar 

  65. Jagupilla SCK, Vaccari DA, Miskewitz R et al (2015) Symbolic regression of upstream, stormwater, and tributary E. Coli concentrations using river flows. Water Environ Res 87:26–34. https://doi.org/10.1002/j.1554-7531.2015.tb00138.x

    CAS  Article  Google Scholar 

  66. Jia X, Willard J, Karpatne A et al (2021) Physics-guided machine learning for scientific discovery: an application in simulating lake temperature profiles. ACM/IMS Trans Data Sci 2:1–26. https://doi.org/10.1145/3447814

    Article  Google Scholar 

  67. Jin T, Cai S, Jiang D, Liu J (2019) A data-driven model for real-time water quality prediction and early warning by an integration method. Environ Sci Pollut Res 26:30374–30385. https://doi.org/10.1007/s11356-019-06049-2

    CAS  Article  Google Scholar 

  68. Juntunen P, Liukkonen M, Lehtola M, Hiltunen Y (2013) Cluster analysis by self-organizing maps: an application to the modelling of water quality in a treatment process. Appl Soft Comput J 13:3191–3196. https://doi.org/10.1016/j.asoc.2013.01.027

    Article  Google Scholar 

  69. Juntunen P, Liukkonen M, Pelo M et al. (2012) Modelling of Water Quality: an application to a water treatment process. In: Applied Computational Intelligence and Soft Computing. https://www.hindawi.com/journals/acisc/2012/846321/. Accessed 15 Sep 2020

  70. Kabir G, Tesfamariam S, Francisque A, Sadiq R (2015) Evaluating risk of water mains failure using a Bayesian belief network model. Eur J Oper Res 240:220–234. https://doi.org/10.1016/j.ejor.2014.06.033

    Article  Google Scholar 

  71. Karniadakis GE, Kevrekidis IG, Lu L et al (2021) Physics-informed machine learning. Nat Rev Phys 3:422–440. https://doi.org/10.1038/s42254-021-00314-5

    Article  Google Scholar 

  72. Keskin TE, Düğenci M, Kaçaroğlu F (2015) Prediction of water pollution sources using artificial neural networks in the study areas of Sivas, Karabük and Bartın (Turkey). Environ Earth Sci 73:5333–5347. https://doi.org/10.1007/s12665-014-3784-6

    Article  Google Scholar 

  73. Khataee AR, Kasiri MB (2011) Modeling of biological water and wastewater treatment processes using artificial neural networks. Clean: Soil, Air, Water 39:742–749. https://doi.org/10.1002/clen.201000234

    CAS  Article  Google Scholar 

  74. Kim CM, Parnichkun M (2017) Prediction of settled water turbidity and optimal coagulant dosage in drinking water treatment plant using a hybrid model of k-means clustering and adaptive neuro-fuzzy inference system. Appl Water Sci 7:3885–3902. https://doi.org/10.1007/s13201-017-0541-5

    Article  Google Scholar 

  75. Kotsiantis SB (2007) Supervised machine learning: a review of classification techniques. Informatica 249–268

  76. JohnR K (1994) Genetic programming as a means for programming computers by natural selection. Stat Comput. https://doi.org/10.1007/BF00175355

    Article  Google Scholar 

  77. Krzywinski M, Altman N (2017) Classification and regression trees. Nat Methods 14:757–758. https://doi.org/10.1038/nmeth.4370

    CAS  Article  Google Scholar 

  78. Kulkarni P, Chellam S (2010) Disinfection by-product formation following chlorination of drinking water: artificial neural network models and changes in speciation with treatment. Sci Total Environ 408:4202–4210. https://doi.org/10.1016/j.scitotenv.2010.05.040

    CAS  Article  Google Scholar 

  79. Lee S, Lee D (2018) Improved prediction of harmful algal blooms in four major south Korea’s rivers using deep learning models. Int J Environ Res Public Health 15:1322. https://doi.org/10.3390/ijerph15071322

    CAS  Article  Google Scholar 

  80. Li J, Liu H, Li Y et al (2013) Monitoring and modeling dissolved oxygen dynamics through continuous longitudinal sampling: a case study in wen-rui tang river, wenzhou, china. Hydrol Process 27:3502–3510. https://doi.org/10.1002/hyp.9459

    CAS  Article  Google Scholar 

  81. Li R, Zou Z, An Y (2016) Water quality assessment in Qu River based on fuzzy water pollution index method. J Environ Sci 50:87–92. https://doi.org/10.1016/j.jes.2016.03.030

    CAS  Article  Google Scholar 

  82. Li Z, Peleato NM (2021) Comparison of dimensionality reduction techniques for cross-source transfer of fluorescence contaminant detection models. Chemosphere. https://doi.org/10.1016/j.chemosphere.2021.130064

    Article  Google Scholar 

  83. Lin H, Dai Q, Zheng L et al (2020) Radial basis function artificial neural network able to accurately predict disinfection by-product levels in tap water: taking haloacetic acids as a case study. Chemosphere 248:125999. https://doi.org/10.1016/j.chemosphere.2020.125999

    CAS  Article  Google Scholar 

  84. Liu P, Wang J, Sangaiah AK et al (2019) Analysis and prediction of water quality using LSTM deep neural networks in IoT environment. Sustainability 11:2058. https://doi.org/10.3390/su11072058

    CAS  Article  Google Scholar 

  85. Maier HR, Dandy GC (2000) Neural networks for the prediction and forecasting of water resources variables: a review of modelling issues and applications. Environ Model Softw 15:101–124. https://doi.org/10.1016/S1364-8152(99)00007-9

    Article  Google Scholar 

  86. Maier HR, Dandy GC (1996) The use of artificial neural networks for the prediction of water quality parameters. Water Resour Res 32:1013–1022. https://doi.org/10.1029/96WR03529

    Article  Google Scholar 

  87. Maier HR, Jain A, Dandy GC, Sudheer KP (2010) Methods used for the development of neural networks for the prediction of water resource variables in river systems: current status and future directions. Environ Model Softw 25:891–909. https://doi.org/10.1016/j.envsoft.2010.02.003

    Article  Google Scholar 

  88. Maier HR, Morgan N, Chow CWK (2004) Use of artificial neural networks for predicting optimal alum doses and treated water quality parameters. Environ Model Softw 19:485–494. https://doi.org/10.1016/S1364-8152(03)00163-4

    Article  Google Scholar 

  89. Marton I, Sánchez AI, Carlos S, Martorell S (2013) Application of data driven methods for condition monitoring maintenance. Chem Eng Trans 33:301–306. https://doi.org/10.3303/CET1333051

    Article  Google Scholar 

  90. Matilainen A, Gjessing ET, Lahtinen T et al (2011) An overview of the methods used in the characterisation of natural organic matter (NOM) in relation to drinking water treatment. Chemosphere 83:1431–1442. https://doi.org/10.1016/j.chemosphere.2011.01.018

    CAS  Article  Google Scholar 

  91. May RJ, Dandy GC, Maier HR, Nixon JB (2008) Application of partial mutual information variable selection to ANN forecasting of water quality in water distribution systems. Environ Model Softw 23:1289–1299. https://doi.org/10.1016/j.envsoft.2008.03.008

    Article  Google Scholar 

  92. May RJ, Maier HR, Dandy GC (2010) Data splitting for artificial neural networks using SOM-based stratified sampling. Neural Netw 23:283–294. https://doi.org/10.1016/j.neunet.2009.11.009

    CAS  Article  Google Scholar 

  93. McKay G, Korak JA, Erickson PR et al (2018) The case against charge transfer interactions in dissolved organic matter photophysics. Environ Sci Technol 52:406–414. https://doi.org/10.1021/acs.est.7b03589

    CAS  Article  Google Scholar 

  94. Mei K, Liao L, Zhu Y et al (2014) Evaluation of spatial-temporal variations and trends in surface water quality across a rural-suburban-urban interface. Environ Sci Pollut Res 21:8036–8051. https://doi.org/10.1007/s11356-014-2716-z

    CAS  Article  Google Scholar 

  95. Meyers G, Kapelan Z, Keedwell E (2017) Short-term forecasting of turbidity in trunk main networks. Water Res 124:67–76. https://doi.org/10.1016/j.watres.2017.07.035

    CAS  Article  Google Scholar 

  96. Mohammed H, Hameed IA, Seidu R (2018) Comparative predictive modelling of the occurrence of faecal indicator bacteria in a drinking water source in Norway. Sci Total Environ 628–629:1178–1190. https://doi.org/10.1016/j.scitotenv.2018.02.140

    CAS  Article  Google Scholar 

  97. Mohri M, Rostamizadeh A, Talwalkar A (2018) Foundations of machine learning, 2nd edn. MIT Press

    Google Scholar 

  98. Montáns FJ, Chinesta F, Gómez-Bombarelli R, Kutz JN (2019) Data-driven modeling and learning in science and engineering. Comptes Rendus Mécanique 347:845–855. https://doi.org/10.1016/j.crme.2019.11.009

    Article  Google Scholar 

  99. Mulia IE, Tay H, Roopsekhar K, Tkalich P (2013) Hybrid ANN–GA model for predicting turbidity and chlorophyll-a concentrations. J Hydro-Environ Res 7:279–299. https://doi.org/10.1016/j.jher.2013.04.003

    Article  Google Scholar 

  100. Murphy KP (2012) Machine learning: a probabilistic perspective, Illustrated. The MIT Press, Cambridge, MA

    Google Scholar 

  101. Murphy KR, Bro R, Stedmon CA (2014) Chemometric analysis of organic matter fluorescence. In: Coble P, Lead J, Baker A et al (eds) Aquatic Organic Matter Fluorescence. Cambridge University Press, Cambridge, pp 339–375

    Chapter  Google Scholar 

  102. Murphy KR, Stedmon CA, Graeber D, Bro R (2013) Fluorescence spectroscopy and multi-way techniques. Parafac Anal Methods 5:6557–6566. https://doi.org/10.1039/C3AY41160E

    CAS  Article  Google Scholar 

  103. Murray S, Ghazali M, McBean EA (2012) Real-time water quality monitoring: assessment of multisensor data using Bayesian belief networks. J Water Resour Plan Manag 138:63–70. https://doi.org/10.1061/(ASCE)WR.1943-5452.0000163

    Article  Google Scholar 

  104. Nguyen A, Yosinski J, Clune J (2015) Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. pp 427–436

  105. Oliker N, Ostfeld A (2014a) Comparison of two multivariate classification models for contamination event detection in water quality time series. J Water Supply Res Technol AQUA 64:558–566. https://doi.org/10.2166/aqua.2014.033

    Article  Google Scholar 

  106. Oliker N, Ostfeld A (2014b) A coupled classification – evolutionary optimization model for contamination event detection in water distribution systems. Water Res 51:234–245. https://doi.org/10.1016/j.watres.2013.10.060

    CAS  Article  Google Scholar 

  107. O’Reilly G, Bezuidenhout CC, Bezuidenhout JJ (2018) Artificial neural networks: applications in the drinking water sector. Water Supply 18:1869–1887. https://doi.org/10.2166/ws.2018.016

    CAS  Article  Google Scholar 

  108. Panidhapu A, Li Z, Aliashrafi A, Peleato NM (2020) Integration of weather conditions for predicting microbial water quality using Bayesian Belief Networks. Water Res 170:115349. https://doi.org/10.1016/j.watres.2019.115349

    CAS  Article  Google Scholar 

  109. Peiris RH, Hallé C, Budman H et al (2010) Identifying fouling events in a membrane-based drinking water treatment process using principal component analysis of fluorescence excitation-emission matrices. Water Res 44:185–194. https://doi.org/10.1016/j.watres.2009.09.036

    CAS  Article  Google Scholar 

  110. Peleato NM, Legge RL, Andrews RC (2018) Neural networks for dimensionality reduction of fluorescence spectra and prediction of drinking water disinfection by-products. Water Res 136:84–94. https://doi.org/10.1016/j.watres.2018.02.052

    CAS  Article  Google Scholar 

  111. Perelman L, Arad J, Housh M, Ostfeld A (2012) Event detection in water distribution systems from multivariate water quality time series. Environ Sci Technol 46:8212–8219. https://doi.org/10.1021/es3014024

    CAS  Article  Google Scholar 

  112. Pianosi F, Beven K, Freer J et al (2016) Sensitivity analysis of environmental models: a systematic review with practical workflow. Environ Model Softw 79:214–232. https://doi.org/10.1016/j.envsoft.2016.02.008

    Article  Google Scholar 

  113. Pifer AD, Fairey JL (2012) Improving on SUVA254 using fluorescence-PARAFAC analysis and asymmetric flow-field flow fractionation for assessing disinfection byproduct formation and control. Water Res 46:2927–2936. https://doi.org/10.1016/j.watres.2012.03.002

    CAS  Article  Google Scholar 

  114. Pu F, Ding C, Chao Z et al (2019) Water-quality classification of inland lakes using landsat8 images by convolutional neural networks. Remote Sens 11:1674. https://doi.org/10.3390/rs11141674

    Article  Google Scholar 

  115. Qi Y (2012) Random Forest for Bioinformatics. In: Zhang C, Ma Y (eds) Ensemble Machine Learning: Methods and Applications. Springer, US, Boston, MA, pp 307–323

    Chapter  Google Scholar 

  116. Qin SJ, Chiang LH (2019) Advances and opportunities in machine learning for process data analytics. Comput Chem Eng 126:465–473. https://doi.org/10.1016/j.compchemeng.2019.04.003

    CAS  Article  Google Scholar 

  117. Quade M, Abel M, Shafi K et al (2016) Prediction of dynamical systems by symbolic regression. Phys Rev E 94:012214. https://doi.org/10.1103/PhysRevE.94.012214

    CAS  Article  Google Scholar 

  118. Raissi M, Perdikaris P, Karniadakis GE (2019) Physics-informed neural networks: a deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. J Comput Phys 378:686–707. https://doi.org/10.1016/j.jcp.2018.10.045

    Article  Google Scholar 

  119. Razavi S, Gupta HV (2015) What do we mean by sensitivity analysis? The need for comprehensive characterization of “global” sensitivity in Earth and Environmental systems models. Water Resour Res 51:3070–3092. https://doi.org/10.1002/2014WR016527

    Article  Google Scholar 

  120. Razavi S, Tolson BA (2011) A new formulation for feedforward neural networks. IEEE Trans Neural Netw 22:1588–1598. https://doi.org/10.1109/TNN.2011.2163169

    Article  Google Scholar 

  121. Reckhow KH (1999) Water quality prediction and probability network models. 56:9

  122. Ribeiro MT, Singh S, Guestrin C (2016) “Why Should I Trust You?”: Explaining the Predictions of Any Classifier. [cs, stat]

  123. Rojas R (1996) The Backpropagation Algorithm. Neural Networks. Springer, Berlin Heidelberg, Berlin, Heidelberg, pp 149–182

    Chapter  Google Scholar 

  124. Rokach L, Maimon O (2015) Data mining with decision trees: theory and applications, 2nd edn. World Scientific, Hackensack, New Jersey

    Google Scholar 

  125. Rosé CP, McLaughlin EA, Liu R, Koedinger KR (2019) Explanatory learner models: why machine learning (alone) is not the answer. Br J Edu Technol 50:2943–2958. https://doi.org/10.1111/bjet.12858

    Article  Google Scholar 

  126. Ross AS, Hughes MC, Doshi-Velez F (2017) Right for the right reasons: training differentiable models by constraining their explanations. [cs, stat]

  127. Sadiq R, Rodriguez MJ (2004) Disinfection by-products (DBPs) in drinking water and predictive models for their occurrence: a review. Sci Total Environ 321:21–46. https://doi.org/10.1016/j.scitotenv.2003.05.001

    CAS  Article  Google Scholar 

  128. Sahoo GB, Ray C, Wade HF (2005) Pesticide prediction in ground water in North Carolina domestic wells using artificial neural networks. Ecol Model 183:29–46. https://doi.org/10.1016/j.ecolmodel.2004.07.021

    CAS  Article  Google Scholar 

  129. Sanchez NP, Skeriotis AT, Miller CM (2013) Assessment of dissolved organic matter fluorescence PARAFAC components before and after coagulation–filtration in a full scale water treatment plant. Water Res 47:1679–1690. https://doi.org/10.1016/j.watres.2012.12.032

    CAS  Article  Google Scholar 

  130. Sharpless CM, Blough NV (2014) The importance of charge-transfer interactions in determining chromophoric dissolved organic matter (CDOM) optical and photochemical properties. Environ Sci Process Impacts 16:654–671. https://doi.org/10.1039/C3EM00573A

    CAS  Article  Google Scholar 

  131. Shutova Y, Baker A, Bridgeman J, Henderson RK (2014) Spectroscopic characterisation of dissolved organic matter changes in drinking water treatment: from PARAFAC analysis to online monitoring wavelengths. Water Res 54:159–169. https://doi.org/10.1016/j.watres.2014.01.053

    CAS  Article  Google Scholar 

  132. Singh KP, Gupta S (2012) Artificial intelligence based modeling for predicting the disinfection by-products in water. Chemom Intell Lab Syst 114:122–131. https://doi.org/10.1016/j.chemolab.2012.03.014

    CAS  Article  Google Scholar 

  133. Snee RD (1977) Validation of regression models: methods and examples. Null 19:415–428

    Google Scholar 

  134. Soyupak S, Kilic H, Karadirek IE, Muhammetoglu H (2011) On the usage of artificial neural networks in chlorine control applications for water distribution networks with high quality water. J Water Supply Res Technol AQUA 60:51–60. https://doi.org/10.2166/aqua.2011.086

    CAS  Article  Google Scholar 

  135. Stedmon CA, Seredyńska-Sobecka B, Boe-Hansen R et al (2011) A potential approach for monitoring drinking water quality from groundwater systems using organic matter fluorescence as an early warning for contamination events. Water Res 45:6030–6038. https://doi.org/10.1016/j.watres.2011.08.066

    CAS  Article  Google Scholar 

  136. Stidson RT, Gray CA, McPhail CD (2012) Development and use of modelling techniques for real-time bathing water quality predictions. Water Environ J 26:7–18. https://doi.org/10.1111/j.1747-6593.2011.00258.x

    Article  Google Scholar 

  137. Szegedy C, Zaremba W, Sutskever I et al. (2014) Intriguing properties of neural networks. [cs]

  138. Tesoriero AJ, Gronberg JA, Juckem PF et al (2017) Predicting redox-sensitive contaminant concentrations in groundwater using random forest classification. Water Resour Res 53:7316–7331. https://doi.org/10.1002/2016WR020197

    CAS  Article  Google Scholar 

  139. Thoe W, Gold M, Griesbach A et al (2014) Predicting water quality at Santa Monica Beach: evaluation of five different models for public notification of unsafe swimming conditions. Water Res 67:105–117. https://doi.org/10.1016/j.watres.2014.09.001

    CAS  Article  Google Scholar 

  140. Tinelli S, Juran I (2019) Artificial intelligence-based monitoring system of water quality parameters for early detection of non-specific bio-contamination in water distribution systems. Water Supply 19:1785–1792. https://doi.org/10.2166/ws.2019.057

    Article  Google Scholar 

  141. Tomperi J, Leiviskä K (2019) Utilizing variable selection methods in modelling potable water quality. Water Supply 19:1187–1194. https://doi.org/10.2166/ws.2018.173

    CAS  Article  Google Scholar 

  142. Trueman BF, MacIsaac SA, Stoddart AK, Gagnon GA (2016) Prediction of disinfection by-product formation in drinking water via fluorescence spectroscopy. Environ Sci Water Res Technol 2:383–389. https://doi.org/10.1039/C5EW00285K

    CAS  Article  Google Scholar 

  143. Tyralis H, Papacharalampous G, Langousis A (2019) A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water 11:910. https://doi.org/10.3390/w11050910

    Article  Google Scholar 

  144. Uusitalo L (2007) Advantages and challenges of Bayesian networks in environmental modelling. Ecol Model 203:312–318. https://doi.org/10.1016/j.ecolmodel.2006.11.033

    Article  Google Scholar 

  145. van der Aalst WMP, Rubin V, Verbeek HMW et al (2010) Process mining: a two-step approach to balance between underfitting and overfitting. Softw Syst Model 9:87–111. https://doi.org/10.1007/s10270-008-0106-z

    Article  Google Scholar 

  146. Wagner ED, Plewa MJ (2017) CHO cell cytotoxicity and genotoxicity analyses of disinfection by-products: an updated review. J Environ Sci 58:64–76. https://doi.org/10.1016/j.jes.2017.04.021

    CAS  Article  Google Scholar 

  147. Wan R, Cai S, Li H et al (2014) Inferring land use and land cover impact on stream water quality using a Bayesian hierarchical modeling approach in the Xitiaoxi River Watershed, China. J Environ Manage 133:1–11. https://doi.org/10.1016/j.jenvman.2013.11.035

    CAS  Article  Google Scholar 

  148. Wang AY-T, Murdock RJ, Kauwe SK et al (2020a) Machine learning for materials scientists: an introductory guide toward best practices. Chem Mater 32:4954–4965. https://doi.org/10.1021/acs.chemmater.0c01907

    CAS  Article  Google Scholar 

  149. Wang D (2016) Research on raw water quality assessment oriented to drinking water treatment based on the SVM model. Water Supply 16:746–755. https://doi.org/10.2166/ws.2015.186

    CAS  Article  Google Scholar 

  150. Wang D, Shen J, Zhu S, Jiang G (2020b) Model predictive control for chlorine dosing of drinking water treatment based on support vector machine model. DWT 173:133–141. https://doi.org/10.5004/dwt.2020.24144

    CAS  Article  Google Scholar 

  151. Wang P, Yao J, Wang G et al (2019) Exploring the application of artificial intelligence technology for identification of water pollution characteristics and tracing the source of water quality pollutants. Sci Total Environ 693:133440. https://doi.org/10.1016/j.scitotenv.2019.07.246

    CAS  Article  Google Scholar 

  152. Wang Y, Zhou J, Chen K et al. (2017) Water quality prediction method based on LSTM neural network. In: 2017 12th international conference on intelligent systems and knowledge engineering (ISKE). pp 1–5

  153. Wikle CK (2003) Hierarchical models in environmental science. Int Stat Rev 71:181–199. https://doi.org/10.1111/j.1751-5823.2003.tb00192.x

    Article  Google Scholar 

  154. Wu G-D, Lo S-L (2008) Predicting real-time coagulant dosage in water treatment by artificial neural networks and adaptive network-based fuzzy inference system. Eng Appl Artif Intell 21:1189–1195. https://doi.org/10.1016/j.engappai.2008.03.015

    Article  Google Scholar 

  155. Wu W, May R, Dandy GC, Maier HR (2012) A method for comparing data splitting approaches for developing hydrological ANN models. International Congress on Environmental Modelling and Software 394

  156. Yang YZ, Peleato NM, Legge RL, Andrews RC (2019) Fluorescence excitation emission matrices for rapid detection of polycyclic aromatic hydrocarbons and pesticides in surface waters. Environ Sci Water Res Technol 5:315–324. https://doi.org/10.1039/C8EW00821C

    CAS  Article  Google Scholar 

  157. Yu Q, Yin H, Wang K et al (2018) Adaptive detection method for organic contamination events in water distribution systems using the UV-Vis spectrum based on semi-supervised learning. Water 10:1566. https://doi.org/10.3390/w10111566

    Article  Google Scholar 

  158. Zhang S, Zhang C, Yang Q (2003) Data preparation for data mining. Appl Artif Intell 17:375–381. https://doi.org/10.1080/713827180

    Article  Google Scholar 

  159. Zhang Y, Ling C (2018) A strategy to apply machine learning to small datasets in materials science. Npj Comput Mater 4:1–8

    Article  Google Scholar 

  160. Zhang Z, Deng Z, Rusch KA (2015) Modeling fecal coliform bacteria levels at gulf coast beaches. Water Qual Expo Health 7:255–263. https://doi.org/10.1007/s12403-014-0145-3

    CAS  Article  Google Scholar 

  161. Zheng F, Maier HR, Wu W et al (2018) On lack of robustness in hydrological model development due to absence of guidelines for selecting calibration and evaluation data: demonstration for data-driven models. Water Resour Res 54:1013–1030. https://doi.org/10.1002/2017WR021470

    Article  Google Scholar 

  162. Zhou J, Wang Y, Xiao F et al (2018) Water quality prediction method based on IGRA and LSTM. Water 10:1148. https://doi.org/10.3390/w10091148

    CAS  Article  Google Scholar 

  163. Zou X-Y, Lin Y-L, Xu B et al (2019) A novel event detection model for water distribution systems based on data-driven estimation and support vector machine classification. Water Resour Manage 33:4569–4581. https://doi.org/10.1007/s11269-019-02317-5

    Article  Google Scholar 

Download references

Funding

Natural Sciences and Engineering Research Council (NSERC) Discovery Grant.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Nicolas M. Peleato.

Ethics declarations

Conflict of interest

None.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aliashrafi, A., Zhang, Y., Groenewegen, H. et al. A review of data-driven modelling in drinking water treatment. Rev Environ Sci Biotechnol 20, 985–1009 (2021). https://doi.org/10.1007/s11157-021-09592-y

Download citation

Keywords

  • Data-driven modelling
  • Drinking water
  • Water quality
  • Machine learning
  • Artificial intelligence