Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model

  • 44 Accesses

Abstract

Water quality monitoring is an important component of water resources management. In order to predict two water quality variables, namely dissolved oxygen (DO; mg/L) and chlorophyll-a (Chl-a; µg/L) in the Small Prespa Lake in Greece, two standalone deep learning (DL) models, the long short-term memory (LSTM) and convolutional neural network (CNN) models, along with their hybrid, the CNN–LSTM model, were developed. The main novelty of this study was to build a coupled CNN–LSTM model to predict water quality variables. Two traditional machine learning models, support-vector regression (SVR) and decision tree (DT), were also developed to compare with the DL models. Time series of the physicochemical water quality variables, specifically pH, oxidation–reduction potential (ORP; mV), water temperature (°C), electrical conductivity (EC; µS/cm), DO and Chl-a, were obtained using a sensor at 15-min intervals from June 1, 2012 to May 31, 2013 for model development. Lag times of up to one (t − 1) and two (t − 2) for input variables pH, ORP, water temperature, and EC were used to predict DO and Chl-a concentrations, respectively. Each model’s performance in both training and testing phases was assessed using statistical metrics including the correlation coefficient (r), root mean square error (RMSE), mean absolute error (MAE), their normalized equivalents (RRMSE, RMAE; %), percentage of bias (PBIAS), Nash–Sutcliffe coefficient (\(E_{NS}\)), Willmott’s Index, and graphical plots (Taylor diagram, box plot and spider diagram). Results showed that LSTM outperformed the CNN model for DO prediction, but the standalone DL models yielded similar performances for Chl-a prediction. Generally, the hybrid CNN–LSTM models outperformed the standalone models (LSTM, CNN, SVR and DT models) in predicting both DO and Chl-a. By integrating the LSTM and CNN models, the hybrid model successfully captured both the low and high levels of the water quality variables, particularly for the DO concentrations.

This is a preview of subscription content, log in to check access.

Fig. 1

Adapted from https://commons.wikimedia.org/wi

Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

References

  1. Affonso C, Rossi ALD, Vieira FHA, de Leon Ferreira ACP (2017) Deep learning for biological image classification. Expert Syst Appl 85:114–122

  2. Aghel B, Rezaei A, Mohadesi M (2019) Modeling and prediction of water quality parameters using a hybrid particle swarm optimization–neural fuzzy approach. Int J Environ Sci Technol 16(8):4823–4832

  3. Alizadeh MJ, Kavianpour MR (2015) Development of wavelet-ANN models to predict water quality parameters in Hilo Bay, Pacific Ocean. Mar Pollut Bull 98(1–2):171–178

  4. Asadollahfardi G, Zangooi H, Asadi M, Tayebi Jebeli M, Meshkat-Dini M, Roohani N (2018) Comparison of Box-Jenkins time series and ANN in predicting total dissolved solid at the Zāyandé-Rūd River, Iran. J Water Supply Res Technol 67(7):673–684

  5. Babaei M, Moeini R, Ehsanzadeh E (2019) Artificial neural network and support vector machine models for inflow prediction of dam reservoir (case study: Zayandehroud dam reservoir). Water Resour Manag 33(6):2203–2218

  6. Bacal MCJO, Hwang S, Guevarra-Segura I (2019) Predictive lithologic mapping of South Korea from geochemical data using decision trees. J Geochem Explor 205:106326. https://doi.org/10.1016/j.gexplo.2019.06.008

  7. Barzegar R, Adamowski J, Moghaddam AA (2016a) Application of wavelet-artificial intelligence hybrid models for water quality prediction: a case study in Aji-Chay River, Iran. Stoch Environ Res Risk Assess 30(7):1797–1819

  8. Barzegar R, Moghaddam AA, Tziritis E (2016b) Assessing the hydrogeochemistry and water quality of the Aji-Chay River, northwest of Iran. Environ Earth Sci 75(23):1486

  9. Barzegar R, Moghaddam AA, Adamowski J, Fijani E (2017) Comparison of machine learning models for predicting fluoride contamination in groundwater. Stoch Environ Res Risk Assess 31(10):2705–2718

  10. Barzegar R, Moghaddam AA, Adamowski J, Ozga-Zielinski B (2018) Multi-step water quality forecasting using a boosting ensemble multi-wavelet extreme learning machine model. Stoch Environ Res Risk Assess 32(3):799–813

  11. Barzegar R, Ghasri M, Qi Z, Quilty J, Adamowski J (2019) Using bootstrap ELM and LSSVM models to estimate river ice thickness in the Mackenzie River Basin in the Northwest Territories, Canada. J Hydrol 577:123903. https://doi.org/10.1016/j.jhydrol.2019.06.075

  12. Borovykh A, Bohte S, Oosterlee CW (2017) Conditional time series forecasting with convolutional neural networks. arXiv preprint arXiv:1703.04691

  13. Bui DT, Hoang ND, Alvarez FM, Ngo PTT, Hoa PV, Pham TD, Samui P, Costache R (2019) A novel deep learning neural network approach for predicting flash flood susceptibility: a case study at a high frequency tropical storm area. Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2019.134413

  14. Bui DT, Tsangaratos P, Nguyen VT, Liem NV, Trinh PT (2020) Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment. CATENA 188:104426. https://doi.org/10.1016/j.catena.2019.104426

  15. Cai M, Pipattanasomporn M, Rahman S (2019) Day-ahead building-level load forecasts using deep learning vs. traditional time-series techniques. Appl Energy 236:1078–1088

  16. Chen Q, Mynett AE (2003) Integration of data mining techniques and heuristic knowledge in fuzzy logic modelling of eutrophication in Taihu Lake. Ecol Model 162(1–2):55–67

  17. Chen J, Zeng GQ, Zhou W, Du W, Lu KD (2018) Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization. Energy Convers Manag 165:681–695

  18. Chen Y, Cheng Y, Yang L, Liu Y, Li D (2019) Prediction model of ammonia-nitrogen in pond aquaculture water based on improved multi-variable deep belief network. Nongye Gongcheng Xuebao/Trans Chin Soc Agric Eng 35(7):195–202. https://doi.org/10.11975/j.issn.1002-6819.2019.07.024

  19. Cho H, Choi UJ, Park H (2018) Deep learning application to time-series prediction of daily chlorophyll-a concentration. WIT Trans Ecol Environ 215:157–163

  20. Choi J, Kim J, Won J, Min O (2019) Modelling chlorophyll-a concentration using deep neural networks considering extreme data imbalance and skewness. In: 21st International conference on advanced communication technology (ICACT), PyeongChang Kwangwoon Do, Korea (South), pp 631–634. https://doi.org/10.23919/icact.2019.8702027

  21. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

  22. Cummins N, Baird A, Schuller BW (2018) Speech analysis for health: current state-of-the-art and the increasing impact of deep learning. Methods 151:41–54

  23. Fan L, Zhang T, Zhao X, Wang H, Zheng M (2019) Deep topology network: a framework based on feedback adjustment learning rate for image classification. Adv Eng Inf 42:100935

  24. Fang W, Zhong B, Zhao N, Love PE, Luo H, Xue J, Xu S (2019) A deep learning-based approach for mitigating falls from height with computer vision: convolutional neural network. Adv Eng Inf 39:170–177

  25. Fayek HM, Lech M, Cavedon L (2017) Evaluating deep learning architectures for Speech Emotion Recognition. Neural Netw 92:60–68

  26. Fijani E, Barzegar R, Deo R, Tziritis E, Konstantinos S (2019) Design and implementation of a hybrid model based on two-layer decomposition method coupled with extreme learning machines to support real-time environmental monitoring of water quality parameters. Sci Total Environ 648:839–853

  27. Ghorbani MA, Deo RC, Karimi V, Yaseen ZM, Terzi O (2018) Implementation of a hybrid MLP-FFA model for water level prediction of Lake Egirdir, Turkey. Stoch Environ Res Risk Assess 32(6):1683–1697

  28. Goz E, Yuceer M, Karadurmus E (2019) Total organic carbon prediction with artificial intelligence techniques. In: Munoz SG, Laird CD, Realff MJ (eds) Computer aided chemical engineering, vol 46. Elsevier, Amsterdam, pp 889–894

  29. Gu Y, Lu W, Qin L, Li M, Shao Z (2019) Short-term prediction of lane-level traffic speeds: a fusion deep learning model. Transp Res Part C Emerg Technol 106:1–16

  30. Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

  31. Hollis GE, Stevenson AC (1997) The physical basis of the Lake Mikri Prespa systems: geology, climate, hydrology and water quality. In: Crivelli AJ, Catsadorakis G (eds) Lake Prespa, Northwestern Greece. Springer, Dordrecht, pp 1–19

  32. Hoseinzade E, Haratizadeh S (2019) CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst Appl 129:273–285

  33. Huang M, Tian D, Liu H, Zhang C, Yi X, Cai J, Ruan J, Zhang T, Kong S, Ying G (2018) A hybrid fuzzy wavelet neural network model with self-adapted fuzzy-means clustering and genetic algorithm for water quality prediction in rivers. Complexity. https://doi.org/10.1155/2018/8241342

  34. Huang H, Liang Z, Li B, Wang D, Hu Y, Li Y (2019) Combination of multiple data-driven models for long-term monthly runoff predictions based on Bayesian model averaging. Water Resour Manag 33:3321–3338

  35. Jaloree S, Rajput A, Gour S (2014) Decision tree approach to build a model for water quality. Bin J Data Min Netw 4:25–28

  36. Khadr M (2017) Modeling of water quality parameters in Manzala lake using adaptive neuro-fuzzy inference system and stochastic models. In: Negm A, Bek M, Abdel-Fattah S (eds) Egyptian coastal lakes and wetlands: part II. The handbook of environmental chemistry, vol 72. Springer, Cham. https://doi.org/10.1007/698_2017_110

  37. Khosravi K, Mao L, Kisi O, Yaseen ZM, Shahid S (2018) Quantifying hourly suspended sediment load using data mining models: case study of a glacierized Andean catchment in Chile. J Hydrol 567:165–179

  38. Kim TY, Cho SB (2018) Predicting the household power consumption using CNN-LSTM hybrid networks. In: International conference on intelligent data engineering and automated learning. Springer, Cham, pp 481–490

  39. Kim TY, Cho SB (2019) Predicting residential energy consumption using CNN-LSTM neural networks. Energy 182:72–81

  40. Kisi O, Parmar KS (2016) Application of least square support vector machine and multivariate adaptive regression spline models in long term prediction of river water pollution. J Hydrol 534:104–112

  41. Kisi O, Azad A, Kashi H, Saeedian A, Hashemi SAA, Ghorbani S (2019) Modeling groundwater quality parameters using hybrid neuro-fuzzy methods. Water Resour Manag 33(2):847–861

  42. Kratzert F, Klotz D, Brenner C, Schulz K, Herrnegger M (2018) Rainfall–runoff modelling using long short-term memory (LSTM) networks. Hydrol Earth Syst Sci 22(11):6005–6022

  43. Krstić SS (2012) Environmental changes in lakes catchments as a trigger for rapid eutrophication: a Prespa Lake case study. In: Piacentini T, Miccadei E (eds) Studies on environmental and applied geomorphology. IntechOpen. https://doi.org/10.5772/27246

  44. Legates DR, McCabe GJ (1999) Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour Res 35(1):233–241

  45. Lei C, Deng J, Cao K, Xiao Y, Ma L, Wang W, Ma T, Shu C (2019) A comparison of random forest and support vector machine approaches to predict coal spontaneous combustion in gob. Fuel 239:297–311

  46. Li W, Yang M, Liang Z, Zhu Y, Mao W, Shi J, Chen Y (2013) Assessment for surface water quality in Lake Taihu Tiaoxi River Basin China based on support vector machine. Stoch Environ Res Risk Assess 27(8):1861–1870

  47. Li X, Peng L, Yao X, Cui S, Hu Y, You C, Chi T (2017) Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environ pollut 231:997–1004

  48. Li P, Abdel-Aty M, Yuan J (2020) Real-time crash risk prediction on arterials based on LSTM-CNN. Accid Anal Prev 135:105371. https://doi.org/10.1016/j.aap.2019.105371

  49. Liu H, Mi X, Li Y, Duan Z, Xu Y (2019a) Smart wind speed deep learning based multi-step forecasting model using singular spectrum analysis, convolutional gated recurrent unit network and support vector regression. Renew Energy 143:842–854

  50. Liu P, Wang J, Sangaiah AK, Xie Y, Yin X (2019b) Analysis and prediction of water quality using LSTM deep neural networks in IoT environment. Sustainability (Switzerland) 11(7):2058. https://doi.org/10.3390/su11072058

  51. Liu Y, Wang H, Gu Y, Lv X (2019c) Image classification toward lung cancer recognition by learning deep quality model. J Vis Commun Image Represent 63:102570. https://doi.org/10.1016/j.jvcir.2019.06.012

  52. Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50(3):885–900

  53. Najafzadeh M, Ghaemi A (2019) Prediction of the five-day biochemical oxygen demand and chemical oxygen demand in natural streams using machine learning methods. Environ Monit Assess 191(6):380

  54. Noori R, Karbassi A, Farokhnia A, Dehghani M (2009) Predicting the longitudinal dispersion coefficient using support vector machine and adaptive neuro-fuzzy inference system techniques. Environ Eng Sci 26(10):1503–1510

  55. Noori R, Karbassi AR, Moghaddamnia A, Han D, Zokaei-Ashtiani MH, Farokhnia A, Gousheh MG (2011) Assessment of input variables determination on the SVM model performance using PCA, Gamma test, and forward selection techniques for monthly stream flow prediction. J Hydrol 401(3–4):177–189

  56. Noori R, Safavi S, Shahrokni SAN (2013) A reduced-order adaptive neuro-fuzzy inference system model as a software sensor for rapid estimation of five-day biochemical oxygen demand. J Hydrol 495:175–185

  57. Noori R, Deng Z, Kiaghadi A, Kachoosangi FT (2015a) How reliable are ANN, ANFIS, and SVM techniques for predicting longitudinal dispersion coefficient in natural rivers? J Hydraul Eng 142(1):04015039. https://doi.org/10.1061/(ASCE)HY.1943-7900.0001062

  58. Noori R, Yeh HD, Abbasi M, Kachoosangi FT, Moazami S (2015b) Uncertainty analysis of support vector machine for online prediction of five-day biochemical oxygen demand. J Hydrol 527:833–843

  59. Oelen A, van Aart CJ, De Boer V (2018) Measuring surface water quality using a low-cost sensor kit within the context of rural Africa. In: P-ICT4D@ WebSci

  60. Panagiotopoulos K, Aufgebauer A, Schäbitz F, Wagner B (2013) Vegetation and climate history of the Lake Prespa region since the Lateglacial. Quat Int 293:157–169

  61. Patceva S, Mitic V (2010) Chlorophyll a content as indicator of eutrophication of Lake Prespa. BALWOIS 2010 — Ohrid, Republic of Macedonia — 25, 29 May 2010, pp 1–5

  62. Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J (2011) Scikit-learn: machine learning in Python. J Mach Learn Res 12:2825–2830

  63. Pereira GC, Evsukoff A, Ebecken NF (2009) Fuzzy modelling of chlorophyll production in a Brazilian upwelling system. Ecol Model 220(12):1506–1512

  64. Plappert M, Mandery C, Asfour T (2018) Learning a bidirectional mapping between human whole-body motion and natural language using deep recurrent neural networks. Rob Auton Syst 109:13–26

  65. Quinlan JR (1986) Induction of decision trees. Mach Learn 1(1):81–106

  66. Shin HC, Lu L, Summers RM (2017) Natural language processing for large-scale medical image analysis using deep learning. In: Zhou SK, Greenspan H, Shen D (eds) Deep learning for medical image analysis. Academic Press, Cambridge, pp 405–421

  67. Sinshaw TA, Surbeck CQ, Yasarer H, Najjar Y (2019) Artificial neural network for prediction of total nitrogen and phosphorus in US lakes. J Environ Eng 145(6):04019032

  68. Song YY, Ying LU (2015) Decision tree methods: applications for classification and prediction. Shanghai Arch Psychiatry 27(2):130

  69. Song X, Zhang G, Liu F, Li D, Zhao Y, Yang J (2016) Modeling spatio-temporal distribution of soil moisture by deep learning-based cellular automata model. J Arid Land 8(5):734–748

  70. Tao Y, Gao X, Hsu K, Sorooshian S, Ihler A (2016) A deep neural network modeling framework to reduce bias in satellite precipitation products. J Hydrometeorol 17(3):931–945

  71. Tziritis EP (2014) Environmental monitoring of Micro Prespa Lake basin (Western Macedonia, Greece): hydrogeochemical characteristics of water resources and quality trends. Environ Monit Assess 186(7):4553–4568

  72. Vapnik V (1995) The nature of statistical learning theory. Springer, Berlin

  73. Wang Y, Xu C, Zhang S, Yang L, Wang Z, Zhu Y, Yuan J (2019) Development and evaluation of a deep learning approach for modeling seasonality and trends in hand-foot-mouth disease incidence in mainland China. Sci Rep 9(1):1–15

  74. Willmott CJ (1981) On the validation of models. Phys Geogr 2:184–194

  75. Wu Y, Chen J (2013) Investigating the effects of point source and nonpoint source pollution on the water quality of the East River (Dongjiang) in South China. Ecol Indic 32:294–304

  76. Wu Q, Lin H (2019) Daily urban air quality index forecasting based on variational mode decomposition, sample entropy and LSTM neural network. Sustain Cities Soc 50:101657. https://doi.org/10.1016/j.scs.2019.101657

  77. Wu Y, Liu S (2012) Modeling of land use and reservoir effects on nonpoint source pollution in a highly agricultural basin. J Environ Monit 14(9):2350–2361

  78. Xu Z, Cao Y, Kang Y (2019) Deep spatiotemporal residual early-late fusion network for city region vehicle emission pollution prediction. Neurocomputing 355:183–199

  79. Yajima H, Derot J (2018) Application of the Random Forest model for chlorophyll-a forecasts in fresh and brackish water bodies in Japan, using multivariate long-term databases. J Hydroinform 20:206–220

  80. Yang HF, Chen YPP (2019) Hybrid deep learning and empirical mode decomposition model for time series applications. Expert Syst Appl 120:128–138

  81. Yaseen ZM, Deo RC, Hilal A, Abd AM, Bueno LC, Salcedo-Sanz S, Nehdi ML (2018) Predicting compressive strength of lightweight foamed concrete using extreme learning machine model. Adv Eng Softw 115:112–125

  82. Yi HS, Lee B, Park S, Kwak KC, An KG (2018) Prediction of short-term algal bloom using the M5P model-tree and extreme learning machine. Environ Eng Res 24(3):404–411

  83. Yu PS, Yang TC, Chen SY, Kuo CM, Tseng HW (2017) Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting. J Hydrol 552:92–104

  84. Yuan X, Chen C, Lei X, Yuan Y, Adnan RM (2018) Monthly runoff forecasting based on LSTM–ALO model. Stoch Environ Res Risk Assess 32(8):2199–2212

  85. Zhang D, Lindholm G, Ratnaweera H (2018) Use long short-term memory to enhance Internet of Things for combined sewer overflow monitoring. J Hydrol 556:409–418

  86. Zhang Y, Fitch P, Thorburn P, Vilas MDLP (2019) Applying multi-layer artificial neural network and mutual information to the prediction of trends in dissolved oxygen. Front Environ Sci 7:46

  87. Zhu S, Hadzima-Nyarko M, Gao A, Wang F, Wu J, Wu S (2019) Two hybrid data-driven models for modeling water-air temperature relationship in rivers. Environ Sci Pollut Res 26(12):12622–12630

  88. Zuo R, Xiong Y, Wang J, Carranza EJM (2019) Deep learning and its application in geochemical mapping. Earth Sci Rev 192:1–14. https://doi.org/10.1016/j.earscirev.2019.02.023

Download references

Acknowledgements

Data acquisition was performed in the context of  the “Vodafone World of Difference” program which is a charitable volunteer initiative delivered by Vodafone Foundations and funded by Vodafone Greece. The project was supported by the Society for the Protection of Prespa (SPP) as the  host organization in liaison with Prespa Municipality. The authors wish to express their acknowledgment to the above for their fruitful contribution and overall support. The authors are also thankful to Evangelos Tziritis for providing the data. Funding for this research was provided by an NSERC Discovery Grant held by Jan Adamowski.

Author information

Correspondence to Rahim Barzegar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Barzegar, R., Aalami, M.T. & Adamowski, J. Short-term water quality variable prediction using a hybrid CNN–LSTM deep learning model. Stoch Environ Res Risk Assess (2020). https://doi.org/10.1007/s00477-020-01776-2

Download citation

Keywords

  • Long short-term memory
  • Convolutional neural network
  • Water quality modeling
  • Deep learning