Skip to main content
Log in

A hybrid approach combining the multi-dimensional time series k-means algorithm and long short-term memory networks to predict the monthly water demand according to the uncertainty in the dataset

  • RESEARCH
  • Published:
Earth Science Informatics Aims and scope Submit manuscript

Abstract

An authentic water consumption forecast is an auxiliary tool to support the management of the water supply and demand in urban areas. Providing a highly accurate forecasting model depends a lot on the quality of the input data. Despite the advancement of technology, water consumption in some places is still recorded by operators, so its database usually has some approximate and incomplete data. For this reason, the methods used to predict the water demand should be able to handle the drawbacks caused by the uncertainty in the dataset. In this regard, a structured hybrid approach was designed to cluster the customers and predict their water demand according to the uncertainty in the dataset. First, a fuzzy-based algorithm consisting of Forward-Filling, Backward-Filling, and Mean methods was innovatively proposed to impute the missing data. Then, a multi-dimensional time series k-means clustering technique was developed to group the consumers based on their consumption behavior, for which the missing data were estimated with fuzzy numbers. Finally, one forecasting model inspired by Long Short-Term Memory (LSTM) networks was adjusted for each cluster to predict the monthly water demand using the lagged demand and the temperature. This approach was implemented on the water time series of the residential consumers in Yazd, Iran, from January 2011 to November 2020. Based on the performance evaluation in terms of the Root Mean Squared Error (RMSE), the proposed approach had an acceptable level of confidence to predict the water demand of all the clusters.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Data availability

Not applicable.

References

  • Aghabozorgi S, Ying Wah T, Herawan T, Jalab HA, Shaygan MA, Jalali A (2014) A hybrid algorithm for clustering of time series data based on affinity search technique. Sci World J 2014

  • Altunkaynak A, Nigussie TA (2017) Monthly water consumption prediction using season algorithm and wavelet transform–based models. J Water Resour Plan Manag 143(6):04017011

    Article  Google Scholar 

  • Amiri M, Jensen R (2016) Missing data imputation using fuzzy-rough methods. Neurocomputing 205:152–164

    Article  Google Scholar 

  • Antunes A, Andrade-Campos A, Sardinha-Lourenço A, Oliveira M (2018) Short-term water demand forecasting using machine learning techniques. J Hydroinf 20(6):1343–1366

    Article  Google Scholar 

  • Aristiawati K, Siswantining T, Sarwinda D, Soemartojo SM (2019). Missing values imputation based on fuzzy C-Means algorithm for classification of chronic obstructive pulmonary disease (COPD). In AIP Conference Proceedings (Vol. 2192, No. 1, p. 060003). AIP Publishing LLC

  • Bata Mt, Carriveau R, Ting DS-K (2020). Short-term water demand forecasting using hybrid supervised and unsupervised machine learning model. Smart Water, 5, 1-18

  • Bokde N, Beck MW, Álvarez FM, Kulat K (2018) A novel imputation methodology for time series based on pattern sequence forecasting. Pattern Recogn Lett 116:88–96

    Article  Google Scholar 

  • Candelieri A (2017) Clustering and support vector regression for water demand forecasting and anomaly detection. Water 9(3):224

    Article  Google Scholar 

  • Candelieri A, Giordani I, Archetti F, Barkalov K, Meyerov I, Polovinkin A, Zolotykh N (2019) Tuning hyperparameters of a SVM-based water demand forecasting system through parallel global optimization. Comput Oper Res 106:202–209

    Article  Google Scholar 

  • Chen SM (1994) Fuzzy system reliability analysis using fuzzy number arithmetic operations. Fuzzy Sets Syst 64(1):31–38

    Article  Google Scholar 

  • Du B, Zhou Q, Guo J, Guo S, Wang L (2021) Deep learning with long short-term memory neural networks combining wavelet transform and principal component analysis for daily urban water demand forecasting. Expert Syst Appl 171:114571

    Article  Google Scholar 

  • El-Bakry M, Ali F, El-Kilany A, Mazen S (2021) Fuzzy based techniques for handling missing values. Int J Adv Comput Sci Appl 12(3)

  • García Valverde D, Quevedo Casín JJ, Puig Cayuela V, Saludes Closa J (2015) Water demand estimation and outlier detection from smart meter data using classification and Big Data methods. In 2nd New Developments in IT & Water Conference, 8–10 Rotterdam (Holland) (pp. 1–8)

  • Gil A, Quartulli M, Olaizola IG, Sierra B (2020) Learning Optimal Time Series Combination and Pre-Processing by Smart Joins. Appl Sci 10(18):6346

    Article  Google Scholar 

  • Giordano D, Mellia M, Cerquitelli T (2021) K-mdtsc: K-multi-dimensional time-series clustering algorithm. Electronics 10(10):1166

    Article  Google Scholar 

  • Hammond M, Chen AS, Batica J, Butler D, Djordjević S, Gourbesville P et al (2018) A new flood risk assessment framework for evaluating the effectiveness of policies to improve urban flood resilience. Urban Water Journal 15(5):427–436

    Article  Google Scholar 

  • Herrera M, Torgo L, Izquierdo J, Pérez-García R (2010) Predictive models for forecasting hourly urban water demand. J Hydrol 387(1–2):141–150

    Article  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  • Hu P, Tong J, Wang J, Yang Y, de Oliveira Turci L (2019) A hybrid model based on CNN and Bi-LSTM for urban water demand prediction. In 2019 IEEE Congress on evolutionary computation (CEC) (pp. 1088-1094). IEEE

  • Jun S, Jung D, Lansey KE (2021) Comparison of imputation methods for end-user demands in water distribution systems. J Water Resour Plan Manag 147(12):04021080

    Article  Google Scholar 

  • Kavitha V, Punithavalli M (2010). Clustering time series data stream-a literature survey. arXiv preprint arXiv:1005.4270

  • Kim SH, Yang HJ, Ng KS (2011) Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data. J Zhejiang Univ Sci 12(8):687–697

    Article  Google Scholar 

  • Kühnert C, Gonuguntla NM, Krieg H, Nowak D, Thomas JA (2021) Application of LSTM networks for water demand prediction in optimal pump control. Water 13(5):644

    Article  Google Scholar 

  • Kumaran SR, Othman MS, Yusuf LM, Yunianta A (2019) Estimation of Missing Values Using Hybrid Fuzzy Clustering Mean and Majority Vote for Microarray Data. Procedia Comput Sci 163:145–153

    Article  Google Scholar 

  • Luengo J, Sáez JA, Herrera F (2012) Missing data imputation for fuzzy rule-based classification systems. Soft Comput 16(5):863–881

    Article  Google Scholar 

  • Madani K (2014) Water management in Iran: what is causing the looming crisis? J Environ Stud Sci 4(4):315–328

    Article  Google Scholar 

  • Mousavi-Mirkalaei P, Roozbahani A, Banihabib ME, Randhir TO (2022) Forecasting urban water consumption using bayesian networks and gene expression programming. Earth Sci Inf 15(1):623–633

    Article  Google Scholar 

  • Nejadrekabi M, Eslamian S, Zareian MJ (2022) Spatial statistics techniques for SPEI and NDVI drought indices: a case study of Khuzestan Province. Int J Environ Sci Technol 19(7):6573–6594

    Article  Google Scholar 

  • Niknam A, Zare HK, Hosseininasab H, Mostafaeipour A, Herrera M (2022) A Critical Review of Short-Term Water Demand Forecasting Tools—What Method Should I Use? Sustainability 14(9):5412

    Article  Google Scholar 

  • Pesantez JE, Berglund EZ, Kaza N (2020) Smart meters data for modeling and forecasting water demand at the user-level. Environ Model Softw 125:104633

    Article  Google Scholar 

  • Qi C, Chang NB (2011) System dynamics modeling for municipal water demand estimation in an urban region under uncertain economic impacts. J Environ Manage 92(6):1628–1641

    Article  Google Scholar 

  • Razavi-Far R, Saif M (2016) Imputation of missing data using fuzzy neighborhood density-based clustering. In 2016 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (pp. 1834–1841). IEEE

  • Rezaali M, Quilty J, Karimi A (2021) Probabilistic urban water demand forecasting using wavelet-based machine learning models. J Hydrol 600:126358

    Article  Google Scholar 

  • Saemian P, Tourian MJ, AghaKouchak A, Madani K, Sneeuw N (2022) How much water did Iran lose over the last two decades? J. Hydrol Reg Stud 41:101095

    Article  Google Scholar 

  • Tang F, Ishwaran H (2017) Random forest missing data algorithms. Stat Anal Data Min ASA Data Sci J 10(6):363–377

    Article  Google Scholar 

  • Torres JF, Martínez-Álvarez F, Troncoso A (2022) A deep LSTM network for the Spanish electricity consumption forecasting. Neural Comput Appl 34(13):10533–10545

    Article  Google Scholar 

  • Vijai P, Sivakumar PB (2018) Performance comparison of techniques for water demand forecasting. Procedia Comput Sci 143:258–266

    Article  Google Scholar 

  • Vysala A, Gomes D (2020). Evaluating and validating cluster results. arXiv preprint arXiv:2007.08034

  • Wang X, Xu Y (2019) An improved index for clustering validation based on Silhouette index and Calinski-Harabasz index. In IOP Conference Series: Materials Science and Engineering (Vol. 569, No. 5, p. 052024). IOP Publishing

  • Zadeh LA, Klir GJ, Yuan B (1996) Fuzzy sets, fuzzy logic, and fuzzy systems: selected papers (Vol. 6). World scientific

  • Zaidi AZ, Rasmani KA (2016) Classification of excessive domestic water consumption using Fuzzy Clustering Method. In Journal of Physics: Conference Series (Vol. 738, No. 1, p. 012081). IOP Publishing

  • Zanfei A, Brentan BM, Menapace A, Righetti M, Herrera M (2022a) Graph convolutional recurrent neural networks for water demand forecasting. Water Resour Res 58(7):e2022WRO32299

    Article  Google Scholar 

  • Zanfei A, Menapace A, Brentan BM, Righetti M (2022b) How does missing data imputation affect the forecasting of urban water demand? J Water Resour Plan Manag 148(11):04022060

    Article  Google Scholar 

  • Zubaidi SL, Ortega-Martorell S, Al-Bugharbee H, Olier I, Hashim KS, Gharghan SK, Al-Khaddar R (2020) Urban water demand prediction for a city that suffers from climate change and population growth: Gauteng province case study. Water, 12(7), 1885

  • Yu Y, Si X, Hu C, Zhang J (2019) A review of recurrent neural networks: LSTM cells and network architectures. Neural Comput 31(7):1235–1270

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization, A.N. and H.K.Z.;.; data collection, A.N.; methodology, A.N.; software, A.N.; validation, A.N., H.K.Z., H.H., and A.M. and; formal analysis, A.N. and H.K.Z.; investigation, A.N., H.K.Z.; resources, A.N., H.K.Z., H.H. and A.M; writing original draft preparation, A.N.; writing review and editing, A.N., H.K.Z., H.H., and A.M.; visualization, A.N. and H.K.Z.; supervision, H.K.Z. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Hasan Khademi Zare.

Ethics declarations

Ethical approval

The authors assure that this article has not been published in any other journal and that no plagiarism has occurred.

Consent to participate

The authors assure that this article has not been published in any other journal and that no plagiarism has occurred.

Conflict of interests

Regarding the content of this article, the authors have no competing interests to declare.

Additional information

Communicated by: H. Babaie

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Niknam, A., Zare, H.K., Hosseininasab, H. et al. A hybrid approach combining the multi-dimensional time series k-means algorithm and long short-term memory networks to predict the monthly water demand according to the uncertainty in the dataset. Earth Sci Inform 16, 1519–1536 (2023). https://doi.org/10.1007/s12145-023-00976-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12145-023-00976-y

Keywords

Navigation