Abstract
Length, completeness, and quality of hydrological time-series can affect considerably the efficiency of decisions in water resources engineering. Regrettably, obtaining short, incomplete, and low-quality data is not rare. In this study, different machine learning techniques have been implemented and applied to fill in missed data of streamflow at Coxs River, in Australia. The implemented techniques are Support Vector Regression improved by Equilibrium Optimizer (SVR-EO) and Particle Swarm Optimizer (SVR-PSO), alongside Artificial Neural Networks trained by EO (ANN-EO) and PSO (ANN-PSO). Multivariate Adaptive Regression Splines (MARS) and Multiple Linear Regression (MLR) have been used for comparison purposes. Rainfall data provided by five climatic stations located near Coxs River along with Kowmung River streamflow records have been used to fill the gaps in the Coxs River time-series. The gamma test has been used to select the convenient data combination that reduces errors in prediction models. According to the findings, SVR-PSO and SVR-EO outperformed the other techniques with \(R^{2}\approx 0.94\) for training, and \(R^{2}\approx 0.85\) for testing part. The imputation process and the developed SVR-EO and SVR-PSO could be applied to other rivers in different countries to ensure whether these methods could be generalized.
Similar content being viewed by others
Data Availability
Not applicable.
Code Availability
The code is available on https://github.com/SaadDAHMANI.
References
Baddoo TD, Li Z, Odai SN et al (2021) Comparison of missing data infilling mechanisms for recovering a real-world single station streamflow observation. Int J Environ Res Public Health 18(16):8375. https://doi.org/10.3390/ijerph18168375
Cao W, Liu X, Ni J (2020) Parameter optimization of support vector regression using henry gas solubility optimization algorithm. IEEE Access 8:88633–88642. https://doi.org/10.1109/ACCESS.2020.2993267
Chang CC, Lin CJ (2011) Libsvm: A library for support vector machines. ACM transactions on intelligent systems and technology (TIST) 2(3):1–27. https://doi.org/10.1145/1961189.1961199
Ehteram M, Ghotbi S, Kisi O et al (2019) Investigation on the potential to integrate different artificial intelligence models with metaheuristic algorithms for improving river suspended sediment predictions. Appl Sci 9(19):4149. https://doi.org/10.3390/app9194149
Evans D, Jones AJ (2002) A proof of the gamma test. Proc R Soc London Ser A Math Phys Eng Sci 458(2027):2759–2799. https://doi.org/10.1098/rspa.2002.1010
Faramarzi A, Heidarinejad M, Stephens B et al (2020) Equilibrium optimizer: A novel optimization algorithm. Knowl-Based Syst 191(105):190. https://doi.org/10.1016/j.knosys.2019.105190
Farzandi M, Sanaeinejad H, Rezaei-Pazhan H et al (2022) Improving estimation of missing data in historical monthly precipitation by evolutionary methods in the semi-arid area. Environ Dev Sustain 24(6):8313–8332. https://doi.org/10.1007/s10668-021-01784-4
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67. https://doi.org/10.1214/aos/1176347963
Hamzah FB, Mohd Hamzah F, Mohd Razali SF et al (2020) Imputation methods for recovering streamflow observation: A methodological review. Cogent Environ Sci 6(1):1745133. https://doi.org/10.1080/23311843.2020.1745133
Houssein EH, Dirar M, Abualigah L et al (2022) An efficient equilibrium optimizer with support vector regression for stock market prediction. Neural Comput Appl 34(4):3165–3200. https://doi.org/10.1007/s00521-021-06580-9
Kemp S, Wilson I, Ware J (2004) A tutorial on the gamma test. Int J Simul Syst Sci Technol 6(1–2):67–75
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, IEEE 1942–1948. https://doi.org/10.1109/ICNN.1995.488968
Khampuengson T, Wang W (2023) Novel methods for imputing missing values in water level monitoring data. Water Resour Manage 37(2):851–878. https://doi.org/10.1007/s11269-022-03408-6
Latif SD, Ahmed AN (2023) Ensuring a generalizable machine learning model for forecasting reservoir inflow in Kurdistan Region of Iraq and Australia. Environ Dev Sustain 1–32. https://doi.org/10.1007/s10668-023-03885-8
Neris J, Santin C, Lew R et al (2021) Designing tools to predict and mitigate impacts on water quality following the australian 2019/2020 wildfires: Insights from sydney’s largest water supply catchment. Integr Environ Assess Manag 17(6):1151–1161. https://doi.org/10.1002/ieam.4406
Niedzielski T, Halicki M (2023) Improving linear interpolation of missing hydrological data by applying integrated autoregressive models. Water Resour Manag 1–18. https://doi.org/10.1007/s11269-023-03625-7
Samui P, Yesilyurt SN, Dalkilic HY et al (2022) Comparison of different optimized machine learning algorithms for daily river flow forecasting. Earth Sci Inf 1–16. https://doi.org/10.1007/s12145-022-00896-3
Saplıoglu K, Kucukerdem T et al (2018) Estimation of missing streamflow data using anfis models and determination of the number of datasets for anfis: The case of yeşrmak river. Appl Ecol Environ Res 16(3):3583–3594. https://doi.org/10.15666/aeer/1603_35833594
Sattari MT, Falsafian K, Irvem A et al (2020) Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall. Eng Appl Comput Fluid Mech 14(1):1078–1094. https://doi.org/10.1080/19942060.2020.1803971
Shiau JT, Hsu HT (2016) Suitability of ann-based daily streamflow extension models: a case study of gaoping river basin, taiwan. Water Resour Manage 30:1499–1513. https://doi.org/10.1007/s11269-016-1235-8
Stefánsson A, Končar N, Jones AJ (1997) A note on the gamma test. Neural Comput Applic 5(3):131–133. https://doi.org/10.1007/BF01413858
Sudheer C, Maheswaran R, Panigrahi BK et al (2014) A hybrid svm-pso model for forecasting monthly streamflow. Neural Comput Appl 24(6):1381–1389. https://doi.org/10.1007/s00521-013-1341-y
Tencaliec P, Favre AC, Prieur C et al (2015) Reconstruction of missing daily streamflow data using dynamic regression models. Water Resour Res 51(12):9447–9463. https://doi.org/10.1002/2015WR017399
Vapnik V (1995) The nature of statistical learning theory. Springer
Yilmaz MU, Bihrat Ö (2019) Evaluation of statistical methods for estimating missing daily streamflow data. Teknik Dergi 30(6):9597–9620. https://doi.org/10.18400/tekderg.421091
Acknowledgements
The authors would like to thank the Australian Bureau of Meteorology (http://www.bom.gov.au) and WaterNSW for providing the data online.
Funding
Not applicable.
Author information
Authors and Affiliations
Contributions
Saad Dahmani wrote the original draft, Methodology, and Analysis. Sarmad Dashti Latif wrote review-editing, and Data curation. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics Approval
Not applicable.
Consent to Participate
Not applicable.
Consent for Publication
Not applicable.
Conflict of Interest
The authors declare no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Dahmani, S., Latif, S.D. Streamflow Data Infilling Using Machine Learning Techniques with Gamma Test. Water Resour Manage 38, 701–716 (2024). https://doi.org/10.1007/s11269-023-03694-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11269-023-03694-8