Abstract
Missing rainfall data has been a prevalent issue and primarily interested in hydrology and meteorology. This research aimed to examine the capability of machine learning (ML) and spatial interpolation (SI) methods to estimate missing monthly rainfall data. Six ML algorithms (i.e. multiple linear regression (MLR), M5 model tree (M5), random forest (RF), support vector regression (SVR), multilayer perceptron (MLP), genetic programming (GP)) and four SI methods (i.e. arithmetic average (AA), inverse distance weighting (IDW), correlation coefficient weighted (CCW), normal ratio (NR)) were investigated and compared in their performance. The twelve rainfall stations, located in the Thale Sap Songkhla river basin and nearby basins, were considered as a study case. Tuning hyper-parameters for each ML method was conducted to get the most suitable model for the data sets considered. Three performance criteria matrices (i.e. NSE, OI, and r) were chosen, and the sum of those three performance criteria matrices was introduced for methods’ performance comparison. The experimental results pointed out that selecting neighbouring stations were essential when applying SI methods, but not for the ML method. The overall performance showed ML better imputed missing monthly rainfall than SI due to overcoming spatial constraints. GP provided the highest performance by giving NSE = 0.825, OI = 0.877, and r = 0.909 for the training stage. Those values for the testing stage were 0.796, 0.852, and 0.902, respectively. It was followed by SVR-rbf, SVR-poly, and RF. NR provided the best performance among four SI methods, followed by CCW, AA, and IDW. When applying SI methods, it should contemplate a correlation between the target and neighbouring stations greater than 0.80.
Similar content being viewed by others
Data availability
The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.
Code availability
Not applicable.
References
Abreu S (2019) Automated architecture design for deep neural networks. arXiv preprint arXiv:1908.10714
Adhikary SK, Muttil N, Yilmaz AG (2016) Genetic programming-based ordinary kriging for spatial interpolation of rainfall. J Hydrol Eng 21:04015062
Ali S, Techato K, Taweenkun J, Gyawali S (2020) Assessment of land use suitability for natural rubber using GIS in the U-tapao River basin, Thailand. Kasetsart J Soc Sci 41:110–117–110–117
Armanuos AM, Al-Ansari N, Yaseen ZM (2020) Cross assessment of twenty-one different methods for missing precipitation data estimation. Atmosphere 11:389
Azman AH, Tukimat NNA, Malek M (2021) Comparison of missing rainfall data treatment analysis at Kenyir Lake. Page 012046 in IOP Conference Series: Materials Science and Engineering. IOP Publishing
Barrios A, Trincado G, Garreaud R (2018) Alternative approaches for estimating missing climate data: application to monthly precipitation records in South-Central Chile. For Ecosyst 5:1–10
Breiman L (2001) Random forests. Machine Learning 45:5–32
Campozano L, Tenelanda D, Sanchez E, Samaniego E, Feyen J (2016) Comparison of statistical downscaling methods for monthly total precipitation: case study for the Paute River Basin in Southern Ecuador. Adv Meteorol 2016:1–13
Chen F-W, Liu C-W (2012) Estimation of the spatial rainfall distribution using inverse distance weighting (IDW) in the middle of Taiwan. Paddy Water Environ, 10:209–222
Das J, Nanduri UV (2018) Assessment and evaluation of potential climate change impact on monsoon flows using machine learning technique over Wainganga River basin, India. Hydrol Sci J 63:1020–1046
Dawson C, Wilby R (2001) Hydrological modelling using artificial neural networks. Prog Phys Geogr 25:80–108
Ditthakit P, Pinthong S, Salaeh N, Binnui F, Khwanchum L, Pham QB (2021) Using machine learning methods for supporting GR2M model in runoff estimation in an ungauged basin. Sci Rep 11:1–16
Eischeid JK, Bruce Baker C, Karl TR, Diaz HF (1995) The quality control of long-term climatological data using objective data analysis. J Appl Meteorol 34:2787–2795
Fung KF, Chew KS, Huang YF, Ahmed AN, Teo FY, Ng JL, Elshafie A (2022) Evaluation of spatial interpolation methods and spatiotemporal modeling of rainfall distribution in Peninsular Malaysia. Ain Shams Eng J 13:101571
Goodfellow I, Bengio Y, Courville A (2017) Deep learning (adaptive computation and machine learning series). Cambridge Massachusetts, p 429
Granata F, Di Nunno F (2021) Artificial Intelligence models for prediction of the tide level in Venice. Stoch Environ Res Risk Assess 35:2537–2548
Gunarathna M, Sakai K, Nakandakari T, Momii K, Kumari M (2019) Machine learning approaches to develop pedotransfer functions for tropical Sri Lankan soils. Water 11:1940
Gupta N, Yadav KK, Kumar V, Singh D (2013) Assessment of physicochemical properties of Yamuna River in Agra city. Int J ChemTech Res 5:528–531
Ho TK (1995) Random decision forests. Pages 278–282 in Proceedings of 3rd international conference on document analysis and recognition. IEEE
Hussain D, Khan AA (2020) Machine learning techniques for monthly river flow forecasting of Hunza River, Pakistan. Earth Sci Inform 13
Hussein EA, Thron C, Ghaziasgar M, Bagula A, Vaccari M (2020) Groundwater prediction using machine-learning tools. Algorithms 13:300
Jagannath V (2020) Random Forest Template for TIBCO Spotfire®
Kar K, Thakur N, Sanghvi P (2019) Prediction of rainfall using fuzzy dataset. Int J Comput Sci Mob Comput 8:182–186
Kleynhans T, Montanaro M, Gerace A, Kanan C (2017) Predicting top-of-atmosphere thermal radiance using merra-2 atmospheric data with deep learning. Remote Sensing 9:1133
Koza JR, Rice JP (1992) Automatic programming of robots using genetic programming. Pages 194–207 in AAAI. Citeseer
Legates DR, McCabe GJ Jr (1999) Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour Res 35:233–241
McClelland JL, Rumelhart DE, Group PR (1986) Parallel distributed processing. MIT press, Cambridge
McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133
Minsky M, Papert S (1969) An introduction to computational geometry. Cambridge tiass., HIT
Mitchell TM (1997) Does machine learning really work? AI Mag 18:11–11
Moeletsi ME-ARC, Shabalala ZP-ARC, De Nysschen G-ARC, Moeletsi ME, Walker S (2016) Evaluation of an inverse distance weighting method for patching daily and dekadal rainfall over the Free State Province, South Africa. Water SA 42:466–474
Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50:885–900
Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I—A discussion of principles. J Hydrol 10:282–290
Nourani V, Komasi M, Alami MT (2012) Hybrid wavelet–genetic programming approach to optimize ANN modeling of rainfall–runoff process. J Hydrol Eng 17:724–741
Paulhus JL, Kohler MA (1952) Interpolation of missing precipitation records. Mon Weather Rev 80:129–133
Quinlan JR (1992) Learning with continuous classes. Pages 343–348 in 5th Australian joint conference on artificial intelligence. World Scientific
Radi NFA, Zakaria R, Azman MA-z (2015) Estimation of missing rainfall data using spatial interpolation and imputation methods. Pages 42–48 in AIP conference proceedings. American Institute of Physics
Ridwan WM, Sapitang M, Aziz A, Kushiar KF, Ahmed AN, El-Shafie A (2021) Rainfall forecasting model using machine learning methods: case study Terengganu, Malaysia. Ain Shams Eng J 12:1651–1663
Rosenblatt, F. 1961. Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Cornell Aeronautical Lab Inc., Buffalo
Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science
Sachindra D, Ahmed K, Rashid MM, Shahid S, Perera B (2018) Statistical downscaling of precipitation using machine learning techniques. Atmos Res 212:240–258
Sami BHZ, Sami BFZ, Fai CM, Essam Y, Ahmed AN, El-Shafie A (2021) Investigating the reliability of machine learning algorithms as a sustainable tool for total suspended solid prediction. Ain Shams Eng J 12:1607–1622
Santhi C, Arnold JG, Williams JR, Dugas WA, Srinivasan R, Hauck LM (2001) Validation of the swat model on a large rwer basin with point and nonpoint sources 1. JAWRA J Am Water Resour Assoc 37:1169–1188
Sattari M-T, Rezazadeh-Joudi A, Kusiak A (2017) Assessment of different methods for estimation of missing data in precipitation studies. Hydrol Res 48:1032–1044
Sattari MT, Falsafian K, Irvem A, Qasem SN (2020) Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall. Eng Appl Comput Fluid Mech 14:1078–1094
Seal HL (1967) Studies in the History of Probability and Statistics. XV The historical development of the Gauss linear model. Biometrika 54:1–24
Shaharudin SM, Andayani S, Kismiantini NB, Kurniawan A, Basri MAA, Zainuddin NH (2020) Imputation methods for addressing missing data of monthly rainfall in Yogyakarta, Indonesia. Int J 9
Singh VP (1994) Elementary Hydrology. Prentice-hall Of India Pvt Ltd., Delhi, India
Sivapragasam C, Muttil N, Jeselia MC, Visweshwaran S (2015) Infilling of rainfall information using genetic programming. Aquatic Procedia 4:1016–1022
Suhaila J, Sayang MD, Jemain AA (2008) Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac J Atmos Sci 44:93–104
Tabucanon AS, Kurisu K, Hanaki K (2021) Assessment and mitigation of tangible flood damages driven by climate change in a tropical city: Hat Yai Municipality, Southern Thailand. Sci Total Environ 789:147983
Te CV, Maidment DR, Mays LW (1988) Applied hydrology. Water Resources Handbook
Teegavarapu RS, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. J Hydrol 312:191–206
Tongal H, Booij MJ (2018) Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J Hydrol 564:266–282
Van Liew MW, Veith TL, Bosch DD, Arnold JG (2007) Suitability of SWAT for the conservation effects assessment project: Comparison on USDA agricultural research service watersheds. J Hydrol Eng 12:173–189
Vapnik V, Golowich SE, Smola A (1997) Support vector method for function approximation, regression estimation, and signal processing. Adv Neural Inf Proces Syst 281–287
Weerts HJ, Mueller AC, Vanschoren J (2020) Importance of tuning hyperparameters of machine learning algorithms. arXiv preprint arXiv:2007.07588
Wei TC (1973) Reciprocal Distance Squared Method, A computer technique for estimating areal precipitation. US Department of Agriculture, Agricultural Research Service, North Central
Wuthiwongyothin S, Kalkan C, Panyavaraporn J (2021) Evaluating inverse distance weighting and correlation coefficient weighting infilling methods on daily rainfall time series. SNRU J Sci Technol 13:71–79
Xia Y, Fabian P, Stohl A, Winterhalter M (1999) Forest climatology: estimation of missing values for Bavaria, Germany. Agric for Meteorol 96:131–144
Xu R, Chen N, Chen Y, Chen Z (2020) Downscaling and projection of multi-CMIP5 precipitation using machine learning methods in the upper Han River basin. Adv Meteorol 2020:1–17
Yadav KK, Gupta N, Kumar V, Arya S, Singh D (2012) Physico-chemical analysis of selected ground water samples of Agra city, India. Recent Res Sci Technol 4:51–54
Yadav KK, Kumar V, Gupta N, Choudhary P, Khan SA (2018) GIS based evaluation of groundwater geochemistry and statistical determination fate of contaminants in shallow aquifers from different functional areas of Agra city, India: levels and spatial distributions. RSC Adv 8:15876–15889
Yadav KK, Kumar V, Gupta N, Rezania S, Singh N (2019) Human health risk assessment: Study of a population exposed to fluoride through groundwater of Agra city, India. Regul Toxicol Pharmacol 106:68–80
Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 415:295–316
Yi Xun T, Ng JL, Huang YF (2020) Estimation of missing daily rainfall during monsoon seasons for tropical region: a comparison between ann and conventional methods. Carpath J Earth Environ Sci 15:103–112
Young KC (1992) A three-way model for interpolating for monthly precipitation values. Mon Weather Rev 120:2561–2569
Acknowledgements
The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University, Abha, Kingdom of Saudi Arabia, for funding this work through Large Groups RGP.2/43/43. The authors would like to express gratitude to College of Graduate Studies, Walailak University, for providing Walailak University Master’s Degree Excellence Scholarships under Contract No. ME04/2021.
Funding
This work was supported by Deanship of Scientific Research at King Khalid University, Abha, Kingdom of Saudi Arabia, through Large Groups RGP.2/43/43 and Walailak University Master’s Degree Excellence Scholarships under Contract No. ME04/2021. The author (Mohd. Abul Hasan) has received research support from King Khalid University, Abha, Kingdom of Saudi Arabia, and Pakorn Ditthakit has received research support from College of Graduate Studies, Walailak University, Thailand.
Author information
Authors and Affiliations
Contributions
Pakorn Ditthakit contributed to the conceptualization, methodology, and supervision. Material preparation, data collection, and analysis were performed by Sirimon Pinthong and Nureehan Salaeh. The first draft of the manuscript was written by Sirimon Pinthong, Nureehan Salaeh, and Krishna Kumar Yadav. Nguyen Thi Thuy Linh and Saiful Islam reviewed and edited the previous version of the manuscript. Mohd Abul Hasan and Cao Truong Son proofread the text and helped in structuring the publication. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Responsible Editor: Marcus Schulz
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Pinthong, S., Ditthakit, P., Salaeh, N. et al. Imputation of missing monthly rainfall data using machine learning and spatial interpolation approaches in Thale Sap Songkhla River Basin, Thailand. Environ Sci Pollut Res (2022). https://doi.org/10.1007/s11356-022-23022-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11356-022-23022-8