Skip to main content

Advertisement

Log in

Imputation of missing monthly rainfall data using machine learning and spatial interpolation approaches in Thale Sap Songkhla River Basin, Thailand

  • Resilient and Sustainable Water Management in Agriculture
  • Published:
Environmental Science and Pollution Research Aims and scope Submit manuscript

Abstract

Missing rainfall data has been a prevalent issue and primarily interested in hydrology and meteorology. This research aimed to examine the capability of machine learning (ML) and spatial interpolation (SI) methods to estimate missing monthly rainfall data. Six ML algorithms (i.e. multiple linear regression (MLR), M5 model tree (M5), random forest (RF), support vector regression (SVR), multilayer perceptron (MLP), genetic programming (GP)) and four SI methods (i.e. arithmetic average (AA), inverse distance weighting (IDW), correlation coefficient weighted (CCW), normal ratio (NR)) were investigated and compared in their performance. The twelve rainfall stations, located in the Thale Sap Songkhla river basin and nearby basins, were considered as a study case. Tuning hyper-parameters for each ML method was conducted to get the most suitable model for the data sets considered. Three performance criteria matrices (i.e. NSE, OI, and r) were chosen, and the sum of those three performance criteria matrices was introduced for methods’ performance comparison. The experimental results pointed out that selecting neighbouring stations were essential when applying SI methods, but not for the ML method. The overall performance showed ML better imputed missing monthly rainfall than SI due to overcoming spatial constraints. GP provided the highest performance by giving NSE = 0.825, OI = 0.877, and r = 0.909 for the training stage. Those values for the testing stage were 0.796, 0.852, and 0.902, respectively. It was followed by SVR-rbf, SVR-poly, and RF. NR provided the best performance among four SI methods, followed by CCW, AA, and IDW. When applying SI methods, it should contemplate a correlation between the target and neighbouring stations greater than 0.80.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author upon reasonable request.

Code availability

Not applicable.

References

  • Abreu S (2019) Automated architecture design for deep neural networks. arXiv preprint arXiv:1908.10714

  • Adhikary SK, Muttil N, Yilmaz AG (2016) Genetic programming-based ordinary kriging for spatial interpolation of rainfall. J Hydrol Eng 21:04015062

    Article  Google Scholar 

  • Ali S, Techato K, Taweenkun J, Gyawali S (2020) Assessment of land use suitability for natural rubber using GIS in the U-tapao River basin, Thailand. Kasetsart J Soc Sci 41:110–117–110–117

  • Armanuos AM, Al-Ansari N, Yaseen ZM (2020) Cross assessment of twenty-one different methods for missing precipitation data estimation. Atmosphere 11:389

    Article  Google Scholar 

  • Azman AH, Tukimat NNA, Malek M (2021) Comparison of missing rainfall data treatment analysis at Kenyir Lake. Page 012046 in IOP Conference Series: Materials Science and Engineering. IOP Publishing

  • Barrios A, Trincado G, Garreaud R (2018) Alternative approaches for estimating missing climate data: application to monthly precipitation records in South-Central Chile. For Ecosyst 5:1–10

    Article  Google Scholar 

  • Breiman L (2001) Random forests. Machine Learning 45:5–32

    Article  Google Scholar 

  • Campozano L, Tenelanda D, Sanchez E, Samaniego E, Feyen J (2016) Comparison of statistical downscaling methods for monthly total precipitation: case study for the Paute River Basin in Southern Ecuador. Adv Meteorol 2016:1–13

    Google Scholar 

  • Chen F-W, Liu C-W (2012) Estimation of the spatial rainfall distribution using inverse distance weighting (IDW) in the middle of Taiwan. Paddy Water Environ, 10:209–222

    Article  Google Scholar 

  • Das J, Nanduri UV (2018) Assessment and evaluation of potential climate change impact on monsoon flows using machine learning technique over Wainganga River basin, India. Hydrol Sci J 63:1020–1046

    Article  Google Scholar 

  • Dawson C, Wilby R (2001) Hydrological modelling using artificial neural networks. Prog Phys Geogr 25:80–108

    Article  Google Scholar 

  • Ditthakit P, Pinthong S, Salaeh N, Binnui F, Khwanchum L, Pham QB (2021) Using machine learning methods for supporting GR2M model in runoff estimation in an ungauged basin. Sci Rep 11:1–16

    Article  Google Scholar 

  • Eischeid JK, Bruce Baker C, Karl TR, Diaz HF (1995) The quality control of long-term climatological data using objective data analysis. J Appl Meteorol 34:2787–2795

    Article  Google Scholar 

  • Fung KF, Chew KS, Huang YF, Ahmed AN, Teo FY, Ng JL, Elshafie A (2022) Evaluation of spatial interpolation methods and spatiotemporal modeling of rainfall distribution in Peninsular Malaysia. Ain Shams Eng J 13:101571

    Article  Google Scholar 

  • Goodfellow I, Bengio Y, Courville A (2017) Deep learning (adaptive computation and machine learning series). Cambridge Massachusetts, p 429

  • Granata F, Di Nunno F (2021) Artificial Intelligence models for prediction of the tide level in Venice. Stoch Environ Res Risk Assess 35:2537–2548

  • Gunarathna M, Sakai K, Nakandakari T, Momii K, Kumari M (2019) Machine learning approaches to develop pedotransfer functions for tropical Sri Lankan soils. Water 11:1940

    Article  CAS  Google Scholar 

  • Gupta N, Yadav KK, Kumar V, Singh D (2013) Assessment of physicochemical properties of Yamuna River in Agra city. Int J ChemTech Res 5:528–531

    CAS  Google Scholar 

  • Ho TK (1995) Random decision forests. Pages 278–282 in Proceedings of 3rd international conference on document analysis and recognition. IEEE

  • Hussain D, Khan AA (2020) Machine learning techniques for monthly river flow forecasting of Hunza River, Pakistan. Earth Sci Inform 13

  • Hussein EA, Thron C, Ghaziasgar M, Bagula A, Vaccari M (2020) Groundwater prediction using machine-learning tools. Algorithms 13:300

    Article  Google Scholar 

  • Jagannath V (2020) Random Forest Template for TIBCO Spotfire®

  • Kar K, Thakur N, Sanghvi P (2019) Prediction of rainfall using fuzzy dataset. Int J Comput Sci Mob Comput 8:182–186

    Google Scholar 

  • Kleynhans T, Montanaro M, Gerace A, Kanan C (2017) Predicting top-of-atmosphere thermal radiance using merra-2 atmospheric data with deep learning. Remote Sensing 9:1133

    Article  Google Scholar 

  • Koza JR, Rice JP (1992) Automatic programming of robots using genetic programming. Pages 194–207 in AAAI. Citeseer

  • Legates DR, McCabe GJ Jr (1999) Evaluating the use of “goodness-of-fit” measures in hydrologic and hydroclimatic model validation. Water Resour Res 35:233–241

    Article  Google Scholar 

  • McClelland JL, Rumelhart DE, Group PR (1986) Parallel distributed processing. MIT press, Cambridge

  • McCulloch WS, Pitts W (1943) A logical calculus of the ideas immanent in nervous activity. Bull Math Biophys 5:115–133

    Article  Google Scholar 

  • Minsky M, Papert S (1969) An introduction to computational geometry. Cambridge tiass., HIT

  • Mitchell TM (1997) Does machine learning really work? AI Mag 18:11–11

    Google Scholar 

  • Moeletsi ME-ARC, Shabalala ZP-ARC, De Nysschen G-ARC, Moeletsi ME, Walker S (2016) Evaluation of an inverse distance weighting method for patching daily and dekadal rainfall over the Free State Province, South Africa. Water SA 42:466–474

    Article  Google Scholar 

  • Moriasi DN, Arnold JG, Van Liew MW, Bingner RL, Harmel RD, Veith TL (2007) Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans ASABE 50:885–900

    Article  Google Scholar 

  • Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I—A discussion of principles. J Hydrol 10:282–290

    Article  Google Scholar 

  • Nourani V, Komasi M, Alami MT (2012) Hybrid wavelet–genetic programming approach to optimize ANN modeling of rainfall–runoff process. J Hydrol Eng 17:724–741

    Article  Google Scholar 

  • Paulhus JL, Kohler MA (1952) Interpolation of missing precipitation records. Mon Weather Rev 80:129–133

    Article  Google Scholar 

  • Quinlan JR (1992) Learning with continuous classes. Pages 343–348 in 5th Australian joint conference on artificial intelligence. World Scientific

  • Radi NFA, Zakaria R, Azman MA-z (2015) Estimation of missing rainfall data using spatial interpolation and imputation methods. Pages 42–48 in AIP conference proceedings. American Institute of Physics

  • Ridwan WM, Sapitang M, Aziz A, Kushiar KF, Ahmed AN, El-Shafie A (2021) Rainfall forecasting model using machine learning methods: case study Terengganu, Malaysia. Ain Shams Eng J 12:1651–1663

    Article  Google Scholar 

  • Rosenblatt, F. 1961. Principles of neurodynamics. perceptrons and the theory of brain mechanisms. Cornell Aeronautical Lab Inc., Buffalo

  • Rumelhart DE, Hinton GE, Williams RJ (1985) Learning internal representations by error propagation. California Univ San Diego La Jolla Inst for Cognitive Science

  • Sachindra D, Ahmed K, Rashid MM, Shahid S, Perera B (2018) Statistical downscaling of precipitation using machine learning techniques. Atmos Res 212:240–258

    Article  Google Scholar 

  • Sami BHZ, Sami BFZ, Fai CM, Essam Y, Ahmed AN, El-Shafie A (2021) Investigating the reliability of machine learning algorithms as a sustainable tool for total suspended solid prediction. Ain Shams Eng J 12:1607–1622

    Article  Google Scholar 

  • Santhi C, Arnold JG, Williams JR, Dugas WA, Srinivasan R, Hauck LM (2001) Validation of the swat model on a large rwer basin with point and nonpoint sources 1. JAWRA J Am Water Resour Assoc 37:1169–1188

    Article  CAS  Google Scholar 

  • Sattari M-T, Rezazadeh-Joudi A, Kusiak A (2017) Assessment of different methods for estimation of missing data in precipitation studies. Hydrol Res 48:1032–1044

    Article  Google Scholar 

  • Sattari MT, Falsafian K, Irvem A, Qasem SN (2020) Potential of kernel and tree-based machine-learning models for estimating missing data of rainfall. Eng Appl Comput Fluid Mech 14:1078–1094

    Google Scholar 

  • Seal HL (1967) Studies in the History of Probability and Statistics. XV The historical development of the Gauss linear model. Biometrika 54:1–24

    CAS  Google Scholar 

  • Shaharudin SM, Andayani S, Kismiantini NB, Kurniawan A, Basri MAA, Zainuddin NH (2020) Imputation methods for addressing missing data of monthly rainfall in Yogyakarta, Indonesia. Int J 9

  • Singh VP (1994) Elementary Hydrology. Prentice-hall Of India Pvt Ltd., Delhi, India

    Google Scholar 

  • Sivapragasam C, Muttil N, Jeselia MC, Visweshwaran S (2015) Infilling of rainfall information using genetic programming. Aquatic Procedia 4:1016–1022

    Article  Google Scholar 

  • Suhaila J, Sayang MD, Jemain AA (2008) Revised spatial weighting methods for estimation of missing rainfall data. Asia-Pac J Atmos Sci 44:93–104

    Google Scholar 

  • Tabucanon AS, Kurisu K, Hanaki K (2021) Assessment and mitigation of tangible flood damages driven by climate change in a tropical city: Hat Yai Municipality, Southern Thailand. Sci Total Environ 789:147983

    Article  CAS  Google Scholar 

  • Te CV, Maidment DR, Mays LW (1988) Applied hydrology. Water Resources Handbook

  • Teegavarapu RS, Chandramouli V (2005) Improved weighting methods, deterministic and stochastic data-driven models for estimation of missing precipitation records. J Hydrol 312:191–206

    Article  Google Scholar 

  • Tongal H, Booij MJ (2018) Simulation and forecasting of streamflows using machine learning models coupled with base flow separation. J Hydrol 564:266–282

    Article  Google Scholar 

  • Van Liew MW, Veith TL, Bosch DD, Arnold JG (2007) Suitability of SWAT for the conservation effects assessment project: Comparison on USDA agricultural research service watersheds. J Hydrol Eng 12:173–189

    Article  Google Scholar 

  • Vapnik V, Golowich SE, Smola A (1997) Support vector method for function approximation, regression estimation, and signal processing. Adv Neural Inf Proces Syst 281–287

  • Weerts HJ, Mueller AC, Vanschoren J (2020) Importance of tuning hyperparameters of machine learning algorithms. arXiv preprint arXiv:2007.07588

  • Wei TC (1973) Reciprocal Distance Squared Method, A computer technique for estimating areal precipitation. US Department of Agriculture, Agricultural Research Service, North Central

  • Wuthiwongyothin S, Kalkan C, Panyavaraporn J (2021) Evaluating inverse distance weighting and correlation coefficient weighting infilling methods on daily rainfall time series. SNRU J Sci Technol 13:71–79

    Google Scholar 

  • Xia Y, Fabian P, Stohl A, Winterhalter M (1999) Forest climatology: estimation of missing values for Bavaria, Germany. Agric for Meteorol 96:131–144

    Article  Google Scholar 

  • Xu R, Chen N, Chen Y, Chen Z (2020) Downscaling and projection of multi-CMIP5 precipitation using machine learning methods in the upper Han River basin. Adv Meteorol 2020:1–17

    Article  Google Scholar 

  • Yadav KK, Gupta N, Kumar V, Arya S, Singh D (2012) Physico-chemical analysis of selected ground water samples of Agra city, India. Recent Res Sci Technol 4:51–54

    CAS  Google Scholar 

  • Yadav KK, Kumar V, Gupta N, Choudhary P, Khan SA (2018) GIS based evaluation of groundwater geochemistry and statistical determination fate of contaminants in shallow aquifers from different functional areas of Agra city, India: levels and spatial distributions. RSC Adv 8:15876–15889

    Article  CAS  Google Scholar 

  • Yadav KK, Kumar V, Gupta N, Rezania S, Singh N (2019) Human health risk assessment: Study of a population exposed to fluoride through groundwater of Agra city, India. Regul Toxicol Pharmacol 106:68–80

    Article  CAS  Google Scholar 

  • Yang L, Shami A (2020) On hyperparameter optimization of machine learning algorithms: Theory and practice. Neurocomputing 415:295–316

    Article  Google Scholar 

  • Yi Xun T, Ng JL, Huang YF (2020) Estimation of missing daily rainfall during monsoon seasons for tropical region: a comparison between ann and conventional methods. Carpath J Earth Environ Sci 15:103–112

    Article  Google Scholar 

  • Young KC (1992) A three-way model for interpolating for monthly precipitation values. Mon Weather Rev 120:2561–2569

    Article  Google Scholar 

Download references

Acknowledgements

The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University, Abha, Kingdom of Saudi Arabia, for funding this work through Large Groups RGP.2/43/43. The authors would like to express gratitude to College of Graduate Studies, Walailak University, for providing Walailak University Master’s Degree Excellence Scholarships under Contract No. ME04/2021.

Funding

This work was supported by Deanship of Scientific Research at King Khalid University, Abha, Kingdom of Saudi Arabia, through Large Groups RGP.2/43/43 and Walailak University Master’s Degree Excellence Scholarships under Contract No. ME04/2021. The author (Mohd. Abul Hasan) has received research support from King Khalid University, Abha, Kingdom of Saudi Arabia, and Pakorn Ditthakit has received research support from College of Graduate Studies, Walailak University, Thailand.

Author information

Authors and Affiliations

Authors

Contributions

Pakorn Ditthakit contributed to the conceptualization, methodology, and supervision. Material preparation, data collection, and analysis were performed by Sirimon Pinthong and Nureehan Salaeh. The first draft of the manuscript was written by Sirimon Pinthong, Nureehan Salaeh, and Krishna Kumar Yadav. Nguyen Thi Thuy Linh and Saiful Islam reviewed and edited the previous version of the manuscript. Mohd Abul Hasan and Cao Truong Son proofread the text and helped in structuring the publication. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Pakorn Ditthakit.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Responsible Editor: Marcus Schulz

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pinthong, S., Ditthakit, P., Salaeh, N. et al. Imputation of missing monthly rainfall data using machine learning and spatial interpolation approaches in Thale Sap Songkhla River Basin, Thailand. Environ Sci Pollut Res (2022). https://doi.org/10.1007/s11356-022-23022-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s11356-022-23022-8

Keywords

Navigation