Abstract
The present research examined the potential of two important feature selection methods, Bayesian Networks (BN) and Recursive Feature Elimination (RFE), in identifying the optimum predictors for forecasting rainfall in the Godavari Basin in India. Initially, a set of ‘probable hydro-climatological variables’ is chosen based on previous studies. Following a correlation analysis between these probable predictors and monthly rainfall, the most correlated contiguous zones were chosen as ‘potential predictors' which were subsequently used as inputs to the two feature selection algorithms, BN and RFE for selecting the ‘optimum predictors’. The optimum predictions were further utilised to develop seven state-of-the-art Machine Learning (ML) models, including Support Vector Regression (SVR), Gaussian Process Regression (GPR), Multivariate Adaptive Regression Splines (MARS), Random Forest (RF), Parallel Multi-Population Genetic Programming (PMPGP (5 demes (Case 1), 9 demes (Case 2), and 17 demes (Case 3)). The result of the correlation analysis revealed that the domain for Relative Humidity, Total Precipitable Water should be considered over the study area, and for U-Wind, V-Wind and Surface Pressure, the zones around Indian Ocean, Arabian Sea and Persian Gulf should be considered respectively. BN, as a feature selection technique for choosing the optimum predictors, was found to be more effective than RFE. In terms of prediction models, GPR and PMPGP models outperformed others, both when used alone and in conjunction with feature selection methods. The R2 values for GPR models vary from 0.82–0.41, whereas the same varies from 0.81–0.31 for PMPGP models.
Similar content being viewed by others
Availability of Data and Materials
The Copernicus Climate Change Service (C3S) Climate Data Store (CDS) provided the ERA 5 Reanalysis datasets. The Indian Meteorological Department (IMD), Pune, provided the gridded rainfall data. All codes that support the findings of this study are available from the corresponding author upon reasonable request.
References
Adamowski J, Chan HF, Prasher SO, Sharda VN (2011) Comparison of multivariate adaptive regression splines with coupled wavelet transform artificial neural networks for runoff forecasting in Himalayan micro-watersheds with limited data. J Hydroinf 14:731–744. https://doi.org/10.2166/hydro.2011.044
Adnan RM, Liang Z, Heddam S et al (2020) Least square support vector machine and multivariate adaptive regression splines for streamflow prediction in mountainous basin using hydro-meteorological data as inputs. J Hydrol 586:124371. https://doi.org/10.1016/j.jhydrol.2019.124371
Ali M, Prasad R, Xiang Y, Yaseen ZM (2020) Complete ensemble empirical mode decomposition hybridized with random forest and kernel ridge regression model for monthly rainfall forecasts. J Hydrol 584:124647. https://doi.org/10.1016/j.jhydrol.2020.124647
Arshad M, Ma X, Yin J et al (2021) Performance evaluation of ERA-5, JRA-55, MERRA-2, and CFS-2 reanalysis datasets, over diverse climate regions of Pakistan. Weather Clim Extrem 33:100373. https://doi.org/10.1016/j.wace.2021.100373
Bandhauer M, Isotta F, Lakatos M et al (2021) Evaluation of daily precipitation analyses in E-OBS (v19.0e) and ERA5 by comparison to regional high-resolution datasets in European regions. Int J Climatol:1–21. https://doi.org/10.1002/joc.7269
Bolón-Canedo V, Sánchez-Maroño N, Alonso-Betanzos A (2015) Feature selection for high-dimensional data. Springer International Publishing
Bourdin DR, Fleming SW, Stull RB (2012) Streamflow Modelling: A Primer on Applications, Approaches and Challenges. Atmosphere-Ocean 50:507–536. https://doi.org/10.1080/07055900.2012.734276
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Chanda K, Maity R (2016) Uncovering global climate fields causing local precipitation extremes. Hydrol Sci J 61:1227–1237. https://doi.org/10.1080/02626667.2015.1006232
Chen C-J, Georgakakos AP (2014) Hydro-climatic forecasting using sea surface temperatures: methodology and application for the southeast US. Clim Dyn 42:2955–2982. https://doi.org/10.1007/s00382-013-1908-4
Chen Q, Meng Z, Liu X et al (2018) Decision variants for the automatic determination of optimal feature subset in RF-RFE. Genes (Basel) 9. https://doi.org/10.3390/genes9060301
Choubin B, Khalighi-Sigaroodi S, Malekian A, Kişi Ö (2016) Multiple linear regression, multi-layer perceptron network and adaptive neuro-fuzzy inference system for forecasting precipitation based on large-scale climate signals. Hydrol Sci J 61:1001–1009. https://doi.org/10.1080/02626667.2014.966721
Constantinou AC (2021) The importance of temporal information in Bayesian network structure learning. Expert Syst Appl 164:113814. https://doi.org/10.1016/j.eswa.2020.113814
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20:273–297. https://doi.org/10.1007/BF00994018
Das J, Umamahesh NV (2016) Downscaling monsoon rainfall over river Godavari basin under different climate-change scenarios. Water Resour Manag 30:5575–5587. https://doi.org/10.1007/s11269-016-1549-6
Das P, Chanda K (2020) Bayesian Network based modeling of regional rainfall from multiple local meteorological drivers. J Hydrol 591:125563. https://doi.org/10.1016/j.jhydrol.2020.125563
Das P, Chanda K (2022) Feature Selection for Rainfall Prediction and Drought Assessment Using Bayesian Network Technique BT - Climate Change and Water Security. In: Kolathayar S, Mondal A, Chian SC (eds). Springer Singapore, Singapore, pp 117–129
Das P, Naganna SR, Deka PC, Pushparaj J (2020) Hybrid wavelet packet machine learning approaches for drought modeling. Environ Earth Sci 79:1–18. https://doi.org/10.1007/s12665-020-08971-y
Das S, Sangode SJ, Kandekar AM (2021) Recent decline in streamflow and sediment discharge in the Godavari basin, India (1965–2015). Catena 206:105537. https://doi.org/10.1016/j.catena.2021.105537
Diez-Sierra J, del Jesus M (2020) Long-term rainfall prediction using atmospheric synoptic patterns in semi-arid climates with statistical and machine learning methods. J Hydrol 586:124789. https://doi.org/10.1016/j.jhydrol.2020.124789
Dutta R, Maity R (2020) Identification of potential causal variables for statistical downscaling models: Effectiveness of graphical modeling approach. Theor Appl Climatol 142:1255–1269. https://doi.org/10.1007/s00704-020-03372-4
Dutta R, Maity R (2021) Time-varying network-based approach for capturing hydrological extremes under climate change with application on drought. J Hydrol 603:126958. https://doi.org/10.1016/j.jhydrol.2021.126958
Dutta R, Maity R, Patel P (2022) Short and medium range forecast of soil moisture for the different climatic regions of India using temporal networks. Water Resour Manag 36:235–251. https://doi.org/10.1007/s11269-021-03025-9
Fathipour-Azar H (2021) Machine learning-assisted distinct element model calibration: ANFIS, SVM, GPR, and MARS approaches. Acta Geotech 9. https://doi.org/10.1007/s11440-021-01303-9
Felipe VPS, Silva MA, Valente BD, Rosa GJM (2014) Using multiple regression, Bayesian networks and artificial neural networks for prediction of total egg production in European quails based on earlier expressed phenotypes. Poult Sci 94:772–780. https://doi.org/10.3382/ps/pev031
Fernández F, Tomassini M, Vanneschi L (2003) An empirical study of multipopulation genetic programming. Genet Program Evolvable Mach 4:21–51. https://doi.org/10.1023/A:1021873026259
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19:1–67
Ghasemi P, Karbasi M, Nouri AZ et al (2021) Application of Gaussian Process Regression to forecast multi-step ahead SPEI drought index. Alex Eng J 60:5375–5392. https://doi.org/10.1016/j.aej.2021.04.022
Granitto PM, Furlanello C, Biasioli F, Gasperi F (2006) Recursive feature elimination with random forest for PTR-MS analysis of agroindustrial products. Chemom Intell Lab Syst 83:83–90. https://doi.org/10.1016/j.chemolab.2006.01.007
Heddam S, Ptak M, Zhu S (2020) Modelling of daily lake surface water temperature from air temperature: Extremely randomized trees (ERT) versus Air2Water, MARS, M5Tree, RF and MLPNN. J Hydrol 588:125130. https://doi.org/10.1016/j.jhydrol.2020.125130
Hernández E, Sanchez-Anguix V, Julian V et al (2016) Rainfall prediction: a deep learning approach. In: Martínez-Álvarez F, Troncoso A, Quintián HCE (Eds) Hybrid Artificial Intelligent Systems. HAIS 2016, Lecture Notes in Computer Science. Springer, Cham, pp 250–260
Hersbach H, Bell B, Berrisford P, Biavati G, Horányi A, Muñoz Sabater J, Nicolas J, Peubey C, Radu R, Rozum I, Schepers D, Simmons A, Soci C, Dee D, Thépaut J-N (2019) ERA5 monthly averaged data on pressure levels from 1979 to present. Copernicus Climate Change Service (C3S) Climate Data Store (CDS). https://doi.org/10.24381/cds.6860a573. Accessed Apr 2021
Huang M, Lin R, Huang S, Xing T (2017) A novel approach for precipitation forecast via improved K-nearest neighbor algorithm. Adv Eng Inform 33:89–95. https://doi.org/10.1016/j.aei.2017.05.003
Kannan S, Ghosh S (2011) Prediction of daily rainfall state in a river basin using statistical downscaling from GCM output. Stoch Environ Res Risk Assess 25:457–474. https://doi.org/10.1007/s00477-010-0415-y
Kashid SS, Maity R (2012) Prediction of monthly rainfall on homogeneous monsoon regions of India based on large scale circulation patterns using Genetic Programming. J Hydrol 454–455:26–41. https://doi.org/10.1016/j.jhydrol.2012.05.033
Khan N, Sachindra DA, Shahid S et al (2020) Prediction of droughts over Pakistan using machine learning algorithms. Adv Water Resour 139:103562. https://doi.org/10.1016/j.advwatres.2020.103562
Khan N, Shahid S, Juneng L et al (2019) Prediction of heat waves in Pakistan using quantile regression forests. Atmos Res 221:1–11. https://doi.org/10.1016/j.atmosres.2019.01.024
Kumar KS, Rathnam EV, Sridhar V (2021) Tracking seasonal and monthly drought with GRACE-based terrestrial water storage assessments over major river basins in South India. Sci Total Environ 763:142994. https://doi.org/10.1016/j.scitotenv.2020.142994
Kuhn M (2008) Building predictive models in R using the caret Package. J Stat Softw 28:1–26. https://doi.org/10.18637/jss.v028.i05
Kumar R, Singh MP, Roy B, Shahid AH (2021) A Comparative assessment of metaheuristic optimized extreme learning machine and deep neural network in multi-step-ahead long-term rainfall prediction for All-Indian Regions. Water Resour Manag 35:1927–1960. https://doi.org/10.1007/s11269-021-02822-6
Legates DR, McCabe GJ Jr (1999) Evaluating the use of “goodness-of-fit” Measures in hydrologic and hydroclimatic model validation. Water Resour Res 35:233–241. https://doi.org/10.1029/1998WR900018
Leu SS, Bui QN (2016) Leak prediction model for water distribution networks created using a Bayesian Network learning approach. Water Resour Manag 30:2719–2733. https://doi.org/10.1007/s11269-016-1316-8
Liaw A, Wiener M (2002) Classification and regression by random Forest. R News 2:18–22
Liu L, Gu H, Xie J, Xu Y-P (2021) How well do the ERA-Interim, ERA-5, GLDAS-2.1 and NCEP-R2 reanalysis datasets represent daily air temperature over the Tibetan Plateau? Int J Climatol 41:1484–1505. https://doi.org/10.1002/joc.6867
Mahto SS, Mishra V (2019) Does ERA-5 outperform other reanalysis products for hydrologic applications in India? J Geophys Res Atmos 124:9423–9441. https://doi.org/10.1029/2019JD031155
Malakar P, Kesarkar AP, Bhate JN et al (2020) Comparison of reanalysis data sets to comprehend the evolution of tropical cyclones over North Indian Ocean. Earth Space Sci 7:e2019EA000978. https://doi.org/10.1029/2019EA000978
Meyer D, Wien FT (2015) Support vector machines. The Interface to libsvm in package e1071:28–20
Najafi MR, Moradkhani H, Wherry SA (2011) Statistical downscaling of precipitation using machine learning with optimal predictor selection. J Hydrol Eng 16:650–664. https://doi.org/10.1061/(asce)he.1943-5584.0000355
Nash JE, Sutcliffe JV (1970) River flow forecasting through conceptual models part I — A discussion of principles. J Hydrol 10:282–290. https://doi.org/10.1016/0022-1694(70)90255-6
Noorbeh P, Roozbahani A, KardanMoghaddam H (2020) Annual and monthly dam inflow prediction using Bayesian Networks. Water Resour Manag 34:2933–2951. https://doi.org/10.1007/s11269-020-02591-8
Ortiz-García EG, Salcedo-Sanz S, Casanova-Mateo C (2014) Accurate precipitation prediction with support vector classifiers: A study including novel predictive variables and observational data. Atmos Res 139:128–136. https://doi.org/10.1016/j.atmosres.2014.01.012
Pagano A, Giordano R, Portoghese I (2022) A pipe ranking method for water distribution network resilience assessment based on graph - theory metrics aggregated through Bayesian belief networks. Water Resour Manag. https://doi.org/10.1007/s11269-022-03293-z
Pai DS, Sridhar L, Rajeevan M et al (2014) Development of a new high spatial resolution (0.25° × 0.25°) long period (1901–2010) daily gridded rainfall data set over India and its comparison with existing data sets over the region. Mausam 65:1–18
Pan Y, Zeng X, Xu H et al (2021) Evaluation of Gaussian process regression kernel functions for improving groundwater prediction. J Hydrol 603:126960. https://doi.org/10.1016/j.jhydrol.2021.126960
Panda KC, Singh RM, Thakural LN, Sahoo DP (2022) Representative grid location-multivariate adaptive regression spline (RGL-MARS) algorithm for downscaling dry and wet season rainfall. J Hydrol 605:127381. https://doi.org/10.1016/j.jhydrol.2021.127381
Pérez-Alarcón A, Fernández-Alvarez DGJC (2022) Improving monthly rainfall forecast in a watershed by combining neural networks and autoregressive models. Environ Process 9. https://doi.org/10.1007/s40710-022-00602-x
Pham BT, Le LM, Le TT et al (2020) Development of advanced artificial intelligence models for daily rainfall prediction. Atmos Res 237:104845. https://doi.org/10.1016/j.atmosres.2020.104845
Pour SH, Wahab AKA, Shahid S (2020) Physical-empirical models for prediction of seasonal rainfall extremes of Peninsular Malaysia. Atmos Res 233. https://doi.org/10.1016/j.atmosres.2019.104720
Raghavendra S, Deka PC (2014) Support vector machine applications in the field of hydrology: A review. Appl Soft Comput J 19:372–386. https://doi.org/10.1016/j.asoc.2014.02.002
Ramadas M, Maity R, Ojha R, Govindaraju RS (2015) Predictor selection for streamflows using a graphical modeling approach. Stoch Environ Res Risk Assess 29:1583–1599. https://doi.org/10.1007/s00477-014-0977-1
Rezaie-balf M, Naganna SR, Ghaemi A, Deka PC (2017) Wavelet coupled MARS and M5 Model Tree approaches for groundwater level forecasting. J Hydrol 553:356–373. https://doi.org/10.1016/j.jhydrol.2017.08.006
Roushangar K, Chamani M, Ghasempour R et al (2021) A comparative study of wavelet and empirical mode decomposition-based GPR models for river discharge relationship modeling at consecutive hydrometric stations. Water Supply 21:3080–3098. https://doi.org/10.2166/ws.2021.073
Sachindra DA, Ahmed K, Rashid MM et al (2018a) Statistical downscaling of precipitation using machine learning techniques. Atmos Res 212:240–258. https://doi.org/10.1016/j.atmosres.2018.05.022
Sachindra DA, Kanae S (2019) Machine learning for downscaling: the use of parallel multiple populations in genetic programming. Springer, Berlin Heidelberg
Sachindra DA, Ahmed K, Shahid S, Perera BJC (2018b) Cautionary note on the use of genetic programming in statistical downscaling. Int J Climatol 38:3449–3465. https://doi.org/10.1002/joc.5508
Safari MJS (2020) Hybridization of multivariate adaptive regression splines and random forest models with an empirical equation for sediment deposition prediction in open channel flow. J Hydrol 590:125392. https://doi.org/10.1016/j.jhydrol.2020.125392
Saha SK, Pokhrel S, Salunke K et al (2016) Potential predictability of Indian summer monsoon rainfall in NCEP CFSv2. J Adv Model Earth Syst 8:96–120. https://doi.org/10.1002/2015MS000542
Schoppa L, Disse M, Bachmair S (2020) Evaluating the performance of random forest for large-scale flood discharge simulation. J Hydrol 590:125531. https://doi.org/10.1016/j.jhydrol.2020.125531
Schulz E, Speekenbrink M, Krause A (2018) A tutorial on Gaussian process regression: Modelling, exploring, and exploiting functions. J Math Psychol 85:1–16. https://doi.org/10.1016/j.jmp.2018.03.001
Scutari M (2017) Bayesian network constraint-based structure learning algorithms: Parallel and optimized implementations in the bnlearn R package. J Stat Softw 77. https://doi.org/10.18637/jss.v077.i02
Scutari M (2010) Learning Bayesian networks with the bnlearn R Package. J Stat Softw 35:1–22. https://doi.org/10.18637/jss.v035.i03
Scutari M, Elisabeth C, Manuel J (2019) Who learns better Bayesian network structures: Accuracy and speed of structure learning algorithms. Int J Approx Reason 115:235–253. https://doi.org/10.1016/j.ijar.2019.10.003
Senanayake IP, Yeo IY, Walker JP, Willgoose GR (2021) Estimating catchment scale soil moisture at a high spatial resolution: Integrating remote sensing and machine learning. Sci Total Environ 776:145924. https://doi.org/10.1016/j.scitotenv.2021.145924
Sharifzadeh M, Sikinioti-Lock A, Shah N (2019) Machine-learning methods for integrated renewable power generation: a comparative study of artificial neural networks, support vector regression, and Gaussian Process Regression. Renew Sustain Energy Rev 108:513–538. https://doi.org/10.1016/j.rser.2019.03.040
Shenify M, Danesh AS, Gocić M et al (2015) Precipitation estimation using support vector machine with discrete wavelet transform. Water Resour Manag 30:641–652. https://doi.org/10.1007/s11269-015-1182-9
Singh AK, Tripathi JN, Singh KK et al (2019) Comparison of different satellite-derived rainfall products with IMD gridded data over Indian meteorological subdivisions during Indian Summer Monsoon (ISM) 2016 at weekly temporal resolution. J Hydrol 575:1371–1379. https://doi.org/10.1016/j.jhydrol.2019.02.016
Snieder E, Shakir R, Khan UT (2020) A comprehensive comparison of four input variable selection methods for artificial neural network flow forecasting models. J Hydrol 583:124299. https://doi.org/10.1016/j.jhydrol.2019.124299
Subrahmanyam KV, Ramsenthil C, Girach Imran A et al (2021) Prediction of heavy rainfall days over a peninsular Indian station using the machine learning algorithms. J Earth Syst Sci 130:240. https://doi.org/10.1007/s12040-021-01725-9
Sun AY, Wang D, Xu X (2014) Monthly streamflow forecasting using Gaussian Process Regression. J Hydrol 511:72–81. https://doi.org/10.1016/j.jhydrol.2014.01.023
Taylor KE (2001) Summarizing multiple aspects of model performance in a single diagram. J Geophys Res 106:7183–7192. https://doi.org/10.1029/2000JD900719
Tyralis H, Papacharalampous G, Langousis A (2019) A brief review of random forests for water scientists and practitioners and their recent history in water resources. Water (Switzerland) 11. https://doi.org/10.3390/w11050910
Venkata Ramana R, Krishna B, Kumar SR, Pandey NG (2013) Monthly rainfall prediction using wavelet neural network analysis. Water Resour Manag 27:3697–3711. https://doi.org/10.1007/s11269-013-0374-4
Venkateswarlu T, Anmala J, Dharwa M (2020) PCA, CCA, and ANN modeling of climate and land-use effects on stream water quality of Karst watershed in Upper Green River, Kentucky. J Hydrol Eng 25. https://doi.org/10.1061/(ASCE)HE.1943-5584.0001921
Vitolo C, Scutari M, Ghalaieny M et al (2018) Modeling air pollution, climate, and health data using Bayesian Networks: a case study of the english regions. Earth Space Sci 5:76–88. https://doi.org/10.1002/2017EA000326
Wei M, You X (2022) Monthly rainfall forecasting by a hybrid neural network of discrete wavelet transformation and deep learning. Water Resour Manag. https://doi.org/10.1007/s11269-022-03218-w
Willmott CJ, Robeson SM, Matsuura K (2012) A refined index of model performance. Int J Climatol 32:2088–2094. https://doi.org/10.1002/joc.2419
Yin L, Tao F, Chen Y et al (2021) Improving terrestrial evapotranspiration estimation across China during 2000–2018 with machine learning methods. J Hydrol 600:126538. https://doi.org/10.1016/j.jhydrol.2021.126538
Yu PS, Yang TC, Chen SY et al (2017) Comparison of random forests and support vector machine for real-time radar-derived rainfall forecasting. J Hydrol 552:92–104. https://doi.org/10.1016/j.jhydrol.2017.06.020
Zeynoddin M, Bonakdari H, Azari A et al (2018) Novel hybrid linear stochastic with non-linear extreme learning machine methods for forecasting monthly rainfall a tropical climate. J Environ Manage 222:190–206. https://doi.org/10.1016/j.jenvman.2018.05.072
Acknowledgements
The authors would like to extend their gratitude to the Department of Science and Technology (DST). The authors would also like to thank the European Centre for Medium-Range Weather Forecasts (ECMWF) and the Indian Metrological Department (IMD) for making the necessary datasets available publicly for academic research. The authors would also like to thank the anonymous reviewers for their constructive comments in improving the quality of the paper.
Funding
This work was funded by the Department of Science and Technology (DST), India through project number ECR/2017/001880.
Author information
Authors and Affiliations
Contributions
Prabal Das: Data Collection, Formal analysis, Investigation, Writing—original draft. D. A. Sachindra: Conceptualisation, Formal analysis, Investigation, Writing – Review and editing. Kironmala Chanda: Conceptualisation, Supervision, Investigation, Funding acquisition, Writing – Review and editing.
Corresponding author
Ethics declarations
Ethical Approval
The authors ensure that this article has not been published elsewhere and that there has been no plagiarism.
Consent of Participate
The authors agree to participate in the journal.
Consent to Publish
The authors have agreed to publish in this journal.
Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. India for funding this research.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Das, P., Sachindra, D.A. & Chanda, K. Machine Learning-Based Rainfall Forecasting with Multiple Non-Linear Feature Selection Algorithms. Water Resour Manage 36, 6043–6071 (2022). https://doi.org/10.1007/s11269-022-03341-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11269-022-03341-8