Abstract
The severity of global climate change is exemplified by the significant increase in extreme precipitation events, leading to an urgent need for accurate rainfall prediction models to mitigate flood disasters that adversely affect economic and social development. With the rapid progress of machine learning in the big data era, novel solutions to regression problems are being proposed. In this paper, we try to construct and evaluate different rainfall prediction models based on specific humidity, relative humidity, horizontal and vertical water vapor flux, and lifting index as variables, using four classic machine learning algorithms: linear regression, random forest regression, support vector regression, and Bayesian ridge regression. The grid search method is employed for hyperparameter tuning, significantly improving the models' prediction accuracy and generalization ability. Evaluation of the predictive performance of the models on nine typical regions in China, including Zhengzhou, Beijing, and Chengdu, demonstrates that the random forest regression model has the highest predictive accuracy, with an average fitting degree of 0.8 or above, followed by support vector regression and Bayesian ridge regression models. Conversely, the linear regression model may have the poorest predictive performance. Therefore, the random forest regression model is recommended for future precipitation prediction, providing a valuable solution to various regression problems. The appropriate selection of variables for prediction and grid search for hyperparameter tuning are possibly the highlights of this paper.
Similar content being viewed by others
Availability of data and materials
The data sets used or analyzed during the current study are available from the corresponding author on reasonable request.
References
Chen X, Chen Y, Shi J (2020) Modeling and prediction of rainfall radar echo data based on Machine learning. J Nanjing Univ Inf Sci Technol,20,12(4):483494
Bouaziz M, Medhioub E, Csaplovisc E.(2021) A machine learning model for drought tracking and forecasting using remote precipitation data and a standardized precipitation index from arid regions. J Arid Environ
(2008) Based on least squares support vector machine (SVM) rainfall prediction [J]. The people of the Yangtze River, 9 (1): 2931. https://doi.org/10.16232/j.carolcarrollnki.10014179.2008.19.001
Lange H, Sippel S (2020) Machine learning applications in hydrology. Forestw Interact, 233257
Leung CKS, MacKinnon RK, Wang Y (2014) A machine learning approach for stock price prediction. In: Proceedings of the 18th international database engineering & applications symposium, pp 274277
Li S, Bai Y (2022) Book review: text as data: a new framework for machine learning and the social sciences
Ahmed K, Sachindra DA, Shahid S, Iqbal Z, Nawaz N, Khan N (2020) Multimodel ensemble predictions of precipitation and temperature using machine learning algorithms. Atmos Res 236:104806
Gocic M, Shamshirband S, Razak Z, Petković D, Ch S, Trajkovic S (2016) Longterm precipitation analysis and estimation of precipitation concentration index using three support vector machine methods. Adv Meteorol, 2016.
Haiden T, Kann A, Wittmann C, Pistotnik G, Bica B, Gruber C (2011) The integrated nowcasting through comprehensive analysis (INCA) system and its validation over the Eastern Alpine region. Weather Forecast 26(2):166183
Alizamir M, Kim S, Kisi O, ZounematKermani M (2020) A comparative study of several machine learning based nonlinear regression methods in estimating solar radiation: case studies of the USA and Turkey regions. Energy 197:117239
Xu L, Yu J. (2020) Different optimizer under Gaussian noise on the study of the influence of the LR performance. Comput Technol Dev, 712
Yao W, Li L (2014) A new regression model: modal linear regression. Scand J Stat 41(3):656671
Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. In: Inf Comput Appl Third Int Conf, ICICA 2012, Chengde, China, September 1416, 2012. Proceedings 3. Springer, pp 246252
Yisen W, Shutao X (2018) Review of stochastic forest algorithm for ensemble learning. Inf Commun Technol:4955. (in Chinese). https://doi.org/10.3969/j.issn.16741285.2018.01.009.
Jain N, Jana PK (2023) LRF: A logically randomized forest algorithm for classification and regression problems. Expert Syst Appl 213:119225
Zhang W, Wu C, Li Y, Wang L, Samui P (2021) Assessment of pile drivability using random forest regression and multivariate adaptive regression splines. Georisk: Assess Manage Risk Eng Syst Geohazards, 15(1), 2740.
Xue L, Liu Y, Xiong Y, Liu Y, Cui X, Lei G (2021) A datadriven shale gas production forecasting method based on the multiobjective random forest regression. J Petrol Sci Eng 196:107801
Liang C, Jinhong W, Tao H, et al (2018) Regional transportation carbon based on SVR prediction research. J Transp Syst Eng Inf Technol 19(2):13 to 19. https://doi.org/10.16097/j.carolcarrollnki.10096744. 2018.02.003
Xu Weiya Xu, Wei YL (2021) Deformation prediction of toppling deformed slope based on LMBP and SVR. J Hohai Univ (Nat Sci) 49(1):6469
Wang YG, Wu J, Hu ZH, McLachlan GJ (2023) A new algorithm for support vector regression with automatic selection of hyperparameters. Pattern Recogn 133:108989
Kurani A, Doshi P, Vakharia A, Shah M (2023) A comprehensive comparative study of artificial neural network (ANN) and support vector machines (SVM) on stock forecasting. Ann Data Sci 10(1):183208
Michimae H, Emura T (2022) Bayesian ridge estimators based on copulabased joint prior distributions for regression coefficients. Comput Statistics 37(5):27412769
Imane M, Aoula ES, Achouyab EH (2022) Using Bayesian ridge regression to predict the overall equipment effectiveness performance. In: 2022 2nd international conference on innovative research in applied science, engineering and technology (IRASET). IEEE, pp 14
Na MH, Cho WH, Kim SK, Na IS (2022) Automatic weight prediction system for Korean cattle using Bayesian ridge algorithm on RGBD image. Electronics 11(10):1663
Degener A (2022) Prediction of appropriate L2 regularization strengths through Bayesian formalism
Cheng K, Lu Z (2021) Adaptive Bayesian support vector regression model for structural reliability analysis. Reliab Eng Syst Saf 206:107286
Gupta S, McFarquhar GM, O’Brien JR et al (2022) Factors affecting precipitation formation and precipitation susceptibility of marine stratocumulus with variable above-and below-cloud aerosol concentrations over the Southeast Atlantic. Atmos Chem Phys 22(4):2769–2793
Bailey A, Aemisegger F, Villiger L et al (2023) Isotopic measurements in water vapor, precipitation, and seawater during EUREC 4 A. Earth Syst Sci Data 15(1):465–495
Ricciotti JA, Cordeira JM (2022) Summarizing relationships among landfalling atmospheric rivers, integrated water vapor transport, and California watershed precipitation 1982–2019[J]. J Hydrometeorol 23(9):1439–1454
Czajka B, Barthlott C, Kohler M et al (2023) Analysis of the impact of selected sources of uncertainty on precipitation simultaions of summer convection over Central Europe[R]. Copernicus Meet
Sun D, Xu J, Wen H, Wang D (2021) Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: a comparison between logistic regression and random forest. Eng Geol 281:105972
Torbeck L (2010) When to use percent relative standard deviation—and how to do so correctly. Pharm Technol 34(1):263
Battey HS, Reid N (2021) Inference in highdimensional linear regression. arXiv preprint arXiv:2106.12001
Hongzhi Y, Baorong Z (2018) Normal equations based on machine learning linear regression analysis. J Geek, https://doi.org/10.3969/j.issn.1672528X.2018.07.171
Arora S, Li Z, Panigrahi A (2022) Understanding gradient descent on the edge of stability in deep learning. In: International conference on machine learning. PMLR, pp 9481024
Belete DM, Huchaiah MD (2022) Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int J Comput Appl 44(9):875886
Belete DM, Huchaiah MD (2022) Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results[J]. Int J Comput Appl 44(9):875–886
Afzal A, Aabid A, Khan A, Khan SA, Rajak U, Verma TN, Kumar R (2020) Response surface analysis, clustering, and random forest regression of pressure in suddenly expanded highspeed aerodynamic flows. Aerosp Sci Technol 107:106318
Pisner DA, Schnyer DM (2020) Support vector machine. In Machine learning. Academic Press, pp 101121
Santos CFGD, Papa JP (2022) Avoiding overfitting: a survey on regularization methods for convolutional neural networks. ACM Comput Surv (CSUR) 54(10s):125
Cervantes J, GarciaLamont F, RodríguezMazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189215
Sheykhmousa M, Mahdianpari M, Ghanbari H, Mohammadimanesh F, Ghamisi P, Homayouni S (2020) Support vector machine versus random forest for remote sensing image classification: a metaanalysis and systematic review. IEEE J Sel Top Appl Earth Obs Remote Sens 13:63086325
Pirone D, Cimorelli L, Del Giudice G, Pianese D (2023) Shortterm rainfall forecasting using cumulative precipitation fields from station data: a probabilistic machine learning approach. J Hydrol 617:128949
Acknowledgements
The author would like to acknowledge the National Natural Science Foundation of China (NO. 11972327) for the financial support for this paper. They also would like to thank the anonymous reviewers for their helpful and kind suggestions.
Funding
This work was supported by the National Natural Science Foundation of China (NO. 11972327).
Author information
Authors and Affiliations
Contributions
LP has made the substantial contributions to the conception and design of the work, and interpretation of the predication results, revised critically the important content, approved the final version to be published; YW has carried out the total analysis and predication, and drafted the manuscript; JW has collected the data for analysis and prediction, checked and polished the draft.
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare the following Financial interests/personal relationships which may be considered as potential competing interests: No.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Wang, Y., Pei, L. & Wang, J. Precipitation prediction in several Chinese regions using machine learning methods. Int. J. Dynam. Control 12, 1180–1196 (2024). https://doi.org/10.1007/s40435-023-01250-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40435-023-01250-1