Skip to main content
Log in

Precipitation prediction in several Chinese regions using machine learning methods

  • Published:
International Journal of Dynamics and Control Aims and scope Submit manuscript

Abstract

The severity of global climate change is exemplified by the significant increase in extreme precipitation events, leading to an urgent need for accurate rainfall prediction models to mitigate flood disasters that adversely affect economic and social development. With the rapid progress of machine learning in the big data era, novel solutions to regression problems are being proposed. In this paper, we try to construct and evaluate different rainfall prediction models based on specific humidity, relative humidity, horizontal and vertical water vapor flux, and lifting index as variables, using four classic machine learning algorithms: linear regression, random forest regression, support vector regression, and Bayesian ridge regression. The grid search method is employed for hyperparameter tuning, significantly improving the models' prediction accuracy and generalization ability. Evaluation of the predictive performance of the models on nine typical regions in China, including Zhengzhou, Beijing, and Chengdu, demonstrates that the random forest regression model has the highest predictive accuracy, with an average fitting degree of 0.8 or above, followed by support vector regression and Bayesian ridge regression models. Conversely, the linear regression model may have the poorest predictive performance. Therefore, the random forest regression model is recommended for future precipitation prediction, providing a valuable solution to various regression problems. The appropriate selection of variables for prediction and grid search for hyperparameter tuning are possibly the highlights of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

Availability of data and materials

The data sets used or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Chen X, Chen Y, Shi J (2020) Modeling and prediction of rainfall radar echo data based on Machine learning. J Nanjing Univ Inf Sci Technol,20,12(4):483494

  2. Bouaziz M, Medhioub E, Csaplovisc E.(2021) A machine learning model for drought tracking and forecasting using remote precipitation data and a standardized precipitation index from arid regions. J Arid Environ

  3. (2008) Based on least squares support vector machine (SVM) rainfall prediction [J]. The people of the Yangtze River, 9 (1): 2931. https://doi.org/10.16232/j.carolcarrollnki.10014179.2008.19.001

  4. Lange H, Sippel S (2020) Machine learning applications in hydrology. Forestw Interact, 233257

  5. Leung CKS, MacKinnon RK, Wang Y (2014) A machine learning approach for stock price prediction. In: Proceedings of the 18th international database engineering & applications symposium, pp 274277

  6. Li S, Bai Y (2022) Book review: text as data: a new framework for machine learning and the social sciences

  7. Ahmed K, Sachindra DA, Shahid S, Iqbal Z, Nawaz N, Khan N (2020) Multimodel ensemble predictions of precipitation and temperature using machine learning algorithms. Atmos Res 236:104806

    Article  Google Scholar 

  8. Gocic M, Shamshirband S, Razak Z, Petković D, Ch S, Trajkovic S (2016) Longterm precipitation analysis and estimation of precipitation concentration index using three support vector machine methods. Adv Meteorol, 2016.

  9. Haiden T, Kann A, Wittmann C, Pistotnik G, Bica B, Gruber C (2011) The integrated nowcasting through comprehensive analysis (INCA) system and its validation over the Eastern Alpine region. Weather Forecast 26(2):166183

    Article  Google Scholar 

  10. Alizamir M, Kim S, Kisi O, ZounematKermani M (2020) A comparative study of several machine learning based nonlinear regression methods in estimating solar radiation: case studies of the USA and Turkey regions. Energy 197:117239

    Article  Google Scholar 

  11. Xu L, Yu J. (2020) Different optimizer under Gaussian noise on the study of the influence of the LR performance. Comput Technol Dev, 712

  12. Yao W, Li L (2014) A new regression model: modal linear regression. Scand J Stat 41(3):656671

    Article  MathSciNet  Google Scholar 

  13. Liu Y, Wang Y, Zhang J (2012) New machine learning algorithm: random forest. In: Inf Comput Appl Third Int Conf, ICICA 2012, Chengde, China, September 1416, 2012. Proceedings 3. Springer, pp 246252

  14. Yisen W, Shutao X (2018) Review of stochastic forest algorithm for ensemble learning. Inf Commun Technol:4955. (in Chinese). https://doi.org/10.3969/j.issn.16741285.2018.01.009.

  15. Jain N, Jana PK (2023) LRF: A logically randomized forest algorithm for classification and regression problems. Expert Syst Appl 213:119225

    Article  Google Scholar 

  16. Zhang W, Wu C, Li Y, Wang L, Samui P (2021) Assessment of pile drivability using random forest regression and multivariate adaptive regression splines. Georisk: Assess Manage Risk Eng Syst Geohazards, 15(1), 2740.

  17. Xue L, Liu Y, Xiong Y, Liu Y, Cui X, Lei G (2021) A datadriven shale gas production forecasting method based on the multiobjective random forest regression. J Petrol Sci Eng 196:107801

    Article  Google Scholar 

  18. Liang C, Jinhong W, Tao H, et al (2018) Regional transportation carbon based on SVR prediction research. J Transp Syst Eng Inf Technol 19(2):13 to 19. https://doi.org/10.16097/j.carolcarrollnki.10096744. 2018.02.003

  19. Xu Weiya Xu, Wei YL (2021) Deformation prediction of toppling deformed slope based on LMBP and SVR. J Hohai Univ (Nat Sci) 49(1):6469

    Google Scholar 

  20. Wang YG, Wu J, Hu ZH, McLachlan GJ (2023) A new algorithm for support vector regression with automatic selection of hyperparameters. Pattern Recogn 133:108989

    Article  Google Scholar 

  21. Kurani A, Doshi P, Vakharia A, Shah M (2023) A comprehensive comparative study of artificial neural network (ANN) and support vector machines (SVM) on stock forecasting. Ann Data Sci 10(1):183208

    Article  Google Scholar 

  22. Michimae H, Emura T (2022) Bayesian ridge estimators based on copulabased joint prior distributions for regression coefficients. Comput Statistics 37(5):27412769

    Article  Google Scholar 

  23. Imane M, Aoula ES, Achouyab EH (2022) Using Bayesian ridge regression to predict the overall equipment effectiveness performance. In: 2022 2nd international conference on innovative research in applied science, engineering and technology (IRASET). IEEE, pp 14

  24. Na MH, Cho WH, Kim SK, Na IS (2022) Automatic weight prediction system for Korean cattle using Bayesian ridge algorithm on RGBD image. Electronics 11(10):1663

    Article  Google Scholar 

  25. Degener A (2022) Prediction of appropriate L2 regularization strengths through Bayesian formalism

  26. Cheng K, Lu Z (2021) Adaptive Bayesian support vector regression model for structural reliability analysis. Reliab Eng Syst Saf 206:107286

    Article  Google Scholar 

  27. Gupta S, McFarquhar GM, O’Brien JR et al (2022) Factors affecting precipitation formation and precipitation susceptibility of marine stratocumulus with variable above-and below-cloud aerosol concentrations over the Southeast Atlantic. Atmos Chem Phys 22(4):2769–2793

    Article  Google Scholar 

  28. Bailey A, Aemisegger F, Villiger L et al (2023) Isotopic measurements in water vapor, precipitation, and seawater during EUREC 4 A. Earth Syst Sci Data 15(1):465–495

    Article  Google Scholar 

  29. Ricciotti JA, Cordeira JM (2022) Summarizing relationships among landfalling atmospheric rivers, integrated water vapor transport, and California watershed precipitation 1982–2019[J]. J Hydrometeorol 23(9):1439–1454

    Article  Google Scholar 

  30. Czajka B, Barthlott C, Kohler M et al (2023) Analysis of the impact of selected sources of uncertainty on precipitation simultaions of summer convection over Central Europe[R]. Copernicus Meet

  31. Sun D, Xu J, Wen H, Wang D (2021) Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: a comparison between logistic regression and random forest. Eng Geol 281:105972

    Article  Google Scholar 

  32. Torbeck L (2010) When to use percent relative standard deviation—and how to do so correctly. Pharm Technol 34(1):263

    Google Scholar 

  33. Battey HS, Reid N (2021) Inference in highdimensional linear regression. arXiv preprint arXiv:2106.12001

  34. Hongzhi Y, Baorong Z (2018) Normal equations based on machine learning linear regression analysis. J Geek, https://doi.org/10.3969/j.issn.1672528X.2018.07.171

  35. Arora S, Li Z, Panigrahi A (2022) Understanding gradient descent on the edge of stability in deep learning. In: International conference on machine learning. PMLR, pp 9481024

  36. Belete DM, Huchaiah MD (2022) Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results. Int J Comput Appl 44(9):875886

    Google Scholar 

  37. Belete DM, Huchaiah MD (2022) Grid search in hyperparameter optimization of machine learning models for prediction of HIV/AIDS test results[J]. Int J Comput Appl 44(9):875–886

    Google Scholar 

  38. Afzal A, Aabid A, Khan A, Khan SA, Rajak U, Verma TN, Kumar R (2020) Response surface analysis, clustering, and random forest regression of pressure in suddenly expanded highspeed aerodynamic flows. Aerosp Sci Technol 107:106318

    Article  Google Scholar 

  39. Pisner DA, Schnyer DM (2020) Support vector machine. In Machine learning. Academic Press, pp 101121

  40. Santos CFGD, Papa JP (2022) Avoiding overfitting: a survey on regularization methods for convolutional neural networks. ACM Comput Surv (CSUR) 54(10s):125

    Article  Google Scholar 

  41. Cervantes J, GarciaLamont F, RodríguezMazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189215

    Article  Google Scholar 

  42. Sheykhmousa M, Mahdianpari M, Ghanbari H, Mohammadimanesh F, Ghamisi P, Homayouni S (2020) Support vector machine versus random forest for remote sensing image classification: a metaanalysis and systematic review. IEEE J Sel Top Appl Earth Obs Remote Sens 13:63086325

    Article  Google Scholar 

  43. Pirone D, Cimorelli L, Del Giudice G, Pianese D (2023) Shortterm rainfall forecasting using cumulative precipitation fields from station data: a probabilistic machine learning approach. J Hydrol 617:128949

    Article  Google Scholar 

Download references

Acknowledgements

The author would like to acknowledge the National Natural Science Foundation of China (NO. 11972327) for the financial support for this paper. They also would like to thank the anonymous reviewers for their helpful and kind suggestions.

Funding

This work was supported by the National Natural Science Foundation of China (NO. 11972327).

Author information

Authors and Affiliations

Authors

Contributions

LP has made the substantial contributions to the conception and design of the work, and interpretation of the predication results, revised critically the important content, approved the final version to be published; YW has carried out the total analysis and predication, and drafted the manuscript; JW has collected the data for analysis and prediction, checked and polished the draft.

Corresponding author

Correspondence to Lijun Pei.

Ethics declarations

Conflict of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper. The authors declare the following Financial interests/personal relationships which may be considered as potential competing interests: No.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Pei, L. & Wang, J. Precipitation prediction in several Chinese regions using machine learning methods. Int. J. Dynam. Control 12, 1180–1196 (2024). https://doi.org/10.1007/s40435-023-01250-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40435-023-01250-1

Keywords

Navigation