Machine Learning Applications in Hydrology

Part of the Ecological Studies book series (ECOLSTUD, volume 240)


The rapidly expanding field of machine learning (ML) provides many methodological opportunities which match very well with the needs and challenges of hydrological research. Due to extended measurement networks, more frequent automatic measurements of hydrological variables, and not the least increasing use of remote sensing products, the era of big data surely has arrived in hydrology. Process-based models are usually developed for certain spatiotemporal scales, not fitting easily to the scope of the new datasets. Automatic methods that learn patterns and generalizations have been demonstrated to be superior in many applications. The chapter provides an overview of some of the most important machine learning algorithms which have been used in the hydrological literature. It will be shown that there is no single best method among them, but instead a spectrum of methods should be utilized, from highly flexible ones to more parsimonious learning methods, depending on the specific hydrological application, research question, and data availability. Most machine learning techniques require a calibration and a validation dataset for training. As these data are usually correlated in time and space, the problem of bias-variance tradeoff arises will be discussed as a simple example. The presentation of ML algorithms, roughly following chronological order, is discussed starting with artificial neural networks through support vector machines to gradient boosting machines. As data streams increase, these and other machine learning techniques will play an ever more important role in hydrology.

Supplementary material

464883_1_En_10_MOESM1_ESM.r (4 kb)
bias_variance_trade-off (R 3 kb)
464883_1_En_10_MOESM2_ESM.r (1 kb)
functions (R 1 kb)


  1. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185. CrossRefGoogle Scholar
  2. Bergmeir C, Hyndman RJ, Koo B (2018) A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput Stat Data Anal 120:70–83. CrossRefGoogle Scholar
  3. Beven K, Freer J (2001) Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems. J Hydrol 249:11–29. CrossRefGoogle Scholar
  4. Bishop C (2006) Pattern recognition and machine learning. Springer, New York. 738 pGoogle Scholar
  5. Blöschl G (2001) Scaling in hydrology. Hydrol Process 15:709–711. CrossRefGoogle Scholar
  6. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory. ACM, Pittsburgh, pp 144–152. CrossRefGoogle Scholar
  7. Bozorg-Haddad O, Aboutalebi M, Ashofteh PS, Loaiciga HA (2018) Real-time reservoir operation using data mining techniques. Environ Monit Assess 190:594. CrossRefGoogle Scholar
  8. Breiman L (2001) Random forests. Mach Learn 45:5–32. CrossRefGoogle Scholar
  9. Breiman L, Friedman JH, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman & Hall, Boca Raton. 368 pGoogle Scholar
  10. Clark MP, Nijssen B, Lundquist JD, Kavetski D, Rupp DE, Woods RA et al (2015) A unified approach for process-based hydrological modeling: 1. Modeling concept. Water Resour Res 51:2498–2514. CrossRefGoogle Scholar
  11. Corzo Perez GA (2009) Hybrid models for hydrological forecasting: Integration of data-driven and conceptual modelling techniques. Doctoral thesis, TU Delft. 215 pGoogle Scholar
  12. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signal 2:303–314. CrossRefGoogle Scholar
  13. Daliakopoulos IN, Tsanis IK (2016) Comparison of an artificial neural network and a conceptual rainfall-runoff model in the simulation of ephemeral streamflow. Hydrol Sci J 61:2763–2774. CrossRefGoogle Scholar
  14. Dechter R (1986) Learning while searching in constraint-satisfaction problems. In: AAAI ‘86 Proceedings of the Fifth AAAI national conference on artificial intelligence. Pennsylvania, Philadelphia, pp 178–183Google Scholar
  15. Fatichi S, Pappas C, Valeriy IY (2016) Modeling plant–water interactions: an ecohydrological overview from the cell to the global scale. WIRES Water 3:327–368. CrossRefGoogle Scholar
  16. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data An 38:367–378. CrossRefGoogle Scholar
  17. Friedman JH, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22. CrossRefGoogle Scholar
  18. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42. CrossRefGoogle Scholar
  19. Gudmundsson L, Seneviratne SI (2015) Towards observation-based gridded runoff estimates for Europe. Hydrol Earth Syst Sci 19:2859–2879. CrossRefGoogle Scholar
  20. Hastie T, Tibshirani R, Friedman JH (2008) The elements of statistical learning. Springer, New York. 745 pGoogle Scholar
  21. Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36:1171–1220. CrossRefGoogle Scholar
  22. Hong W-C (2008) Rainfall forecasting by technological machine learning models. Appl Math Comput 200:41–57. CrossRefGoogle Scholar
  23. Hothorn T (2019) CRAN task view: machine learning and statistical learning. Accessed 27 Feb 2019.
  24. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501. CrossRefGoogle Scholar
  25. Irving K, Kuemmerlen M, Kiesel J, Kakouei K, Domisch S, Jähnig SC (2018) A high-resolution streamflow and hydrological metrics dataset for ecological modeling using a regression model. Sci Data 5:180224. CrossRefGoogle Scholar
  26. Karunanithi N, Grenney WJ, Whitley D, Bovee K (1994) Neural networks for river flow prediction. J Comput Civil Eng 8:201–220. CrossRefGoogle Scholar
  27. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the ICNN’95 international conference on neural networks, vol 4, pp 1942–1948. CrossRefGoogle Scholar
  28. Kingston GB, Maier HR, Lambert MF (2005) Calibration and validation of neural networks to ensure physically plausible hydrological modeling. J Hydrol 314:158–176. CrossRefGoogle Scholar
  29. Kleinberg EM (1996) An overtraining-resistant stochastic modeling method for pattern recognition. Ann Stat 24:2319–2349CrossRefGoogle Scholar
  30. Kuligowski RJ, Barros AP (1998) Experiments in short-term precipitation forecasting using artificial neural networks. Mon Weather Rev 126:470–482.<0470:EISTPF>2.0.CO;2 CrossRefGoogle Scholar
  31. Lange H, Rosso OA, Hauhs M (2013) Ordinal pattern and statistical complexity analysis of daily stream flow time series. Eur Phys- J Spec Top 222:535–552. CrossRefGoogle Scholar
  32. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput Appl 1:541–551. CrossRefGoogle Scholar
  33. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22Google Scholar
  34. Lima AR, Hsieh WW, Cannon AJ (2017) Variable complexity online sequential extreme learning machine, with applications to streamflow prediction. J Hydrol 555:983–994. CrossRefGoogle Scholar
  35. Lin JY, Cheng CT, Chau KW (2006) Using support vector machines for long-term discharge prediction. Hydrol Sci J 51:599–612. CrossRefGoogle Scholar
  36. Lischeid G (2001) Investigating short-term dynamics and long-term trends of SO4 in the runoff of a forested catchment using artificial neural networks. J Hydrol 243:31–42. CrossRefGoogle Scholar
  37. Loh W-Y (2011) Classification and regression trees. WIRES Data Min Knowl 1:14–23. CrossRefGoogle Scholar
  38. Ma Y, Li XY, Guo L, Lin H (2017) Hydropedology: interactions between pedologic and hydrologic processes across spatiotemporal scales. Earth-Sci Rev 171:181–195. CrossRefGoogle Scholar
  39. Maier HR, Dandy GC (1995) Comparison of the Box-Jenkins procedure with artificial neural network methods for univariate time series modelling. Research Report No R 127, June 1995. Department of Civil and Environmental Engineering, University of Adelaide, Adelaide, AustraliaGoogle Scholar
  40. Miettinen K (1999) Nonlinear multiobjective optimization. Springer, New York., 298 p. CrossRefGoogle Scholar
  41. Modaresi F, Araghinejad S, Ebrahimi K (2018a) A comparative assessment of artificial neural network, generalized regression neural network, least-square support vector regression, and K-nearest neighbor regression for monthly streamflow forecasting in linear and nonlinear conditions. Water Resour Manag 32:243–258. CrossRefGoogle Scholar
  42. Modaresi F, Araghinejad S, Ebrahimi K (2018b) Selected model fusion: an approach for improving the accuracy of monthly streamflow forecasting. J Hydroinform 20:917–933. CrossRefGoogle Scholar
  43. Naghibi SA, Pourghasemi HR (2015) A comparative assessment between three machine learning models and their performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping. Water Resour Manag 29:5217–5236. CrossRefGoogle Scholar
  44. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1).
  45. Nash JE, Sutcliffe V (1970) River flow forecasting through conceptual models, I. A discussion of principles. J Hydrol 10:282–290. CrossRefGoogle Scholar
  46. Nourani V, Roushangar K, Andalib G (2018) An inverse method for watershed change detection using hybrid conceptual and artificial intelligence approaches. J Hydrol 562:371–384. CrossRefGoogle Scholar
  47. Parasuraman K, Elshorbagy A, Carey SK (2006) Spiking modular neural networks: a neural network modeling approach for hydrological processes. Water Resour Res 42:W05412. CrossRefGoogle Scholar
  48. Peters J, Janzing D, Schölkopf B (2017) Elements of causal inference, Foundations and learning algorithms. MIT Press, Cambridge, MA. 288 pGoogle Scholar
  49. Peters R, Lin Y, Berger U (2016) Machine learning meets individual-based modelling: self-organising feature maps for the analysis of below-ground competition among plants. Ecol Model 326:142–151. CrossRefGoogle Scholar
  50. Quinlan JR (1993) Combining instance-based and model-based learning. In: Proceedings of the tenth international conference on machine learning. Morgan Kaufmann, Amherst, MA, pp 236–243Google Scholar
  51. Raghavendra SN, Deka PC (2014) Support vector machine applications in the field of hydrology: a review. Appl Soft Comput 19:372–386. CrossRefGoogle Scholar
  52. Rasouli K, Hsieh WW, Cannon AJ (2012) Daily streamflow forecasting by machine learning methods with weather and climate inputs. J Hydrol 414–415:284–293. CrossRefGoogle Scholar
  53. Rasp S, Pritchard MS, Gentine P (2018) Deep learning to represent subgrid processes in climate models. Proc Natl Acad Sci USA 115:9684–9689. CrossRefGoogle Scholar
  54. Richards LA (1931) Capillary conduction of liquids in porous mediums. Physics 1:318–333. CrossRefGoogle Scholar
  55. Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera-Arroita G et al (2017) Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40:913–929. CrossRefGoogle Scholar
  56. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386–408. CrossRefGoogle Scholar
  57. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536. CrossRefGoogle Scholar
  58. Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3:210–229. CrossRefGoogle Scholar
  59. Schliep K, Hechenbichler K (2016) kknn: Weighted k-Nearest Neighbors. R package version 1.3.1.
  60. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117. CrossRefGoogle Scholar
  61. Schoups G, Vrugt JA (2010) A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour Res 46:W10531. CrossRefGoogle Scholar
  62. Schultz W (2007) Reward signals. Scholarpedia 2:2184. CrossRefGoogle Scholar
  63. Shafaei M, Kisi O (2017) Predicting river daily flow using wavelet-artificial neural networks based on regression analyses in comparison with artificial neural networks and support vector machine models. Neural Comput Appl 28:S15–S28. CrossRefGoogle Scholar
  64. Shen C (2018) Deep learning: a next-generation big-data approach for hydrology. EOS Trans 99.
  65. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489. CrossRefGoogle Scholar
  66. Silver N (2012) The signal and the noise: why so many predictions fail--but some don’t. Penguin Books, New York. 560 pGoogle Scholar
  67. Siqueira H, Boccato L, Luna I, Attux R, Lyra C (2018) Performance analysis of unorganized machines in streamflow forecasting of Brazilian plants. Appl Soft Comput 68:494–506. CrossRefGoogle Scholar
  68. Sivapalan M (2003) Process compexity at hillslope scale, process simplicity at the watershed scale: is there a connection? Hydrol Process 17:1037–1041. CrossRefGoogle Scholar
  69. Sivapalan M (2006) Pattern, process and function: elements of a unified theory of hydrology at the catchment scale. Encycl Hydrol Sci.
  70. Sivapalan M, Grayson R, Woods R (2004) Scale and scaling in hydrology. Hydrol Process 18:1369–1371. CrossRefGoogle Scholar
  71. Sugihara G, May R, Ye H, Hsieh C-H, Deyle E, Fogarty M et al (2012) Detecting causality in complex ecosystems. Science 338:496–500. CrossRefGoogle Scholar
  72. Tongal H, Berndtsson R (2017) Impact of complexity on daily and multi-step forecasting of streamflow with chaotic, stochastic, and black-box models. Stoch Environ Res Risk Assess 31:661–682. CrossRefGoogle Scholar
  73. Toth E, Brath A (2007) Multistep ahead streamflow forecasting: role of calibration data in conceptual and neural network modeling. Water Resour Res 43:W11405. CrossRefGoogle Scholar
  74. Tyralis H, Papacharalampous G, Langousis A (2019) A brief review of Random Forests for water scientists and practitioners and their recent history in water resources. Water 11:910. CrossRefGoogle Scholar
  75. Viney NR, Sivapalan M (2004) A framework for scaling of hydrologic conceptualizations based on a disaggregation-aggregation approach. Hydrol Process 18:1395–1408. CrossRefGoogle Scholar
  76. Wang W, Van Gelder P, Vrijling JK, Ma J (2006) Forecasting daily streamflow using hybrid ANN models. J Hydrol 324:383–399. CrossRefGoogle Scholar
  77. Werbos PJ (1975) Beyond regression: new tools for prediction and analysis in the behavioral sciences. Harvard University Press, Cambridge, MA. 906 pGoogle Scholar
  78. Wolpert DH (1992) Stacked generalization. Neural Netw 5:241–259. CrossRefGoogle Scholar
  79. Worland SC, Farmer WH, Kiang JE (2018) Improving predictions of hydrological low-flow indices in ungaged basins using machine learning. Environ Model Softw 101:169–182. CrossRefGoogle Scholar
  80. Yaseen ZM, Allawi MF, Yousif AA, Jaafar O, Hamzah FM, El-Shafie A (2018) Non-tuned machine learning approach for hydrological time series forecasting. Neural Comput Appl 30:1479–1491. CrossRefGoogle Scholar
  81. Yin ZL, Feng Q, Wen XH, Deo RC, Yang LS, Si JH et al (2018) Design and evaluation of SVR, MARS and M5Tree models for 1, 2 and 3-day lead time forecasting of river flow data in a semiarid mountainous catchment. Stoch Environ Res Risk Assess 32:2457–2476. CrossRefGoogle Scholar
  82. Yu X, Zhang XQ, Qin H (2018) A data-driven model based on Fourier transform and support vector regression for monthly reservoir inflow forecasting. J Hydro-Environ Res 18:12–24. CrossRefGoogle Scholar
  83. Zealand CM, Burn DH, Simonovic SP (1999) Short term streamflow forecasting using artificial neural networks. J Hydrol 214:32–48. CrossRefGoogle Scholar
  84. Zhang S-Q (2009) Enhanced supervised locally linear embedding. Pattern Recogn Lett 30:1208–1218. CrossRefGoogle Scholar
  85. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320. CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Norwegian Institute of Bioeconomy ResearchÅsNorway
  2. 2.Department of Environmental System ScienceETH ZürichZürichSwitzerland

Personalised recommendations