Advertisement

Machine Learning Applications in Hydrology

  • H. LangeEmail author
  • S. Sippel
Chapter
  • 168 Downloads
Part of the Ecological Studies book series (ECOLSTUD, volume 240)

Abstract

The rapidly expanding field of machine learning (ML) provides many methodological opportunities which match very well with the needs and challenges of hydrological research. Due to extended measurement networks, more frequent automatic measurements of hydrological variables, and not the least increasing use of remote sensing products, the era of big data surely has arrived in hydrology. Process-based models are usually developed for certain spatiotemporal scales, not fitting easily to the scope of the new datasets. Automatic methods that learn patterns and generalizations have been demonstrated to be superior in many applications. The chapter provides an overview of some of the most important machine learning algorithms which have been used in the hydrological literature. It will be shown that there is no single best method among them, but instead a spectrum of methods should be utilized, from highly flexible ones to more parsimonious learning methods, depending on the specific hydrological application, research question, and data availability. Most machine learning techniques require a calibration and a validation dataset for training. As these data are usually correlated in time and space, the problem of bias-variance tradeoff arises will be discussed as a simple example. The presentation of ML algorithms, roughly following chronological order, is discussed starting with artificial neural networks through support vector machines to gradient boosting machines. As data streams increase, these and other machine learning techniques will play an ever more important role in hydrology.

Supplementary material

464883_1_En_10_MOESM1_ESM.r (4 kb)
bias_variance_trade-off (R 3 kb)
464883_1_En_10_MOESM2_ESM.r (1 kb)
functions (R 1 kb)

References

  1. Altman NS (1992) An introduction to kernel and nearest-neighbor nonparametric regression. Am Stat 46:175–185.  https://doi.org/10.2307/2685209 CrossRefGoogle Scholar
  2. Bergmeir C, Hyndman RJ, Koo B (2018) A note on the validity of cross-validation for evaluating autoregressive time series prediction. Comput Stat Data Anal 120:70–83.  https://doi.org/10.1016/j.csda.2017.11.003 CrossRefGoogle Scholar
  3. Beven K, Freer J (2001) Equifinality, data assimilation, and uncertainty estimation in mechanistic modelling of complex environmental systems. J Hydrol 249:11–29.  https://doi.org/10.1016/S0022-1694(01)00421-8 CrossRefGoogle Scholar
  4. Bishop C (2006) Pattern recognition and machine learning. Springer, New York. 738 pGoogle Scholar
  5. Blöschl G (2001) Scaling in hydrology. Hydrol Process 15:709–711.  https://doi.org/10.1002/hyp.432 CrossRefGoogle Scholar
  6. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers. In: Proceedings of the fifth annual workshop on computational learning theory. ACM, Pittsburgh, pp 144–152.  https://doi.org/10.1145/130385.130401 CrossRefGoogle Scholar
  7. Bozorg-Haddad O, Aboutalebi M, Ashofteh PS, Loaiciga HA (2018) Real-time reservoir operation using data mining techniques. Environ Monit Assess 190:594.  https://doi.org/10.1007/s10661-018-6970-2 CrossRefGoogle Scholar
  8. Breiman L (2001) Random forests. Mach Learn 45:5–32.  https://doi.org/10.1023/A:1010933404324 CrossRefGoogle Scholar
  9. Breiman L, Friedman JH, Stone CJ, Olshen RA (1984) Classification and regression trees. Chapman & Hall, Boca Raton. 368 pGoogle Scholar
  10. Clark MP, Nijssen B, Lundquist JD, Kavetski D, Rupp DE, Woods RA et al (2015) A unified approach for process-based hydrological modeling: 1. Modeling concept. Water Resour Res 51:2498–2514.  https://doi.org/10.1002/2015WR017198 CrossRefGoogle Scholar
  11. Corzo Perez GA (2009) Hybrid models for hydrological forecasting: Integration of data-driven and conceptual modelling techniques. Doctoral thesis, TU Delft. 215 pGoogle Scholar
  12. Cybenko G (1989) Approximation by superpositions of a sigmoidal function. Math Control Signal 2:303–314.  https://doi.org/10.1007/BF02551274 CrossRefGoogle Scholar
  13. Daliakopoulos IN, Tsanis IK (2016) Comparison of an artificial neural network and a conceptual rainfall-runoff model in the simulation of ephemeral streamflow. Hydrol Sci J 61:2763–2774.  https://doi.org/10.1080/02626667.2016.1154151 CrossRefGoogle Scholar
  14. Dechter R (1986) Learning while searching in constraint-satisfaction problems. In: AAAI ‘86 Proceedings of the Fifth AAAI national conference on artificial intelligence. Pennsylvania, Philadelphia, pp 178–183Google Scholar
  15. Fatichi S, Pappas C, Valeriy IY (2016) Modeling plant–water interactions: an ecohydrological overview from the cell to the global scale. WIRES Water 3:327–368.  https://doi.org/10.1002/wat2.1125 CrossRefGoogle Scholar
  16. Friedman JH (2002) Stochastic gradient boosting. Comput Stat Data An 38:367–378.  https://doi.org/10.1016/S0167-9473(01)00065-2 CrossRefGoogle Scholar
  17. Friedman JH, Hastie T, Tibshirani R (2010) Regularization paths for generalized linear models via coordinate descent. J Stat Softw 33:1–22.  https://doi.org/10.18637/jss.v033.i01 CrossRefGoogle Scholar
  18. Geurts P, Ernst D, Wehenkel L (2006) Extremely randomized trees. Mach Learn 63:3–42.  https://doi.org/10.1007/s10994-006-6226-1 CrossRefGoogle Scholar
  19. Gudmundsson L, Seneviratne SI (2015) Towards observation-based gridded runoff estimates for Europe. Hydrol Earth Syst Sci 19:2859–2879.  https://doi.org/10.5194/hess-19-2859-2015 CrossRefGoogle Scholar
  20. Hastie T, Tibshirani R, Friedman JH (2008) The elements of statistical learning. Springer, New York. 745 pGoogle Scholar
  21. Hofmann T, Schölkopf B, Smola AJ (2008) Kernel methods in machine learning. Ann Stat 36:1171–1220.  https://doi.org/10.1214/009053607000000677 CrossRefGoogle Scholar
  22. Hong W-C (2008) Rainfall forecasting by technological machine learning models. Appl Math Comput 200:41–57.  https://doi.org/10.1016/j.amc.2007.10.046 CrossRefGoogle Scholar
  23. Hothorn T (2019) CRAN task view: machine learning and statistical learning. R-project.org. Accessed 27 Feb 2019. https://cran.r-project.org/web/views/MachineLearning.html
  24. Huang G-B, Zhu Q-Y, Siew C-K (2006) Extreme learning machine: theory and applications. Neurocomputing 70:489–501.  https://doi.org/10.1016/j.neucom.2005.12.126 CrossRefGoogle Scholar
  25. Irving K, Kuemmerlen M, Kiesel J, Kakouei K, Domisch S, Jähnig SC (2018) A high-resolution streamflow and hydrological metrics dataset for ecological modeling using a regression model. Sci Data 5:180224.  https://doi.org/10.1038/sdata.2018.224 CrossRefGoogle Scholar
  26. Karunanithi N, Grenney WJ, Whitley D, Bovee K (1994) Neural networks for river flow prediction. J Comput Civil Eng 8:201–220.  https://doi.org/10.1061/(ASCE)0887-3801(1994)8:2(201) CrossRefGoogle Scholar
  27. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of the ICNN’95 international conference on neural networks, vol 4, pp 1942–1948.  https://doi.org/10.1109/ICNN.1995.488968 CrossRefGoogle Scholar
  28. Kingston GB, Maier HR, Lambert MF (2005) Calibration and validation of neural networks to ensure physically plausible hydrological modeling. J Hydrol 314:158–176.  https://doi.org/10.1016/j.jhydrol.2005.03.013 CrossRefGoogle Scholar
  29. Kleinberg EM (1996) An overtraining-resistant stochastic modeling method for pattern recognition. Ann Stat 24:2319–2349CrossRefGoogle Scholar
  30. Kuligowski RJ, Barros AP (1998) Experiments in short-term precipitation forecasting using artificial neural networks. Mon Weather Rev 126:470–482.  https://doi.org/10.1175/1520-0493(1998)126<0470:EISTPF>2.0.CO;2 CrossRefGoogle Scholar
  31. Lange H, Rosso OA, Hauhs M (2013) Ordinal pattern and statistical complexity analysis of daily stream flow time series. Eur Phys- J Spec Top 222:535–552.  https://doi.org/10.1140/epjst/e2013-01858-3 CrossRefGoogle Scholar
  32. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W et al (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput Appl 1:541–551.  https://doi.org/10.1162/neco.1989.1.4.541 CrossRefGoogle Scholar
  33. Liaw A, Wiener M (2002) Classification and regression by randomForest. R News 2:18–22Google Scholar
  34. Lima AR, Hsieh WW, Cannon AJ (2017) Variable complexity online sequential extreme learning machine, with applications to streamflow prediction. J Hydrol 555:983–994.  https://doi.org/10.1016/j.jhydrol.2017.10.037 CrossRefGoogle Scholar
  35. Lin JY, Cheng CT, Chau KW (2006) Using support vector machines for long-term discharge prediction. Hydrol Sci J 51:599–612.  https://doi.org/10.1623/hysj.51.4.599 CrossRefGoogle Scholar
  36. Lischeid G (2001) Investigating short-term dynamics and long-term trends of SO4 in the runoff of a forested catchment using artificial neural networks. J Hydrol 243:31–42.  https://doi.org/10.1016/S0022-1694(00)00399-1 CrossRefGoogle Scholar
  37. Loh W-Y (2011) Classification and regression trees. WIRES Data Min Knowl 1:14–23.  https://doi.org/10.1002/widm.8 CrossRefGoogle Scholar
  38. Ma Y, Li XY, Guo L, Lin H (2017) Hydropedology: interactions between pedologic and hydrologic processes across spatiotemporal scales. Earth-Sci Rev 171:181–195.  https://doi.org/10.1016/j.earscirev.2017.05.014 CrossRefGoogle Scholar
  39. Maier HR, Dandy GC (1995) Comparison of the Box-Jenkins procedure with artificial neural network methods for univariate time series modelling. Research Report No R 127, June 1995. Department of Civil and Environmental Engineering, University of Adelaide, Adelaide, AustraliaGoogle Scholar
  40. Miettinen K (1999) Nonlinear multiobjective optimization. Springer, New York., 298 p.  https://doi.org/10.1007/978-1-4615-5563-6 CrossRefGoogle Scholar
  41. Modaresi F, Araghinejad S, Ebrahimi K (2018a) A comparative assessment of artificial neural network, generalized regression neural network, least-square support vector regression, and K-nearest neighbor regression for monthly streamflow forecasting in linear and nonlinear conditions. Water Resour Manag 32:243–258.  https://doi.org/10.1007/s11269-017-1807-2 CrossRefGoogle Scholar
  42. Modaresi F, Araghinejad S, Ebrahimi K (2018b) Selected model fusion: an approach for improving the accuracy of monthly streamflow forecasting. J Hydroinform 20:917–933.  https://doi.org/10.2166/hydro.2018.098 CrossRefGoogle Scholar
  43. Naghibi SA, Pourghasemi HR (2015) A comparative assessment between three machine learning models and their performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping. Water Resour Manag 29:5217–5236.  https://doi.org/10.1007/s11269-015-1114-8 CrossRefGoogle Scholar
  44. Najafabadi MM, Villanustre F, Khoshgoftaar TM, Seliya N, Wald R, Muharemagic E (2015) Deep learning applications and challenges in big data analytics. J Big Data 2(1).  https://doi.org/10.1186/s40537-014-0007-7
  45. Nash JE, Sutcliffe V (1970) River flow forecasting through conceptual models, I. A discussion of principles. J Hydrol 10:282–290.  https://doi.org/10.1016/0022-1694(70)90255-6 CrossRefGoogle Scholar
  46. Nourani V, Roushangar K, Andalib G (2018) An inverse method for watershed change detection using hybrid conceptual and artificial intelligence approaches. J Hydrol 562:371–384.  https://doi.org/10.1016/j.jhydrol.2018.05.018 CrossRefGoogle Scholar
  47. Parasuraman K, Elshorbagy A, Carey SK (2006) Spiking modular neural networks: a neural network modeling approach for hydrological processes. Water Resour Res 42:W05412.  https://doi.org/10.1029/2005WR004317 CrossRefGoogle Scholar
  48. Peters J, Janzing D, Schölkopf B (2017) Elements of causal inference, Foundations and learning algorithms. MIT Press, Cambridge, MA. 288 pGoogle Scholar
  49. Peters R, Lin Y, Berger U (2016) Machine learning meets individual-based modelling: self-organising feature maps for the analysis of below-ground competition among plants. Ecol Model 326:142–151.  https://doi.org/10.1016/j.ecolmodel.2015.10.014 CrossRefGoogle Scholar
  50. Quinlan JR (1993) Combining instance-based and model-based learning. In: Proceedings of the tenth international conference on machine learning. Morgan Kaufmann, Amherst, MA, pp 236–243Google Scholar
  51. Raghavendra SN, Deka PC (2014) Support vector machine applications in the field of hydrology: a review. Appl Soft Comput 19:372–386.  https://doi.org/10.1016/j.asoc.2014.02.002 CrossRefGoogle Scholar
  52. Rasouli K, Hsieh WW, Cannon AJ (2012) Daily streamflow forecasting by machine learning methods with weather and climate inputs. J Hydrol 414–415:284–293.  https://doi.org/10.1016/j.jhydrol.2011.10.039 CrossRefGoogle Scholar
  53. Rasp S, Pritchard MS, Gentine P (2018) Deep learning to represent subgrid processes in climate models. Proc Natl Acad Sci USA 115:9684–9689.  https://doi.org/10.1073/pnas.1810286115 CrossRefGoogle Scholar
  54. Richards LA (1931) Capillary conduction of liquids in porous mediums. Physics 1:318–333.  https://doi.org/10.1063/1.1745010 CrossRefGoogle Scholar
  55. Roberts DR, Bahn V, Ciuti S, Boyce MS, Elith J, Guillera-Arroita G et al (2017) Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 40:913–929.  https://doi.org/10.1111/ecog.02881 CrossRefGoogle Scholar
  56. Rosenblatt F (1958) The perceptron: a probabilistic model for information storage and organization in the brain. Psychol Rev 65:386–408.  https://doi.org/10.1037/h0042519 CrossRefGoogle Scholar
  57. Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors. Nature 323:533–536.  https://doi.org/10.1038/323533a0 CrossRefGoogle Scholar
  58. Samuel AL (1959) Some studies in machine learning using the game of checkers. IBM J Res Dev 3:210–229.  https://doi.org/10.1147/rd.33.0210 CrossRefGoogle Scholar
  59. Schliep K, Hechenbichler K (2016) kknn: Weighted k-Nearest Neighbors. R package version 1.3.1. https://CRAN.R-project.org/package=kknn
  60. Schmidhuber J (2015) Deep learning in neural networks: an overview. Neural Netw 61:85–117.  https://doi.org/10.1016/j.neunet.2014.09.003 CrossRefGoogle Scholar
  61. Schoups G, Vrugt JA (2010) A formal likelihood function for parameter and predictive inference of hydrologic models with correlated, heteroscedastic, and non-Gaussian errors. Water Resour Res 46:W10531.  https://doi.org/10.1029/2009WR008933 CrossRefGoogle Scholar
  62. Schultz W (2007) Reward signals. Scholarpedia 2:2184.  https://doi.org/10.4249/scholarpedia.2184 CrossRefGoogle Scholar
  63. Shafaei M, Kisi O (2017) Predicting river daily flow using wavelet-artificial neural networks based on regression analyses in comparison with artificial neural networks and support vector machine models. Neural Comput Appl 28:S15–S28.  https://doi.org/10.1007/s00521-016-2293-9 CrossRefGoogle Scholar
  64. Shen C (2018) Deep learning: a next-generation big-data approach for hydrology. EOS Trans 99.  https://doi.org/10.1029/2018EO095649
  65. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, van den Driessche G et al (2016) Mastering the game of Go with deep neural networks and tree search. Nature 529:484–489.  https://doi.org/10.1038/nature16961 CrossRefGoogle Scholar
  66. Silver N (2012) The signal and the noise: why so many predictions fail--but some don’t. Penguin Books, New York. 560 pGoogle Scholar
  67. Siqueira H, Boccato L, Luna I, Attux R, Lyra C (2018) Performance analysis of unorganized machines in streamflow forecasting of Brazilian plants. Appl Soft Comput 68:494–506.  https://doi.org/10.1016/j.asoc.2018.04.007 CrossRefGoogle Scholar
  68. Sivapalan M (2003) Process compexity at hillslope scale, process simplicity at the watershed scale: is there a connection? Hydrol Process 17:1037–1041.  https://doi.org/10.1002/hyp.5109 CrossRefGoogle Scholar
  69. Sivapalan M (2006) Pattern, process and function: elements of a unified theory of hydrology at the catchment scale. Encycl Hydrol Sci.  https://doi.org/10.1002/0470848944.hsa012
  70. Sivapalan M, Grayson R, Woods R (2004) Scale and scaling in hydrology. Hydrol Process 18:1369–1371.  https://doi.org/10.1002/hyp.1417 CrossRefGoogle Scholar
  71. Sugihara G, May R, Ye H, Hsieh C-H, Deyle E, Fogarty M et al (2012) Detecting causality in complex ecosystems. Science 338:496–500.  https://doi.org/10.1126/science.1227079 CrossRefGoogle Scholar
  72. Tongal H, Berndtsson R (2017) Impact of complexity on daily and multi-step forecasting of streamflow with chaotic, stochastic, and black-box models. Stoch Environ Res Risk Assess 31:661–682.  https://doi.org/10.1007/s00477-016-1236-4 CrossRefGoogle Scholar
  73. Toth E, Brath A (2007) Multistep ahead streamflow forecasting: role of calibration data in conceptual and neural network modeling. Water Resour Res 43:W11405.  https://doi.org/10.1029/2006WR005383 CrossRefGoogle Scholar
  74. Tyralis H, Papacharalampous G, Langousis A (2019) A brief review of Random Forests for water scientists and practitioners and their recent history in water resources. Water 11:910.  https://doi.org/10.3390/w11050910 CrossRefGoogle Scholar
  75. Viney NR, Sivapalan M (2004) A framework for scaling of hydrologic conceptualizations based on a disaggregation-aggregation approach. Hydrol Process 18:1395–1408.  https://doi.org/10.1002/hyp.1419 CrossRefGoogle Scholar
  76. Wang W, Van Gelder P, Vrijling JK, Ma J (2006) Forecasting daily streamflow using hybrid ANN models. J Hydrol 324:383–399.  https://doi.org/10.1016/j.jhydrol.2005.09.032 CrossRefGoogle Scholar
  77. Werbos PJ (1975) Beyond regression: new tools for prediction and analysis in the behavioral sciences. Harvard University Press, Cambridge, MA. 906 pGoogle Scholar
  78. Wolpert DH (1992) Stacked generalization. Neural Netw 5:241–259.  https://doi.org/10.1016/S0893-6080(05)80023-1 CrossRefGoogle Scholar
  79. Worland SC, Farmer WH, Kiang JE (2018) Improving predictions of hydrological low-flow indices in ungaged basins using machine learning. Environ Model Softw 101:169–182.  https://doi.org/10.1016/j.envsoft.2017.12.021 CrossRefGoogle Scholar
  80. Yaseen ZM, Allawi MF, Yousif AA, Jaafar O, Hamzah FM, El-Shafie A (2018) Non-tuned machine learning approach for hydrological time series forecasting. Neural Comput Appl 30:1479–1491.  https://doi.org/10.1007/s00521-016-2763-0 CrossRefGoogle Scholar
  81. Yin ZL, Feng Q, Wen XH, Deo RC, Yang LS, Si JH et al (2018) Design and evaluation of SVR, MARS and M5Tree models for 1, 2 and 3-day lead time forecasting of river flow data in a semiarid mountainous catchment. Stoch Environ Res Risk Assess 32:2457–2476.  https://doi.org/10.1007/s00477-018-1585-2 CrossRefGoogle Scholar
  82. Yu X, Zhang XQ, Qin H (2018) A data-driven model based on Fourier transform and support vector regression for monthly reservoir inflow forecasting. J Hydro-Environ Res 18:12–24.  https://doi.org/10.1016/j.jher.2017.10.005 CrossRefGoogle Scholar
  83. Zealand CM, Burn DH, Simonovic SP (1999) Short term streamflow forecasting using artificial neural networks. J Hydrol 214:32–48.  https://doi.org/10.1016/S0022-1694(98)00242-X CrossRefGoogle Scholar
  84. Zhang S-Q (2009) Enhanced supervised locally linear embedding. Pattern Recogn Lett 30:1208–1218.  https://doi.org/10.1016/j.patrec.2009.05.011 CrossRefGoogle Scholar
  85. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc B 67:301–320.  https://doi.org/10.1111/j.1467-9868.2005.00503.x CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Norwegian Institute of Bioeconomy ResearchÅsNorway
  2. 2.Department of Environmental System ScienceETH ZürichZürichSwitzerland

Personalised recommendations