Surrogate optimization of deep neural networks for groundwater predictions


Sustainable management of groundwater resources under changing climatic conditions require an application of reliable and accurate predictions of groundwater levels. Mechanistic multi-scale, multi-physics simulation models are often too hard to use for this purpose, especially for groundwater managers who do not have access to the complex compute resources and data. Therefore, we analyzed the applicability and performance of four modern deep learning computational models for predictions of groundwater levels. We compare three methods for optimizing the models’ hyperparameters, including two surrogate model-based algorithms and a random sampling method. The models were tested using predictions of the groundwater level in Butte County, California, USA, taking into account the temporal variability of streamflow, precipitation, and ambient temperature. Our numerical study shows that the optimization of the hyperparameters can lead to reasonably accurate performance of all models (root mean squared errors of groundwater predictions of 2 meters or less), but the “simplest” network, namely a multilayer perceptron (MLP) performs overall better for learning and predicting groundwater data than the more advanced long short-term memory or convolutional neural networks in terms of prediction accuracy and time-to-solution, making the MLP a suitable candidate for groundwater prediction.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10


  1. 1.

    Note that we usually use a similar approach in continuous optimization where we scale the parameters to the unit hypercube, which improves the surrogate models and eliminates difficulties when sampling by perturbation.


  1. 1.

    Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M. et al.: Tensorflow: large-scale machine learning on heterogeneous distributed systems (2016). arXiv:1603.04467

  2. 2.

    Abramson, M.A., Audet, C., Chrissis, J., Walston, J.: Mesh adaptive direct search algorithms for mixed variable optimization. Optim. Lett. 3, 35–47 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  3. 3.

    Ali, Z., Hussain, I., Faisal, M., Nazir, H.M., Hussain, T., Shad, M.Y., Shoukry, A.M., Gani, S.H.: Forecasting drought using multilayer perceptron artificial neural network model. Adv. Meteorol., 5681308, 9 pages (2017)

  4. 4.

    Araujo, P., Astray, G., Ferrerio-Lage, J.A., Mejuto, J.C., Rodriguez-Suarez, J.A., Soto, B.: Multilayer perceptron neural network for flow prediction. J. Environ. Monit. 13(1), 35–41 (2011)

    Article  Google Scholar 

  5. 5.

    Audet, C., Dennis Jr., J.E.: Mesh adaptive direct search algorithms for constrained optimization. SIAM J. Optim. 17, 188–217 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  6. 6.

    Audet, C., Hare, W.: Derivative-Free and Blackbox Optimization. Springer Series in Operations Research and Financial Engineering. Springer, Berlin (2017)

    MATH  Book  Google Scholar 

  7. 7.

    Audet, C., Kokkolaras, M.: Blackbox and derivative-free optimization: theory, algorithms and applications. Optim. Eng. 17(1), 1–2 (2016)

    MathSciNet  MATH  Article  Google Scholar 

  8. 8.

    Audet, C., Savard, G., Zghal, W.: A mesh adaptive direct search algorithm for multiobjective optimization. Eur. J. Oper. Res. 204(3), 545–556 (2010)

    MathSciNet  MATH  Article  Google Scholar 

  9. 9.

    Balaprakash, P., Salim, M., Uram, T.D., Vishwanath, V., Wild, S.M.: Deephyper: asynchronous hyperparameter search for deep neural networks. In: 2018 IEEE 25th International Conference on High Performance Computing (HiPC), pp. 42–51 (2018)

  10. 10.

    Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13(1), 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  11. 11.

    Bergstra, J., Yamins, D., Cox, D.D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: Proceedings of the 30th International Conference on Machine Learning (2013)

  12. 12.

    Bishop, C.M., et al.: Neural networks for pattern recognition. Oxford University Press, Oxford (1995)

    MATH  Google Scholar 

  13. 13.

    Booker, A.J., Dennis Jr., J.E., Frank, P.D., Serafini, D.B., Torczon, V., Trosset, M.W.: A rigorous framework for optimization of expensive functions by surrogates. Struct. Multidiscip. Optim. 17, 1–13 (1999)

    Article  Google Scholar 

  14. 14.

    Borovykh, A., Bohte, S., Oosterlee, C.W.: Conditional Time Series Forecasting with Convolutional Neural Networks (2017). arXiv:1703.04691

  15. 15.

    Bottou, L.: Large-scale machine learning with stochastic gradient descent. In: 19th International Conference on Computational Statistics, pp. 177–186 (2010)

    Chapter  Google Scholar 

  16. 16.

    California Department of Water Resources. SGMA groundwater management. Accessed 18 May 2020

  17. 17.

    Chiang, Y.-M., Chang, L.-C., Chang, F.-J.: Comparison of static-feedforward and dynamic-feedback neural networks for rainfall-runoff modeling. J. Hydrol. 290(3–4), 297–311 (2004)

    Article  Google Scholar 

  18. 18.

    Chollet, F.: keras. GitHub Repository (2015). Accessed 18 May 2020

  19. 19.

    Cook, B.I., Mankin, J.S., Anchukaitis, K.J.: Climate change and drought: from past to future. Curr. Clim. Change Rep. 4(2), 164–179 (2018)

    Article  Google Scholar 

  20. 20.

    Coulibaly, P., Anctil, F., Aravena, R., Bobée, B.: Artificial neural network modeling of water table depth fluctuations. Water Resour. Res. 37(4), 885–896 (2001)

    Article  Google Scholar 

  21. 21.

    Cui, Z., Chen, W., Chen, Y.: Multi-scale Convolutional Neural Networks for Time Series Classification (2016). arXiv:1603.06995

  22. 22.

    Daliakopoulos, I.N., Coulibaly, P., Tsanis, I.K.: Groundwater level forecasting using artificial neural networks. J. Hydrol. 309(1–4), 229–240 (2005)

    Article  Google Scholar 

  23. 23.

    Datta, R., Regis, R.G.: A surrogate-assisted evolution strategy for constrained multi-objective optimization. Expert Syst. Appl. 57, 270–284 (2016)

    Article  Google Scholar 

  24. 24.

    Davis, E., Ierapetritou, M.: Kriging based method for the solution of mixed-integer nonlinear programs containing black-box functions. J. Global Optim. 43, 191–205 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  25. 25.

    Faunt, C.C.: Groundwater Availability of the Central Valley Aquifer, California. Professional paper 1766, 225 p., U.S. Geological Survey (2009). Accessed 18 May 2020

  26. 26.

    Forrester, A.I.J., Sóbester, A., Keane, A.J.: Multi-fidelity optimization via surrogate modelling. Proc. R. Soc. 463, 3251–3269 (2007)

    MathSciNet  MATH  Article  Google Scholar 

  27. 27.

    Fortin, F.-A., De Rainville, F.-M., Gardner, M.-A., Gagné, C., Parizeau, M.: DEAP: evolutionary algorithms made easy. J. Mach. Learn. Res. 13, 2171–2175 (2012)

    MathSciNet  Google Scholar 

  28. 28.

    Gardner, M.W., Dorling, S.R.: Artificial neural networks (the multilayer perceptron): a review of applications in the atmospheric sciences. Atmos. Environ. 32(14–15), 2627–2636 (1998)

    Article  Google Scholar 

  29. 29.

    Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12, 2451–2471 (2000)

    Article  Google Scholar 

  30. 30.

    Gramacy, R., Le Digabel, S.: The mesh adaptive direct search algorithm with treed Gaussian process surrogates. Pac. J. Optim. 11, 419–447 (2015)

    MathSciNet  MATH  Google Scholar 

  31. 31.

    Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of the 2013 International Conference on Acoustics, Speech, and Signal Processing (2013)

  32. 32.

    Gutmann, H.-M.: A radial basis function method for global optimization. J. Global Optim. 19, 201–227 (2001)

    MathSciNet  MATH  Article  Google Scholar 

  33. 33.

    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)

  34. 34.

    Hinton, G., Srivastava, N., Swersky, K.: Neural Networks for Machine Learning. lecture 6a, Overview of Mini-batch Gradient Descent. Lecture Notes (2012). Accessed 18 May 2020

  35. 35.

    Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  36. 36.

    Holmström, K.: An adaptive radial basis algorithm (ARBF) for expensive black-box mixed-integer global optimization. J. Global Optim. 9, 311–339 (2008a)

    MathSciNet  MATH  Google Scholar 

  37. 37.

    Holmström, K.: An adaptive radial basis algorithm (ARBF) for expensive black-box global optimization. J. Global Optim. 41, 447–464 (2008b)

    MathSciNet  MATH  Article  Google Scholar 

  38. 38.

    Hsu, D.: Multi-period Time Series Modeling with Sparsity Via Bayesian Variational Inference (2018). arXiv:1707.00666v3

  39. 39.

    Ilievski, I., Akhtar, T., Feng, J., Shoemaker, C.A.: Efficient hyperparameter optimization of deep learning algorithms using deterministic RBF surrogates. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (2017)

  40. 40.

    Jin, H., Song, Q., Hu, X.: Auto-Keras: An Efficient Neural Architecture Search System (2019). arXiv:1806.10282 [cs.LG]

  41. 41.

    Jones, D.R., Schonlau, M., Welch, W.J.: Efficient global optimization of expensive black-box functions. J. Global Optim. 13, 455–492 (1998)

    MathSciNet  MATH  Article  Google Scholar 

  42. 42.

    Karandish, F., Šimunek, J.: A comparison of numerical and machine-learning modeling of soil water content with limited input data. J. Hydrol. 543, 892–909 (2016)

    Article  Google Scholar 

  43. 43.

    Karslıoğlu, O., Gehlmann, M., Müller, J., Nemšàk, S., Sethian, J., Kaduwela, A., Bluhm, H., Fadley, C.: An efficient algorithm for automatic structure optimization in x-ray standing-wave experiments. J. Electron Spectrosc. Relat. Phenom. 230, 10–20 (2019)

    Article  Google Scholar 

  44. 44.

    Kingma, D.P., Ba, J.L.: ADAM: a method for stochastic optimization. In: ICLR 2015 (2015)

  45. 45.

    Klein, A., Falkner, S., Bartels, S., Hennig, P., Hutter, F.: Fast Bayesian optimization of machine learning hyperparameters on large datasets. In: Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS) 2017, Fort Lauderdale, Florida, USA, vol. 54 (2017)

  46. 46.

    Kratzert, F., Klotz, D., Brenner, C., Schulz, K., Herrnegger, M.: Rainfall-runoff modelling using long short-term memory (LSTM) networks. Hydrol. Earth Syst. Sci. 22(11), 6005–6022 (2018)

    Article  Google Scholar 

  47. 47.

    Kuderer, M., Gulati, S., Burgard, W.: Learning driving styles for autonomous vehicles from demonstration. In: 2015 IEEE International Conference on Robotics and Automation (ICRA), pp. 2641–2646 (2015)

  48. 48.

    Lakhmiri, D., Digabel, S. Le, Tribes, C.: HyperNOMAD: Hyperparameter Optimization of Deep Neural Networks Using Mesh Adaptive Direct Search (2019). arXiv:1907.01698 [cs.LG]

  49. 49.

    Langevin, C.D., Hughes, J.D., Banta, E.R., Niswonger, R.G., Panday, S., Provost, A.M.: Documentation for the MODFLOW 6 Groundwater Flow Model. Technical Report, US Geological Survey (2017)

  50. 50.

    Langhans, W., Müller, J., Collins, W.: Optimization of the Eddy-diffusivity/mass-flux shallow cumulus and boundary-layer parameterization using surrogate models. J. Adv. Model. Earth Syst. 11, 402–416 (2019)

    Article  Google Scholar 

  51. 51.

    Le Digabel, S.: Algorithm 909: NOMAD–nonlinear optimization with the MADS algorithm. ACM Trans. Math. Softw. 37, 1–15 (2011)

    MathSciNet  MATH  Article  Google Scholar 

  52. 52.

    LeCun, Y., Bottou, L., Bengio, Y., Haffner, P., et al.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  53. 53.

    LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)

    Article  Google Scholar 

  54. 54.

    Lee, H.K.H., Gramacy, R.B., Linkletter, C., Gray, G.A.: Optimization subject to hidden constraints via statistical emulation. Pac. J. Optim. 7, 467–478 (2011)

    MathSciNet  MATH  Google Scholar 

  55. 55.

    Ma, X., Tao, Z., Wang, Y., Yu, H., Wang, Y.: Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. C Emerg. Technol. 54, 187–197 (2015)

    Article  Google Scholar 

  56. 56.

    Matheron, G.: Principles of geostatistics. Econ. Geol. 58, 1246–1266 (1963)

    Article  Google Scholar 

  57. 57.

    Mikolov, T., Karafiát, M., Burget, L., Černockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Eleventh Annual Conference of the International Speech Communication Association (2010)

  58. 58.

    Mitchell, M.: An Introduction to Genetic Algorithms. MIT Press, Cambridge (1996)

    MATH  Google Scholar 

  59. 59.

    Moritz, S., Bartz-Beielstein, T.: imputeTS: Time Series Missing Value Imputation in R. R J. 9, 207–218 (2017)

    Article  Google Scholar 

  60. 60.

    Müller, J.: MISO: mixed integer surrogate optimization framework. Optim. Eng. 17(1), 177–203 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  61. 61.

    Müller, J.: SOCEMO: surrogate optimization of computationally expensive multiobjective problems. INFORMS J. Comput. 29(4), 581–596 (2017)

    MathSciNet  Article  Google Scholar 

  62. 62.

    Müller, J.: An algorithmic framework for the optimization of computationally expensive bi-fidelity black-box problems. INFOR Inf. Syst. Oper. Res. (2019).

    Article  Google Scholar 

  63. 63.

    Müller, J., Day, M.: Surrogate optimization of computationally expensive black-box problems with hidden constraints. INFORMS J. Comput. (2019).

    MathSciNet  Article  Google Scholar 

  64. 64.

    Müller, J., Woodbury, J.: GOSAC: global optimization with surrogate approximation of constraints. J. Glob. Optim. (2017).

    MathSciNet  Article  MATH  Google Scholar 

  65. 65.

    Müller, J., Shoemaker, C.A., Piché, R.: SO-MI: a surrogate model algorithm for computationally expensive nonlinear mixed-integer black-box global optimization problems. Comput. Oper. Res. 40, 1383–1400 (2013a)

    MathSciNet  MATH  Article  Google Scholar 

  66. 66.

    Müller, J., Shoemaker, C.A., Piché, R.: SO-I: a surrogate model algorithm for expensive nonlinear integer programming problems including global optimization applications. J. Glob. Optim. 59, 865–889 (2013b)

    MathSciNet  MATH  Article  Google Scholar 

  67. 67.

    Müller, J., Paudel, R., Shoemaker, C.A., Woodbury, J., Wang, Y., Mahowald, N.: CH4 parameter estimation in CLM4.5bgc using surrogate global optimization. Geosci. Model Dev. Discus. 8, 141–207 (2015)

    Article  Google Scholar 

  68. 68.

    Myers, R.H., Montgomery, D.C., Anderson-Cook, C.M.: Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 4th edn. John Wiley & Sons, Inc., Hoboken, NJ (2016)

    MATH  Google Scholar 

  69. 69.

    Najah, A., El-Shafie, A., Karim, O.A., El-Shafie, A.H.: Application of artificial neural networks for water quality prediction. Neural Comput. Appl. 22(1), 187–201 (2013)

    Article  Google Scholar 

  70. 70.

    Nuñez, L., Regis, R.G., Varela, K.: Accelerated random search for constrained global optimization assisted by radial basis function surrogates. J. Comput. Appl. Math. 340, 276–295 (2018)

    MathSciNet  MATH  Article  Google Scholar 

  71. 71.

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  72. 72.

    Powell, M.J.D.: Advances in Numerical Analysis, Vol. 2: Wavelets, Subdivision Algorithms and Radial Basis Functions. Oxford University Press, Oxford, pp. 105–210, Chapter The Theory of Radial Basis Function Approximation in 1990. Oxford University Press, London (1992)

    Google Scholar 

  73. 73.

    Powell, M.J.D.: Recent Research at Cambridge on Radial Basis Functions New Developments in Approximation Theory, pp. 215–232. Birkhäuser, Basel (1999)

    MATH  Book  Google Scholar 

  74. 74.

    Regis, R.G.: Stochastic radial basis function algorithms for large-scale optimization involving expensive black-box objective and constraint functions. Comput. Oper. Res. 38, 837–853 (2011)

    MathSciNet  Article  MATH  Google Scholar 

  75. 75.

    Regis, R.G., Shoemaker, C.A.: A stochastic radial basis function method for the global optimization of expensive functions. INFORMS J. Comput. 19, 497–509 (2007)

    MathSciNet  MATH  Article  Google Scholar 

  76. 76.

    Robbins, H., Monro, S.: A stochastic approximation method. Ann. Math. Stat. 22(3), 400–407 (1951)

    MathSciNet  MATH  Article  Google Scholar 

  77. 77.

    Rudy, S., Alla, A., Brunton, S.L., Kutz, J.N.: Data-driven identification of parametric partial differential equations. SIAM J. Appl. Dyn. Syst. 18(2), 643–660 (2019)

    MathSciNet  MATH  Article  Google Scholar 

  78. 78.

    Rumelhart, D.E., Hinton, G.E., Williams, R.J., et al.: Learning representations by back-propagating errors. Cognit. Model. 5(3), 1 (1988)

    MATH  Google Scholar 

  79. 79.

    Sahoo, S., Russo, T.A., Elliott, J., Foster, I.: Machine learning algorithms for modeling groundwater level changes in agricultural regions of the US. Water Resour. Res. 53(5), 3878–3895 (2017)

    Article  Google Scholar 

  80. 80.

    Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems (2012)

  81. 81.

    Steefel, C.I., Appelo, C.A.J., Arora, B., Jacques, D., Kalbacher, T., Kolditz, O., Lagneau, V., Lichtner, P.C., Mayer, K.U., Meeussen, J.C.L., et al.: Reactive transport codes for subsurface environmental simulation. Comput. Geosci. 19(3), 445–478 (2015)

    MathSciNet  MATH  Article  Google Scholar 

  82. 82.

    Sundermeyer, M., Schluter, R., Ney, H.: LSTM neural networks for language modeling. In: Proceedings of the 12th Annual Conference of the International Speech Communication Association, Portland, Oregon, USA, pp. 601–608 (2012)

  83. 83.

    Sutskever, I., Martens, J., Hinton, G.E.: Generating text with recurrent neural networks. In: Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 1017–1024 (2011)

  84. 84.

    Tabari, H., Talaee, P.H.: Multilayer perceptron for reference evapotranspiration estimation in a semiarid region. Neural Comput. Appl. 23(2), 341–348 (2013)

    Article  Google Scholar 

  85. 85.

    Taylor, M.: Liquid Assets: Improving Management of the State’s Groundwater Resources. Legislative Analyst’s Office, Technical Report (2010)

  86. 86.

    Toal, D., Keane, A.: Efficient multi-point aerodynamic design optimization via co-kriging. J. Aircr. 48(5), 1685–1695 (2011)

    Article  Google Scholar 

  87. 87.

    Trenn, S.: Multilayer perceptrons: approximation order and necessary number of hidden units. IEEE Trans. Neural Netw. 19(5), 836–844 (2008)

    Article  Google Scholar 

  88. 88.

    Wild, S.M., Shoemaker, C.A.: Global convergence of radial basis function trust-region algorithms for derivative-free optimization. SIAM Rev. 55, 349–371 (2013)

    MathSciNet  MATH  Article  Google Scholar 

  89. 89.

    Xu, T., Spycher, N., Sonnenthal, E., Zhang, G., Zheng, L., Pruess, K.: TOUGHREACT version 2.0: a simulator for subsurface reactive transport under non-isothermal multiphase flow conditions. Comput. Geosci. 37(6), 763–774 (2011)

    Article  Google Scholar 

  90. 90.

    Young, S.R., Rose, D.C., Karnowski, T.P., Lim, S.-H., Patton, R.M.: Optimizing deep learning hyper-parameters through an evolutionary algorithm. In: MLHPC’15 Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments, Volume Article No. 4 (2015)

  91. 91.

    Zhang, J., Zhu, Y., Zhang, X., Ye, M., Yang, J.: Developing a long short-term memory (LSTM) based model for predicting water table depth in agricultural areas. J. Hydrol. 561, 918–929 (2018)

    Article  Google Scholar 

Download references


This work was supported by Laboratory Directed Research and Development (LDRD) funding from Berkeley Lab, provided by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231.

Author information



Corresponding author

Correspondence to Juliane Müller.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 1515 KB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Müller, J., Park, J., Sahu, R. et al. Surrogate optimization of deep neural networks for groundwater predictions. J Glob Optim (2020).

Download citation


  • Hyperparameter optimization
  • Machine learning
  • Derivative-free optimization
  • Groundwater prediction
  • Surrogate models