Skip to main content

Variational Monte Carlo—bridging concepts of machine learning and high-dimensional partial differential equations


A statistical learning approach for high-dimensional parametric PDEs related to uncertainty quantification is derived. The method is based on the minimization of an empirical risk on a selected model class, and it is shown to be applicable to a broad range of problems. A general unified convergence analysis is derived, which takes into account the approximation and the statistical errors. By this, a combination of theoretical results from numerical analysis and statistics is obtained. Numerical experiments illustrate the performance of the method with the model class of hierarchical tensors.

This is a preview of subscription content, access via your institution.


  1. Bachmayr, M., Schneider, R., Uschmajew, A.: Tensor networks and hierarchical tensors for the solution of high-dimensional partial differential equations. In: Foundations of Computational Mathematics, pp 1–50 (2016)

    MathSciNet  MATH  Google Scholar 

  2. Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press (2016)

  3. Ceperley, D., Chester, G.V., Kalos, M.H.: Monte Carlo simulation of a many-fermion study, vol. 16, pp 3081–3099 (1977)

    Google Scholar 

  4. Lord, G.J., Powell, C.E., Shardlow, T.: An introduction to computational stochastic PDEs. Cambridge Texts in Applied Mathematics, p xii+ 503. Cambridge University Press, New York (2014)

    MATH  Google Scholar 

  5. Smith, R.C.: Uncertainty quantification: theory, implementation, and applications. vol. 12, Siam (2013)

  6. Schwab, C., Gittelson, C.J.: Sparse tensor discretizations of high-dimensional parametric and stochastic PDEs. Acta Numer. 20, 291–467 (2011)

    MathSciNet  MATH  Google Scholar 

  7. Le Maitre, O., Knio, O.M.: Spectral methods for uncertainty quantification: with applications to computational fluid dynamics. Springer Science & Business Media (2010)

  8. Matthies, H.G., Keese, A.: Galerkin methods for linear and nonlinear elliptic stochastic partial differential equations, vol. 194, pp 1295–1331 (2005)

    MathSciNet  MATH  Google Scholar 

  9. Deb, M.K., Babuška, I.M., Oden, J.T.: Solution of stochastic partial differential equations using Galerkin finite element techniques, vol. 190, pp 6359–6372 (2001)

    MathSciNet  MATH  Google Scholar 

  10. Ghanem, R.G., Spanos, P.D.: Stochastic finite elements: a spectral approach, p x + 214. Springer, New York (1991)

    MATH  Google Scholar 

  11. Friedman, J., Hastie, T., Tibshirani, R.: The elements of statistical learning. vol. 1. 10. Springer series in statistics. New York, NY, USA (2001)

  12. Cohen, A., Davenport, M.A., Leviatan, D.: On the stability and accuracy of least squares approximations (2011)

  13. Cohen, A., Migliorati, G.: Optimal weighted least-squares methods (2016)

  14. Lepage, G.P.: Vegas - an adaptive multi-dimensional integration program

  15. Ohl, T.: Vegas revisited: adaptive Monte Carlo integration beyond factorization. arXiv preprint arXiv:9806432 (1998)

  16. Giraldi, L., Liu, D., Matthies, H.G., Nouy, A.: To be or not to be intrusive? the solution of parametric and stochastic equations–?proper generalized decomposition. SIAM J. Sci. Comput. 37(1), A347–A368 (2015)

    MathSciNet  MATH  Google Scholar 

  17. Cohen, A., Devore, R., Schwab, C.: Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE’s. Anal. Appl. (Singap.) 9 (1), 11–47 (2011)

    MathSciNet  MATH  Google Scholar 

  18. Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best N-term Galerkin approximations for a class of elliptic sPDEs. Found. Comput. Math. 10(6), 615–646 (2010)

    MathSciNet  MATH  Google Scholar 

  19. Cohen, A., Devore, R., Schwab, C.: Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDE’s. Anal. Appl. (Singap.) 9 (1), 11–47 (2011)

    MathSciNet  MATH  Google Scholar 

  20. Cohen, A., DeVore, R.: Approximation of high-dimensional parametric PDEs. Acta Numerica. 24, 1–159 (2015)

    MathSciNet  MATH  Google Scholar 

  21. Bachmayr, M., Cohen, A., Dahmen, W.: Parametric PDEs: sparse or low-rank approximations?. arXiv preprint arXiv:1607.04444 (2016)

  22. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (Part I): data-driven solutions of nonlinear partial differential equations. In: arXiv preprint arXiv:1711.10561 (2017)

  23. Raissi, M., Perdikaris, P., Karniadakis, G.E.: Physics informed deep learning (Part II): data-driven discovery of nonlinear partial differential equations. In: arXiv preprint arXiv:1711.10566 (2017)

  24. Lu, L., Meng, X., Mao, Z., Karniadakis, G.E.: DeepXDE: a deep learning library for solving differential equations. In: arXiv preprint arXiv:1907.04502 (2019)

  25. Vapnik, V.: The nature of statistical learning theory. Springer science & business media (2013)

  26. Chervonenkis, A., Vapnik, V.: Theory of uniform convergence of frequencies of events to their probabilities and problems of search for an optimal solution from empirical data (Average risk minimization based on empirical data, showing relationship of problem to uniform convergence of averages toward expectation value). Autom. Remote. Control. 32, 207–217 (1971)

    MATH  Google Scholar 

  27. Vapnik, V.N., Chervonenkis, A.Y.: Necessary and sufficient conditions for the uniform convergence of means to their expectations. Theory of Probability & Its Applications 26(3), 532–553 (1982)

    MATH  Google Scholar 

  28. Valiant, L.G.: A theory of the learnable. Commun. ACM 27(11), 1134–1142 (1984)

    MATH  Google Scholar 

  29. Haussler, D.: Decision theoretic generalizations of the PAC model for neural net and other learning applications. Inf. Comput. 100(1), 78–150 (1992)

    MathSciNet  MATH  Google Scholar 

  30. Anthony, M., Bartlett, P.L.: Neural network learning: theoretical foundations. Cambridge University Press, Cambridge (2009)

    MATH  Google Scholar 

  31. Shalev-Shwartz, S.: Online learning and online convex optimization. Foundations and Trends\({\circledR }\) in Machine Learning 4(2), 107–194 (2011)

    MATH  Google Scholar 

  32. Cucker, F., Smale, S.: On the mathematical foundations of learning. Bull. Amer. Math. Soc. 39(01), 1–50 (2001)

    MathSciNet  MATH  Google Scholar 

  33. Cucker, F., Zhou, D.X.: Learning theory: an approximation theory viewpoint. Cambridge Monographs on Applied and Computational Mathematics. Cambridge University Press, Cambridge (2007)

    Google Scholar 

  34. Pinkus, A.: n-Widths in approximation theory. Springer, Berlin (1985)

    MATH  Google Scholar 

  35. Bohn, B., Griebel, M.: Error estimates for multivariate regression on discretized function spaces. SIAM J. Numer. Anal. 55(4), 1843–1866 (2017)

    MathSciNet  MATH  Google Scholar 

  36. Berner, J., Grohs, P., Jentzen, A.: Analysis of the generalization error: empirical risk minimization over deep artificial neural networks overcomes the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. In: arXiv preprint arXiv:1809.03062 (2018)

  37. Schölkopf, B., Smola, A.J., et al.: Learning with kernels: support vector machines, regularization, optimization, and beyond. MIT press, Cambridge (2002)

    Google Scholar 

  38. Cohen, N., Shashua, A.: Convolutional rectifier networks as generalized tensor decompositions (2016)

  39. Bishop, C.M.: Pattern recognition and machine learning. Springer, Berlin (2006)

    MATH  Google Scholar 

  40. James, G., Witten, D., Hastie, T., Tibshirani, R.: An introduction to statistical learning. Vol. 112. Springer, Berlin (2013)

    MATH  Google Scholar 

  41. Brenner, S., Scott, R.: The mathematical theory of finite element methods. Vol. 15. Springer Science & Business Media (2007)

  42. Braess, D.: Finite elements: theory, fast solvers, and applications in solid mechanics. Cambridge University Press, Cambridge (2007)

    MATH  Google Scholar 

  43. Temlyakov, V.: Approximation in learning theory. Constr. Approx. 27(1), 33–74 (2008)

    MathSciNet  MATH  Google Scholar 

  44. Bölcskei, H., Grohs, P., Kutyniok, G., Petersen, P.: Optimal approximation with sparsely connected deep neural networks. In CoRR arXiv:1705.01714 (2017)

  45. Henry-Labordere, P.: (Martingale) optimal transport and anomaly detection with neural networks: a primal-dual algorithm. In: arXiv e-prints arXiv:1904.04546 (2019)

  46. Han, J., Jentzen, A., Weinan, E.: Solving high-dimensional partial differential equations using deep learning. Proc. Natl. Acad. Sci. 115(34), 8505–8510 (2018)

    MathSciNet  MATH  Google Scholar 

  47. Weinan, E., Yu, B.: The deep Ritz method: a deep learning-based numerical algorithm for solving variational problems. Communications in Mathematics and Statistics 6, 1–12 (2018)

    MathSciNet  MATH  Google Scholar 

  48. Jentzen, A., Salimova, D., Welti, T.: A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients. In: arXiv e-prints arXiv:1809.07321 (2018)

  49. Hutzenthaler, M., Jentzen, A., Kruse, T., Nguyen, T.A.: A proof that rectified deep neural networks overcome the curse of dimensionality in the numerical approximation of semilinear heat equations. In: arXiv e-prints arXiv:1901.10854 (2019)

  50. Grohs, P., Hornung, F., Jentzen, A., von Wurstemberger, P.: A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations. In: arXiv e-prints arXiv:1809.02362 (2018)

  51. Macdonald, J.L.: Image classification with wavelet and Shearlet based scattering transforms. MA thesis. Technische Universität Berlin (2017)

  52. Rauhut, H., Schneider, R., Stojanac, ž.: Low rank tensor recovery via iterative hard thresholding. Linear Algebra Appl. 523, 220–262 (2017)

    MathSciNet  MATH  Google Scholar 

  53. Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complexity of neural networks. In: arXiv preprint arXiv:1712.06541 (2017)

  54. Barron, A.R., Klusowski, J.M.: Approximation and estimation for high-dimensional deep learning networks. In: arXiv preprint arXiv:1809.03090 (2018)

  55. Eigel, M., Gittelson, C.J., Schwab, C., Zander, E.: Adaptive stochastic Galerkin FEM. Comput. Methods Appl. Mech. Engrg. 270, 247–269 (2014)

    MathSciNet  MATH  Google Scholar 

  56. Eigel, M., Gittelson, C.J., Schwab, C., Zander, E.: A convergent adaptive stochastic Galerkin finite element method with quasi-optimal spatial meshes. ESAIM: Mathematical Modelling and Numerical Analysis 49(5), 1367–1398 (2015)

    MathSciNet  MATH  Google Scholar 

  57. Eigel, M., Pfeffer, M., Schneider, R.: Adaptive stochastic Galerkin FEM with hierarchical tensor representations. Numerische Mathematik 136(3), 765–803 (2017)

    MathSciNet  MATH  Google Scholar 

  58. Eigel, M., Marschall, M., Pfeffer, M., Schneider, R.: Adaptive stochastic Galerkin FEM for lognormal coefficients in hierarchical tensor representations (2018)

  59. Bespalov, A., Powell, C.E., Silvester, D.: Energy norm a posteriori error estimation for parametric operator equations. SIAM J. Sci. Comput. 36(2), A339–A363 (2014)

    MathSciNet  MATH  Google Scholar 

  60. Eigel, M., Merdon, C.: Local equilibration error estimators for guaranteed error control in adaptive stochastic higher-order Galerkin FEM. In WIAS Preprint 1997 (2014)

  61. Babuška, I., Nobile, F., Tempone, R.: A stochastic collocation method for elliptic partial differential equations with random input data. SIAM J. Numer. Anal. 45(3), 1005–1034 (2007)

    MathSciNet  MATH  Google Scholar 

  62. Nobile, F., Tempone, R., Webster, C.G.: An anisotropic sparse grid stochastic collocation method for partial differential equations with random input data. SIAM J. Numer. Anal. 46(5), 2411–2442 (2008)

    MathSciNet  MATH  Google Scholar 

  63. Oseledets, I.: Tensor-train decomposition. SIAM J. Sci. Comput. 33(5), 2295–2317 (2011)

    MathSciNet  MATH  Google Scholar 

  64. Hackbusch, W., Kühn, S.: A new scheme for the tensor representation. English. J. Fourier Anal. Appl. 15(5), 706–722 (2009)

    MathSciNet  MATH  Google Scholar 

  65. Hackbusch, W.: Tensor spaces and numerical tensor calculus. Vol. 42. Springer Science & Business Media (2012)

  66. Eigel, M., Neumann, J., Schneider, R., Wolf, S.: Non-intrusive tensor reconstruction for high dimensional random PDEs

  67. Hoang, V.H., Schwab, C.: N-term Wiener chaos approximation rate for elliptic PDEs with lognormal Gaussian random inputs. Math. Models Methods Appl. Sci. 24(4), 797–826 (2014)

    MathSciNet  MATH  Google Scholar 

  68. Øksendal, B.: Stochastic differential equations. In: Stochastic differential equations. Springer, pp 65–84 (2003)

  69. Pavliotis, G.A.: Stochastic processes and applications: diffusion processes, the Fokker-Planck and Langevin equations. Vol. 60. Springer (2014)

  70. Beck, C., Becker, S., Grohs, P., Jaafari, N., Jentzen, A.: Solving stochastic differential equations and Kolmogorov equations by means of deep learning. In: arXiv preprint arXiv:1806.00421 (2018)

  71. FEniCS Project: automated solution of differential equations by the finite element method.

  72. Huber, B., Wolf, S.: Xerus: a general purpose tensor library.

  73. Eigel, M., Gruhlke, R., Marschall, M., Trunschke, P., Zander, E.: ALEA - a python framework for spectral methods and low-rank approximations in uncertainty quantification

  74. Nouy, A.: Low-rank methods for high-dimensional approximation and model order reduction. Model Reduction and Approximation: Theory and Algorithms 15, 171 (2017)

    MathSciNet  Google Scholar 

  75. Arras, B., Bachmayr, M., Cohen, A.: Sequential sampling for optimal weighted least squares approximations in hierarchical spaces. In: arXiv preprint arXiv:1805.10801 (2018)

  76. Ciarlet, P.G., Kesavan, S., Ranjan, A., Vanninathan, M.: Lectures on the finite element method. Vol. 49. Tata Institute of fundamental research Bombay (1975)

  77. Prössdorf, S., Silbermann, B.: Numerical analysis for integral and related operator equations. Operator Theory 52, 5–534 (1991)

    MathSciNet  MATH  Google Scholar 

  78. Ern, A., Guermond, J.-L.: Theory and practice of finite elements. Vol. 159. Springer Science & Business Media (2013)

  79. Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26. Curran Associates, Inc., pp 315—323 (2013)

  80. Babanezhad, R., Ahmed, M.O., Virani, A., Schmidt, M., Konečný, J., Sallinen, S.: Stop wasting my gradients: practical SVRG (2015)

  81. Zhang, S., Choromanska, A., LeCun, Y.: Deep learning with elastic averaging SGD (2014)

  82. Mokhtari, A., Ribeiro, A.: First-order adaptive sample size methods to reduce complexity of empirical risk minimization (2017)

  83. Daneshmand, H., Lucchi, A., Hofmann, T.: Starting small – learning with adaptive sample sizes (2016)

  84. Eigel, M., Merdon, C., Neumann, J.: An adaptive multilevel Monte Carlo method with stochastic bounds for quantities of interest with uncertain data. SIAM/ASA Journal on Uncertainty Quantification 4(1), 1219–1245 (2016)

    MathSciNet  MATH  Google Scholar 

Download references


R. Schneider was supported by the DFG project ERA Chemistry and through Matheon by the Einstein Foundation Berlin. M. Eigel was supported by the DFG SPP1886.


Research of S. Wolf was funded in part by the DFG Matheon project SE10. P. Trunschke acknowledges support by the Berlin International Graduate School in Model and Simulation based Research (BIMoS).

Author information

Authors and Affiliations


Corresponding author

Correspondence to Philipp Trunschke.

Additional information

Communicated by: Anthony Nouy

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Eigel, M., Schneider, R., Trunschke, P. et al. Variational Monte Carlo—bridging concepts of machine learning and high-dimensional partial differential equations. Adv Comput Math 45, 2503–2532 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI:


  • Machine learning
  • Uncertainty quantification
  • Partial differential equations
  • Statistical learning
  • Tree tensor networks

Mathematics Subject Classification (2010)

  • 68Q32
  • 65D15
  • 62J02
  • 15A69
  • 41A63
  • 65N30
  • 35R60
  • 65C20
  • 65N22