We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

The Bayesian Approach to Inverse Problems | SpringerLink
Skip to main content

The Bayesian Approach to Inverse Problems

  • Living reference work entry
  • First Online:
Handbook of Uncertainty Quantification

Abstract

These lecture notes highlight the mathematical and computational structure relating to the formulation of, and development of algorithms for, the Bayesian approach to inverse problems in differential equations. This approach is fundamental in the quantification of uncertainty within applications involving the blending of mathematical models with data. The finite-dimensional situation is described first, along with some motivational examples. Then the development of probability measures on separable Banach space is undertaken, using a random series over an infinite set of functions to construct draws; these probability measures are used as priors in the Bayesian approach to inverse problems. Regularity of draws from the priors is studied in the natural Sobolev or Besov spaces implied by the choice of functions in the random series construction, and the Kolmogorov continuity theorem is used to extend regularity considerations to the space of Hölder continuous functions. Bayes’ theorem is derived in this prior setting, and here interpreted as finding conditions under which the posterior is absolutely continuous with respect to the prior, and determining a formula for the Radon-Nikodym derivative in terms of the likelihood of the data. Having established the form of the posterior, we then describe various properties common to it in the infinite-dimensional setting. These properties include well-posedness, approximation theory, and the existence of maximum a posteriori estimators. We then describe measure-preserving dynamics, again on the infinite-dimensional space, including Markov chain Monte Carlo and sequential Monte Carlo methods, and measure-preserving reversible stochastic differential equations. By formulating the theory and algorithms on the underlying infinite-dimensional space, we obtain a framework suitable for rigorous analysis of the accuracy of reconstructions, of computational complexity, as well as naturally constructing algorithms which perform well under mesh refinement, since they are inherently well defined in infinite dimensions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

References

  1. Adler, R.: The Geometry of Random Fields. SIAM, Philadelphia (1981)

    MATH  Google Scholar 

  2. Adams, R.A., Fournier, J.J.: Sobolev Spaces. Pure and Applied Mathematics. Elsevier, Oxford (2003)

    MATH  Google Scholar 

  3. Agapiou, S., Larsson, S., Stuart, A.M.: Posterior contraction rates for the Bayesian approach to linear ill-posed inverse problems. Stoch. Process. Appl. 123, 3828–3860 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  4. Agapiou, S., Stuart, A.M., Zhang, Y.X.: Bayesian posterior consistency for linear severely ill-posed inverse problems. J. Inverse Ill-posed Probl. 22(3), 297–321 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  5. Alexanderian, A., Petra, N., Stadler, G., Ghattas, O.: A fast and scalable method for A-optimal design of experiments for infinite-dimensional Bayesian nonlinear inverse problems. SIAM J. Sci. Comput. 38(1), A243–A272 (2016)

    Article  MathSciNet  MATH  Google Scholar 

  6. Alexanderian, A., Gloor, P., Ghattas, O.: On Bayesian A- and D-optimal experimental designs in infinite dimensions. http://arxiv.org/abs/1408.6323 (2016)

  7. Babuska, I., Tempone, R., Zouraris, G.: Galerkin finite element approximations of stochastic elliptic partial differential equations. SIAM J. Numer. Anal. 42, 800–825 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  8. Banks, H.T., Kunisch, H.: Estimation Techniques for Distributed Parameter Systems. Birkhauser, Boston (1989)

    Book  MATH  Google Scholar 

  9. Beskos, A., Pinski, F.J., Sanz-Serna, J.-M., Stuart, A.M.: Hybrid Monte-Carlo on Hilbert spaces. Stoch. Process. Appl. 121, 2201–2230 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  10. Beskos, A., Jasra, A., Muzaffer, E.A., Stuart, A.M.: Sequential Monte Carlo methods for Bayesian elliptic inverse problems. Stat. Comput. (2015)

    MATH  Google Scholar 

  11. Bernardo, J., Smith, A.: Bayesian Theory. Wiley, Chichester (1994)

    Book  MATH  Google Scholar 

  12. Billingsley, P.: Convergence of Probability Measures. Wiley, New York (1968)

    MATH  Google Scholar 

  13. Bochner, S.: Integration von Funktionen, deren Werte die Elemente eines Vektorraumes sind. Fund. Math. 20, 262–276 (1933)

    MATH  Google Scholar 

  14. Bogachev, V.I.: Gaussian Measures. Mathematical Surveys and Monographs, vol. 62. American Mathematical Society, Providence (1998)

    Google Scholar 

  15. Bogachev, V.I.: Measure Theory, vol. I, II. Springer, Berlin (2007)

    Book  MATH  Google Scholar 

  16. Bui-Thanh, T., Ghattas, O., Martin, J., Stadler, G.: A computational framework for infinite-dimensional Bayesian inverse problems. Part I: the linearized case, with application to global seismic inversion. SIAM J. Sci. Comput. 35(6), A2494–A2523 (2013)

    Google Scholar 

  17. Cotter, S., Dashti, M., Robinson, J., Stuart, A.: Bayesian inverse problems for functions and applications to fluid mechanics. Inverse Probl. 25. doi:10.1088/0266–5611/25/11/115008 (2009)

    Google Scholar 

  18. Cohen, A., DeVore, R., Schwab, C.: Convergence rates of best n-term Galerkin approximations for a class of elliptic sPDEs. Found. Comput. Math. 10, 615–646 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  19. Cohen, A., DeVore, R., Schwab, Ch.: Analytic regularity and polynomial approximation of parametric and stochastic elliptic PDEs. Anal. Appl. 9(1), 11–47 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  20. Cotter, S., Dashti, M., Stuart, A.: Approximation of Bayesian inverse problems. SIAM J. Numer. Anal. 48, 322–345 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  21. Cotter, S., Roberts, G., Stuart, A., White, D.: MCMC methods for functions: modifying old algorithms to make them faster. Stat. Sci. 28, 424–446 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  22. Csiszar, I., Körner, J.: Information Theory: Coding Theorems for Discrete Memoryless Systems. Cambridge University Press, Cambridge (2011)

    Book  MATH  Google Scholar 

  23. Dacorogna, B.: Introduction to the Calculus of Variations. Translated from the 1992 French original, 2nd edn. Imperial College Press, London (2009)

    Google Scholar 

  24. Dashti, M., Harris, S., Stuart, A.: Besov priors for Bayesian inverse problems. Inverse Probl. Imaging 6, 183–200 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  25. Dashti, M., Stuart, A.: Uncertainty quantification and weak approximation of an elliptic inverse problem. SIAM J. Numer. Anal. 49, 2524–2542 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  26. Del Moral, P.: Feynman-Kac Formulae. Springer, New York (2004)

    Book  MATH  Google Scholar 

  27. Da Prato, G.: An introduction to infinite-dimensional analysis. Universitext. Springer, Berlin (2006). Revised and extended from the 2001 original by Da Prato

    Google Scholar 

  28. DaPrato, G., Zabczyk, J.: Stochastic Equations in Infinite Dimensions. Encyclopedia of Mathematics and its Applications, vol. 44. Cambridge University Press, Cambridge (1992)

    Google Scholar 

  29. DaPrato, G., Zabczyk, J.: Ergodicity for Infinite Dimensional Systems. Cambridge University Press, Cambridge (1996)

    Book  Google Scholar 

  30. Dashti, M., Law, K.J.H., Stuart, A.M., Voss, J.: MAP estimators and their consistency in Bayesian nonparametric inverse problems. Inverse Probl. 29, 095017 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  31. Daubechies, I.: Ten lectures on wavelets. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 61. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (1992)

    Google Scholar 

  32. Engl, H., Hanke, M., Neubauer, A.: Regularization of Inverse Problems. Kluwer, Dordrecht/Boston (1996)

    Book  MATH  Google Scholar 

  33. Evans, L.: Partial Differential Equations. AMS, Providence (1998)

    MATH  Google Scholar 

  34. Feldman, J.: Equivalence and perpendicularity of Gaussian processes. Pac. J. Math. 8, 699–708 (1958)

    Article  MathSciNet  MATH  Google Scholar 

  35. Fernique, X.: Intégrabilité des vecteurs Gaussiens. C. R. Acad. Sci. Paris Sér. A-B 270, A1698–A1699 (1970)

    MathSciNet  MATH  Google Scholar 

  36. Franklin, J.: Well-posed stochastic extensions of ill-posed linear problems. J. Math. Anal. Appl. 31, 682–716 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  37. Gardiner, C.W.: Handbook of stochastic methods. Springer, Berlin, 2nd edn. (1985). For Physics, Chemistry and the Natural Sciences

    Google Scholar 

  38. Gibbs, A., Su, F.: On choosing and bounding probability metrics. Int. Stat. Rev. 70, 419–435 (2002)

    Article  MATH  Google Scholar 

  39. Graham, I.G., Kuo, F.Y., Nicholls, J.A., Scheichl, R., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo Finite Element methods for Elliptic PDEs with Log-normal Random Coefficients, Seminar for Applied Mathematics, ETH, SAM Report 2013–14 (2013)

    Google Scholar 

  40. Hairer, M.: Introduction to Stochastic PDEs. Lecture Notes http://arxiv.org/abs/0907.4178 (2009)

  41. Hairer, M., Stuart, A.M., Voss, J.: Analysis of SPDEs arising in path sampling, part II: the nonlinear case. Ann. Appl. Probab. 17, 1657–1706 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  42. Hairer, M., Stuart, A., Voss, J.: Sampling conditioned hypoelliptic diffusions. Ann. Appl. Probab. 21(2), 669–698 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  43. Hairer, M., Stuart, A., Voss, J., Wiberg, P.: Analysis of SPDEs arising in path sampling. Part I: the Gaussian case. Comm. Math. Sci. 3, 587–603 (2005)

    MathSciNet  MATH  Google Scholar 

  44. Hairer, M., Stuart, A., Vollmer, S.: Spectral gaps for a Metropolis-Hastings algorithm in infinite dimensions. Ann. Appl. Probab. 24(6), 2455–2490 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  45. Hájek, Y.: On a property of normal distribution of any stochastic process. Czechoslov. Math. J. 8(83), 610–618 (1958)

    MathSciNet  MATH  Google Scholar 

  46. Hastings, W.K.: Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1), 97–109 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  47. Helin, T., Burger, M.: Maximum a posteriori probability estimates in infinite-dimensional Bayesian inverse problems. Inverse Probl. 31(8), 085009 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  48. Hildebrandt, T.H.: Integration in abstract spaces. Bull. Am. Math. Soc. 59, 111–139 (1953)

    Article  MathSciNet  MATH  Google Scholar 

  49. Kahane, J.-P.: Some Random Series of Functions. Cambridge Studies in Advanced Mathematics, vol. 5. Cambridge University Press, Cambridge (1985)

    Google Scholar 

  50. Kantas, N., Beskos, A., Jasra, A.: Sequential Monte Carlo methods for high-dimensional inverse problems: a case study for the Navier-Stokes equations. arXiv preprint, arXiv:1307.6127

    Google Scholar 

  51. Kirsch, A.: An Introduction to the Mathematical Theory of Inverse Problems. Springer, New York (1996)

    Book  MATH  Google Scholar 

  52. Kühn, T., Liese, F.: A short proof of the Hájek-Feldman theorem. Teor. Verojatnost. i Primenen. 23(2), 448–450 (1978)

    MathSciNet  MATH  Google Scholar 

  53. Kaipio, J., Somersalo, E.: Statistical and Computational Inverse Problems. Applied Mathematical Sciences, vol. 160. Springer, New York (2005)

    Google Scholar 

  54. Kallenberg, O.: Foundations of Modern Probability, 2nd edn. Probability and its Applications. Springer, New York, (2002)

    Book  MATH  Google Scholar 

  55. Knapik, B., van Der Vaart, A., van Zanten, J.: Bayesian inverse problems with Gaussian priors. Ann. Stat. 39(5), 2626–2657 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  56. Knapik, B., van der Vaart, A., van Zanten, J.H.: Bayesian recovery of the initial condition for the heat equation Commun. Stat. Theory Methods 42, 1294–1313 (2013)

    Article  MATH  Google Scholar 

  57. Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo methods for very high dimensional integration: the standard (weighted Hilbert space) setting and beyond. ANZIAM J. 53, 1–37 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  58. Kuo, F.Y., Schwab, C., Sloan, I.H.: Quasi-Monte Carlo Finite Element Methods for a Class of Elliptic Partial Differential Equations with Random Coefficients. SIAM J. Numer. Anal. 50(6), 3351–3374 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  59. Kuo, F.Y., Sloan, I.H.: Lifting the curse of dimensionality. Not. AMS 52(11), 1320–1328 (2005)

    MathSciNet  MATH  Google Scholar 

  60. Kuo, F.Y., Sloan, I.H., Wasilikowski, G.W., Waterhouse, B.J.: Randomly shifted lattice rules with the optimal rate of convergence for unbounded integrands. J. Complex. 26, 135–160 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  61. Lasanen, S.: Discretizations of generalized random variables with applications to inverse problems. Ann. Acad. Sci. Fenn. Math. Dissertation, University of Oulu, 130 (2002)

    Google Scholar 

  62. Lasanen, S.: Measurements and infinite-dimensional statistical inverse theory. PAMM 7, 1080101–1080102 (2007)

    Article  Google Scholar 

  63. Lasanen, S.: Non-Gaussian statistical inverse problems part II: posterior convergence for approximated unknowns. Inverse Probl. Imaging 6(2), 267 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  64. Lasanen, S.: Non-Gaussian statistical inverse problems. Part I: posterior distributions. Inverse Probl. Imaging 6(2), 215–266 (2012)

    MathSciNet  MATH  Google Scholar 

  65. Lasanen, S.: Non-Gaussian statistical inverse problems. Part II: posterior distributions. Inverse Probl. Imaging 6(2), 267–287 (2012)

    MathSciNet  MATH  Google Scholar 

  66. Ledoux, M.: Isoperimetry and Gaussian analysis. In: Lectures on Probability Theory and Statistics (Saint-Flour, 1994). Lecture Notes in Mathematics, vol. 1648, pp. 165–294. Springer, Berlin (1996)

    Google Scholar 

  67. Lifshits, M.: Gaussian Random Functions. Mathematics and its Applications, vol. 322. Kluwer, Dordrecht (1995)

    Google Scholar 

  68. Lehtinen, M.S., Päivärinta, L., Somersalo, E.: Linear inverse problems for generalised random variables. Inverse Probl. 5(4), 599–612 (1989). http://stacks.iop.org/0266-5611/5/599

    Article  MathSciNet  MATH  Google Scholar 

  69. Lassas, M., Saksman, E., Siltanen, S.: Discretization-invariant Bayesian inversion and Besov space priors. Inverse Probl. Imaging 3, 87–122 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  70. Lunardi, A.: Analytic Semigroups and Optimal Regularity in Parabolic Problems. Progress in Nonlinear Differential Equations and their Applications, vol. 16. Birkhäuser Verlag, Basel (1995)

    Google Scholar 

  71. Mandelbaum, A.: Linear estimators and measurable linear transformations on a Hilbert space. Z. Wahrsch. Verw. Gebiete 65(3), 385–397 (1984). http://dx.doi.org/10.1007/BF00533743

    Article  MathSciNet  MATH  Google Scholar 

  72. Mattingly, J., Pillai, N., Stuart, A.: Diffusion limits of the random walk Metropolis algorithm in high dimensions. Ann. Appl. Probl. 22, 881–930 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  73. Metropolis, N., Rosenbluth, R., Teller, M., Teller, E.: Equations of state calculations by fast computing machines. J. Chem. Phys. 21, 1087–1092 (1953)

    Article  Google Scholar 

  74. Meyer, Y.: Wavelets and operators. Translated from the 1990 French original by D.H. Salinger. Cambridge Studies in Advanced Mathematics, vol. 37. Cambridge University Press, Cambridge (1992)

    Google Scholar 

  75. Meyn, S.P., Tweedie, R.L.: Markov Chains and Stochastic Stability. Communications and Control Engineering Series. Springer, London (1993)

    Book  MATH  Google Scholar 

  76. Neal, R.: Regression and classification using Gaussian process priors. http://www.cs.toronto.edu/~radford/valencia.abstract.html (1998)

  77. Niederreiter, H.: Random Number Generation and Quasi-Monte Carlo Methods. SIAM, Philadelphia (1994)

    MATH  Google Scholar 

  78. Norris, J.: Markov Chains. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, Cambridge (1998)

    MATH  Google Scholar 

  79. Oksendal, B.: Stochastic Differential Equations. An Introduction with Applications. Universitext, 6th edn. Springer, Berlin (2003)

    Google Scholar 

  80. Pazy, A.: Semigroups of Linear Operators and Applications to Partial Differential Equations. Springer, New York (1983)

    Book  MATH  Google Scholar 

  81. Petra, N., Martin, J., Stadler, G., Ghattas, O.: A computational framework for infinite-dimensional Bayesian inverse problems. Part II: stochastic Newton MCMC with application to ice sheet flow inverse problems. SIAM J. Sci. Comput. 36(4), A1525–A1555 (2014)

    MathSciNet  MATH  Google Scholar 

  82. Pillai, N.S., Stuart, A.M., Thiery, A.H.: Noisy gradient flow from a random walk in Hilbert space. Stoch. PDEs: Anal. Comput. 2, 196–232 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  83. Pinski, F., Stuart, A.: Transition paths in molecules at finite temperature. J. Chem. Phys. 132, 184104 (2010)

    Article  Google Scholar 

  84. Rebeschini, P., van Handel, R.: Can local particle filters beat the curse of dimensionality? Ann. Appl. Probab. 25(5), 2809–2866 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  85. Revuz, D., Yor, M.: Continuous Martingales and Brownian Motion. Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 293, 2nd edn. Springer, Berlin (1994)

    Google Scholar 

  86. Richter, G.: An inverse problem for the steady state diffusion equation. SIAM J. Appl. Math. 41(2), 210–221 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  87. Robinson, J.C.: Infinite-Dimensional Dynamical Systems. Cambridge Texts in Applied Mathematics. Cambridge University Press, Cambridge (2001)

    Book  MATH  Google Scholar 

  88. Rudin, W.: Real and Complex Analysis, 3rd edn. McGraw-Hill, New York (1987)

    MATH  Google Scholar 

  89. Schillings, C., Schwab, C.: Sparse, adaptive Smolyak quadratures for Bayesian inverse problems. Inverse Probl. 29, 065011 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  90. Schwab, C., Stuart, A.: Sparse deterministic approximation of Bayesian inverse problems. Inverse Probl. 28, 045003 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  91. Strauss, W.A.: Partial differential equations. An introduction, 2nd edn. Wiley, Chichester (2008)

    Google Scholar 

  92. Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  93. Stuart, A.M.: Uncertainty quantification in Bayesian inversion. ICM2014. Invited Lecture (2014)

    Google Scholar 

  94. Tierney, L.: A note on Metropolis-Hastings kernels for general state spaces. Ann. Appl. Probab. 8(1), 1–9 (1998)

    Article  MathSciNet  MATH  Google Scholar 

  95. Triebel, H.: Theory of function spaces. Mathematik und ihre Anwendungen in Physik und Technik [Mathematics and its Applications in Physics and Technology], vol. 38. Akademische Verlagsgesellschaft Geest & Portig K.-G., Leipzig (1983)

    Google Scholar 

  96. Triebel, H.: Theory of Function Spaces. II. Monographs in Mathematics, vol. 84. Birkhäuser Verlag, Basel (1992)

    Google Scholar 

  97. Triebel, H.: Theory of Function Spaces. III. Monographs in Mathematics, vol. 100. Birkhäuser Verlag, Basel (2006)

    Google Scholar 

  98. Villani, C.: Topics in Optimal Transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society, Providence (2003)

    Google Scholar 

  99. Vollmer, S.: Posterior consistency for Bayesian inverse problems through stability and regression results. Inverse Probl. 29, 125011 (2013)

    Article  MathSciNet  Google Scholar 

  100. Yosida, K.: Functional Analysis. Classics in Mathematics. Springer, Berlin (1995). Reprint of the sixth (1980) edition.

    Google Scholar 

  101. Yin, G., Zhang, Q.: Continuous-Time Markov Chains and Applications. Applications of Mathematics (New York), vol. 37. Springer, New York (1998)

    Google Scholar 

Download references

Acknowledgements

The authors are indebted to Martin Hairer for help in the development of these notes, and in particular for considerable help in structuring the Appendix, for the proof of Theorem 28 (which is a slight generalization to Hilbert scales of Theorem 6.16 in [40]) and for the proof of Corollary 5 (which is a generalization of Corollary 3.22 in [40] to the non-Gaussian setting and to Hölder, rather than Lipschitz, functions {ψ k }). They are also grateful to Joris Bierkens, Patrick Conrad, Matthew Dunlop, Shiwei Lan, Yulong Lu, Daniel Sanz-Alonso, Claudia Schillings and Aretha Teckentrup for careful proof-reading of the notes and related comments. AMS is grateful for various hosts who gave him the opportunity to teach this material in short course form at TIFR-Bangalore (Amit Apte), Göttingen (Axel Munk), PKU-Beijing (Teijun Li), ETH-Zurich (Christoph Schwab) and Cambridge CCA (Arieh Iserles), a process which led to refinements of the material; the authors are also grateful to the students on those courses, who provided useful feedback. The authors would also like to thank Sergios Agapiou and Yuan-Xiang Zhang for help in the preparation of these lecture notes, including type-setting, proof-reading, providing the proof of Lemma 3 and delivering problems classes related to the short courses. AMS is also pleased to acknowledge the financial support of EPSRC, ERC and ONR over the last decade, while the research that underpins this work has been developed.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masoumeh Dashti .

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 Function Spaces

In this subsection we briefly define the Hilbert and Banach spaces that will be important in our developments of probability and integration in infinite-dimensional spaces. As a consequence we pay particular attention to the issue of separability (the existence of a countable dense subset) which we require in that context. We primarily restrict our discussion to \(\mathbb{R}\)- or \(\mathbb{C}\)-valued functions, but the reader will easily be able to extend to \(\mathbb{R}^{n}\)-valued or \(\mathbb{R}^{n\times n}\)-valued situations, and we discuss Banach space-valued functions at the end of the subsection.

1.1.1 A.1.1 p and L p Spaces

Consider real-valued sequences \(u =\{ u_{j}\}_{j=1}^{\infty }\in \mathbb{R}^{\infty }.\) Let \(w \in \mathbb{R}^{\infty }\) denote a positive sequence so that w j > 0 for each \(j \in \mathbb{N}\). For every \(p \in [1,\infty )\), we define

$$\displaystyle{\ell_{w}^{p} =\ell_{ w}^{p}(\mathbb{N}; \mathbb{R}) ={\big\{ u \in \mathbb{R}\big\vert \sum _{ j=1}^{\infty }w_{ j}\vert u_{j}\vert ^{p} < \infty \big\}}.}$$

Then w p is a Banach space when equipped with the norm

$$\displaystyle{\|u\|_{\ell_{w}^{p}} ={\big(\sum _{ j=1}^{\infty }w_{ j}\vert u_{j}\vert ^{p}\big)}^{\frac{1} {p} }.}$$

In the case p = 2, the resulting spaces are Hilbert spaces when equipped with the inner product

$$\displaystyle{\langle u,v\rangle =\sum _{ j=1}^{\infty }w_{ j}u_{j}v_{j}.}$$

These \(\ell^{p}\) spaces, with \(p \in [1,\infty )\), are separable. Throughout we simply write \(\ell^{p}\) for the spaces \(\ell_{w}^{p}\) with \(w_{j} \equiv 1\). In the case w j ≡ 1, we extend the definition of Banach spaces to the case \(p = \infty \) by defining

$$\displaystyle{\ell^{\infty } =\ell ^{\infty }(\mathbb{N}; \mathbb{R}) ={\big\{ u \in \mathbb{R}\big\vert \mathrm{sup}_{ j\in \mathbb{N}}(\vert u_{j}\vert ) < \infty \big\}}}$$

and

$$\displaystyle{\|u\|_{\ell^{\infty }} =\mathrm{ sup}_{j\in \mathbb{N}}(\vert u_{j}\vert ).}$$

The space \(\ell^{\infty }\) of bounded sequences is not separable. Each element of the sequence u j is real valued, but the definitions may be readily extended to complex-valued, \(\mathbb{R}^{n}\)-valued, and \(\mathbb{R}^{n\times n}\)-valued sequences, replacing | ⋅ | by the complex modulus, the vector \(\ell^{p}\) norm, and the operator \(\ell^{p}\) norm on matrices, respectively.

We now extend the idea of p-summability to functions and to p-integrability. Let D be a bounded open set in \(\mathbb{R}^{d}\) with Lipschitz boundary and define the space \(\ L^{p} = L^{p}(D; \mathbb{R})\) of Lebesgue measurable functions \(f : D \rightarrow \mathbb{R}\) with norm \(\|\cdot \|_{L^{p}(D)}\) defined by

$$\displaystyle\begin{array}{rcl} \|f\|_{L^{p}(D)} := \left \{\begin{array}{cc} \left (\int _{D}\vert f\vert ^{p}\,dx\right )^{\frac{1} {p} }&\mbox{ for }\,1 \leq p < \infty \\ \mbox{ ess}\sup \nolimits_{D}\vert f\vert & \mbox{ for }\,p = \infty. \end{array} \right.& & {}\\ \end{array}$$

In the above definition we have used the notation

$$\displaystyle{\mbox{ ess}\sup _{D}\vert f\vert =\inf \left \{C : \vert f\vert \leq C\mbox{ a.e. on }D\right \}.}$$

Here a. e. is with respect to Lebesgue measure and the integral is, of course, the Lebesgue integral. Sometimes we drop explicit reference to the set D in the norm and simply write \(\|\cdot \|_{L^{p}}\). For Lebesgue measurable functions \(f : D \rightarrow \mathbb{R}^{n}\), the norm is readily extended replacing | f | under the integral by the vector p-norm on \(\mathbb{R}^{n}\). Likewise we may consider Lebegue measurable \(f : D \rightarrow \mathbb{R}^{n\times n}\), using the operator p-norm on \(\mathbb{R}^{n\times n}\). In all these cases, we write L p(D) as shorthand for L p(D; X) where \(X = \mathbb{R},\, \mathbb{R}^{n}\) or \(\mathbb{R}^{n\times n}\). Then L p(D) is the vector space of all (equivalence classes of) measurable functions \(f : D \rightarrow \mathbb{R}\) for which \(\|f\|_{L^{p}(D)} < \infty \). The space L p(D) is separable for \(p \in [1,\infty )\), while \(L^{\infty }(D)\) is not separable. We define periodic versions of L p(D), denoted by L per p(D), in the case where D is a unit cube; these spaces are defined as the completion of \(C^{\infty }\) periodic functions on the unit cube, with respect to the L p-norm. If we define \(\mathbb{T}^{d}\) to be the d-dimensional unit torus, then we write \(L_{\mathrm{per}}^{p}([0,1]^{d}) = L^{p}(\mathbb{T}^{d})\). Again these spaces are separable for \(1 \leq p < \infty \), but not for \(p = \infty.\)

1.1.2 A.1.2 Continuous and Hölder Continuous Functions

Let D be an open and bounded set in \(\mathbb{R}^{d}\) with Lipschitz boundary. We will denote by \(C(\overline{D}, \mathbb{R})\), or simply \(C(\overline{D})\), the space of continuous functions \(f : \overline{D} \rightarrow \mathbb{R}\). When equipped with the supremum norm,

$$\displaystyle{\|f\|_{C(\overline{D})} =\sup _{x\in \overline{D}}\vert f(x)\vert,}$$

\(C(\overline{D})\) is a Banach space. Building on this we define the space \(C^{0,\gamma }(\overline{D})\) to be the space of functions in \(C(\overline{D})\) which are Hölder with any exponent γ ∈ (0, 1] with norm

$$\displaystyle{ \|f\|_{C^{0,\gamma }(\overline{D})} =\sup _{x\in \overline{D}}\vert f(x)\vert +\sup _{x,y\in \overline{D}}{\big(\frac{\vert f(x) - f(y)\vert } {\vert x - y\vert ^{\gamma }} \big)}. }$$
(99)

The case γ = 1 corresponds to Lipschitz functions.

We remark that \(C(\overline{D})\) is separable since \(\overline{D} \subset \mathbb{R}^{d}\) is compact here. The space of Hölder functions \(C^{0,\gamma }(\overline{D}; \mathbb{R})\) is, however, not separable. Separability can be recovered by working in the subset of \(C^{0,\gamma }(\overline{D}; \mathbb{R})\) where, in addition to (99) being finite,

$$\displaystyle{\lim _{y\rightarrow x}\frac{\vert f(x) - f(y)\vert } {\vert x - y\vert ^{\gamma }} = 0,}$$

uniformly in x; we denote the resulting separable space by \(C_{0}^{0,\gamma }(\overline{D}, \mathbb{R}).\) This is analogous to the fact that the space of bounded measurable functions is not separable, while the space of continuous functions on a compact domain is. Furthermore it may be shown that \(C^{0,\gamma '} \subset C_{0}^{0,\gamma }\) for every γ′ > γ. All of the preceding spaces can be generalized to functions \(C^{0,\gamma }(\overline{D}, \mathbb{R}^{n})\) and \(C_{0}^{0,\gamma }(\overline{D}, \mathbb{R}^{n});\) they may also be extended to periodic functions on the unit torus \(\mathbb{T}^{d}\) found by identifying opposite faces of the unit cube [0, 1]d. The same separability issues arise for these generalizations.

1.1.3 A.1.3 Sobolev Spaces

We define Sobolev spaces of functions with integer number of derivatives, extend to fractional and negative derivatives, and make the connection with Hilbert scales . Here D is a bounded open set in \(\mathbb{R}^{d}\) with Lipschitz boundary. In the context of a function \(u \in L^{2}(D)\), we will use the notation \(\frac{\partial u} {\partial x_{i}}\) to denote the weak derivative with respect to x i and the notation ∇u for the weak gradient.

The Sobolev space W r,p (D) consists of all L p-integrable functions \(u : D \rightarrow \mathbb{R}\) whose α th order weak derivatives exist and are L p-integrable for all | α | ≤ r:

$$\displaystyle{ W^{r,p}(D) = \left \{u\big\vert D^{\alpha }u \in L^{p}(D)\mbox{ for }\vert \alpha \vert \leq r\right \} }$$
(100)

with norm

$$\displaystyle{ \|u\|_{W^{r,p}(D)} = \left \{\begin{array}{ll} \left (\sum \nolimits_{\vert \alpha \vert \leq r}\|D^{\alpha }u\|_{L^{p}(D)}^{p}\right )^{\frac{1} {p} }&\mbox{ for }\;1 \leq p < \infty, \\ \sum \nolimits_{\vert \alpha \vert \leq r}\|D^{\alpha }u\|_{L^{\infty }(D)} &\mbox{ for }\;p = \infty. \end{array} \right. }$$
(101)

We denote \(W^{r,2}(D)\) by H r(D). We define periodic versions of H s(D), denoted by \(H_{\mathrm{per}}^{s}(D)\), in the case where D is a unit cube [0, 1]d; these spaces are defined as the completion of \(C^{\infty }\) periodic functions on the unit cube, with respect to the H s-norm. If we define \(\mathbb{T}^{d}\) to be d-dimensional unit torus, we then write \(H^{s}(\mathbb{T}^{d}) = H_{\mathrm{per}}^{s}([0,1]^{d})\).

The spaces H s(D) with D a bounded open set in \(\mathbb{R}^{d}\), and \(H_{\mathrm{per}}^{s}([0,1]^{d})\), are separable Hilbert spaces. In particular if we define the inner-product \((\cdot,\cdot )_{L^{2}(D)}\) on L 2(D) by

$$\displaystyle{(u,v)_{L^{2}(D)} :=\int _{D}u(x)v(x)dx}$$

and define the resulting norm \(\|\cdot \|_{L^{2}(D)}\) by the identity

$$\displaystyle{\|u\|_{L^{2}(D)}^{2} = (u,u)_{ L^{2}(D)}}$$

then the space H 1(D) is a separable Hilbert space with inner product

$$\displaystyle{ \langle u,v\rangle _{H^{1}(D)} = (u,v)_{L^{2}(D)} + (\nabla u,\nabla v)_{L^{2}(D)} }$$

and norm (101) with p = 2. Likewise the space H 0 1(D) is a separable Hilbert space with inner product

$$\displaystyle{ \langle u,v\rangle _{H_{0}^{1}(D)} = (\nabla u,\nabla v)_{L^{2}(D)} }$$

and norm

$$\displaystyle{ \|u\|_{H_{0}^{1}(D)} =\| \nabla u\|_{L^{2}(D)}. }$$
(102)

As defined above, Sobolev spaces concern integer numbers of derivatives. However the concept can be extended to fractional derivatives, and there is then a natural connection to Hilbert scales of functions. To explain this we start our development in the periodic setting. Recall that, given an element u in \(L^{2}(\mathbb{T}^{d})\), we can decompose it as a Fourier series:

$$\displaystyle{ u(x) =\sum _{k\in \mathbb{Z}^{d}}u_{k}e^{2\pi i\langle k,x\rangle }\;, }$$

where the identity holds for (Lebesgue) almost every \(x \in \mathbb{T}^{d}\). Furthermore, the L 2 norm of u is given by Parseval’s identity \(\|u\|_{L^{2}}^{2} =\sum \vert u_{k}\vert ^{2}\). The fractional Sobolev space \(H^{s}(\mathbb{T}^{d})\) for s ≥ 0 is given by the subspace of functions \(u \in L^{2}(\mathbb{T}^{d})\) such that

$$\displaystyle{ \|u\|_{H^{s}}^{2} :=\sum _{ k\in \mathbb{Z}^{d}}(1 + 4\pi ^{2}\vert k\vert ^{2})^{s}\vert u_{ k}\vert ^{2} < \infty \;. }$$
(103)

Note that this is a separable Hilbert space by virtue of \(\ell_{w}^{2}\) being separable. Note also that \(H^{0}(\mathbb{T}^{d}) = L^{2}(\mathbb{T}^{d})\) and that, for positive integer s, the definition agrees with the definition \(H^{s}(\mathbb{T}^{d}) = W^{s,2}(\mathbb{T}^{d})\) obtained from (100) with the obvious generalization from D to \(\mathbb{T}^{d}\). For s < 0, we define \(H^{s}(\mathbb{T}^{d})\) as the closure of L 2 under the norm (103). The spaces \(H^{s}(\mathbb{T}^{d})\) for s < 0 may also be defined via duality. The resulting spaces H s are separable for all \(s \in \mathbb{R}\).

We now link the spaces \(H^{s}(\mathbb{T}^{d})\) to a specific Hilbert scale of spaces. Hilbert scales are families of spaces defined by \(\mathcal{D}(A^{s/2})\) for A a positive, unbounded, self-adjoint operator on a Hilbert space. To view the fractional Sobolev spaces from this perspective, let \(A = I -\bigtriangleup \) with domain \(H^{2}(\mathbb{T}^{d})\), noting that the eigenvalues of A are simply 1 + 4π 2 | k | 2 for \(k \in \mathbb{Z}^{d}\). We thus see that, by the spectral decomposition theorem, \(H^{s} = \mathcal{D}(A^{s/2})\), and we have \(\|u\|_{H^{s}} =\| A^{s/2}u\|_{L^{2}}\). Note that we may work in the space of real-valued functions where the eigenfunctions of A, \(\{\Phi _{j}\}_{j=1}^{\infty }\), comprise sine and cosine functions; the eigenvalues of A, when ordered on a one-dimensional lattice, then satisfy \(\alpha _{j} \asymp j^{2/d}\). This is relevant to the more general perpsective of Hilbert scales that we now introduce.

We can now generalize the previous construction of fractional Sobolev spaces to more general domains than the torus. The resulting spaces do not, in general, coincide with Sobolev spaces, because of the effect of the boundary conditions of the operator A used in the construction. On an arbitrary bounded open set \(D \subset \mathbb{R}^{d}\) with Lipschitz boundary, we consider a positive self-adjoint operator A satisfying Assumption 1 so that its eigenvalues satisfy \(\alpha _{j} \asymp j^{2/d}\); then we define the spaces \(\mathcal{H}^{s} = \mathcal{D}(A^{s/2})\) for s > 0. Given a Hilbert space \((H,\langle \cdot,\cdot \rangle,\|\cdot \|)\) of real-valued functions on a bounded open set D in \(\mathbb{R}^{d}\), we recall from Assumption 1 the orthonormal basis for H denoted by \(\{\Phi _{j}\}_{j=1}^{\infty }\). Any uH can be written as

$$\displaystyle{u =\sum _{ j=1}^{\infty }\langle u,\varphi _{ j}\rangle \varphi _{j}.}$$

Thus

$$\displaystyle{ \mathcal{H}^{s} =\big\{ u : D \rightarrow \mathbb{R}\big\vert \|w\|_{ \mathcal{H}^{s}}^{2} < \infty \big\} }$$
(104)

where, for \(u_{j} =\langle u,\varphi _{j}\rangle\),

$$\displaystyle{\|u\|_{\mathcal{H}^{s}}^{2} =\sum _{ j=1}^{\infty }j^{\frac{2s} {d} }\vert u_{j}\vert ^{2}.}$$

In fact \(\mathcal{H}^{s}\) is a Hilbert space: for \(v_{j} =\langle v,\varphi _{j}\rangle\) we may define the inner product

$$\displaystyle{\langle u,v\rangle _{\mathcal{H}^{s}} =\sum _{ j=1}^{\infty }j^{\frac{2s} {d} }u_{j}v_{j}.}$$

For any s > 0, the Hilbert space \((\mathcal{H}^{s},\langle \cdot,\cdot \rangle _{\mathcal{H}^{t}},\|\cdot \|_{\mathcal{H}^{t}})\) is a subset of the original Hilbert space H; for s < 0 the spaces are defined by duality and are supersets of H. Note also that we have Parseval-like identities showing that the \(\mathcal{H}^{s}\) norm on a function u is equivalent to the \(\ell_{w}^{2}\) norm on the sequence \(\{u_{j}\}_{j=1}^{\infty }\) with the choice \(w_{j} = j^{2s/d}\). The spaces \(\mathcal{H}^{s}\) are separable Hilbert spaces for any \(s \in \mathbb{R}.\)

1.1.4 A.1.4 Other Useful Function Spaces

As mentioned in passing, all of the preceding function spaces can be extended to functions taking values in \(\mathbb{R}^{n}, \mathbb{R}^{n\times n}\); thus, we may then write \(C(D; \mathbb{R}^{n}),L^{p}(D; \mathbb{R}^{n})\), and \(H^{s}(D; \mathbb{R}^{n})\), for example. More generally we may wish to consider functions taking values in a separable Banach space E. For example, when we are interested in solutions of time-dependent PDEs, then these may be formulated as ordinary differential equations taking values in a separable Banach space E, with norm \(\|\cdot \|_{E}\). It is then natural to consider Banach spaces such as \(L^{2}((0,T);E)\) and \(C([0,T];E)\) with norms

$$\displaystyle{\|u\|_{L^{2}((0,T);E)} = \sqrt{{\big(\int _{0 }^{T }\|u(\cdot, t)\|_{E }^{2 }dt\big)}},\quad \|u\|_{C([0,T];E)} =\sup _{t\in [0,T]}\|u(\cdot,t)\|_{E}.}$$

These norms can be generalized in a variety of ways, by generalizing the norm on the time variable.

The preceding idea of defining Banach space-valued L p spaces defined on an interval (0, T) can be taken further to define Banach space-valued L p spaces defined on a measure space. Let \((\mathcal{M},\nu )\) any countably generated measure space, like, for example, any Polish space (a separable completely metrizable topological space) equipped with a positive Radon measure ν. Again let E denote a separable Banach space. Then \(L_{\nu }^{p}(\mathcal{M};E)\) is the space of functions \(u : \mathcal{M}\rightarrow E\) with norm (in this defintion of norm we use Bochner integration, defined in the next subsection)

$$\displaystyle{\|u\|_{L_{\nu }^{p}(\mathcal{M};E)} ={\big(\int _{\mathcal{M}}\|u(x)\|_{E}^{p}\nu (dx)\big)}^{\frac{1} {p} }.}$$

For \(p \in (1,\infty )\) these spaces are separable. However, separability fails to hold for \(p = \infty.\) We will use these Banach spaces in the case where ν is a probability measure \(\mathbb{P}\), with corresponding expectation \(\mathbb{E}\), and we then have

$$\displaystyle{\|u\|_{L_{\mathbb{P}}^{p}(\mathcal{M};E)} ={\big( \mathbb{E}{\big(\|u\|_{E}^{p}\big)}\big)}^{\frac{1} {p} }.}$$

1.1.5 A.1.5 Interpolation Inequalities and Sobolev Embeddings

Here we state some useful interpolation inequalities and use them to prove a Sobolev embedding result, all in the context of fractional Sobolev spaces, in the generalized sense defined through a Hilbert scale of functions.

Let \(p,q \in [1,\infty ]\) be a pair of conjugate exponents so that \(p^{-1} + q^{-1} = 1\). Then for any positive real a, b, we have the Young inequality

$$\displaystyle{ ab \leq \frac{a^{p}} {p} + \frac{b^{q}} {q} \;. }$$

As a corollary of this elementary bound, we obtain the following Hölder inequality. Let \((\mathcal{M},\mu )\) be a measure space and denote the norm \(\|\cdot \|_{L_{\nu }^{p}(\mathcal{M};\mathbb{R})}\) by \(\|\cdot \|_{p}.\) For \(p,q \in [1,\infty ]\) as above and \(u,v: \mathcal{M}\rightarrow \mathbb{R}\) a pair of measurable functions, we have

$$\displaystyle{ \int _{\mathcal{M}}\vert u(x)v(x)\vert \,\mu (dx) \leq \| u\|_{p}\,\|v\|_{q}. }$$
(105)

From this Hölder-like inequality, the following interpolation bound results: let α ∈ [0, 1] and let L denote a (possibly unbounded) self-adjoint operator on the Hilbert space \((H,\langle \cdot,\cdot \rangle,\|\cdot \|)\). Then, the bound

$$\displaystyle{ \|L^{\alpha }u\| \leq \| Lu\|^{\alpha }\|u\|^{1-\alpha } }$$
(106)

holds for every \(u \in \mathcal{D}(L) \subset H.\)

Now assume that A is a self-adjoint unbounded operator on L 2(D) with \(D \subset \mathbb{R}^{d}\) a bounded open set with Lipschitz boundary. Assume further that A has eigenvalues \(\alpha _{j} \asymp j^{\frac{2} {d} }\) and define the Hilbert scale of spaces \(\mathcal{H}^{t} = \mathcal{D}(A^{\frac{t} {2} })\). An immediate corollary of the bound (106), obtained by choosing \(H = \mathcal{H}^{s}\), \(L = A^{\frac{t-s} {2} }\), and \(\alpha = (r - s)/(t - s)\), is:

Lemma 15.

Let Assumption  1 hold. Then for any t > s, any r ∈ [s,t] and any \(u \in \mathcal{H}^{t}\) , it follows that

$$\displaystyle{ \|u\|_{\mathcal{H}^{r}}^{t-s} \leq \| u\|_{ \mathcal{H}^{t}}^{r-s}\|u\|_{ \mathcal{H}^{s}}^{t-r}. }$$

It is of interest to bound the L p norm of a function in terms of one of the fractional Sobolev norms , or more generally in terms of norms from a Hilbert scale . To do this we need to not only make assumptions on the eigenvalues of the operator A which defines the Hilbert scale , but also on the behavior of the corresponding orthonormal basis of eigenfunctions in \(L^{\infty }\). To this end we let Assumption 2 hold. It then turns out that bounding the \(L^{\infty }\) norm is rather straightforward and we start with this case.

Lemma 16.

Let Assumption  2 hold and define the resulting Hilbert scale of spaces \(\mathcal{H}^{s}\) by (104) . Then for every \(s > \frac{d} {2}\) , the space \(\mathcal{H}^{s}\) is contained in the space \(L^{\infty }(D)\) and there exists a constant K 1 such that \(\|u\|_{L^{\infty }} \leq K_{1}\|u\|_{\mathcal{H}^{s}}\).

Proof.

It follows from Cauchy-Schwarz that

$$\displaystyle{ \frac{1} {C}\|u\|_{L^{\infty }} \leq \sum _{k\in \mathbb{Z}^{d}}\vert u_{k}\vert \leq {\big(\sum _{k\in \mathbb{Z}^{d}}(1 + \vert k\vert ^{2})^{s}\vert u_{ k}\vert ^{2}\big)}^{1/2}{\big(\sum _{ k\in \mathbb{Z}^{d}}(1 + \vert k\vert ^{2})^{-s}\big)}^{1/2}\;. }$$

Since the sum in the second factor converges if and only if \(s > \frac{d} {2}\), the claim follows.

As a consequence of Lemma 16, we are able to obtain a more general Sobolev embedding for all L p spaces:

Theorem 28 (Sobolev Embeddings).

Let Assumption  2 hold, define the resulting Hilbert scale of spaces \(\mathcal{H}^{s}\) by (104) and assume that \(p \in [2,\infty ]\) . Then, for every \(s > \frac{d} {2} -\frac{d} {p}\) , the space \(\mathcal{H}^{s}\) is contained in the space L p (D), and there exists a constant K 2 such that \(\|u\|_{L^{p}} \leq K_{2}\|u\|_{\mathcal{H}^{s}}\).

Proof.

The case p = 2 is obvious and the case \(p = \infty \) has already been shown, so it remains to show the claim for \(p \in (2,\infty )\). The idea is to divide the space of eigenfunctions into “blocks” and to estimate separately the L p norm of every block. More precisely, we define a sequence of functions u (n) by

$$\displaystyle{ u^{(-1)} = u_{ 0}\,\varphi _{0}\;,\quad u^{(n)} =\sum _{ 2^{n}\leq j<2^{n+1}}u_{j}\,\varphi _{j}\;, }$$

where the \(\varphi _{j}\) are an orthonormal basis of eigenfunctions for A, so that \(u =\ \sum \nolimits_{n\geq -1}u^{(n)}\). For n ≥ 0 the Hölder inequality gives

$$\displaystyle{ \|u^{(n)}\|_{ L^{p}}^{p} \leq \| u^{(n)}\|_{ L^{2}}^{2}\|u^{(n)}\|_{ L^{\infty }}^{p-2}\;. }$$
(107)

Now set \(s' = \frac{d} {2}+\epsilon\) for some ε > 0 and note that the construction of u (n), together with Lemma 16, gives the bounds

$$\displaystyle{ \|u^{(n)}\|_{ L^{2}} \leq K2^{-ns/d}\|u^{(n)}\|_{ \mathcal{H}^{s}}\;,\quad \|u^{(n)}\|_{ L^{\infty }} \leq K_{1}\|u^{(n)}\|_{ \mathcal{H}^{s'}} \leq K2^{n(s'-s)/d}\|u^{(n)}\|_{ \mathcal{H}^{s}}\;. }$$
(108)

Inserting this into (107), we obtain (possibly for an enlarged K)

$$\displaystyle\begin{array}{rcl} \|u^{(n)}\|_{ L^{p}}& \leq & K\|u^{(n)}\|_{ \mathcal{H}^{s}}2^{n{\big((s'-s)\frac{p-2} {p} -\frac{2s} {p} \big)}/d} = K\|u^{(n)}\|_{\mathcal{H}^{s}}2^{n{\big(\epsilon \frac{p-2} {p} +\frac{d} {2} -\frac{d} {p}-s\big)}/d} {}\\ & \leq & K\|u\|_{\mathcal{H}^{s}}2^{n{\big(\epsilon +\frac{d} {2} -\frac{d} {p}-s\big)}/d}\;. {}\\ \end{array}$$

It follows that \(\|u\|_{L^{p}} \leq \vert u_{0}\vert +\sum \nolimits_{n\geq 0}\|u^{(n)}\|_{L^{p}} \leq K_{2}\|u\|_{\mathcal{H}^{s}}\), provided that the exponent appearing in this expression is negative which, since ε can be chosen arbitrarily small, is precisely the case whenever \(s > \frac{d} {2} -\frac{d} {p}\).

1.2 A.2 Probability and Integration In Infinite Dimensions

1.2.1 A.2.1 Product Measure for i.i.d. Sequences

Perhaps the most straightforward setting in which probability measures in infinite dimensions are encountered is when studying i.i.d. sequences of real-valued random variables. Furthermore, this is our basic building block for the construction of random functions – see Sect. 2.1 – so we briefly overview the subject. Let \(\mathbb{P}_{0}\) be a probability measure on \(\mathbb{R}\) so that \((\mathbb{R},\mathsf{B}(\mathbb{R}), \mathbb{P}_{0})\) is a probability space and consider the i.i.d. sequence \(\xi :=\{\xi _{j}\}_{j=1}^{\infty }\) with \(\xi _{1} \sim \mathbb{P}_{0}\).

The construction of such a sequence can be formalised as follows. We consider \(\xi\) as a random variable taking values in the space \(\mathbb{R}^{\infty }\) endowed with the product topology, i.e. the smallest topology for which the projection maps \(\ell_{n}: \xi \mapsto \xi _{n}\) are continuous for every n. This is a complete metric space; an example of a distance generating the product topology is given by

$$\displaystyle{d(x,y) =\sum _{ n=1}^{\infty }2^{-n} \frac{\vert x_{n} - y_{n}\vert } {1 + \vert x_{n} - y_{n}\vert }\;.}$$

Since we are considering a countable product, the resulting \(\sigma\)-algebra \(\mathsf{B}(\mathbb{R}^{\infty })\) coincides with the product \(\sigma\)-algebra, which is the smallest \(\sigma\)-algebra for which all \(\ell_{n}\)’s are measurable.

In what follows we need the notion of the pushforward of a probability measure under a measurable map. If f : B 1B 2 is a measurable map between two measurable spaces \({\big(B_{i},\mathsf{B}(B_{i})\big)}\) i = 1, 2 and μ 1 is a probability measure on B 1, then \(\mu _{2} = f^{\sharp }\mu _{1}\) denotes the pushforward probability measure on B 2 defined by \(\mu _{2}(A) =\mu _{1}{\big(f^{-1}(A)\big)}\) for all \(A \in \mathsf{B}(B_{2})\). (The notation f μ is sometimes used in place of \(f^{\sharp }\mu\), but we reserve this notation for adjoints.) Recall that in Sect. 2, we construct random functions via the random series (11) whose coefficients are constructed from an i.i.d sequence. Our interest is in studying the pushforward measure \(\mathcal{F}^{\sharp }\mathbb{P}_{0}\) where \(\mathcal{F} : \mathbb{R}^{\infty }\rightarrow X'\) is defined by

$$\displaystyle{ \mathcal{F}\xi = m_{0} +\sum _{ j=1}^{\infty }\gamma _{ j}\xi _{j}\phi _{j}. }$$
(109)

In particular Sect. 2 is devoted to determing suitable separable Banach spaces X′ on which to define the pushforward measure.

With the pushforward notation at hand, we may also describe Kolmogorov’s extension theorem which can be stated as follows.

Theorem 29 ((Kolmogorov Extension)).

Let X be a Polish space and let I be an arbitrary set. Assume that, for any finite subset \(A \subset I\) , we are given a probability measure \(\mathbb{P}_{A}\) on the finite product space X A . Assume furthermore that the family of measures \(\{\mathbb{P}_{A}\}\) is consistent in the sense that if \(B \subset A\) and \(\varPi _{A,B}: X^{A} \rightarrow X^{B}\) denotes the natural projection map, then \(\varPi _{A,B}^{\sharp }\mathbb{P}_{A} = \mathbb{P}_{B}\) . Then, there exists a unique probability measure \(\mathbb{P}\) on X I endowed with the product \(\sigma\) -algebra with the property that \(\varPi _{I,A}^{\sharp }\mathbb{P} = \mathbb{P}_{A}\) for every finite subset \(A \subset I\).

Loosely speaking, one can interpret this theorem as stating that if one knows the law of any finite number of components of a random vector or function, then this determines the law of the whole random vector or function; in particular, in the case of the random function, this comprises uncountably many components. This statement is thus highly nontrivial as soon as the set I is infinite since we have a priori defined \(\mathbb{P}_{A}\) only for finite subsets \(A \subset I\), and the theorem allows us to extend this uniquely also to infinite subsets.

As a simple application, we can use this theorem to define the infinite product measure \(\mathbb{P} =\bigotimes \nolimits_{ k=1}^{\infty }\mathbb{P}_{0}\) as the measure given byKolmogorov’s Extension Theorem 29 if we take as our family of specifications \(\mathbb{P}_{A} =\bigotimes \nolimits_{k\in A}\mathbb{P}_{0}\). Our i.i.d. sequence \(\xi\) is then naturally defined as a random sample taken from the probability space \({\big(\mathbb{R}^{\infty },\mathsf{B}(\mathbb{R}^{\infty }), \mathbb{P}\big)}\). A more complicated example follows from making sense of the random field perspective on random functions as explained in Sect. 2.5 .

1.2.2 A.2.2 Probability and Integration on Separable Banach Spaces

We now study probability and integration on separable Banach spaces B; we let B denote the dual space of bounded linear functionals on B. The assumption of separability rules out some important function spaces like \(L^{\infty }(D; \mathbb{R})\), but is required in order for the basic results of integration theory to hold. This is because, when considering a non-separable Banach space B, it is not clear what the “natural” \(\sigma\)-algebra on B is. One natural candidate is the Borel \(\sigma\)-algebra, denoted B(B), namely, the smallest \(\sigma\)-algebra containing all open sets; another is the cylindrical \(\sigma\)-algebra, namely, the smallest \(\sigma\)-algebra for which all bounded linear functionals on B are measurable. For i.i.d. sequences, the analogues of these two \(\sigma\)-algebras can be identified, whereas, in the general setting, the cylindrical \(\sigma\)-algebra can be strictly smaller than the Borel \(\sigma\)-algebra. In the case of separable Banach spaces, however, both \(\sigma\)-algebras agree:

Lemma 17.

Let B be a separable Banach space and let μ and ν be two Borel probability measures on B. If \(\ell^{\sharp }\mu =\ell ^{\sharp }\nu\) for every ℓ ∈ B , then μ = ν.

Thus, as for i.i.d. sequences, there is therefore a canonical notion of measurability. Whenever we refer to (probability) measures on a separable Banach space B in the sequel, we really mean (probability) measures on \({\big(B,\mathsf{B}(B)\big)}\).

We now turn to the definition of integration with respect to probability measures on B. Given a (Borel) measurable function \(f : \varOmega \rightarrow B\) where \((\varOmega,\mathcal{F}, \mathbb{P})\) is a standard probability space, we say that f is integrable with respect to \(\mathbb{P}\) if the map \(\omega \mapsto \|f(\omega )\|\) belongs to \(L_{\mathbb{P}}^{1}(\varOmega ; \mathbb{R})\). (Note that this map is certainly Borel measurable since the norm \(\|\cdot \|: B \rightarrow \mathbb{R}\) is a continuous, and therefore also Borel measurable, function.) Given such an integrable function f, we define its Bochner integral by

$$\displaystyle{ \int f(\omega )\,\mathbb{P}(d\omega ) =\lim _{n\rightarrow \infty }\int f_{n}(\omega )\,\mathbb{P}(d\omega )\;, }$$

where f n is a sequence of simple functions, for which the integral on the right-hand side may be defined in the usual way, chosen such that

$$\displaystyle{\lim _{n\rightarrow \infty }\int \|f_{n}(\omega ) - f(\omega )\|\,\mathbb{P}(d\omega ) = 0.}$$

With this definition the value of the integral does not depend on the approximating sequence, it is linear in f, and

$$\displaystyle{ \int \ell(f(\omega ))\,\mathbb{P}(d\omega ) =\ell{\big(\int f(\omega )\,\mathbb{P}(d\omega )\big)}\;, }$$
(110)

for every element \(\ell\) in the dual space B .

Given a probability measure μ on a separable Banach space B, we now say that μ has finite expectation if the identity function \(x\mapsto x\) is integrable with respect to μ. If this is the case, we define the expectation of μ as

$$\displaystyle{ \int _{B}x\,\mu (dx)\;, }$$

where the integral is interpreted as a Bochner integral.

Similarly, it is natural to say that μ has finite variance if the map \(x\mapsto \|x\|^{2}\) is integrable with respect to μ. Regarding the covariance C μ of μ itself, it is natural to define it as a bounded linear operator \(C_{\mu }: B^{{\ast}}\rightarrow B\) with the property that

$$\displaystyle{ C_{\mu }\ell =\int _{B}x\ell(x)\,\mu (dx)\;, }$$
(111)

for every B . At this stage, however, it is not clear whether such an operator C μ always exists solely under the assumption that μ has finite variance. For any xB, we define the projection operator \(P_{x}: B^{{\ast}}\rightarrow B\) by

$$\displaystyle{ P_{x}\ell = x\,\ell(x)\;, }$$
(112)

suggesting that we define

$$\displaystyle{ C_{\mu } :=\int _{B}P_{x}\,\mu (dx)\;. }$$
(113)

The problem with this definition is that if we view the map xP x as a map taking values in the space \(\mathcal{L}(B^{{\ast}},B)\) of bounded linear operators from B B, then, since this space is not separable in general, it is not clear a priori whether (113) makes sense as a Bochner integral. This suggests to define the subspace \(B_{\star }(B) \subset \mathcal{L}(B^{{\ast}},B)\) given by the closure (in the usual operator norm) of the linear span of operators of the type P x given in (112) for xB. We then have:

Lemma 18.

If B is separable, then \(B_{\star }(B)\) is also separable. Furthermore, \(B_{\star }(B)\) consists of compact operators.

This leads to the following corollary:

Corollary 3.

Assume that μ has finite variance so that the map \(x\mapsto \|x\|^{2}\) is integrable with respect to μ. Then the covariance operator C μ defined by (113) exists as a Bochner integral in \(B_{\star }(B)\).

Remark 3.

Once the covariance is defined, the fact that (111) holds is then an immediate consequence of (110). In general, not every element \(C \in B_{\star }(B)\) can be realised as the covariance of some probability measure. This is the case even if we impose the positivity condition \(\ell(C\ell) \geq 0\), which by (111) is a condition satisfied by every covariance operator. For further insight into this issue, see Lemma 23 which characteritzes precisely the covariance operators of a Gaussian measure in separable Hilbert space.

Given any probability measure μ on B, we can define its Fourier transform \(\hat{\mu }: B^{{\ast}}\rightarrow \mathbb{C}\) by

$$\displaystyle{ \hat{\mu }(\ell) :=\int _{B}e^{i\ell(x)}\,\mu (dx)\;. }$$
(114)

For a Gaussian measure μ 0 on B with mean a and covariance operator C, it may be shown that, for any \(\ell\in B^{{\ast}}\), the characteristic function is given by

$$\displaystyle\begin{array}{rcl} & & \hat{\mu }_{0}(\ell) =\mathrm{ e}^{i\ell(a)-\frac{1} {2} \ell(C\ell)}.{}\end{array}$$
(115)

As a consequence of Lemma 17, it is almost immediate that a measure is uniquely determined by its Fourier transform, and this is the content of the following result.

Lemma 19.

Let μ and ν be any two probability measures on a separable Banach space B. If \(\hat{\mu }(\ell) =\hat{\nu } (\ell)\) for every \(\ell\in B^{{\ast}}\) , then μ = ν.

1.2.3 A.2.3 Probability and Integration on Separable Hilbert Spaces

We will frequently be interested in the case where \(B = \mathcal{H}\) for \({\big(\mathcal{H},\langle \cdot,\cdot \rangle,\|\cdot \|\big)}\) some separable Hilbert space. Bochner integration can then, of course, be defined as a special case of the preceding development on separable Banach spaces. We make use of the Riesz representation theorem to identify \(\mathcal{H}\) with its dual and \(\mathcal{H}\otimes \mathcal{H}\) with a subspace of the space of linear operators on \(\mathcal{H}\). The covariance operator of a measure μ on \(\mathcal{H}\) may then be viewed as a bounded linear operator from \(\mathcal{H}\) into itself. The definition (111) of C μ becomes

$$\displaystyle{ C_{\mu }\ell =\int _{\mathcal{H}}\langle \ell,x\rangle x\,\mu (dx)\;, }$$
(116)

for all \(\ell\in \mathcal{H}\) and (113) becomes

$$\displaystyle{ C_{\mu } =\int _{\mathcal{H}}x \otimes x\,\mu (dx)\;. }$$
(117)

Corollary 3 shows that we can indeed make sense of the second formulation as a Bochner integral, provided that μ has finite variance in \(\mathcal{H}\).

1.2.4 A.2.4 Metrics on Probability Measures

When discussing well posedness and approximation theory for the posterior distribution, it is of interest to estimate the distance between two probability measures, and thus we will be interested in metrics between probability measures. In this subsection we introduce two useful metrics on measures: the total variation distance and the Hellinger distance . We discuss the relationships between the metrics and indicate how they may be used to estimate differences between expectations of random variables under two different measures. We also discuss the Kullback-Leibler divergence , a useful distance measure which does not satisfy the axioms of a metric, but which may be used to bound both the Hellinger and total variation distances , and which is also useful in defining algorithms for finding the best approximation to a given measure from within some restricted class of measures, such as Gaussians.

Assume that we have two probability measures μ and μ′ on a separable Banach space denoted by B (actually the considerations here apply on a Polish space but we do not need this level of generality). Assume that μ and μ′ are both absolutely continuous with respect to a common reference measure ν, also defined on the same measure space. Such a measure always exists – take \(\nu = \frac{1} {2}(\mu +\mu ')\), for example. In the following, all integrals of real-valued functions over B are simply denoted by \(\int\). The following define two concepts of distance between μ and μ′. The resulting metrics that we define are independent of the choice of this common reference measure.

Definition 3.

The total variation distance between μ and μ′ is

$$\displaystyle{d_{\mbox{ TV}}(\mu,\mu ') = \frac{1} {2}\int {\big|\frac{d\mu } {d\nu } -\frac{d\mu '} {d\nu }\big|}d\nu.\quad \square }$$

In particular, if μ′ is absolutely continuous with respect to μ, then

$$\displaystyle{ d_{\mbox{ TV}}(\mu,\mu ') = \frac{1} {2}\int {\big|1 -\frac{d\mu '} {d\mu }\big|}d\mu. }$$
(118)

Definition 4.

The Hellinger distance between μ and μ′ is

$$\displaystyle{d_{\mbox{ Hell}}(\mu,\mu ') = \sqrt{\frac{1} {2}\int {\big(\sqrt{\frac{d\mu } {d\nu }} -\sqrt{\frac{d\mu '} {d\nu }}\big)}^{2}d\nu }.\quad \square }$$

In particular, if μ′ is absolutely continuous with respect to μ, then

$$\displaystyle{ d_{\mbox{ Hell}}(\mu,\mu ') = \sqrt{\frac{1} {2}\int {\big(1 -\sqrt{\frac{d\mu '} {d\mu }}\big)}^{2}d\mu }. }$$
(119)

Note that the numerical constant \(\frac{1} {2}\) appearing in both definitions is chosen in such a way as to ensure the bounds

$$\displaystyle{0 \leq d_{\mbox{ TV}}(\mu,\mu ') \leq 1\;,\qquad 0 \leq d_{\mbox{ Hell}}(\mu,\mu ') \leq 1\;.}$$

In the case of the total variation inequality, this is an immediate consequence of the triangle inequality, combined with the fact that both μ and μ′ are probability measures, so that \(\int \frac{d\mu } {d\nu }\,d\nu = 1\) and similarly for μ′. In the case of the Hellinger distance , it follows by expanding the square and applying similar considerations.

The Hellinger and total variation distances are related as follows, which shows in particular that they both generate the same topology:

Lemma 20.

The total variation and Hellinger metrics are related by the inequalities

$$\displaystyle{ \frac{1} {\sqrt{2}}d_{\mbox{ TV}}(\mu,\mu ') \leq d_{\mbox{ Hell}}(\mu,\mu ') \leq d_{\mbox{ TV}}(\mu,\mu ')^{\frac{1} {2} }.}$$

Proof.

We have

$$\displaystyle\begin{array}{rcl} d_{\mbox{ TV}}(\mu,\mu ')& =& \frac{1} {2}\int {\big|\sqrt{ \frac{d\mu } {d\nu }} -\sqrt{\frac{d\mu '} {d\nu }}\big|}{\big|\sqrt{\frac{d\mu } {d\nu }} + \sqrt{\frac{d\mu '} {d\nu }}\big|}d\nu {}\\ & \leq & \sqrt{{\big(\frac{1} {2}\int {\big(\sqrt{\frac{d\mu } {d\nu }} -\sqrt{\frac{d\mu '} {d\nu }}\big)}^{2}d\nu \big)}}\sqrt{ {\big(\frac{1} {2}\int {\big(\sqrt{\frac{d\mu } {d\nu }} + \sqrt{\frac{d\mu '} {d\nu }}\big)}^{2}d\nu \big)}} {}\\ & \leq & \sqrt{{\big(\frac{1} {2}\int {\big(\sqrt{\frac{d\mu } {d\nu }} -\sqrt{\frac{d\mu '} {d\nu }}\big)}^{2}d\nu \big)}}\sqrt{ {\big(\int {\big(\frac{d\mu } {d\nu } + \frac{d\mu '} {d\nu }\big)}d\nu \big)}} {}\\ & =& \sqrt{2}d_{\mbox{ Hell}}(\mu,\mu ') {}\\ \end{array}$$

as required for the first bound.

For the second bound note that, for any positive a and b, one has the bound \(\vert \sqrt{a} -\sqrt{b}\vert \leq \sqrt{a} + \sqrt{b}\). As a consequence, we have the bound

$$\displaystyle\begin{array}{rcl} d_{\mbox{ Hell}}(\mu,\mu ')^{2}& \leq & \frac{1} {2}\int {\big|\sqrt{ \frac{d\mu } {d\nu }} -\sqrt{\frac{d\mu '} {d\nu }}\big|}{\big|\sqrt{\frac{d\mu } {d\nu }} + \sqrt{\frac{d\mu '} {d\nu }}\big|}d\nu {}\\ & =& \frac{1} {2}\int {\big|\frac{d\mu } {d\nu } -\frac{d\mu '} {d\nu }\big|}d\nu {}\\ & =& d_{\mbox{ TV}}(\mu,\mu ')\;, {}\\ \end{array}$$

as required.

Example 11.

Consider two Gaussian densities on \(\mathbb{R}\): \(N(m_{1},\sigma _{1}^{2})\) and \(N(m_{2},\sigma _{2}^{2})\). The Hellinger distance between them is given by

$$\displaystyle{d_{\mbox{ Hell}}(\mu,\mu ')^{2} = 1 -\sqrt{\exp {\big(-\frac{(m_{1 } - m_{2 } )^{2 } } {2(\sigma _{1}^{2} +\sigma _{ 2}^{2})} \big)} \frac{2\sigma _{1}\sigma _{2}} {(\sigma _{1}^{2} +\sigma _{ 2}^{2})}}.}$$

To see this note that

$$\displaystyle{d_{\mbox{ Hell}}(\mu,\mu ')^{2} = 1 - \frac{1} {(2\pi \sigma _{1}\sigma _{2})^{\frac{1} {2} }} \int _{\mathbb{R}}\exp (-Q)dx}$$

where

$$\displaystyle{Q = \frac{1} {4\sigma _{1}^{2}}(x - m_{1})^{2} + \frac{1} {4\sigma _{2}^{2}}(x - m_{2})^{2}.}$$

Define \(\sigma ^{2}\) by

$$\displaystyle{ \frac{1} {2\sigma ^{2}} = \frac{1} {4\sigma _{1}^{2}} + \frac{1} {4\sigma _{2}^{2}}.}$$

We change variable under the integral to y given by

$$\displaystyle{y = x -\frac{m_{1} + m_{2}} {2} }$$

and note that then, by completing the square,

$$\displaystyle{Q = \frac{1} {2\sigma ^{2}}(y - m)^{2} + \frac{1} {4(\sigma _{1}^{2} +\sigma _{ 2}^{2})}(m_{2} - m_{1})^{2}}$$

where m does not appear in what follows and so we do not detail it. Noting that the integral is then a multiple of a standard Gaussian \(N(m,\sigma ^{2})\) gives the desired result. In particular this calculation shows that the Hellinger distance between two Gaussians on \(\mathbb{R}\) tends to zero if and only if the means and variances of the two Gaussians approach one another. Furthermore, by the previous lemma, the same is true for the total variation distance .

The preceding example generalizes to higher dimension and shows that, for example, the total variation and Hellinger metrics cannot metrize weak convergence of probability measures (as one can also show that convergence in total variation metric implies strong convergence). They are nonetheless useful distance measures, for example, between families of measures which are mutually absolutely continuous. Furthermore, the Hellinger distance is particularly useful for estimating the difference between expectation values of functions of random variables under different measures. This is encapsulated in the following lemma:

Lemma 21.

Let μ and μ′ be two probability measures on a separable Banach space X. Assume also that f : X → E, where \((E,\|\cdot \|)\) is a separable Banach space, is measurable and has second moments with respect to both μ and μ′. Then

$$\displaystyle{\|\mathbb{E}^{\mu }f - \mathbb{E}^{\mu '}f\| \leq 2{\big(\mathbb{E}^{\mu }\|f\|^{2} + \mathbb{E}^{\mu '}\|f\|^{2}\big)}^{\frac{1} {2} }d_{\mbox{ Hell}}(\mu,\mu ').}$$

Furthermore, if E is a separable Hilbert space and f : X → E as before has fourth moments, then

$$\displaystyle{\|\mathbb{E}^{\mu }(f \otimes f) - \mathbb{E}^{\mu '}(f \otimes f)\| \leq 2{\big(\mathbb{E}^{\mu }\|f\|^{4} + \mathbb{E}^{\mu '}\|f\|^{4}\big)}^{\frac{1} {2} }d_{\mbox{ Hell}}(\mu,\mu ').}$$

Proof.

Let ν be a reference probability measure as above. We then have the bound

$$\displaystyle\begin{array}{rcl} \|\mathbb{E}^{\mu }f& -& \mathbb{E}^{\mu '}f\| \leq \int \| f\|{\big|\frac{d\mu } {d\nu } -\frac{d\mu '} {d\nu }\big|}d\nu {}\\ & =& \int {\big( \frac{1} {\sqrt{2}}{\big|\sqrt{\frac{d\mu } {d\nu }} -\sqrt{\frac{d\mu '} {d\nu }}\big|}\big)}{\big(\sqrt{2}\|f\|{\big|\sqrt{\frac{d\mu } {d\nu }} + \sqrt{\frac{d\mu '} {d\nu }}\big|}\big)}d\nu {}\\ & \leq & \sqrt{{\big(\frac{1} {2}\int {\big(\sqrt{\frac{d\mu } {d\nu }} -\sqrt{\frac{d\mu '} {d\nu }}\big)}^{2}d\nu \big)}}\sqrt{ {\big(2\int \|f\|^{2}{\big(\sqrt{\frac{d\mu } {d\nu }} + \sqrt{\frac{d\mu '} {d\nu }}\big)}^{2}d\nu \big)}} {}\\ & \leq & \sqrt{{\big(\frac{1} {2}\int {\big(\sqrt{\frac{d\mu } {d\nu }} -\sqrt{\frac{d\mu '} {d\nu }}\big)}^{2}d\nu \big)}}\sqrt{ {\big(4\int \|f\|^{2}{\big(\frac{d\mu } {d\nu } + \frac{d\mu '} {d\nu }\big)}d\nu \big)}} {}\\ & =& 2{\big(\mathbb{E}^{\mu }\|f\|^{2} + \mathbb{E}^{\mu '}\|f\|^{2}\big)}^{\frac{1} {2} }d_{\mbox{ Hell}}(\mu,\mu ') {}\\ \end{array}$$

as required.

The proof for \(f \otimes f\) follows from the bound

$$\displaystyle\begin{array}{rcl} \|\mathbb{E}^{\mu }(f \otimes f) - \mathbb{E}^{\mu '}(f \otimes f)\|& =& \sup _{\|h\|=1}\|\mathbb{E}^{\mu }\langle f,h\rangle f - \mathbb{E}^{\mu '}\langle f,h\rangle f\| {}\\ & \leq & \int \|f\|^{2}{\big|\frac{d\mu } {d\nu } -\frac{d\mu '} {d\nu }\big|}d\nu \;, {}\\ \end{array}$$

and then arguing similarly to the first case but with \(\|f\|\) replaced by \(\|f\|^{2}\).

Remark 4.

Note, in particular, that choosing X = E, and with f chosen to be the identity mapping, we deduce that the differences between the mean (respectively, covariance operator ) of two measures are bounded above by their Hellinger distance , provided that one has some a priori control on the second (respectively, fourth) moments.

We now define a third widely used distance concept for comparing two probability measures. Note, however, that it does not give rise to a metric in the strict sense, because it violates both symmetry and the triangle inequality.

Definition 5.

The Kullback-Leibler divergence between two measures μ′ and μ, with μ′ absolutely continuous with respect to μ, is

$$\displaystyle{D_{\mbox{ KL}}(\mu '\vert \vert \mu ) =\int \frac{d\mu '} {d\mu }\log {\big(\frac{d\mu '} {d\mu }\big)}d\mu.\quad \square }$$

If μ is also absolutely continuous with respect to μ′, so that the two measures are equivalent, then

$$\displaystyle{D_{\mbox{ KL}}(\mu '\vert \vert \mu ) = -\int \log {\big(\frac{d\mu } {d\mu '}\big)}d\mu '}$$

and the two definitions coincide.

Example 12.

Consider two Gaussian densities on \(\mathbb{R}\): \(N(m_{1},\sigma _{1}^{2})\) and \(N(m_{2},\sigma _{2}^{2})\). The Kullback-Leibler divergence between them is given by

$$\displaystyle{D_{\mbox{ KL}}(\mu _{1}\vert \vert \mu _{2}) =\ln {\big( \frac{\sigma _{2}} {\sigma _{1}}\big)} + \frac{1} {2}{\big(\frac{\sigma _{1}^{2}} {\sigma _{2}^{2}} - 1\big)} + \frac{(m_{2} - m_{1})^{2}} {2\sigma _{2}^{2}}.}$$

To see this note that

$$\displaystyle\begin{array}{rcl} D_{\mbox{ KL}}(\mu _{1}\vert \vert \mu _{2})& =& \mathbb{E}^{\mu _{1}}{\big(\ln \sqrt{\frac{\sigma _{2 }^{2 }} {\sigma _{1}^{2}}} + \frac{1} {2\sigma _{2}^{2}}\vert x - m_{2}\vert ^{2} - \frac{1} {2\sigma _{1}^{2}}\vert x - m_{1}\vert ^{2}\big)} {}\\ & =& \ln \frac{\sigma _{2}} {\sigma _{1}} + \mathbb{E}^{\mu _{1}}{\big({\big( \frac{1} {2\sigma _{2}^{2}} - \frac{1} {2\sigma _{1}^{2}}\big)}\vert x - m_{1}\vert ^{2}\big)} {}\\ & & +\mathbb{E}^{\mu _{1}} \frac{1} {2\sigma _{2}^{2}}{\big(\vert x - m_{2}\vert ^{2} -\vert x - m_{ 1}\vert ^{2}\big)} {}\\ & =& \ln \frac{\sigma _{2}} {\sigma _{1}} + \frac{1} {2}{\big(\frac{\sigma _{1}^{2}} {\sigma _{2}^{2}} - 1\big)} + \frac{1} {2\sigma _{2}^{2}}\mathbb{E}^{\mu _{1}}{\big(m_{2}^{2} - m_{1}^{2} + 2x(m_{1} - m_{2})\big)} {}\\ & =& \ln \frac{\sigma _{2}} {\sigma _{1}} + \frac{1} {2}{\big(\frac{\sigma _{1}^{2}} {\sigma _{2}^{2}} - 1\big)} + \frac{1} {2\sigma _{2}^{2}}(m_{2} - m_{1})^{2} {}\\ \end{array}$$

as required.

As for Hellinger distance , this example shows that two Gaussians on \(\mathbb{R}\) approach one another in the Kullback-Leibler divergence if and only if their means and variances approach one another. This generalizes to higher dimensions. The Kullback-Leibler divergence provides an upper bound for the square of the Hellinger distance and for the square of the total variation distance.

Lemma 22.

Assume that two measures μ and μ′ are equivalent. Then the bounds

$$\displaystyle{d_{\mbox{ Hell}}(\mu,\mu ')^{2} \leq \frac{1} {2}D_{\mbox{ KL}}(\mu \vert \vert \mu ')\;,\qquad d_{\mbox{ TV}}(\mu,\mu ')^{2} \leq D_{\mbox{ KL}}(\mu \vert \vert \mu ')\;,}$$

hold.

Proof.

The second bound follows from the first by using Lemma 20, thus it suffices to proof the first. In the following we use the fact that

$$\displaystyle{x - 1 \geq \log (x)\qquad \forall x \geq 0\;,}$$

so that

$$\displaystyle{\sqrt{x} - 1 \geq \frac{1} {2}\log (x)\qquad \forall x \geq 0\;.}$$

This yields the bound

$$\displaystyle\begin{array}{rcl} d_{\mbox{ Hell}}(\mu,\mu ')^{2}& =& \frac{1} {2}\int {\big(\sqrt{ \frac{d\mu '} {d\mu }} - 1\big)}^{2}d\mu = \frac{1} {2}\int {\big(\frac{d\mu '} {d\mu } + 1 - 2\sqrt{\frac{d\mu '} {d\mu }}\big)}d\mu {}\\ & =& \int {\big(1 -\sqrt{\frac{d\mu '} {d\mu }}\big)}d\mu \leq \frac{1} {2}\int {\big( -\log \frac{d\mu '} {d\mu }\big)}d\mu {}\\ & =& \frac{1} {2}D_{\mbox{ KL}}(\mu \vert \vert \mu ')\;, {}\\ \end{array}$$

as required.

1.2.5 A.2.5 Kolmogorov Continuity Test

The setting of Kolmogorov’s continuity test is the following. We assume that we are given a compact domain \(D \subset \mathbb{R}^{d}\), a complete separable metric space X, as well as a collection of X-valued random variables \(u : x \in D\mapsto X\). At this stage we assume no regularity whatsoever on the parameter x: the distribution of this collection of random variables is a measure μ 0 on the space X D of all functions from D to X endowed with the product \(\sigma\)-algebra. Any consistent family of marginal distributions does yield such a measure by Kolmogorov’s extension Theorem 29 . With these notations at hand, Kolmogorov’s continuity test can be formulated as follows and enables the extraction of regularity with respect to variation of u(x) with respect to x.

Theorem 30 (Kolmogorov Continuity Test).

Let D and u be as above and assume that there exist p > 1, α > 0 and K > 0 such that

$$\displaystyle{ \mathbb{E}\mathsf{d}{\big(u(x),u(y)\big)}^{p} \leq K\vert x - y\vert ^{p\alpha +d}\;,\qquad \forall x,y \in D\;, }$$
(120)

where d denotes the distance function on X and d the dimension of the compact domain D. Then, for every β < α, there exists a unique measure μ on \(C^{0,\beta }(D,X)\) such that the canonical process under μ has the same law as u.

We have here generalized the notion of Hölder spaces from Sect. A.1.2 to functions taking values in a Polish space; such generalizations are discussed in Sect. A.1.4. The notion of canonical process is defined in Sect. A.4.

We will frequently use Kolmogorov’s continuity test in the following setting: we again assume that we are given a compact domain \(D \subset \mathbb{R}^{d}\), and now a collection u(x) of \(\mathbb{R}^{n}\)-valued random variables indexed by xD. We have the following:

Corollary 4.

Assume that there exist p > 1, α > 0 and K > 0 such that

$$\displaystyle{ \mathbb{E}\vert u(x) - u(y)\vert ^{p} \leq K\vert x - y\vert ^{p\alpha +d}\;,\qquad \forall x,y \in D\;. }$$

Then, for every β < α, there exists a unique measure μ on \(C^{0,\beta }(D)\) such that the canonical process under μ has the same law as u.

Remark 5.

Recall that \(C^{0,\gamma '}(D) \subset C_{0}^{0,\gamma }(D)\) for all γ′ > γ so that, since the interval β < α for this theorem is open, we may interpret the result as giving an equivalent measure defined on a separable Banach space.

A very useful consequence of Kolmogorov’s continuity criterion is the following result. The setting is to consider a random function f given by the random series

$$\displaystyle{ u =\sum _{k\geq 0}\xi _{k}\psi _{k} }$$
(121)

where \(\{\xi _{k}\}_{k\geq 0}\) is an i.i.d. sequence and the ψ k are real- or complex-valued Hölder functions on bounded open \(D \subset \mathbb{R}^{d}\) satisfying, for some α ∈ (0, 1],

$$\displaystyle{ \vert \psi _{k}(x) -\psi _{k}(y)\vert \leq h(\alpha,\psi _{k})\vert x - y\vert ^{\alpha }\quad x,y \in D; }$$
(122)

of course if α = 1 the functions are Lipschitz.

Corollary 5.

Let \(\{\xi _{k}\}_{k\geq 0}\) be countably many centred i.i.d. random variables (real or complex) with bounded moments of all orders. Moreover let \(\{\psi _{k}\}_{k\geq 0}\) satisfy (122) . Suppose there is some δ ∈ (0,2) such that

$$\displaystyle{ S_{1} :=\sum _{k\geq 0}\|\psi _{k}\|_{L^{\infty }}^{2} < \infty \quad \mathrm{and}\quad S_{ 2} :=\sum _{k\geq 0}\|\psi _{k}\|_{L^{\infty }}^{2-\delta }h(\alpha,\psi _{ k})^{\delta } < \infty \;. }$$
(123)

Then u defined by (121) is almost surely finite for every x ∈ D, and u is Hölder continuous for every Hölder exponent smaller than \(\alpha \delta /2\).

Proof.

Let us denote by κ n (X) the nth cumulant of a random variable X. The odd cumulants of centred random variables are zero. Furthermore, using the fact that the cumulants of independent random variables simply add up and that the cumulants of \(\xi _{k}\) are all finite by assumption, we obtain for p ≥ 1 the bound

$$\displaystyle\begin{array}{rcl} \big|\kappa _{2p}\big(u(x) - u(y)\big)\big|& =& \big|\sum _{k\geq 0}\kappa _{2p}(\xi _{k})\,\big(\psi _{k}(x) -\psi _{k}(y)\big)^{2p}\big| {}\\ & \lesssim & C_{p}\sum _{k\geq 0}\min \{2^{2p}\|\psi _{ k}\|_{L^{\infty }}^{2p},h(\alpha,\psi _{ k})^{2p}\vert x - y\vert ^{2p\alpha }\} {}\\ & \lesssim & C_{p}\sum _{k\geq 0}\|\psi _{k}\|_{L^{\infty }}^{(1-\frac{\delta }{2} )2p}h(\alpha,\psi _{ k})^{2p. \frac{\delta }{2} }\vert x - y\vert ^{2p\alpha. \frac{\delta }{2} } {}\\ & \lesssim & C_{p}\vert x - y\vert ^{p\alpha \delta }\;, {}\\ \end{array}$$

with C p denoting positive constants depending on p which can change from occurrence to occurrence and where we used that \(\min \{a,bx^{2}\} \leq a^{1-\delta /2}b^{\delta /2}\vert x\vert ^{\delta }\) for any a, b ≥ 0 and the finiteness of S 2. In a similar way, we obtain \({\big|\kappa _{2p}u(x)\big|} < \infty \) for every p ≥ 1. Since the random variables u(x) are centred, all moments of even order 2p, p ≥ 1, can be expressed in terms of homogeneous polynomials of the even cumulants of order upto 2p, so that

$$\displaystyle{ \mathbb{E}\vert u(x) - u(y)\vert ^{2p} \lesssim C_{ p}\vert x - y\vert ^{p\alpha \delta }\;,\qquad \mathbb{E}\vert u(x)\vert ^{2p} < \infty \;, }$$

uniformly over x, yD. The almost sure boundedness on \(L^{\infty }\) follows from the second bound. The Hölder continuity claim follows from Kolmogorov’s continuity test in the form of Corollary 4, after noting that \(p\alpha \delta = 2p{\big(\frac{1} {2}\alpha \delta - \frac{d} {2p}\big)} + d\) and choosing p arbitrarily large.

Remark 6.

Note that (121) is simply a rewrite of (11), with ψ 0 = m 0, \(\xi _{0} = 1\) and \(\psi _{k} =\gamma _{k}\phi _{k}\). In the case where the \(\xi _{k}\) are standard normal, then the ψ k ’s in Corollary 5 form an orthonormal basis of the Cameron-Martin space (see Definition 7) of a Gaussian measure. The criterion (123) then provides an effective way of showing that the measure in question can be realised on a space of Hölder continuous functions.

1.3 A.3 Gaussian Measures

1.3.1 A.3.1 Separable Banach Space Setting

We start with the definition of a Gaussian measure on a separable Banach space B. There is no equivalent to Lebesgue measure in infinite dimensions (as it could not be \(\sigma\)-additive), and so we cannot define a Gaussian measure by prescribing the form of its density. However, note that Gaussian measures on \(\mathbb{R}^{n}\) can be characterised by prescribing that the projections of the measure onto any one-dimensional subspace of \(\mathbb{R}^{n}\) are all Gaussian. This is a property that can readily be generalised to infinite-dimensional spaces:

Definition 6.

A Gaussian probability measure μ on a separable Banach space B is a Borel measure such that \(\ell^{\sharp }\mu\) is a Gaussian probability measure on \(\mathbb{R}\) for every continuous linear functional \(\ell: B \rightarrow \mathbb{R}\). (Here, Dirac measures are considered to be Gaussian measures with zero variance.) The measure is said to be centred if \(\ell^{\sharp }\mu\) has mean zero for every .

This is a reasonable definition since, provided that B is separable, the one-dimensional projections of any probability measure carry sufficient information to characterise it – see Lemma 17. We now state an important result which controls the tails of Gaussian distributions:

Theorem 31 (Fernique).

Let μ be a Gaussian probability measure on a separable Banach space B. Then, there exists α > 0 such that \(\int _{B}\exp (\alpha \|x\|^{2})\,\mu (dx) < \infty \).

As a consequence of the Fernique theorem and the Corollary 3, every Gaussian measure μ admits a compact covariance operator C μ given by (113), because the second moment is bounded. In fact the techniques used to prove the Fernique theorem show that, if \(M =\int _{B}\|x\|\,\mu (dx)\), then there is a global constant K > 0 such that

$$\displaystyle{ \int _{B}\|x\|^{2n}\,\mu (dx) \leq n!K\alpha ^{-n}M^{2n}. }$$
(124)

Since the covariance operator , and hence the mean, exist for a Gaussian measure, and since they may be shown to characterize the measure completely, we write N(m, C μ ) for a Gaussian with mean m and covariance operator C μ .

Measures in infinite-dimensional spaces are typically mutually singular. Furthermore, two Gaussian measures are either mutually singular or equivalent (mutually absolutely continuous). The Cameron-Martin space plays a key role in characterizing whether or not two Gaussians are equivalent.

Definition 7.

The Cameron-Martin space \(\mathcal{H}_{\mu }\) of measure μ on a separable Banach space B is the completion of the linear subspace \(\mathring{\mathcal{H}}_{\mu }\subset B\) defined by

$$\displaystyle{ \mathring{\mathcal{H}}_{\mu } =\{ h \in B\, :\, \exists \,h^{{\ast}}\in B^{{\ast}}\;\text{with}\;h = C_{\mu }h^{{\ast}}\}\;, }$$
(125)

under the norm \(\|h\|_{\mu }^{2} =\langle h,h\rangle _{\mu } = h^{{\ast}}(C_{\mu }h^{{\ast}})\). It is a Hilbert space when endowed with the scalar product \(\langle h,k\rangle _{\mu } = h^{{\ast}}(C_{\mu }k^{{\ast}}) = h^{{\ast}}(k) = k^{{\ast}}(h)\).

The Cameron-Martin space is actually independent of the space B in the sense that, although we may view the measure as living on a range of separable Hilbert or Banach spaces, the Cameron-Martin space will be the same in all cases. The space characterizes exactly the directions in which a centred Gaussian measure may be shifted to obtain an equivalent Gaussian measure:

Theorem 32 (Cameron-Martin).

For h ∈ B, define the map \(T_{h}: B \rightarrow B\) by T h (x) = x + h. Then, the measure \(T_{h}^{\sharp }\mu\) is absolutely continuous with respect to μ if and only if \(h \in \mathcal{H}_{\mu }\) . Furthermore, in the latter case, its Radon-Nikodym derivative is given by

$$\displaystyle{ \frac{dT_{h}^{\sharp }\mu } {d\mu } (u) =\exp {\big( h^{{\ast}}(u) -\frac{1} {2}\|h\|_{\mu }^{2}\big)} }$$

where h = C μ h .

Thus, this theorem characterizes the Radon-Nikodym derivative of the measure N(h, C μ ) with respect to the measure N(0, C μ ). Below, in the Hilbert space setting, we also consider changes in the covariance operator which lead to equivalent Gaussian measures. However, before moving to the Hilbert space setting, we conclude this subsection with several useful observations concerning Gaussians on separable Banach spaces. The topological support of measure μ on the separable Banach space B is the set of all uB such that any neighborhood of u has a positive measure.

Theorem 33.

The topological support of a centred Gaussian measure μ on B is the closure of the Cameron-Martin space in B. Furthermore the Cameron-Martin space is dense in X. Therefore all balls in B have positive μ-measure.

Since the Cameron-Martin space of Gaussian measure μ is independent of the space on which we view the measure as living, this following useful theorem shows that the unit ball in the Cameron-Martin space is compact in any separable Banach space X for which μ(X) = 1 :

Theorem 34.

The closed unit ball in the Cameron-Martin space \(\mathcal{H}_{\mu }\) is compactly embedded into the separable Banach space B.

In the setting of Gaussian measures on a separable Banach space, all balls have positive probability. The Cameron-Martin norm is useful in the characterization of small-ball properties of Gaussians. Let \(B^{\delta }(z)\) denote a ball of radius \(\delta\) in B centred at a point \(z \in \mathcal{H}_{\mu }\).

Theorem 35.

The ratio of small ball probabilities under Gaussian measure μ satisfy

$$\displaystyle{ \lim _{\delta \rightarrow 0}\frac{\mu {\big(B^{\delta }(z_{1})\big)}} {\mu {\big(B^{\delta }(z_{2})\big)}} =\exp \left (\frac{1} {2}\|z_{2}\|_{\mu }^{2} -\frac{1} {2}\|z_{1}\|_{\mu }^{2}\right ). }$$

Example 13.

Let μ denote the Gaussian measure N(0, K) on \(\mathbb{R}^{n}\) with K positive definite. Then Theorem 35 is the statement that

$$\displaystyle{ \lim _{\delta \rightarrow 0}\frac{\mu {\big(B^{\delta }(z_{1})\big)}} {\mu {\big(B^{\delta }(z_{2})\big)}} =\exp \left (\frac{1} {2}\vert K^{-\frac{1} {2} }z_{2}\vert ^{2} -\frac{1} {2}\vert K^{-\frac{1} {2} }z_{1}\vert ^{2}\right ) }$$

which follows directly from the fact that the Gaussian measure at point \(z \in \mathbb{R}^{n}\) has Lebesgue density proportional to \(\exp \left (-\frac{1} {2}\vert K^{-\frac{1} {2} }z\vert ^{2}\right )\) and the fact that the Lebesgue density is a continuous function.

1.3.2 A.3.2 Separable Hilbert Space Setting

In these notes our approach is primarily based on defining Gaussian measures on Hilbert space; the Banach spaces which are of full measure under the Gaussian are then determined via the Kolmogorov continuity theorem . In this subsection we develop the theory of Gaussian measures in greater detail within the Hilbert space setting. Throughout \({\big(\mathcal{H},\langle \cdot,\cdot \rangle,\|\cdot \|\big)}\) denotes the separable Hilbert space on which the Gaussian is constructed. Actually, in this Hilbert space setting, the covariance operator C μ has considerably more structure than just the boundedness implied by (124): it is trace class and hence necessarily compact on \(\mathcal{H}\):

Lemma 23.

A Gaussian measure μ on a separable Hilbert space \(\mathcal{H}\) has covariance operator \(C_{\mu }: \mathcal{H}\rightarrow \mathcal{H}\) which is trace class and satisfies

$$\displaystyle{ \int _{\mathcal{H}}\|x\|^{2}\ \mu (dx) =\mathrm{ Tr}\ C_{\mu}. }$$
(126)

Conversely, for every positive trace class symmetric operator \(K : \mathcal{H}\rightarrow \mathcal{H}\) , there exists a Gaussian measure μ on \(\mathcal{H}\) such that C μ = K.

Since the covariance operator \(C_{\mu } : \mathcal{H}\rightarrow \mathcal{H}\) of a Gaussian on \(\mathcal{H}\) is a compact operator, it follows that if operator \(C_{\mu } : \mathcal{H}\rightarrow \mathcal{H}\) has an inverse, then it will be a densely defined unbounded operator on \(\mathcal{H}\); we call this the precision operator . Both the covariance and the precision operators are self-adjoint on appropriate domains, and fractional powers of them may be defined via the spectral theorem.

Theorem 36 (Cameron-Martin Space on Hilbert Space).

Let μ be a Gaussian measure on a Hilbert space \(\mathcal{H}\) with strictly positive covariance operator K. Then the Cameron-Martin space \(\mathcal{H}_{\mu }\) consists of the image of \(\mathcal{H}\) under K 1∕2 and the Cameron-Martin norm is given by \(\|h\|_{\mu }^{2} =\| K^{-\frac{1} {2} }h\|^{2}\).

Example 14.

Consider two Gaussian measures μ i on \(\mathcal{H} = L^{2}(J),J = (0,1)\) both with precision operator \(L = -\frac{d^{2}} {dx^{2}}\) where \(\mathcal{D}(L) = H_{0}^{1}(J) \cap H^{2}(J)\). (Informally − L is the Laplacian on J with homogeneous Dirichlet boundary conditions.) Let \(\mathcal{C}\) denote the inverse of L on \(\mathcal{H}\). Assume that \(\mu _{1} \sim N(m,\mathcal{C})\) and \(\mu _{2} \sim N(0,\mathcal{C})\). Then \(\mathcal{H}_{\mu _{i}}\) is the image of \(\mathcal{H}\) under \(\mathcal{C}^{\frac{1} {2} }\) which is the space = H 0 1(J). It follows that the measures are equivalent if and only if mH 0 1(J). If this condition is satisfied then, from Theorem 36, the Radon-Nikodym derivative between the two measures is given by

$$\displaystyle{\frac{d\mu _{1}} {d\mu _{2}}(x) =\exp {\big(\langle m,x\rangle _{H_{0}^{1}} -\frac{1} {2}\|m\|_{H_{0}^{1}}^{2}\big)}.\quad \square }$$

We now turn to the Feldman-Hájek theorem in the Hilbert Space setting. Let \(\{\varphi _{j}\}_{j=1}^{\infty }\) denote an orthonormal basis for \(\mathcal{H}\). Then the Hilbert-Schmidt norm of a linear operator \(L : \mathcal{H}\rightarrow \mathcal{H}\) is defined by

$$\displaystyle{ \| L \| _{\mathrm{HS}}^{2} := \sum _{ j=1}^{\infty } \| L \varphi _{j} \|^{2}.}$$

The value of the norm is, in fact, independent of the choice of orthonormal basis. In the finite-dimensional setting, the norm is known as the Frobenius norm .

Theorem 37 (Feldman-Hájek on Hilbert Space).

Let μ i with i = 1,2 be two centred Gaussian measures on some fixed Hilbert space \(\mathcal{H}\) with means m i and strictly positive covariance operators \(\mathcal{C}_{i}\) . Then the following hold:

  1. 1.

    μ 1 and μ 2 are either singular or equivalent.

  2. 2.

    The measures μ 1 and μ 2 are equivalent Gaussian measures if and only if:

    1. a)

      The images of \(\mathcal{H}\) under \(\mathcal{C}_{i}^{\frac{1} {2} }\) coincide for i = 1,2, and we denote this common image space by E;

    2. b)

      \(m_{1} - m_{2} \in E\);

    3. c)

      \(\|(\mathcal{C}_{1}^{-1/2}\mathcal{C}_{2}^{1/2})(\mathcal{C}_{1}^{-1/2}\mathcal{C}_{2}^{1/2})^{{\ast}}- I\|_{\mathrm{HS}} < \infty \).

Remark 7.

The final condition may be replaced by the condition that

$$\displaystyle{\|(\mathcal{C}_{1}^{1/2}\mathcal{C}_{ 2}^{-1/2})(\mathcal{C}_{ 1}^{1/2}\mathcal{C}_{ 2}^{-1/2})^{{\ast}}- I\|_{\mathrm{ HS}} < \infty }$$

and the theorem remains true; this formulation is sometimes useful.

Example 15.

Consider two mean-zero Gaussian measures μ i on \(\mathcal{H} = L^{2}(J),J = (0,1)\) with precision operators \(L_{1} = -\frac{d^{2}} {dx^{2}} + I\) and \(L_{2} = -\frac{d^{2}} {dx^{2}}\), respectively, both with domain \(H_{0}^{1}(J) \cap H^{2}(J)\). The operators L 1, L 2 share the same eigenfunctions

$$\displaystyle{\phi _{k}(x) = \sqrt{2}\sin \left (k\pi x\right )}$$

and have eigenvalues

$$\displaystyle{\lambda _{k}(1) =\lambda _{k}(2) + 1,\quad \lambda _{k}(2) = k^{2}\pi ^{2},}$$

respectively. Thus \(\mu _{1} \sim N(0,\mathcal{C}_{1})\) and \(\mu _{2} \sim N(0,\mathcal{C}_{2})\) where, in the basis of eigenfunctions, \(\mathcal{C}_{1}\) and \(\mathcal{C}_{2}\) are diagonal with eigenvalues

$$\displaystyle{ \frac{1} {k^{2}\pi ^{2} + 1},\quad \frac{1} {k^{2}\pi ^{2}}}$$

respectively. We have, for \(h_{k} =\langle h,\phi _{k}\rangle\),

$$\displaystyle{ \frac{\pi ^{2}} {\pi ^{2} + 1} \leq \frac{\langle h,\mathcal{C}_{1}h\rangle } {\langle h,\mathcal{C}_{2}h\rangle } = \frac{\sum \nolimits_{k\in \mathbb{Z}^{+}}(1 + k^{2}\pi ^{2})^{-1}h_{k}^{2}} {\sum \nolimits_{k\in \mathbb{Z}^{+}}(k\pi )^{-2}h_{k}^{2}} \leq 1.}$$

From this it follows that the Cameron-Martin spaces of the two measures coincide and are equal to H 0 1(J). Notice that

$$\displaystyle{T = \mathcal{C}_{1}^{-\frac{1} {2} }\mathcal{C}_{2}\mathcal{C}_{1}^{-\frac{1} {2} } - I}$$

is diagonalized in the same basis as the \(\mathcal{C}_{i}\) and has eigenvalues

$$\displaystyle{ \frac{1} {k^{2}\pi ^{2}}.}$$

These are square summable and so by Theorem 37 the two measures are absolutely continuous with respect to one another.

1.4 A.4 Wiener Processes in Infinite-Dimensional Spaces

Central to the theory of stochastic PDEs is the notion of a cylindrical Wiener process , which can be thought of as an infinite-dimensional generalization of a standard n-dimensional Wiener process. This leads to the notion of the A-Wiener process (A-) for certain classes of operators A. Before we proceed to the definition and construction of such Wiener processes in separable Hilbert spaces, let us recall a few basic facts about stochastic processes in general.

In general, a stochastic process u over a probability space \((\varOmega,\mathcal{F}, \mathbb{P})\) and taking values in a separable Hilbert space \(\mathcal{H}\) is nothing but a collection {u(t)} of \(\mathcal{H}\)-valued random variables indexed by time \(t \in \mathbb{R}\) (or taking values in some subset of \(\mathbb{R}\)). By Kolmogorov’s Extension Theorem  29, we can also view this as a map \(u: \varOmega \rightarrow \mathcal{H}^{\mathbb{R}}\), where \(\mathcal{H}^{\mathbb{R}}\) is endowed with the product sigma-algebra. A notable special case which will be of interest here is the case where the probability space is taken to be \(\varOmega = C([0,T],\mathcal{H})\) (or some other space of \(\mathcal{H}\)-valued continuous functions) endowed with some Gaussian measure \(\mathbb{P}\) and where the process X is given by

$$\displaystyle{ u(t)(\omega ) =\omega (t)\;,\qquad \omega \in \varOmega \;. }$$

In this case, u is called the canonical process on Ω.

The usual (one-dimensional) Wiener process is a real-valued centred Gaussian process B(t) such that B(0) = 0 and \(\mathbb{E}\vert B(t) - B(s)\vert ^{2} = \vert t - s\vert \) for any pair of times s, t. From our point of view, the Wiener process on any finite time interval I can always be realised as the canonical process for the Gaussian measure on \(C(I, \mathbb{R})\) with covariance function \(c(s,t) = s \wedge t =\min \{ s,t\}\). Note that such a measure exists by the Kolmogorov continuity test , and Corollary 4 in particular.

The standard n-dimensional Wiener process B(t) is simply given by n independent copies of a standard one-dimensional Wiener process \(\{\beta _{j}\}_{j=1}^{n}\), so that its covariance is given by

$$\displaystyle{ \mathbb{E}\beta _{i}(s)\beta _{j}(t) = (s \wedge t)\delta _{i,j}\;. }$$

In other words, if u and v are any two elements in \(\mathbb{R}^{n}\), we have

$$\displaystyle{ \mathbb{E}\langle u,B(s)\rangle \langle B(t),v\rangle = (s \wedge t)\langle u,v\rangle \;. }$$

This is the characterization that we will now extend to an arbitrary separable Hilbert space \(\mathcal{H}\). One natural way of constructing such an extension is to fix an orthonormal basis \(\{e_{n}\}_{n\geq 1}\) of \(\mathcal{H}\) and a countable collection \(\{\beta _{j}\}_{j=1}^{\infty }\) of independent one-dimensional Wiener processes, and to set

$$\displaystyle{ B(t) :=\sum _{ n=1}^{\infty }\beta _{ n}(t)\,e_{n}\;. }$$
(127)

If we define

$$\displaystyle{ B^{N}(t) :=\sum _{ n=1}^{N}\beta _{ n}(t)\,e_{n}\; }$$

then clearly \(\mathbb{E}\|B^{N}(t)\|_{\mathcal{H}}^{2} = tN\) and so the series will not converge in \(\mathcal{H}\) for fixed t > 0. However the expression (127) is nonetheless the right way to think of a cylindrical Wiener process on \(\mathcal{H}\); indeed for fixed t > 0 the truncated series for B N will converge in a larger space containing \(\mathcal{H}\). We define the following scale of Hilbert subspaces, for r > 0, by

$$\displaystyle{\mathcal{X}^{r} =\{ u \in \mathcal{H}\big\vert \sum _{ j=1}^{\infty }j^{2r}\vert \langle u,\phi _{ j}\rangle \vert ^{2} < \infty \}}$$

and then extend to superspaces r < 0 by duality. We use \(\|\cdot \|_{r}\) to denote the norm induced by the inner-product

$$\displaystyle{\langle u,v\rangle _{r} =\sum _{ j=1}^{\infty }j^{2r}u_{ j}v_{j}}$$

for \(u_{j} =\langle u,\phi _{j}\rangle\) and \(v_{j} =\langle v,\phi _{j}\rangle\). A simple argument, similar to that used to prove Theorem 8, shows that \(\{B^{N}(t)\}\) is, for fixed t > 0, Cauchy in \(\mathcal{X}^{r}\) for any \(r < -\frac{1} {2}.\) In fact it is possible to construct a stochastic process as the limit of the truncated series, living on the space \(C([0,\infty ),\mathcal{X}^{r})\) for any \(r < -\frac{1} {2}\), by the Kolmogorov Continuity Theorem 30 in the setting where D = [0, T] and \(X = \mathcal{X}^{r}.\) We give details in the more general setting that follows.

Building on the preceding we now discuss construction of a \(\mathcal{C}\)-Wiener process W, using the finite-dimensional case described in Remark 2 to guide us. Here \(\mathcal{C} : \mathcal{H}\rightarrow \mathcal{H}\) is assumed to be trace-class with eigenvalues γ j 2. Consider the cylindrical Wiener process given by

$$\displaystyle{B(t) =\sum _{ j=1}^{\infty }\beta _{ j}e_{j},}$$

where \(\{\beta _{j}\}_{j=1}^{\infty }\) is an i.i.d. family of unit Brownian motions on \(\mathbb{R}\) with \(\beta _{j} \in C([0,\infty ); \mathbb{R})\). We note that

$$\displaystyle{ \mathbb{E}\vert \beta _{j}(t) -\beta _{j}(s)\vert ^{2} = \vert t - s\vert. }$$
(128)

Since \(\sqrt{\mathcal{C}}e_{j} =\gamma _{j}e_{j}\), the \(\mathcal{C}\)-Wiener process \(W = \sqrt{\mathcal{C}}B\) is then

$$\displaystyle{ W(t) =\sum _{ j=1}^{\infty }\gamma _{ j}\beta _{j}(t)e_{j}. }$$
(129)

The following formal calculation gives insight into the properties of W:

$$\displaystyle\begin{array}{rcl} \mathbb{E}W(t) \otimes W(s)& =& \mathbb{E}{\big(\sum _{j=1}^{\infty }\sum _{ k=1}^{\infty }\gamma _{ j}\gamma _{k}\beta _{j}(t)\beta _{k}(s)e_{j} \otimes e_{k}\big)} {}\\ & =& {\big(\sum _{j=1}^{\infty }\sum _{ k=1}^{\infty }\gamma _{ j}\gamma _{k}\mathbb{E}{\big(\beta _{j}(t)\beta _{k}(t)\big)}e_{j} \otimes e_{k}\big)} {}\\ & =& {\big(\sum _{j=1}^{\infty }\sum _{ k=1}^{\infty }\gamma _{ j}\gamma _{k}\delta _{jk}(t \wedge s)e_{j} \otimes e_{k}\big)} {}\\ & =& \sum _{j=1}^{\infty }{\big(\gamma _{ j}^{2}\phi _{ j} \otimes \phi _{j}\big)}t \wedge s {}\\ & =& \mathcal{C}\,(t \wedge s). {}\\ \end{array}$$

Thus the process has the covariance structure of Brownian motion in time, and covariance operator \(\mathcal{C}\) in space. Hence the name \(\mathcal{C}\)-Wiener process.

Assume now that the sequence \(\gamma =\{\gamma _{j}\}_{j=1}^{\infty }\) is such that \(\sum \nolimits_{j=1}^{\infty }j^{2r}\gamma _{j}^{2} = M < \infty \) for some \(r \in \mathbb{R}.\) For fixed t it is then possible to construct a stochastic process as the limit of the truncated series

$$\displaystyle{ W^{N}(t) =\sum _{ j=1}^{N}\gamma _{ j}\beta _{j}(t)e_{j}, }$$

by means of a Cauchy sequence argument in \(L_{\mathbb{P}}^{2}(\varOmega ;\mathcal{X}^{r})\). Similarly W(t) − W(s) may be defined for any t, s. We may then also discuss the regularity of this process in time. Together equations (128),(129) give \(\mathbb{E}\|W(t) - W(s)\|_{r}^{2} = M^{2}\vert t - s\vert.\) It follows that \(\mathbb{E}\|W(t) - W(s)\|_{r} \leq M\vert t - s\vert ^{\frac{1} {2} }.\) Furthermore, since W(t) − W(s) is Gaussian, we have by (124) that \(\mathbb{E}\|W(t) - W(s)\|_{r}^{2q} \leq K_{q}\vert t - s\vert ^{q}.\) Applying the Kolmogorov continuity test of Theorem 30 then demonstrates that the process given by (129) may be viewed as an element of the space \(C^{0,\alpha }([0,T];\mathcal{X}^{r})\) for any \(\alpha < \frac{1} {2}.\) Similar arguments may be used to study the cylindrical Wiener process, showing that it lives in \(C^{0,\alpha }([0,T];\mathcal{X}^{r})\) for \(\alpha < \frac{1} {2}\) and \(r < -\frac{1} {2}.\)

1.5 A.5 Bibliographical Notes

  • Section A.1 introduces various Banach and Hilbert spaces, as well as the notion of separability; see [100]. In the context of PDEs, see [33] and [87], for all of the function spaces defined in Sects. A.1.1A.1.3; Sobolev spaces are developed in detail in [2]. The nonseparability of the Hölder spaces C 0, β and the separability of C 0 0, β is discussed in [40]. For asymptotics of the eigenvalues of the Laplacian operator see [91, Chapter 11]. For discussion of the more general spaces of E-valued functions over a measure space \((\mathcal{M},\nu )\) we refer the reader to [100]. Section A.1.5 concerns Sobolev embedding theorems, building rather explicitly on the case of periodic functions. The corresponding embedding results in domains with more general boundary conditions or even on more general manifolds or unbounded domains, we refer to the comprehensive series of monographs [9597]. The interpolation inequality of (106) and Lemma 15 may be found in [87]; see also Proposition 6.10 and Corollary 6.11 of [40]. The proof of Theorem 28 closely follows that given in [40, Theorem 6.16], and is a slight generalization to the Hilbert scale setting used here.

  • Section A.2 briefly introduces the theory of probability measures on infinite-dimensional spaces. We refer to the extensive treatise by Bogachev [15], and to the much shorter but more readily accessible book by Billingsley [12], for more details. The subject of independent sequences of random variables, as overviewed in Sect. A.2.1 in the i.i.d. case, is discussed in detail in [27, section 1.5.1]. The Kolmogorov Extension Theorem 29 is proved in numerous texts in the setting where \(X = \mathbb{R}\) [79]; since any Polish space is isomorphic to \(\mathbb{R}\) it may be stated as it is here. Proofs of Lemmas 17 and 19 may be found in [40], where they appear as Proposition 3.6 and Proposition 3.9 respectively. For (115) see [28, Chapter 2]. In Sect. A.2.2 we introduce the Bochner integral; see [13, 48] for further details. Lemma 18 and the resulting Corollary 3 are stated and proved in [14]. The topic of metrics on probability measures, introduced in Sect. A.2.4 is overviewed in [38], where detailed references to the literature on the subject may also be found; the second inequality in Lemma 22 is often termed the Pinsker inequality and can be found in [22]. Note that the choice of normalization constants in the definitions of the total variation and Hellinger metrics differs in the literature. For a more detailed account of material on weak convergence of probability measures we refer, for example, to [12, 15, 98]. A proof of the Kolmogorov continuity test as stated in Theorem 30 can be found in [85, p. 26] for simple case of D an interval and X a separable Banach space; the generalization given here may be found in a forthcoming uptodate version of [40].

  • The subject of Gaussian measures, as introduced in Sect. A.3, is comprehensively studied in [14] in the setting of locally convex topological spaces, including separable Banach spaces as a special case. See also [67] which is concerned with Gaussian random functions. The Fernique theorem 31 is proved in [35] and the reader is directed to [40] for a very clear exposition. In Theorem 31 it is possible to take for α any value smaller than \(1/(2\|C_{\mu }\|)\) and this value is sharp: see [66, Thm 4.1]. See [14, 67] for more details on the Cameron-Martin space, and proof of Theorem 32. Theorem 33 follows from Theorem 3.6.1 and Corollary 3.5.8 of [14]: Theorem 3.6.1 shows that the topological support is the closure of the Cameron-Martin space in B and Corollary 3.5.8 shows that the Cameron-Martin space is dense in B. The reproducing kernel Hilbert space for μ (or just reproducing kernel for short) appears widely in the literature and is isomorphic to the Cameron-Martin space in a natural way. There is considerable confusion between the two as a result. We retain in these notes the terminology from [14], but the reader should keep in mind that there are authors who use a slightly different terminology. Theorem 35 as stated is a consequence of Proposition 3 in section 18 in [67]. Turning now to the Hilbert space setting we note that Lemma 23 is proved as Proposition 3.15, and Theorem 36 appears as Exercise 3.34, in [40]. See [14, 28, 52] for alternative developments of the Cameron-Martin and Feldman-Hájek theorems. The original statement of the Feldman-Hájek Theorem 37 can be found in [34, 45]. Our statement of Theorem 37 mirrors Theorem 2.23 of [28] and Remark 7 is Lemma 6.3.1(ii) of [14]. Note that we have not stated a result analogous to Theorem 32 in the case where of two equivalent Gaussian measures with differing covariances. Such a result can be stated, but is technically complicated in general because the ratio of normalization constants of approximating finite-dimensional measures can blow up as the limiting infinite-dimensional Radon-Nikodym derivative is attained; see Corollary 6.4.11 in [14].

  • Section A.4 contains a discussion of cylindrical and \(\mathcal{C}\)-Wiener processes. The development is given in more detail in section 3.4 of [40], and in section 4.3 of [28].

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this entry

Cite this entry

Dashti, M., Stuart, A.M. (2015). The Bayesian Approach to Inverse Problems. In: Ghanem, R., Higdon, D., Owhadi, H. (eds) Handbook of Uncertainty Quantification. Springer, Cham. https://doi.org/10.1007/978-3-319-11259-6_7-1

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11259-6_7-1

  • Received:

  • Accepted:

  • Published:

  • Publisher Name: Springer, Cham

  • Online ISBN: 978-3-319-11259-6

  • eBook Packages: Springer Reference MathematicsReference Module Computer Science and Engineering

Publish with us

Policies and ethics