Skip to main content

The Principle of Energetic Consistency in Data Assimilation

  • Chapter
  • First Online:

Abstract

The preceding chapters have illustrated two essential features of data assimilation. First, to extract all the information available in the observations requires all the sources of uncertainty – in the initial conditions, the dynamics, and the observations – to be identified and accounted for properly in the data assimilation process. This task is complicated by the fact that the non-linear dynamical system actually being observed is typically an infinite-dimensional (continuum) system, whereas at one’s disposal is only a finite-dimensional (discrete) numerical model of the continuum system dynamics. Second, to formulate a computationally viable data assimilation algorithm requires some probabilistic assumptions and computational approximations to be made. Those made in four-dimensional variational (4D-Var) and ensemble Kalman filter (EnKF) methods have been discussed in Chapters Variational Assimilation (Talagrand) and Ensemble Kalman Filter: Status and Potential (Kalnay), respectively.

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Staniforth et al. (2003) point out that the total energy of such an atmosphere is actually \(E + E^{\prime}\), where \(E^{\prime}\) is the constant

    $$E^{\prime} = \frac{1}{g} \int\,\int\,\int_{0}^{1} \phi_{s} p_{t}\,\textrm{d}\sigma\,a^{2} \textrm{d} S = \frac{p_{t}}{g} \int \int \phi_{s}\,a^{2} \textrm{d} S\,,$$

    and they give corresponding expressions for E and \(E^{\prime}\) for deep and/or non-hydrostatic atmospheres. Also, note that a moisture variable is not considered to be an atmospheric state variable for the purposes of this chapter, since moisture in the atmosphere contributes only indirectly to the total energy integral. Thus, choosing state variables to be energy variables does not imply a choice of moisture variables. Dee and da Silva (2003) discuss the many practical considerations involved in the choice of moisture variables.

  2. 2.

    To derive the PEC in the finite-dimensional case, apply the expectation operator to the identity

    $$||\textbf{s}||^{2} = ||\overline{\textbf{s}}||^{2} + 2(\overline{\textbf{s}},\textbf{s}^{\prime}) + ||\textbf{s}^{\prime}||^{2} \,,$$

    where \(\textbf{s}^{\prime} = \textbf{s} - \overline{\textbf{s}}\) and the time argument has been omitted, to obtain

    $$||\overline{\textbf{s}}||^{2} + {\mathcal E}||\textbf{s}^{\prime}||^{2} = {\mathcal E}||\textbf{s}||^{2}\,.$$

    Since \((\textbf{f},\textbf{g}) = \textbf{f}^{T}\textbf{B}\textbf{g}\) for some symmetric positive definite matrix \(\textbf{B}\) and all vectors \(\textbf{f},\textbf{g} \in {\mathcal H}\) in case \({\mathcal H}\) is finite-dimensional, it follows from the definition given by Eq. (2) that in this case the covariance operator \({\mathcal P}\) has matrix representation \(\textbf{P} = {\mathcal E}\textbf{s}^{\prime}\textbf{s}^{\prime T}\textbf{B}\), where the expectation operator applied to a matrix of random variables is defined to act elementwise as usual. Therefore

    $${\mathcal E}||\textbf{s}^{\prime}||^{2} = {\mathcal E}\textbf{s}^{\prime T}\textbf{B}\textbf{s}^{\prime} = {\mathcal E}\,\textrm{tr} \,\textbf{s}^{\prime}\textbf{s}^{\prime T}\textbf{B} = \textrm{tr}\,{\mathcal E}\textbf{s}^{\prime}\textbf{s}^{\prime T}\textbf{B} = \textrm{tr}\,\textbf{P} \,.$$
  3. 3.

    This follows from Appendix 1d. Note first that

    $$s^{k}[\textbf{g}] = {\mathcal E}((\textbf{g},\textbf{s})|\textbf{y}^{k})\,,$$

    where the time argument has been omitted, defines a random linear functional on \({\mathcal H}\). Let \(\{\textbf{g}_{i}\}_{i=1}^{\infty}\) be a countable orthonormal basis for \({\mathcal H}\). Then

    $$\left(s^{k}[\textbf{g}_{i}]\right)^{2} \leq {\mathcal E}((\textbf{g}_{i},\textbf{s})^{2}|\textbf{y}^{k})$$

    by the Schwarz inequality, and taking expectations gives

    $${\mathcal E}\left(s^{k}[\textbf{g}_{i}]\right)^{2} \leq {\mathcal E}(\textbf{g}_{i},\textbf{s})^{2}\,,$$

    for \(i = 1, 2, \ldots\). Therefore,

    $$\sum_{i=1}^{\infty}{\mathcal E}\left(s^{k}[\textbf{g}_{i}]\right)^{2} \leq \sum_{i=1}^{\infty}{\mathcal E}(\textbf{g}_{i},\textbf{s})^{2} = {\mathcal E}\sum_{i=1}^{\infty}(\textbf{g}_{i},\textbf{s})^{2} = {\mathcal E}||\textbf{s}||^{2} < \infty\,.$$

    Hence by the construction of Appendix 1d, there exists an \({\mathcal H}\)-valued random variable \(\overline{\textbf{s}}^{k}\) such that \({\mathcal E}\|\overline{\textbf{s}}^{k}\|^{2} < \infty\) and, for all \(\textbf{g} \in {\mathcal H}\),

    $$(\textbf{g},\overline{\textbf{s}}^{k}) = s^{k}[\textbf{g}]$$

    with probability one. The construction shows that \(\overline{\textbf{s}}^{k}\) is defined uniquely on the set of \(\omega\in\varOmega \) where \(\sum_{i=1}^{\infty}\left(s^{k}[\textbf{g}_{i}]\right)^{2} < \infty\), which must have probability measure one.

  4. 4.

    To see this, first define the conditional covariance functional

    $$C^{k}[\textbf{f},\textbf{g}] = {\mathcal E}\left[(\textbf{f},\textbf{s}-\overline{\textbf{s}}^{k})(\textbf{g},\textbf{s}-\overline{\textbf{s}}^{k})|\textbf{y}^{k}\right]\,,$$

    where the time argument has been omitted. The functional \(C^{k}\) is a symmetric, positive semidefinite, random bilinear functional on \({\mathcal H}\). As in Eq. (64) of Appendix 1c,

    $$\left|C^{k}[\textbf{f},\textbf{g}]\right| \leq ||\textbf{f}||\,||\textbf{g}||\,{\mathcal E}(||\textbf{s}-\overline{\textbf{s}}^{k}||^{2}|\textbf{y}^{k})$$

    for all \(\textbf{f},\textbf{g} \in {\mathcal H}\). Therefore, for each \(\omega \in \varOmega\) where \({\mathcal E}(\|\textbf{s}-\overline{\textbf{s}}^{k}\|^{2}|\textbf{y}^{k}) < \infty\), there exists a unique bounded linear operator \({\mathcal P}^{k}: {\mathcal H} \rightarrow {\mathcal H}\) such that \((\textbf{f},{\mathcal P}^{k}\textbf{g}) = C^{k}[\textbf{f},\textbf{g}]\) for all \(\textbf{f},\textbf{g} \in {\mathcal H}\), and this operator is self-adjoint, positive semidefinite, and trace class, with

    $$\textrm{tr}\,{\mathcal P}^{k} = {\mathcal E}(||\textbf{s}-\overline{\textbf{s}}^{k}||^{2}|\textbf{y}^{k})\,.$$

    But \({\mathcal E}(\|\textbf{s}-\overline{\textbf{s}}^{k}\|^{2}|\textbf{y}^{k}) < \infty\) with probability one, since

    $${\mathcal E}||\textbf{s}-\overline{\textbf{s}}^{k}||^{2} \leq 2{\mathcal E}||\textbf{s}||^{2} + 2{\mathcal E}||\overline{\textbf{s}}^{k}||^{2} < \infty$$

    by the parallelogram law. Thus the set of \(\omega \in \Omega\) where \({\mathcal E}\left(\|\textbf{s}-\overline{\textbf{s}}^{k}\|^{2}|\textbf{y}^{k}\right) = \infty\) has probability measure zero. Upon defining \({\mathcal P}^{k}\) to be the zero operator on this set, it follows that \({\mathcal P}^{k}\) is bounded, self-adjoint, positive semidefinite and trace class for all \(\omega \in \Omega\), and that Eqs. (7) and (8) hold with probability one.

  5. 5.

    This follows by taking expectations on the identity

    $$||\textbf{s}-\widetilde{\textbf{s}}^{k}||^{2} = ||\textbf{s}-\overline{\textbf{s}}^{k}||^{2} + 2 (\textbf{s}-\overline{\textbf{s}}^{k},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k}) + ||\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k}||^{2}\,,$$

    where the time argument has been omitted, and noting that \({\mathcal E}(\textbf{s}-\overline{\textbf{s}}^{k},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k}) = 0\) since

    $${\mathcal E}\left[(\textbf{s}-\overline{\textbf{s}}^{k},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k})|\textbf{y}^{k}\right] = 0$$

    with probability one. The latter equality can be shown in the infinite-dimensional case as follows. Let \(\{\textbf{g}_{i}\}_{i=1}^{\infty}\) be a countable orthonormal basis for \({\mathcal H}\). Then

    $$\begin{aligned}{\mathcal E}\left[(\textbf{s}-\overline{\textbf{s}}^{k},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k})|\textbf{y}^{k}\right] &{}= {\mathcal E}\left[\sum_{i=1}^{\infty}(\textbf{g}_{i},\textbf{s}-\overline{\textbf{s}}^{k})(\textbf{g}_{i},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k})|\textbf{y}^{k}\right] \nonumber \\ &{}= \sum_{i=1}^{\infty}{\mathcal E}\left[(\textbf{g}_{i},\textbf{s}-\overline{\textbf{s}}^{k})(\textbf{g}_{i},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k})|\textbf{y}^{k}\right]\,, \end{aligned}$$

    since

    $${\mathcal E}\sum_{i=1}^{\infty}|(\textbf{g}_{i},\textbf{s}-\overline{\textbf{s}}^{k})(\textbf{g}_{i},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k})| \leq \left({\mathcal E}||\textbf{s}-\overline{\textbf{s}}^{k}||^{2}\right)^{1/2} \left({\mathcal E}||\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k}||^{2}\right)^{1/2} < \infty\,;$$

    cf. Doob (1953, Property \(\textrm{CE}_{5}\), p. 23). But for each \(i = 1, 2, \ldots\),

    $${\mathcal E}\left[(\textbf{g}_{i},\textbf{s}-\overline{\textbf{s}}^{k})(\textbf{g}_{i},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k})|\textbf{y}^{k}\right] = (\textbf{g}_{i},\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k})\,{\mathcal E}\left[(\textbf{g}_{i},\textbf{s}-\overline{\textbf{s}}^{k})|\textbf{y}^{k}\right] = 0$$

    with probability one, since \(\overline{\textbf{s}}^{k}-\widetilde{\textbf{s}}^{k}\) depends only on \(\textbf{y}^{k}\) and since \({\mathcal E}\left[(\textbf{g}_{i},\textbf{s}-\overline{\textbf{s}}^{k})|\textbf{y}^{k}\right] = 0\) with probability one.

  6. 6.

    For instance, if \({\mathcal H}^{N}\) consists of the elements of \({\mathcal H}\) that are constant on grid volumes \(V_{j}\) of a numerical model of hydrostatic atmospheric dynamics with an unstaggered grid, then the matrix \(\textbf{B}\) is the block-diagonal matrix with diagonal blocks

    $$\textbf{B}_{j} = \int\,\int\,\int_{V_{j}} \textbf{A} \,\textrm{d}\sigma\,a^{2} \textrm{d} S\,,$$

    where the diagonal matrix \(\textbf{A}\) was defined in Sect. 2.1. In the general case, \({\mathcal H}^{N}\) is isometrically isomorphic to the Hilbert space \({\mathcal G}^{N}\) of real N-vectors with the stated inner product and corresponding norm; cf. Reed and Simon (1972, Theorem II.7, p. 47). Thus, viewing the elements of \({\mathcal H}^{N}\) as real N-vectors means it is understood that an isometric isomorphism has been applied to elements of \({\mathcal H}^{N}\) to obtain elements of \({\mathcal G}^{N}\). Then \({\mathcal H}^{N}\)-valued random variables become \({\mathcal G}^{N}\)-valued random variables, because an isometric isomorphism is norm-continuous.

  7. 7.

    If \({\mathcal D}_\textbf{x}\) is convex then \(\overline{\textbf{x}}_{t_{i}} \in {\mathcal D}_\textbf{x}\) for \(i = 0, \ldots, K\) (e.g. Cohn 2009, p. 454), and therefore \(\overline{\textbf{x}}_{t_{i}}^{k} \in {\mathcal D}_\textbf{x}\) with probability one, for \(i = 0, \ldots, K\).

  8. 8.

    It follows that, for any sequence \(\{r_{n}\}_{n=0}^{\infty}\) with \(0 \leq r_{0} < r_{1} < r_{2} < \cdots \rightarrow \infty\), \(\varPhi = \cap_{n=0}^{\infty}\varPhi_{r_{n}}\) is a separable Frèchet space, and since the norms \(\|\cdot\|_{r_{n}}\) are Hilbertian seminorms on Φ, also a countably Hilbertian space. If \((\textbf{I}+\textbf{L})^{-p_{1}}\) is not just compact but in fact Hilbert-Schmidt, and if, for instance, \(p_{n} = np_{1}\), then \(\varPhi= \cap_{n=0}^{\infty}\varPhi_{p_{n}}\) is a countably Hilbertian nuclear space, and it is possible to define \(\varPhi^\prime\)-valued random variables, where \(\varPhi^\prime\) is the dual space of Φ. Such random variables are useful for stochastic differential equations in infinite-dimensional spaces (see the books of Itô 1984 and Kallianpur and Xiong 1995), but are not immediately important for the principle of energetic consistency developed in this chapter.

References

  • Anderson, J.L. and S.L. Anderson, 1999. A Monte Carlo implementation of the nonlinear filtering problem to produce ensemble assimilations and forecasts. Mon. Weather Rev., 127, 2741–2758.

    Article  Google Scholar 

  • Coddington, E.A., and N. Levinson, 1955. Theory of Ordinary Differential Equations, McGraw-Hill, NewYork.

    Google Scholar 

  • Cohn, S.E., 1993. Dynamics of short-term univariate forecast error covariances. Mon. Weather Rev., 121, 3123–3149.

    Article  Google Scholar 

  • Cohn, S.E., 1997. An introduction to estimation theory. J. Meteor. Soc. Jpn., 75, 257–288.

    Google Scholar 

  • Cohn, S.E., 2009. Energetic consistency and coupling of the mean and covariance dynamics. In Handbook of Numerical Analysis, vol. XIV, P.G. Ciarlet, (ed.), pp. 443–478. Special Volume: Computational Methods for the Atmosphere and the Oceans, Temam, R.M. and J.J. Tribbia (guest eds.), Elsevier, Amsterdam.

    Google Scholar 

  • Courant, R. and D. Hilbert, 1962. Methods of Mathematical Physics, vol. II: Partial Differential Equations, Wiley, NewYork.

    Google Scholar 

  • Cox, H., 1964. On the estimation of state variables and parameters for noisy dynamic systems. IEEE Trans. Automat. Contr., 9, 5–12.

    Article  Google Scholar 

  • Dee, D.P., and A.M. da Silva, 2003. The choice of variable for atmospheric moisture analysis. Mon. Weather Rev., 131, 155–171.

    Article  Google Scholar 

  • Doob, J.L., 1953. Stochastic Processes, Wiley, NewYork.

    Google Scholar 

  • Epstein, E.S., 1969. Stochastic dynamic prediction. Tellus, 21, 739–759.

    Article  Google Scholar 

  • Fisher, M., M. Leutbecher, and G.A. Kelly, 2005. On the equivalence between Kalman smoothing and weak-constraint four-dimensional variational data assimilation. Q. J. R. Meteorol. Soc., 131, 3235–3246.

    Article  Google Scholar 

  • Fleming, R.J., 1971. On stochastic dynamic prediction I. The energetics of uncertainty and the question of closure. Mon. Weather Rev., 99, 851–872.

    Article  Google Scholar 

  • Houtekamer, P.L., and H.L. Mitchell, 2001. A sequential ensemble Kalman filter for atmospheric data assimilation. Mon. Weather Rev., 129, 123–137.

    Article  Google Scholar 

  • Houtekamer, P.L., and H.L. Mitchell, 2005. Ensemble Kalman filtering. Q. J. R. Meteorol. Soc., 131, 3269–3289.

    Article  Google Scholar 

  • Houtekamer, P.L., and Co-authors, 2005. Atmospheric data assimilation with an ensemble Kalman filter: Results with real observations. Mon. Weather Rev., 133, 604–620.

    Article  Google Scholar 

  • Itô, K., 1984. Foundations of Stochastic Differential Equations in Infinite Dimensional Spaces. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 47, Society for Industrial and Applied Mathematics, Philadelphia, PA.

    Google Scholar 

  • Janjić, T. and S.E. Cohn, 2006. Treatment of observation error due to unresolved scales in atmospheric data assimilation. Mon. Weather Rev., 134, 2900–2915.

    Article  Google Scholar 

  • Jazwinski, A.H., 1970. Stochastic Processes and Filtering Theory, Academic Press, NewYork.

    Google Scholar 

  • Kallianpur, G., and J. Xiong, 1995. Stochastic Differential Equations in Infinite Dimensional Spaces, Lecture Notes-Monograph Series, vol. 26, Institute of Mathematical Statistics, Hayward, CA.

    Google Scholar 

  • Kasahara, A., 1974. Various vertical coordinate systems used for numerical weather prediction. Mon. Weather Rev., 102, 509–522.

    Article  Google Scholar 

  • Kraichnan, R.H., 1961. Dynamics of nonlinear stochastic systems. J. Math. Phys., 2, 124–148.

    Article  Google Scholar 

  • Kreiss, H.-O. and J. Lorenz, 1989. Initial-Boundary Value Problems and the Navier-Stokes Equations, Academic Press, NewYork.

    Google Scholar 

  • Lax, P.D., 1973. Hyperbolic Systems of Conservation Laws and the Mathematical Theory of Shock Waves. CBMS-NSF Regional Conference Series in Applied Mathematics, vol. 11, Society for Industrial and Applied Mathematics, Philadelphia, PA.

    Google Scholar 

  • Lax, P.D., 2006. Hyperbolic Partial Differential Equations, Courant Lecture Notes in Mathematics, Vol. 14, American Mathematical Society, NewYork.

    Google Scholar 

  • Lin, S.-J., 2004. A “vertically Lagrangian” finite-volume dynamical core for global models. Mon. Weather Rev., 132, 2293–2307.

    Article  Google Scholar 

  • Lin, S.-J., and R.B. Rood, 1997. An explicit flux-form semi-Lagrangian shallow-water model on the sphere. Q. J. R. Meteorol. Soc., 123, 2477–2498.

    Article  Google Scholar 

  • Ménard, R., and L.-P. Chang, 2000. Assimilation of stratospheric chemical tracer observations using a Kalman filter. Part II: χ2-validated results and analysis of variance and correlation dynamics. Mon. Weather Rev., 128, 2672–2686.

    Article  Google Scholar 

  • Ménard, R. and Co-authors, 2000. Assimilation of stratospheric chemical tracer observations using a Kalman filter. Part I: Formulation. Mon. Weather Rev., 128, 2654–2671.

    Article  Google Scholar 

  • Mitchell, H.L., P.L. Houtekamer and G. Pellerin, 2002. Ensemble size, balance, and model-error representation in an ensemble Kalman filter. Mon. Weather Rev., 130, 2791–2808.

    Article  Google Scholar 

  • Omatu, S. and J.H. Seinfeld, 1989. Distributed Parameter Systems: Theory and Applications, Oxford University Press, NewYork.

    Google Scholar 

  • Ott, E. and Co-authors, 2004. A local ensemble Kalman filter for atmospheric data assimilation. Tellus, 56A, 415–428.

    Article  Google Scholar 

  • Reed, M. and B. Simon, 1972. Methods of Modern Mathematical Physics, vol. I: Functional Analysis, Academic Press, NewYork.

    Google Scholar 

  • Riesz, F. and B. Sz.-Nagy, 1955. Functional Analysis, Frederick Ungar, NewYork.

    Google Scholar 

  • Royden, H.L., 1968. Real Analysis, 2nd ed., Macmillan, NewYork.

    Google Scholar 

  • Rudin, W., 1991. Functional Analysis, 2nd ed., McGraw-Hill, NewYork.

    Google Scholar 

  • Staniforth, A., N. Wood and C. Girard, 2003. Energy and energy-like invariants for deep non-hydrostatic atmospheres. Q. J. R. Meteorol. Soc., 129, 3495–3499.

    Article  Google Scholar 

  • Trémolet, Y., 2006. Accounting for an imperfect model in 4D-Var. Q. J. R. Meteorol. Soc., 132, 2483–2504.

    Article  Google Scholar 

  • Trémolet, Y., 2007. Model-error estimation in 4D-Var. Q. J. R. Meteorol. Soc., 133, 1267–1280.

    Article  Google Scholar 

  • von Storch, H. and F.W. Zwiers, 1999. Statistical Analysis in Climate Research, Cambridge University Press, NewYork.

    Google Scholar 

Download references

Acknowledgments

The framework of this chapter was motivated in large part by the tracer assimilation work of Ménard et al. (2000) and Ménard and Chang (2000), which identified the problem of spurious variance loss and also made clear the usefulness of having a conserved scalar quantity to work with in data assimilation. The author would like to thank the editors of this book for their diligence and superb work. The generous support of NASA’s Modeling, Analysis and Prediction program is also gratefully acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephen E. Cohn .

Editor information

Editors and Affiliations

Appendices

Appendix 1: Random Variables Taking Values in Hilbert Space

Appendix 1a defines Hilbert space-valued random variables and gives some of their main properties. Appendices 1b–1d give the definition, main properties and general construction, respectively, of Hilbert space-valued random variables of second order. Definitions of basic terms used in this appendix are provided in Appendix 3. Further treatment of Hilbert space-valued random variables, and of random variables taking values in more general spaces, can be found in the books of Itô (1984) and Kallianpur and Xiong (1995).

Hilbert space-valued random variables, like scalar random variables, are defined with reference to some probability space \((\varOmega,{\mathcal F},P)\), with Ω the sample space, \({\mathcal F}\) the event space and P the probability measure. Thus throughout this appendix, a probability space \((\varOmega,{\mathcal F},P)\) is considered to be given. The expectation operator is denoted by \({\mathcal E}\). It is assumed that the given probability space is complete.

A real, separable Hilbert space \({\mathcal H}\) is also considered to be given. The inner product and corresponding norm on \({\mathcal H}\) are denoted by \((\cdot,\cdot)\) and \(\|\cdot\|\), respectively. The Borel field generated by the open sets in \({\mathcal H}\) is denoted by \({\mathcal B}({\mathcal H})\), i.e., \({\mathcal B}({\mathcal H})\) is the smallest σ-algebra of sets in \({\mathcal H}\) that contains all the open sets in \({\mathcal H}\). Recall that every separable Hilbert space has a countable orthonormal basis, and that every orthonormal basis of a separable Hilbert space has the same number of elements \(N \leq \infty\), the dimension of the space. For notational convenience it is assumed in this appendix that \({\mathcal H}\) is infinite-dimensional, with \(\{\textbf{h}_{i}\}_{i=1}^{\infty}\) denoting an orthonormal basis for \({\mathcal H}\). The results of this appendix hold just as well in the finite-dimensional case, by taking \(\{\textbf{h}_{i}\}_{i=1}^{N}\), \(N < \infty\), as an orthonormal basis for \({\mathcal H}\), and by replacing infinite sums by finite ones.

1.1 \({\mathcal H}\)-Valued Random Variables

Recall that if X and Y are sets, f is a map from X into Y, and B is a subset of Y, then the set

$$\textbf{f}^{-1}[B] = \{\textbf{x} \in X: \textbf{f}(\textbf{x}) \in B\}$$

is called the inverse image of B (under f). Recall also that the event space \({\mathcal F}\) of the probability space \((\varOmega,{\mathcal F},P)\) consists of the measurable subsets of Ω, which are called events.

Let \((Y,{\mathcal C})\) be a measurable space, i.e., Y is a set and \({\mathcal C}\) is a σ-algebra of subsets of Y. A map \(\textbf{f}: \varOmega \rightarrow Y\) is called a \((Y,{\mathcal C})\)-valued random variable if the inverse image of every set C in the collection \({\mathcal C}\) is an event, i.e., if \(\textbf{f}^{-1}[C] \in {\mathcal F}\) for every set \(C \in {\mathcal C}\) (e.g. Itô 1984, p. 18; Kallianpur and Xiong 1995, p. 86; see also Reed and Simon 1972, p. 24).

Thus an \(({\mathcal H},{\mathcal B}({\mathcal H}))\)-valued random variable is a map \(\textbf{s}: \varOmega \rightarrow {\mathcal H}\) such that

$$\{\omega \in \varOmega : \textbf{s}(\omega) \in B\} \in {\mathcal F}$$

for every set \(B \in {\mathcal B}({\mathcal H})\). Hereafter, an \(({\mathcal H},{\mathcal B}({\mathcal H}))\)-valued random variable is called simply an \({\mathcal H}\)-valued random variable, with the understanding that this always means an \(({\mathcal H},{\mathcal B}({\mathcal H}))\)-valued random variable. An equivalent definition of \({\mathcal H}\)-valued random variables, expressed in terms of scalar random variables, is given in Appendix 1b.

Let \({\mathcal S}\) be a non-empty set in \({\mathcal B}({\mathcal H})\). It follows that the collection \({\mathcal B}_{\mathcal S}({\mathcal H})\) of all sets in \({\mathcal B}({\mathcal H})\) that are subsets of \({\mathcal S}\),

$${\mathcal B}_{\mathcal S}({\mathcal H}) = \{B \in {\mathcal B}({\mathcal H}): B \subset {\mathcal S}\}\,,$$

is a σ-algebra of subsets of \({\mathcal S}\), namely, the collection of all sets C of the form \(C = B \cap {\mathcal S}\) with \(B \in {\mathcal B}({\mathcal H})\). Hence \(({\mathcal S},{\mathcal B}_{\mathcal S}({\mathcal H}))\) is a measurable space, and an \(({\mathcal S},{\mathcal B}_{\mathcal S}({\mathcal H}))\)-valued random variable is a map \(\textbf{s}: \varOmega \rightarrow {\mathcal S}\) such that

$$\{\omega \in \varOmega : \textbf{s}(\omega) \in C\} \in {\mathcal F}$$

for every set \(C \in {\mathcal B}_{\mathcal S}({\mathcal H})\). Hereafter, an \(({\mathcal S},{\mathcal B}_{\mathcal S}({\mathcal H}))\)-valued random variable is called simply an \({\mathcal S}\)-valued random variable, with the understanding that this always means an \(({\mathcal S},{\mathcal B}_{\mathcal S}({\mathcal H}))\)-valued random variable.

It follows by definition that every \({\mathcal S}\)-valued random variable is an \({\mathcal H}\)-valued random variable, for if \(\textbf{s}: \varOmega \rightarrow {\mathcal S}\) and \(\textbf{s}^{-1}[C] \in {\mathcal F}\) for every set \(C \in {\mathcal B}_{\mathcal S}({\mathcal H})\), then \(\textbf{s}^{-1}[B] = \textbf{s}^{-1}[B \cap {\mathcal S}] \in {\mathcal F}\) for every set \(B \in {\mathcal B}({\mathcal H})\). Also, every \({\mathcal H}\)-valued random variable taking values only in \({\mathcal S}\) is an \({\mathcal S}\)-valued random variable, for if \(\textbf{s}: \varOmega \rightarrow {\mathcal S}\) and \(\textbf{s}^{-1}[B] \in {\mathcal F}\) for every set \(B \in {\mathcal B}({\mathcal H})\), then in particular \(\textbf{s}^{-1}[C] \in {\mathcal F}\) for every set \(C \in {\mathcal B}_{\mathcal S}({\mathcal H})\).

Finally, let N be a continuous map from \({\mathcal S}\) into \({\mathcal H}\). It follows that if s is an \({\mathcal S}\)-valued random variable, then N(s) is an \({\mathcal H}\)-valued random variable, i.e. that

$$\{\omega \in \varOmega : \textbf{N}(\textbf{s}(\omega)) \in B\} \in {\mathcal F}$$

for every set \(B \in {\mathcal B}({\mathcal H})\). To see this, note first that

$$\{\omega \in \varOmega : \textbf{N}(\textbf{s}(\omega)) \in B\} = \textbf{s}^{-1}[\textbf{N}^{-1}[B]]\,,$$

and consider the class of sets E in \({\mathcal H}\) such that \(\textbf{N}^{-1}[E] \in {\mathcal B}_{\mathcal S}({\mathcal H})\). It can be checked that this class of sets is a σ-algebra. Moreover, this class contains all the open sets in \({\mathcal H}\), because if O is an open set in \({\mathcal H}\) then \(\textbf{N}^{-1}[O]\) is also an open set in \({\mathcal H}\) by the continuity of N (e.g. Reed and Simon 1972, p. 8) and so

$$C = \textbf{N}^{-1}[O] = \textbf{N}^{-1}[O] \cap {\mathcal S} \in {\mathcal B}_{\mathcal S}({\mathcal H})\,.$$

But \({\mathcal B}({\mathcal H})\) is the smallest σ-algebra containing all the open sets in \({\mathcal H}\), hence this class includes \({\mathcal B}({\mathcal H})\), i.e., \(\textbf{N}^{-1}[B] \in {\mathcal B}_{\mathcal S}({\mathcal H})\) for every set \(B \in {\mathcal B}({\mathcal H})\). If s is an \({\mathcal S}\)-valued random variable then \(\textbf{s}^{-1}[C] \in {\mathcal F}\) for every set \(C \in {\mathcal B}_{\mathcal S}({\mathcal H})\), and therefore \(\textbf{s}^{-1}[\textbf{N}^{-1}[B]] \in {\mathcal F}\) for every set \(B \in {\mathcal B}({\mathcal H})\), i.e., \(\textbf{N}(\textbf{s})\) is an \({\mathcal H}\)-valued random variable.

1.2 Second-Order \({\mathcal H}\)-Valued Random Variables

If s is an \({\mathcal H}\)-valued random variable and \(\textbf{h} \in {\mathcal H}\), then by the Schwarz inequality,

$$|(\textbf{h},\textbf{s}(\omega))| \leq \|\textbf{h}\|\,\|\textbf{s}(\omega)\| < \infty$$
((61))

for all \(\omega \in \varOmega\), so for each fixed \(\textbf{h} \in {\mathcal H}\), the inner product (h,s) is a map from Ω into \(\mathbb{R}\). In fact, it can be shown (e.g. Kallianpur and Xiong 1995, Corollary 3.1.1(b), p. 87) that a map \(\textbf{s}: \varOmega \rightarrow {\mathcal H}\) is an \({\mathcal H}\)-valued random variable if, and only if, (h,s) is a scalar random variable for every \(\textbf{h} \in {\mathcal H}\). That is, a map \(\textbf{s}: \varOmega \rightarrow {\mathcal H}\) is an \({\mathcal H}\)-valued random variable if, and only if,

$$\{\omega \in \varOmega : (\textbf{h},\textbf{s}(\omega)) \leq \alpha\} \in {\mathcal F}$$

for every \(\textbf{h} \in {\mathcal H}\) and every \(\alpha \in \mathbb{R}\).

It follows that if s is an \({\mathcal H}\)-valued random variable, then \(\|\textbf{s}\|^{2}\) is a scalar random variable, that is,

$$\{\omega \in \varOmega : ||\textbf{s}(\omega)||^{2} \leq \alpha\} \in {\mathcal F}$$

for every \(\alpha \in \mathbb{R}\). To see this, observe that if s is an \({\mathcal H}\)-valued random variable, then \((\textbf{h}_{i},\textbf{s})\) for \(i = 1, 2, \ldots\) are scalar random variables, hence

$$s_n = \sum_{i=1}^{n} (\textbf{h}_{i},\textbf{s})^{2}$$

are scalar random variables with \(0 \leq s_n \leq s_{n+1}\) for \(n = 1, 2, \ldots\), and by Parseval’s relation,

$$||\textbf{s}(\omega)||^{2} = \sum_{i=1}^{\infty} (\textbf{h}_{i},\textbf{s}(\omega))^{2} = \lim_{n \rightarrow \infty} s_{n}(\omega)$$

for all \(\omega \in \varOmega\). Thus \(\|\textbf{s}\|^{2}\) is the limit of an increasing sequence of non-negative scalar random variables, and is therefore a (non-negative) scalar random variable.

If a map \(\textbf{s}: \varOmega \rightarrow {\mathcal H}\) is an \({\mathcal H}\)-valued random variable, then since \(\|\textbf{s}\|^{2} \geq 0\) is a scalar random variable, it follows that \({\mathcal E}\|\textbf{s}\|^{2}\) is defined and either \({\mathcal E}\|\textbf{s}\|^{2} = \infty\) or \({\mathcal E}\|\textbf{s}\|^{2} < \infty\). An \({\mathcal H}\)-valued random variable s is called second-order if \({\mathcal E}\|\textbf{s}\|^{2} < \infty\).

1.3 Properties of Second-Order \({\mathcal H}\)-Valued Random Variables

In this subsection let \(\textbf{s}: \varOmega \rightarrow {\mathcal H}\) be a second-order \({\mathcal H}\)-valued random variable. Since \({\mathcal E}\|\textbf{s}\|^{2} < \infty\), it follows from Eq. (61) that

$${\mathcal E}(\textbf{h},\textbf{s})^{2} \leq \|\textbf{h}\|^{2}\,{\mathcal E}\|\textbf{s}\|^{2} < \infty$$
((62))

for each \(\textbf{h} \in {\mathcal H}\). Thus, for each \(\textbf{h} \in {\mathcal H}\), (h,s) is a second-order scalar random variable, and therefore its mean is defined and finite. The mean of (h,s) will be denoted by

$$m[\textbf{h}] = {\mathcal E}(\textbf{h},\textbf{s})\,,$$

for each \(\textbf{h} \in {\mathcal H}\). Since \({\mathcal E}\|\textbf{s}\|^{2} < \infty\), \(\|\textbf{s}\|\) is a second-order scalar random variable, and its mean \(M = {\mathcal E}\|\textbf{s}\|\) satisfies \(0 \leq M \leq \left({\mathcal E}\|\textbf{s}\|^{2}\right)^{1/2} < \infty\). Now

$$\left|m[\textbf{h}]\right| = \left|{\mathcal E}(\textbf{h},\textbf{s})\right| \leq {\mathcal E}\left|(\textbf{h},\textbf{s})\right| \leq M ||\textbf{h}||$$

for each \(\textbf{h} \in {\mathcal H}\), by Eq. (61), and also

$$m[\alpha \textbf{g} + \beta \textbf{h}] = \alpha m[\textbf{g}] + \beta m[\textbf{h}]$$

for each \(\textbf{g},\textbf{h} \in {\mathcal H}\) and \(\alpha, \beta \in \mathbb{R}\). Thus \(m[\cdot]\) is a bounded linear functional on \({\mathcal H}\), and by the Riesz representation theorem for Hilbert space (e.g. Royden 1968, p. 213; Reed and Simon 1972, p. 43) this implies that there exists a unique element \(\overline{\textbf{s}} \in {\mathcal H}\), called the mean of s, such that

$$m[\textbf{h}] = (\textbf{h},\overline{\textbf{s}})$$

for each \(\textbf{h} \in {\mathcal H}\). Thus the mean \(\overline{\textbf{s}}\) of s is defined uniquely in \({\mathcal H}\), and satisfies \((\textbf{h},\overline{\textbf{s}}) = {\mathcal E}(\textbf{h},\textbf{s})\) for every \(\textbf{h} \in {\mathcal H}\).

Now let \(\textbf{s}^{\prime}(\omega) = \textbf{s}(\omega) - \overline{\textbf{s}}\) for each \(\omega \in \varOmega\). Since

$$||\textbf{s}^{\prime}(\omega)|| \leq ||\textbf{s}(\omega)|| + ||\overline{\textbf{s}}|| < \infty$$

for each \(\omega \in \varOmega\), \(\textbf{s}^\prime = \textbf{s} - \overline{\textbf{s}}\) is a map from Ω into \({\mathcal H}\). Furthermore, for every \(\textbf{h} \in {\mathcal H}\), (h,s) is a scalar random variable, \(\left|(\textbf{h},\overline{\textbf{s}})\right| < \infty\), and \(\left|(\textbf{h},\textbf{s}(\omega))\right| < \infty\) for each \(\omega \in \varOmega\). Therefore \((\textbf{h},\textbf{s}^{\prime}) = (\textbf{h},\textbf{s}) - (\textbf{h},\overline{\textbf{s}})\) is a scalar random variable for every \(\textbf{h} \in {\mathcal H}\), and hence \(\textbf{s}^\prime\) is an \({\mathcal H}\)-valued random variable. Also,

$${\mathcal E}(\textbf{h},\textbf{s}^{\prime}) = {\mathcal E}(\textbf{h},\textbf{s}) - (\textbf{h},\overline{\textbf{s}}) = 0$$

for every \(\textbf{h} \in {\mathcal H}\), so the mean of \(\textbf{s}^{\prime}\) is \(\textbf{0} \in {\mathcal H}\). Thus

$${\mathcal E}\|\textbf{s}\|^{2} = {\mathcal E}(\overline{\textbf{s}} + \textbf{s}^{\prime}, \overline{\textbf{s}} + \textbf{s}^{\prime}) = \|\overline{\textbf{s}}\|^{2} + {\mathcal E}\|\textbf{s}^{\prime}\|^{2}$$
((63))

and, in particular, \({\mathcal E}\|\textbf{s}^{\prime}\|^{2} \leq {\mathcal E}\|\textbf{s}\|^{2} < \infty\). Therefore \(\textbf{s}^{\prime}: \varOmega \rightarrow {\mathcal H}\) is a second-order \({\mathcal H}\)-valued random variable, and \(\|\textbf{s}^{\prime}\|\) is a second-order scalar random variable.

Since \(\textbf{s}^\prime\) is a second-order \({\mathcal H}\)-valued random variable, \((\textbf{g},\textbf{s}^\prime)\) and \((\textbf{h},\textbf{s}^\prime)\) are second-order scalar random variables, for each \(\textbf{g},\textbf{h} \in {\mathcal H}\). Therefore the expectation

$$C[\textbf{g},\textbf{h}] = {\mathcal E}(\textbf{g},\textbf{s}^{\prime})(\textbf{h},\textbf{s}^\prime)$$

is defined for all \(\textbf{g},\textbf{h} \in {\mathcal H}\), and in fact

$$\left|C[\textbf{g},\textbf{h}]\right| \leq {\mathcal E}\left|(\textbf{g},\textbf{s}^{\prime})(\textbf{h},\textbf{s}^\prime)\right| \leq \left[{\mathcal E}(\textbf{g},\textbf{s}^{\prime})^{2}\right]^{1/2}\left[{\mathcal E}(\textbf{h},\textbf{s}^{\prime})^{2}\right]^{1/2} \leq \|\textbf{g}\|\,\|\textbf{h}\| \,{\mathcal E}\|\textbf{s}^{\prime}\|^{2}\,.$$
((64))

The functional C, called the covariance functional of s, is also linear in its two arguments. Thus \(C[\cdot,\cdot]\) is a bounded bilinear functional on \({\mathcal H} \times {\mathcal H}\). It follows (e.g. Rudin 1991, Theorem 12.8, p. 310) that there exists a unique bounded linear operator \({\mathcal P}: {\mathcal H} \rightarrow {\mathcal H}\), called the covariance operator of s, such that

$$C[\textbf{g},\textbf{h}] = (\textbf{g},{\mathcal P}\textbf{h})$$

for each \(\textbf{g},\textbf{h} \in {\mathcal H}\). The covariance operator \({\mathcal P}\) is self-adjoint, i.e., \(({\mathcal P}\textbf{g},\textbf{h}) = (\textbf{g},{\mathcal P}\textbf{h})\) for all \(\textbf{g},\textbf{h} \in {\mathcal H}\), since the covariance functional is symmetric, \(C[\textbf{h},\textbf{g}] = C[\textbf{g},\textbf{h}]\) for all \(\textbf{g},\textbf{h} \in {\mathcal H}\). The covariance operator is also positive semidefinite, i.e., \((\textbf{h},{\mathcal P}\textbf{h}) \geq 0\) for all \(\textbf{h} \in {\mathcal H}\), since

$$(\textbf{h},{\mathcal P}\textbf{h}) = C[\textbf{h},\textbf{h}] = {\mathcal E}(\textbf{h},\textbf{s}^\prime)^{2} \geq 0$$

for all \(\textbf{h} \in {\mathcal H}\).

Now consider the second-order scalar random variable \(\|\textbf{s}^{\prime}\|\). By Parseval’s relation,

$$||\textbf{s}^{\prime}(\omega)||^{2} = \sum_{i=1}^{\infty} (\textbf{h}_{i},\textbf{s}^{\prime}(\omega))^{2}$$

for all \(\omega \in \varOmega\), and therefore

$${\mathcal E}||\textbf{s}^{\prime}||^{2} = \sum_{i=1}^{\infty}{\mathcal E}(\textbf{h}_{i},\textbf{s}^{\prime})^{2}\,,$$

because \(\{(\textbf{h}_{i},\textbf{s}^{\prime})^{2}\}_{i=1}^{\infty}\) is a sequence of non-negative random variables. Furthermore,

$${\mathcal E}(\textbf{h}_{i},\textbf{s}^{\prime})^{2} = (\textbf{h}_{i},{\mathcal P}\textbf{h}_{i})$$
((65))

for \(i = 1, 2, \ldots\), by definition of the covariance operator \({\mathcal P}\), and therefore

$${\mathcal E}||\textbf{s}^{\prime}||^{2} = \sum_{i=1}^{\infty} (\textbf{h}_{i},{\mathcal P}\textbf{h}_{i})\,.$$

The summation on the right-hand side, called the trace of \({\mathcal P}\) and written tr \({\mathcal P}\), is independent of the choice of orthonormal basis \(\{\textbf{h}_{i}\}_{i=1}^{\infty}\) for \({\mathcal H}\), for any positive, semidefinite bounded linear operator from \({\mathcal H}\) into \({\mathcal H}\) (e.g. Reed and Simon 1972, Theorem VI.18, p. 206). Thus

$$\textrm{tr}\,{\mathcal P} = \sum_{i=1}^{\infty} (\textbf{h}_{i},{\mathcal P}\textbf{h}_{i}) = {\mathcal E}||\textbf{s}^{\prime}||^{2} < \infty\,,$$

and Eq. (63) can be written as

$${\mathcal E}\|\textbf{s}\|^{2} = \|\overline{\textbf{s}}\|^{2} + \textrm{tr}\,{\mathcal P}\,,$$
((66))

which is a generalization of Eq. (80) to second-order \({\mathcal H}\)-valued random variables.

Since tr \({\mathcal P} < \infty\), \({\mathcal P}\) is a trace class operator, and therefore also a compact operator (e.g. Reed and Simon 1972, Theorem VI.21, p. 209). Since \({\mathcal P}\) is self-adjoint in addition to being compact, it follows from the Hilbert-Schmidt theorem (e.g. Reed and Simon 1972, Theorem VI.16, p. 203) that there exists an orthonormal basis for \({\mathcal H}\) which consists of eigenvectors \(\{\tilde{\textbf{h}}_{i}\}_{i=1}^{\infty}\) of \({\mathcal P}\),

$${\mathcal P}\tilde{\textbf{h}}_{i} = \lambda_{i}\tilde{\textbf{h}}_{i}$$

for \(i = 1, 2, \ldots\), where the corresponding eigenvalues \(\lambda_{i} = (\tilde{\textbf{h}}_{i},{\mathcal P}\tilde{\textbf{h}}_{i})\) for \(i = 1, 2, \ldots\) are all real numbers and satisfy \(\lambda_{i} \rightarrow 0\) as \(i \rightarrow \infty\). In fact, the eigenvalues are all non-negative since \({\mathcal P}\) is positive semidefinite, and therefore \(\lambda_{i} = \|{\mathcal P}\tilde{\textbf{h}}_{i}\|\) for \(i = 1, 2, \ldots\). Further, it follows from Eq. (65) that

$$\lambda_{i} = (\tilde{\textbf{h}}_{i},{\mathcal P}\tilde{\textbf{h}}_{i}) = {\mathcal E}(\tilde{\textbf{h}}_{i},\textbf{s}^{\prime})^{2} = \sigma_{i}^{2}\,,$$

where \(\sigma_{i}^{2}\) is the variance of the scalar random variable \((\tilde{\textbf{h}}_{i},\textbf{s})\), for \(i = 1, 2, \ldots\). By the definition of tr \({\mathcal P}\),

$${\mathcal E}||\textbf{s}^{\prime}||^{2} = \textrm{tr}\,{\mathcal P} = \sum_{i=1}^{\infty} (\tilde{\textbf{h}}_{i},{\mathcal P}\tilde{\textbf{h}}_{i}) = \sum_{i=1}^{\infty} \lambda_{i} < \infty \,.$$

Thus the eigenvalues \(\{\lambda_{i}\}_{i=1}^{\infty}\) of \({\mathcal P}\) are the variances \(\{\sigma_{i}^{2}\}_{i=1}^{\infty}\) and have finite sum tr \({\mathcal P}\). Equation (66) can then be rewritten as

$${\mathcal E}\|\textbf{s}\|^{2} = \|\overline{\textbf{s}}\|^{2} + \sum_{i=1}^{\infty} \sigma_{i}^{2}\,,$$
((67))

which is another generalization of Eq. (80).

Since every \(\textbf{h} \in {\mathcal H}\) has the representation \(\textbf{h} = \sum_{i=1}^{\infty} (\textbf{h}_{i},\textbf{h}) \textbf{h}_{i}\), and since \({\mathcal P}\textbf{h} \in {\mathcal H}\) for every \(\textbf{h} \in {\mathcal H}\), taking \(\textbf{h}_{i} = \tilde{\textbf{h}}_{i}\) and using the fact that

$$(\tilde{\textbf{h}}_{i},{\mathcal P}\textbf{h}) = ({\mathcal P}\tilde{\textbf{h}}_{i},\textbf{h}) = \lambda_{i}(\tilde{\textbf{h}}_{i},\textbf{h}) = \sigma_{i}^{2}(\tilde{\textbf{h}}_{i},\textbf{h})$$

for \(i = 1, 2, \ldots\), gives the following representation for \({\mathcal P}\):

$${\mathcal P}\textbf{h} = \sum_{i=1}^{\infty} \sigma_{i}^{2}(\tilde{\textbf{h}}_{i},\textbf{h})\tilde{\textbf{h}}_{i}$$
((68))

for every \(\textbf{h} \in {\mathcal H}\). Thus the expectation \({\mathcal E}(\textbf{g},\textbf{s}^{\prime})(\textbf{h},\textbf{s}^\prime)\) is given by the convergent series

$${\mathcal E}(\textbf{g},\textbf{s}^{\prime})(\textbf{h},\textbf{s}^\prime) = C[\textbf{g},\textbf{h}] = (\textbf{g},{\mathcal P}\textbf{h}) = \sum_{i=1}^{\infty} \sigma_{i}^{2}(\tilde{\textbf{h}}_{i},\textbf{g})(\tilde{\textbf{h}}_{i},\textbf{h})\,,$$

for every \(\textbf{g},\textbf{h} \in {\mathcal H}\).

Finally, since \({\mathcal P}\) is a positive semidefinite bounded linear operator from \({\mathcal H}\) into \({\mathcal H}\), there exists a unique positive semidefinite bounded linear operator \({\mathcal P}^{1/2}: {\mathcal H} \rightarrow {\mathcal H}\), called the square root of \({\mathcal P}\), that satisfies \(\left({\mathcal P}^{1/2}\right)^{2} = {\mathcal P}\) (e.g. Reed and Simon 1972, Theorem VI.9, p. 196). Since \({\mathcal P}\) is also self-adjoint and trace class, \({\mathcal P}^{1/2}\) is self-adjoint and Hilbert-Schmidt (e.g. Reed and Simon 1972, p. 210), with the same eigenvectors as \({\mathcal P}\) and with eigenvalues that are the non-negative square roots of the corresponding eigenvalues of \({\mathcal P}\). That is,

$${\mathcal P}^{1/2}\tilde{\textbf{h}}_{i} = \sigma_{i}\tilde{\textbf{h}}_{i}\,,$$

where \(\sigma_{i} = \lambda_{i}^{1/2} = [{\mathcal E}(\tilde{\textbf{h}}_{i},\textbf{s}^{\prime})^{2}]^{1/2}\), for \(i = 1, 2, \ldots\). Therefore \(\sigma_{i} = (\tilde{\textbf{h}}_{i},{\mathcal P}^{1/2}\tilde{\textbf{h}}_{i}) = \|{\mathcal P}^{1/2}\tilde{\textbf{h}}_{i}\|\) for \(i = 1, 2, \ldots\), and \({\mathcal P}^{1/2}\) has the representation

$${\mathcal P}^{1/2}\textbf{h} = \sum_{i=1}^{\infty} \sigma_{i}(\tilde{\textbf{h}}_{i},\textbf{h})\tilde{\textbf{h}}_{i}$$
((69))

for every \(\textbf{h} \in {\mathcal H}\).

1.4 Construction of Second-Order \({\mathcal H}\)-Valued Random Variables

It will now be shown how essentially all second-order \({\mathcal H}\)-valued random variables can be constructed. This will be accomplished by first reconsidering, in a suggestive notation, the defining properties of every second-order \({\mathcal H}\)-valued random variable. The construction given here is by Itô’s regularization theorem (Itô 1984, Theorem 2.3.3, p. 27; Kallianpur and Xiong 1995, Theorem 3.1.2, p. 87) applied to \({\mathcal H}\), and amounts to formalizing on \({\mathcal H}\) the usual construction of infinite-dimensional random variables through random Fourier series.

For the moment, fix a second-order \({\mathcal H}\)-valued random variable s, and consider the behaviour of

$$s[\textbf{h}] = (\textbf{h},\textbf{s})$$

as a functional of \(\textbf{h} \in {\mathcal H}\), that is, as h varies throughout \({\mathcal H}\). The functional s[·] has three important properties. First, on evaluation at any \(\textbf{h} \in {\mathcal H}\), it is a scalar random variable, with

$$s[\textbf{h}](\omega) = (\textbf{h},\textbf{s}(\omega))$$

for each \(\omega \in \varOmega\), since \(\textbf{s}: \varOmega \rightarrow {\mathcal H}\) is an \({\mathcal H}\)-valued random variable. Thus s[·] is a map from \({\mathcal H}\) into the set of scalar random variables on \((\varOmega,{\mathcal F},P)\). Second, this map is linear,

$$s[\alpha \textbf{g} + \beta \textbf{h}] = \alpha s[\textbf{g}] + \beta s[\textbf{h}]$$

for all \(\textbf{g},\textbf{h} \in {\mathcal H}\) and \(\alpha, \beta \in \mathbb{R}\), by linearity of the inner product. Third, according to Eq. (62),

$$({\mathcal E}s^{2}[\textbf{h}])^{1/2} \leq \gamma \|\textbf{h}\|\,,$$
((70))

where

$$\gamma = ({\mathcal E}||\textbf{s}||^{2})^{1/2} < \infty\,,$$

since the \({\mathcal H}\)-valued random variable s is second-order. Thus s[·] is a linear map from \({\mathcal H}\) into the set of second-order scalar random variables on \((\varOmega,{\mathcal F},P)\).

Now recall the space \(L^{2}(\varOmega,{\mathcal F},P)\), whose elements are the equivalence classes of second-order scalar random variables, where two scalar random variables are called equivalent if they are equal wp1 (with probability one; see Appendix 3c). The space \(L^{2}(\varOmega,{\mathcal F},P)\) is a Hilbert space, with the inner product of any two elements \(\tilde{r}, \tilde{s} \in L^{2}(\varOmega,{\mathcal F},P)\) given by \({\mathcal E}\tilde{r}\tilde{s}\) and the corresponding norm of any element \(\tilde{s} \in L^{2}(\varOmega,{\mathcal F},P)\) given by \(({\mathcal E}\tilde{s}^{2})^{1/2}\). The inequality given by Eq. (70) states that the functional s[·] is bounded, when viewed as a map from \({\mathcal H}\) into \(L^{2}(\varOmega,{\mathcal F},P)\).

A map s[·] from \({\mathcal H}\) into the set of scalar random variables on \((\varOmega,{\mathcal F},P)\), which is linear in the sense that if \(\textbf{g},\textbf{h} \in {\mathcal H}\) and \(\alpha,\beta \in \mathbb{R}\) then

$$s[\alpha \textbf{g} + \beta \textbf{h}] = \alpha s[\textbf{g}] + \beta s[\textbf{h}]\,\textrm{wp1}\,,$$

is called a random linear functional (e.g. Itô 1984, p. 22; Omatu and Seinfeld 1989, p. 48). Observe that the set of \(\omega\in \varOmega\) of probability measure zero where linearity fails to hold can depend on \(\alpha,\beta,\textbf{g}\) and h. If linearity holds for all \(\omega \in \varOmega\), for all \(\textbf{g},\textbf{h} \in {\mathcal H}\) and \(\alpha,\beta \in \mathbb{R}\), then the random linear functional is called perfect. If s[·] is a random linear functional and there is a constant \(\gamma \in {\mathbb R}\) such that Eq. (70) holds for all \(\textbf{h} \in {\mathcal H}\), then the random linear functional is called second-order. Thus, given any particular \({\mathcal H}\)-valued random variable s, the map s[·] defined for all \(\textbf{h} \in {\mathcal H}\) by \(s[\textbf{h}] = (\textbf{h},\textbf{s})\) is a perfect random linear functional, and if s is second-order then so is s[·].

Now it will be shown that a random linear functional s[·] is second-order if, and only if,

$$\sum_{i=1}^{\infty} {\mathcal E}s^{2}[\textbf{h}_{i}] < \infty\,.$$
((71))

In particular, a collection \(\{s_{i}\}_{i=1}^{\infty}\) of scalar random variables with \(\sum_{i=1}^{\infty} {\mathcal E}s_{i}^{2} < \infty\) can be used to define a second-order random linear functional, by setting \(s[\textbf{h}_{i}] = s_{i}\) for \(i=1,2,\ldots\). It will then be shown how to construct, from any given second-order random linear functional s[·], a second-order \({\mathcal H}\)-valued random variable s such that, for all \(\textbf{h} \in {\mathcal H}\),

$$(\textbf{h},\textbf{s}) = s[\textbf{h}]\,\textrm{wp1} \,.$$

Such an \({\mathcal H}\)-valued random variable s is called a regularized version of the random linear functional s[·] (Itô 1984, Definition 2.3.2, p. 23).

Let s[·] be a second-order random linear functional. Given any \(\textbf{h} \in {\mathcal H}\) and positive integer n, it follows from the linearity of s[·] that

$$s\left[\sum_{i=1}^{n}(\textbf{h}_{i},\textbf{h})\textbf{h}_{i}\right] = \sum_{i=1}^{n}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\,\textrm{wp1}\,,$$

where the set of probability measure zero on which equality does not hold may depend on h and on the orthonormal basis elements \(\{\textbf{h}_{i}\}_{i=1}^{n}\). By the boundedness of s[·] it follows that

$${\mathcal E}\left(\sum_{i=1}^{n}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\right)^{2} \leq \gamma^{2}\left\|\sum_{i=1}^{n}(\textbf{h}_{i},\textbf{h})\textbf{h}_{i}\right\|^{2}\,,$$

for some constant \(\gamma \in {\mathbb R}\) which is independent of h and n. Taking the limit as \(n \rightarrow \infty\) gives

$${\mathcal E}\left(\sum_{i=1}^{\infty}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\right)^{2} \leq \gamma^{2}||\textbf{h}||^{2} < \infty\,,$$

for all \(\textbf{h} \in {\mathcal H}\). Thus the series \(\sum_{i=1}^{\infty}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\) converges in \(L^{2}(\varOmega,{\mathcal F},P)\), i.e., there exists a unique element \(\tilde{s}[\textbf{h}] \in L^{2}(\varOmega,{\mathcal F},P)\) such that

$$\lim_{n \rightarrow \infty} {\mathcal E}\left(\tilde{s}[\textbf{h}] - \sum_{i=1}^{n}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\right)^{2} = 0\,,$$

for all \(\textbf{h} \in {\mathcal H}\). Equivalently, since a series converges in a Hilbert space if, and only if, it converges in norm,

$$\sum_{i=1}^{\infty} \left( {\mathcal E} \left\{(\textbf{h}_{i},\textbf{h}) s[\textbf{h}_{i}] \right\}^{2} \right)^{1/2} = \sum_{i=1}^{\infty} \left| (\textbf{h}_{i},\textbf{h}) \right| \left( {\mathcal E}s^{2}[\textbf{h}_{i}] \right)^{1/2} < \infty\,,$$

for all \(\textbf{h} \in {\mathcal H}\). By the Riesz representation theorem applied to the Hilbert space of square-summable sequences of real numbers, and since

$$\sum_{i=1}^{\infty} (\textbf{h}_{i},\textbf{h})^{2} = ||\textbf{h}||^{2}$$

by Parseval’s relation, the series \(\sum_{i=1}^{\infty}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\) therefore converges in \(L^{2} (\varOmega,{\mathcal F},\) \(P)\), for all \(\textbf{h} \in {\mathcal H}\), if, and only if, Eq. (71) holds, in which case

$$\sum_{i=1}^{\infty} \left| (\textbf{h}_{i},\textbf{h}) \right| \left( {\mathcal E}s^{2}[\textbf{h}_{i}] \right)^{1/2} \leq ||\textbf{h}|| \left[ \sum_{i=1}^{\infty}{\mathcal E}s^{2}[\textbf{h}_{i}]\right]^{1/2} < \infty\,,$$

by the Schwarz inequality. Thus, if s[·] is a second-order random linear functional, then Eq. (71) holds, for every orthonormal basis \(\{\textbf{h}_{i}\}_{i=1}^{\infty}\) of \({\mathcal H}\).

Conversely, suppose that Eq. (71) holds for a random linear functional s[·], for some orthonormal basis \(\{\textbf{h}_{i}\}_{i=1}^{\infty}\) of \({\mathcal H}\). Since every \(\textbf{h} \in {\mathcal H}\) has the representation \(\textbf{h} = \sum_{i=1}^{\infty}(\textbf{h}_{i},\textbf{h})\textbf{h}_{i}\), it follows from the linearity of s[·] that if \(\textbf{h} \in {\mathcal H}\) then

$$s[\textbf{h}] = \sum_{i=1}^{\infty}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\,\textrm{wp1}\,,$$

and therefore

$$s^{2}[\textbf{h}] \leq ||\textbf{h}||^{2}\sum_{i=1}^{\infty}s^{2}[\textbf{h}_{i}]\,\textrm{wp}1\,,$$

by the Schwarz inequality and Parseval’s relation. Thus

$${\mathcal E}s^{2}[\textbf{h}] \leq ||\textbf{h}||^{2}\sum_{i=1}^{\infty}{\mathcal E}s^{2}[\textbf{h}_{i}]\,,$$

for every \(\textbf{h} \in {\mathcal H}\), i.e., Eq. (70) holds with

$$\gamma^2 = \sum_{i=1}^{\infty}{\mathcal E}s^{2}[\textbf{h}_{i}] < \infty\,,$$

by Eq. (71), and therefore s[·] is a second-order random linear functional. Furthermore, since s[·] is a second-order random linear functional, Eq. (71) holds for every orthonormal basis \(\{\textbf{h}_{i}\}_{i=1}^{\infty}\) of \({\mathcal H}\).

Now let s[·] be a given second-order random linear functional. Since

$${\mathcal E}\sum_{i=1}^{\infty}s^{2}[\textbf{h}_{i}] = \sum_{i=1}^{\infty}{\mathcal E}s^{2}[\textbf{h}_{i}] < \infty\,,$$

the sum \(\sum_{i=1}^{\infty}s^{2}[\textbf{h}_{i}]\) must be finite wp1, i.e., if

$$E = \left\{\omega \in \varOmega : \sum_{i=1}^{\infty}s^{2}[\textbf{h}_{i}](\omega) < \infty\right\}$$

then \(E \in {\mathcal F}\) and P(E) = 1, where the set E may depend on \(\{\textbf{h}_{i}\}_{i=1}^{\infty}\). Define \(\textbf{s}(\omega)\) for each \(\omega \in \varOmega\) by

$$\textbf{s}(\omega) = \left\{ \begin{array}{cl} \sum_{i=1}^{\infty}\textbf{h}_{i}s[\textbf{h}_{i}](\omega) & \textrm{if}\ \omega \in E \\ 0 & \textrm{if}\ \omega \not\in E \end{array} \right.\,. $$

By Parseval’s relation it follows that

$$||\textbf{s}(\omega)||^{2} = \left\{ \begin{array}{cl} \sum_{i=1}^{\infty}s^{2}[\textbf{h}_{i}](\omega) & \textrm{if}\ \omega \in E \\ 0 & \textrm{if}\ \omega \not\in E \end{array} \right.\,, $$

and therefore \(\|\textbf{s}(\omega)\|^{2} < \infty\) for all \(\omega \in \varOmega\). Thus s is a map from Ω into \({\mathcal H}\), and for any \(\textbf{h} \in {\mathcal H}\),

$$(\textbf{h},\textbf{s}(\omega)) = \sum_{i=1}^{\infty}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}](\omega)$$
((72))

for each \(\omega \in E\). Now, if \(\textbf{h} \in {\mathcal H}\) then

$$s[\textbf{h}] = \sum_{i=1}^{\infty}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\,\textrm{wp}1\,,$$

and so there is a set \(E_\textbf{h} \in {\mathcal F}\) with \(P(E_\textbf{h}) = 1\), that may depend on \(\{\textbf{h}_{i}\}_{i=1}^{\infty}\) as well as on \(\textbf{h}\), such that

$$s[\textbf{h}](\omega) = \sum_{i=1}^{\infty}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}](\omega)$$

for each \(\omega \in E_\textbf{h}\). Therefore, for all \(\textbf{h} \in {\mathcal H}\),

$$(\textbf{h},\textbf{s}(\omega)) = s[\textbf{h}](\omega)$$

for each \(\omega \in E \cap E_\textbf{h}\), and \(P(E \cap E_\textbf{h}) = 1\). Since the probability space \((\varOmega,{\mathcal F},P)\) was assumed to be complete, and since s[h] is a scalar random variable for each \(\textbf{h} \in {\mathcal H}\), it follows that (h,s) is a scalar random variable for each \(\textbf{h} \in {\mathcal H}\). Therefore the map \(\textbf{s}: \varOmega \rightarrow {\mathcal H}\) is an \({\mathcal H}\)-valued random variable. Since

$${\mathcal E}||\textbf{s}||^{2} = {\mathcal E}\sum_{i=1}^{\infty}s^{2}[\textbf{h}_{i}] = \sum_{i=1}^{\infty}{\mathcal E}s^{2}[\textbf{h}_{i}] < \infty\,,$$

s is a second-order \({\mathcal H}\)-valued random variable. Since s[·] is bounded as a map from \({\mathcal H}\) into \(L^{2}(\varOmega,{\mathcal F},P)\),

$$\lim_{n \rightarrow \infty} {\mathcal E}\left(s[\textbf{h}] - \sum_{i=1}^{n}(\textbf{h}_{i},\textbf{h})s[\textbf{h}_{i}]\right)^{2} = 0$$

for all \(\textbf{h} \in {\mathcal H}\), and since Eq. (72) holds for all \(\textbf{h} \in {\mathcal H}\) and \(\omega \in E\), it follows that

$${\mathcal E}\left(s[\textbf{h}] - (\textbf{h},\textbf{s})\right)^{2} = 0$$

for all \(\textbf{h} \in {\mathcal H}\). Therefore, for all \(\textbf{h} \in {\mathcal H}\), \((\textbf{h},\textbf{s}) = s[\textbf{h}] \textrm{wp1}\).

Appendix 2: The Hilbert Spaces \(\varPhi_{p}\)

Let \({\mathcal H}\) be a real, separable Hilbert space, with inner product and corresponding norm denoted by (·,·) and \(\|\cdot\|\), respectively. Denote by \({\mathcal B}({\mathcal H})\) the Borel field generated by the open sets in \({\mathcal H}\). For convenience it will be assumed in this appendix that \({\mathcal H}\) is infinite-dimensional.

Appendix 2a uses a self-adjoint linear operator on \({\mathcal H}\) to construct a special family of Hilbert spaces \(\{\varPhi_{p}, p \geq 0\}\). The inner product and corresponding norm on \(\varPhi_{p}\) are denoted by \((\cdot,\cdot)_{p}\) and \(\|\cdot\|_{p}\), respectively, for each \(p \geq 0\). These Hilbert spaces have the following properties: (i) \(\varPhi_{0} = {\mathcal H}\); (ii) for each \(p > 0\), \(\varPhi_{p} \subset {\mathcal H}\), and therefore \(\varPhi_{p}\) is real and separable; (iii) for each \(p > 0\), \(\varPhi_{p}\) is dense in \({\mathcal H}\), and therefore \(\varPhi_{p}\) is infinite-dimensional; and (iv) if \(0 \leq q \leq r\), then \(\|\textbf{h}\| = \|\textbf{h}\|_{0} \leq \|\textbf{h}\|_{q} \leq \|\textbf{h}\|_{r}\) for all \(\textbf{h} \in \varPhi_{r}\), and therefore \({\mathcal H} = \varPhi_{0} \supset \varPhi_{q} \supset \varPhi_{r}\). In view of property (iv), the family \(\{\varPhi_{p}, p \geq 0\}\) is called a decreasing family of Hilbert spaces. The construction given here follows closely that of Kallianpur and Xiong (1995, Example 1.3.2, pp. 40–42). For various concrete examples and classical applications of decreasing families of Hilbert spaces constructed in this way, see Reed and Simon (1972, pp. 141–145), Itô (1984, pp. 1–12), Kallianpur and Xiong (1995, pp. 29–40), and Lax (2006, pp. 61–67).

Appendix 2b discusses the spaces \(\varPhi_{p}\) in case \({\mathcal H} = L^{2}(S)\), the space of square-integrable vector or scalar fields on the sphere S, when the operator L used in the construction of the spaces \(\varPhi_{p}\) is taken to be \(\textbf{L} = -\Delta\), where Δ is the Laplacian operator on the sphere.

2.1 Construction of the Hilbert Spaces \(\varPhi_{p}\)

Let L be a densely defined, positive semidefinite, self-adjoint linear operator on \({\mathcal H}\), and let I denote the identity operator on \({\mathcal H}\). It follows from elementary arguments (e.g. Riesz and Sz.-Nagy 1955, p. 324) that the inverse operator \((\textbf{I}+\textbf{L})^{-1}\) is a bounded, positive semidefinite, self-adjoint linear operator defined on all of \({\mathcal H}\), in fact with

$$||(\textbf{I}+\textbf{L})^{-1}\textbf{h}|| \leq ||\textbf{h}||$$

for all \(\textbf{h} \in {\mathcal H}\). Assume that some power \(p_1 > 0\) of \((\textbf{I}+\textbf{L})^{-1}\) is a compact operator on \({\mathcal H}\). Then it follows from the Hilbert-Schmidt theorem (e.g. Reed and Simon 1972, Theorem VI.16, p. 203) that there exists a countable orthonormal basis for \({\mathcal H}\) which consists of eigenvectors \(\{\textbf{g}_{i}\}_{i=1}^{\infty}\) of \((\textbf{I}+\textbf{L})^{-p_{1}}\),

$$(\textbf{I}+\textbf{L})^{-p_{1}}\textbf{g}_{i} = \mu_{i}\textbf{g}_{i}$$

for \(i = 1, 2, \ldots\), where the corresponding eigenvalues \(\{\mu_{i}\}_{i=1}^{\infty}\) satisfy \(1 \geq \mu_{1} \geq \mu_{2} \geq \cdots\), with \(\mu_{i} \rightarrow 0\) as \(i \rightarrow \infty\). Moreover, \(\mu_{i} > 0\) for \(i = 1, 2, \ldots\), for suppose otherwise. Then there is a first zero eigenvalue, call it \(\mu_{M+1}\), since the eigenvalues decrease monotonically toward zero. Therefore \((\textbf{I}+\textbf{L})^{-p_{1}}\) has finite rank M, hence \(\textbf{I}+\textbf{L}\) is defined everywhere in \({\mathcal H}\) and also has rank M. But \(\textrm{rank}\,(\textbf{I}+\textbf{L}) \geq \textrm{rank}\,\textbf{I} = \infty\) since \(\textbf{L}\) is positive semidefinite and \({\mathcal H}\) was assumed infinite-dimensional, a contradiction.

Now define \(\{\lambda_{i}\}_{i=1}^{\infty}\) by \((1+\lambda_{i})^{-p_{1}} = \mu_{i}\). Then \(0 \leq \lambda_{1} \leq \lambda_{2} \leq \cdots\), with \(\lambda_{i} \rightarrow \infty\) as \(i \rightarrow \infty\), and \(\lambda_{i} < \infty\) for \(i = 1, 2, \ldots\) since \(\mu_{i} > 0\) for \(i = 1, 2, \ldots\). Since the function \(\lambda(\mu) = \mu^{-1/p_{1}} - 1\) is measurable and finite for \(\mu \in (0,1]\), it follows from the functional calculus for self-adjoint operators (e.g. Riesz and Sz.-Nagy 1955, pp. 343–346; Reed and Simon 1972, pp. 259–264) that

$$\textbf{L}\textbf{g}_{i} = \lambda_{i}\textbf{g}_{i}$$

for \(i = 1, 2, \ldots\), and similarly for all \(p \geq 0\) that

$$(\textbf{I}+\textbf{L})^{p}\textbf{g}_{i} = (1 + \lambda_{i})^{p}\textbf{g}_{i}$$

for \(i = 1, 2, \ldots\), with \((\textbf{I}+\textbf{L})^{p}\) densely defined and self-adjoint in \({\mathcal H}\) for all \(p \geq 0\).

For each \(p \geq 0\), denote by \(\varPhi_{p}\) the domain of definition of \((\textbf{I}+\textbf{L})^{p}\), i.e.,

$$\varPhi_{p} = \{\textbf{h} \in {\mathcal H}: ||(\textbf{I}+\textbf{L})^{p}\textbf{h}|| < \infty\}\,.$$

In particular, \(\varPhi_{0} = {\mathcal H}\). Now

$$\begin{aligned} ||(\textbf{I}+\textbf{L})^{p}\textbf{h}||^{2} &{}= \sum_{i=1}^{\infty} ((\textbf{I}+\textbf{L})^{p}\textbf{h},\textbf{g}_{i})^{2} = \sum_{i=1}^{\infty} (\textbf{h},(\textbf{I}+\textbf{L})^{p}\textbf{g}_{i})^{2}\\ &{}= \sum_{i=1}^{\infty} (\textbf{h},(1 + \lambda_{i})^{p}\textbf{g}_{i})^{2} = \sum_{i=1}^{\infty} (1 + \lambda_{i})^{2p} (\textbf{h},\textbf{g}_{i})^{2} \end{aligned}$$

for each \(p \geq 0\), where the first equality is Parseval’s relation and the second one is due to the fact that \((\textbf{I}+\textbf{L})^{p}\) is self-adjoint. Thus for each \(p \geq 0\), \(\varPhi_{p}\) is given explicitly by

$$\varPhi_{p} = \left\{\textbf{h} \in {\mathcal H}: \sum_{i=1}^{\infty} (1 + \lambda_{i})^{2p} (\textbf{h},\textbf{g}_{i})^{2} < \infty \right\}\,.$$

Using this formula, it can be checked that for each \(p \geq 0\), \(\varPhi_{p}\) is an inner product space, with inner product \((\cdot,\cdot)_{p}\) defined by

$$(\textbf{g},\textbf{h})_{p} = \sum_{i=1}^{\infty} (1 + \lambda_{i})^{2p} (\textbf{g},\textbf{g}_{i}) (\textbf{h},\textbf{g}_{i}) = ((\textbf{I}+\textbf{L})^{p}\textbf{g},(\textbf{I}+\textbf{L})^{p}\textbf{h})$$

for all \(\textbf{g},\textbf{h} \in \varPhi_{p}\), and corresponding norm \(\|\cdot\|_{p}\) defined by

$$||\textbf{h}\|_{p}^{2} = (\textbf{h},\textbf{h})_{p} = ||(\textbf{I}+\textbf{L})^{p}\textbf{h}||^{2}$$

for all \(\textbf{h} \in \varPhi_{p}\). It follows also that if \(0 \leq q \leq r\), then \(\|\textbf{h}\| = \|\textbf{h}\|_{0} \leq \|\textbf{h}\|_{q} \leq \|\textbf{h}\|_{r}\) for all \(\textbf{h} \in \varPhi_{r}\), and therefore that \(\varPhi_{r} \subset \varPhi_{q} \subset {\mathcal H}\).

Each inner product space \(\varPhi_{p}\), \(p > 0\), is in fact a Hilbert space, i.e., is already complete in the norm \(\|\cdot\|_{p}\). To see this, suppose that \(\{\textbf{h}_{n}\}_{n=1}^{\infty}\) is a Cauchy sequence in \(\varPhi_{p}\) for some fixed \(p > 0\), i.e. that \(\|\textbf{h}_{n}-\textbf{h}_{m}\|_{p} \rightarrow 0\) as \(n,m \rightarrow \infty\). Since \(\|\textbf{h}_{n}-\textbf{h}_{m}\| \leq \|\textbf{h}_{n}-\textbf{h}_{m}\|_{p}\) for all \(n,m \geq 1\), it follows that \(\{\textbf{h}_{n}\}_{n=1}^{\infty}\) is also a Cauchy sequence in \({\mathcal H}\), and since \({\mathcal H}\) is complete, the sequence converges to a unique element \(\textbf{h}_{\infty} \in {\mathcal H}\). It remains to show that in fact \(\textbf{h}_{n} \rightarrow \textbf{h}_{\infty} \in \varPhi_{p}\) as \(n \rightarrow \infty\).

Now

$$||\textbf{h}_{n}-\textbf{h}_{m}\|_{p}^{2} = ||(\textbf{I}+\textbf{L})^{p}(\textbf{h}_{n}-\textbf{h}_{m})||^{2} = \sum_{i=1}^{\infty} (1 + \lambda_{i})^{2p} (\textbf{h}_{n}-\textbf{h}_{m},\textbf{g}_{i})^{2}\,.$$

Thus, that \(\{\textbf{h}_{n}\}_{n=1}^{\infty}\) is a Cauchy sequence in \(\varPhi_{p}\) means that, given any \(\varepsilon > 0\), there exists an \(M = M(\varepsilon)\) such that, for all \(n,m \geq M\),

$$\sum_{i=1}^{I} (1 + \lambda_{i})^{2p} (\textbf{h}_{n}-\textbf{h}_{m},\textbf{g}_{i})^{2} < \varepsilon$$

for any \(I \geq 1\). But for each \(i = 1,2,\ldots\),

$$|(\textbf{h}_{m}-\textbf{h}_{\infty},\textbf{g}_{i})| \leq ||\textbf{h}_{m}-\textbf{h}_{\infty}||\,||\textbf{g}_{i}|| = ||\textbf{h}_{m}-\textbf{h}_{\infty}|| \rightarrow 0\ \textrm{as }\ m \rightarrow \infty\,,$$

hence \((\textbf{h}_{m},\textbf{g}_{i}) \rightarrow (\textbf{h}_{\infty},\textbf{g}_{i})\) as \(m \rightarrow \infty\), and therefore

$$\sum_{i=1}^{I} (1 + \lambda_{i})^{2p} (\textbf{h}_{n}-\textbf{h}_{\infty},\textbf{g}_{i})^{2} < \varepsilon$$

for all \(n \geq M\) and \(I \geq 1\). Letting \(I \rightarrow \infty\) then gives

$$||\textbf{h}_{n}-\textbf{h}_{\infty}\|_{p}^{2} = \sum_{i=1}^{\infty} (1 + \lambda_{i})^{2p} (\textbf{h}_{n}-\textbf{h}_{\infty},\textbf{g}_{i})^{2} < \varepsilon$$

for all \(n \geq M\), and therefore \(\textbf{h}_{n} \rightarrow \textbf{h}_{\infty} \in \varPhi_{p}\) as \(n \rightarrow \infty\).

Thus, for each \(p > 0\), \(\varPhi_{p}\) is a Hilbert space, with inner product \((\cdot,\cdot)_{p}\) and corresponding norm \(\|\cdot\|_{p}\). It can be checked that \(\{(1 + \lambda_{i})^{-p}\textbf{g}_{i}\}_{i=1}^{\infty}\) is an orthonormal basis for \(\varPhi_{p}\), for each \(p > 0\).Footnote 8

2.2 The Case \({\mathcal H} = L^{2}(S)\ \textit{\textbf{with}}\ \textbf{L} = -\Delta\)

Now let \({\mathcal H} = L^{2}(S)\), the Hilbert space of real, Lebesgue square-integrable scalars on the unit 2-sphere S, with inner product

$$(\phi,\psi) = \int_{S} \phi (\textbf{x}) \psi (\textbf{x})\,\textrm{d} \textbf{x}$$

for all \(\phi,\psi \in L^{2}(S)\), where \(\textbf{x} = (x_{1},x_{2})\) denotes spherical coordinates on S and dx denotes the surface area element, and with corresponding norm \(\|\phi\| = (\phi,\phi)^{1/2}\) for all \(\phi \in L^{2}(S)\). Let \(\textbf{L} = -\Delta\), where Δ is the Laplacian operator on \(L^{2}(S)\). Thus L is a densely defined, positive semidefinite, self-adjoint linear operator on \(L^{2}(S)\). Denote by I the identity operator on \(L^{2}(S)\).

It will be shown first that for all \(p_{1} > 1/2\), \((I-\Delta)^{-p_{1}}\) is a Hilbert-Schmidt operator on \(L^{2}(S)\), hence a compact operator on \(L^{2}(S)\). By Appendix 2a, this allows construction of the decreasing family of Hilbert spaces \(\{\varPhi_{p} = \varPhi_{p}(S), p \geq 0\}\),

$$\varPhi_{p} = \{\phi \in L^{2}(S): ||(I-\Delta)^{p}\phi|| < \infty\}\,,$$

with inner product

$$(\phi,\psi)_{p} = ((I-\Delta)^{p}\phi,(I-\Delta)^{p}\psi)$$

for all \(\phi,\psi \in \varPhi_{p}\), and corresponding norm \(\|\phi\|_{p} = (\phi,\phi)_{p}^{1/2}\) for all \(\phi \in \varPhi_{p}\). Thus if \(\phi \in \varPhi_{p}\) and p is a positive integer or half-integer, then all partial (directional) derivatives of ϕ up to order 2p are Lebesgue square-integrable.

Second, a Sobolev-type lemma for the sphere will be established, showing that if \(\phi \in \varPhi_{1/2 + q}\) with \(q>0\), then ϕ is a bounded function on S, with bound

$$\max_{\textbf{x} \in D} |\phi (\textbf{x})|^{2} < \tfrac{1}{4\pi} \left(1 + \tfrac{1}{2q}\right) \|\phi\|_{1/2 + q}^{2}\,.$$
((73))

It follows that if \(\phi \in \varPhi_{1+q}\) with \(q>0\), then the first partial derivatives of ϕ are bounded functions on the sphere, and in particular that \(\varPhi_{1+q} \subset C^{0}(S)\), the space of continuous functions on the sphere. It will be shown that, in fact, if \(\phi \in \varPhi_{1+q}\) with \(q>0\), then ϕ is Lipschitz continuous on S. Thus, for any \(q>0\) and any non-negative integer l, \(\varPhi_{1 + l/2 + q} \subset C^{l}(S)\), the space of functions with l continuous partial derivatives on the sphere, and in fact all of the partial derivatives up to order l of a function \(\phi \in \varPhi_{1 + l/2 + q}\) are Lipschitz continuous.

These results carry over to vectors in the usual way. Thus denoting by \(L^{2}(S)\) also the Hilbert space of real, Lebesgue square-integrable n-vectors on S, the inner product is

$$(\textbf{g},\textbf{h}) = \int_{S} \textbf{g}^{T}(\textbf{x}) \textbf{h}(\textbf{x})\,\textrm{d}\textbf{x} = \sum_{i=1}^{n} (g_{i},h_{i})$$

for all \(\textbf{g},\textbf{h} \in L^{2}(S)\), and the corresponding norm is \(\|\textbf{h}\| = (\textbf{h},\textbf{h})^{1/2}\) for all \(\textbf{h} \in L^{2}(S)\). Thus for n-vectors on S, the Hilbert spaces \(\varPhi_{p}\), \(p \geq 0\), are defined by

$$\varPhi_{p} = \{\textbf{h}\in L^{2}(S): ||(I-\Delta)^{p}\textbf{h}|| < \infty\}\,,$$

with inner product

$$(\textbf{g},\textbf{h})_{p} = ((I-\Delta)^{p}\textbf{g},(I-\Delta)^{p}\textbf{h}) = \sum_{i=1}^{n} (g_{i},h_{i})_{p}$$

for all \(\textbf{g},\textbf{h} \in \varPhi_{p}\), and corresponding norm \(\|\textbf{h}\|_{p} = (\textbf{h},\textbf{h})_{p}^{1/2}\) for all \(\textbf{h} \in \varPhi_{p}\).

To establish that \((I-\Delta)^{-p}\) is a Hilbert-Schmidt operator on \(L^{2}(S)\) if \(p > 1/2\), note first that

$$\sum_{l=0}^{\infty} \frac{2l+1}{[1+l(l+1)]^{1 + 2 \varepsilon}} < 1 + \frac{1}{2\varepsilon}$$
((74))

if \(\varepsilon > 0\). To obtain this inequality, let

$$f(x) = \frac{2x+1}{[1+x(x+1)]^{1 + 2 \varepsilon}}$$

for \(x \geq 0\) and \(\varepsilon > 0\). Then f is monotone decreasing for \(x \geq 1/2\), and \(f(0) > f(1)\), and so

$$\sum_{l=0}^{\infty} \frac{2l+1}{[1+l(l+1)]^{1 + 2 \varepsilon}} = f(0) + \sum_{l=1}^{\infty} f(l) < f(0) + \int_{0}^{\infty} f(x)\,\textrm{d}x = 1 + \frac{1}{2\varepsilon}\,.$$

The sum in Eq. (74) diverges logarithmically for \(\varepsilon = 0\).

Now let \(C = (I-\Delta)^{-p}\) with \(p>0\). Thus C is a bounded operator from \(L^{2}(S)\) into \(L^{2}(S)\), with \(\|C\phi\| \leq \|\phi\|\) for all \(\phi \in L^{2}(S)\). The real and imaginary parts of the spherical harmonics \(Y_{l}^{m}\) form an orthonormal basis for \(L^{2}(S)\), and

$$\Delta Y_{l}^{m} = -l(l+1)Y_{l}^{m}$$

for \(l \geq 0\) and \(|m| \leq l\). Thus

$$C Y_{l}^{m} = \lambda_{l}^{m} Y_{l}^{m}\,,$$

with eigenvalues \(\lambda_{l}^{m} = (Y_{l}^{m},C Y_{l}^{m}) = [1+l(l+1)]^{-p}\) for \(l \geq 0\) and \(|m| \leq l\). But

$$\sum_{l=0}^{\infty} \sum_{m=-l}^{l} (\lambda_{l}^{m})^{2} = \sum_{l=0}^{\infty} \frac{2l+1}{[1+l(l+1)]^{2p}}\,,$$

and so this sum is finite for \(p > 1/2\) by Eq. (74). Hence C is Hilbert-Schmidt for \(p > 1/2\).

To establish the bound of Eq. (73), suppose that \(\phi \in \varPhi_{1/2 + q}\) with \(q>0\). Thus \((I-\Delta)^{1/2 + q}\phi \in L^{2}(S)\) and has a spherical harmonic expansion

$$(I-\Delta)^{1/2 + q}\phi = \sum_{l=0}^{\infty} \sum_{m=-l}^{l} \beta_{l}^{m} Y_{l}^{m}\,,$$

where the convergence is in \(L^{2}(S)\), with

$$\|\phi\|_{1/2 + q}^{2} = \|(I-\Delta)^{1/2 + q}\phi\|^{2} = \sum_{l=0}^{\infty} \sum_{m=-l}^{l} |\beta_{l}^{m}|^{2} < \infty\,.$$
((75))

Therefore

$$\phi = \sum_{l=0}^{\infty} \sum_{m=-l}^{l} [1+l(l+1)]^{-1/2 - q} \beta_{l}^{m} Y_{l}^{m}\,,$$
((76))

where the convergence is in \(\varPhi_{1/2 + q}\). It will be shown that this series converges absolutely, hence pointwise, so that

$$\phi (\textbf{x}) = \sum_{l=0}^{\infty} \sum_{m=-l}^{l} [1+l(l+1)]^{-1/2 - q} \beta_{l}^{m} Y_{l}^{m}(\textbf{x})$$

for each \(\textbf{x} \in S\). This will also give Eq. (73).

Now,

$$|\phi|\leq\sum_{l=0}^{\infty} [1+l(l+1)]^{-1/2 - q} \sum_{m=-l}^{l} |\beta_{l}^{m}|\,|Y_{l}^{m}|\,,$$

and so

$$|\phi|\leq\sum_{l=0}^{\infty} [1+l(l+1)]^{-1/2 - q} \left\{\sum_{m=-l}^{l} |\beta_{l}^{m}|^{2}\right\}^{1/2}\left\{ \sum_{m=-l}^{l} |Y_{l}^{m}|^{2} \right\}^{1/2}$$

by the Schwarz inequality. The spherical harmonic addition theorem says that

$$P_{l}(\cos\gamma) = \frac{4\pi}{2l+1} \sum_{m=-l}^{l} Y_{l}^{m}(\textbf{x}) \overline{Y}_{l}^{m}(\textbf{y})$$

for \(l \geq 0\), where \(P_{l}\) is the lth Legendre polynomial and γ is the angle between x and y. This implies that

$$\sum_{m=-l}^{l} |Y_{l}^{m}(\textbf{x})|^{2} = \frac{2l+1}{4\pi}$$

for all \(\textbf{x} \in S\), and so

$$|\phi|\leq\frac{1}{\sqrt{4\pi}} \sum_{l=0}^{\infty} [2l+1]^{1/2} [1+l(l+1)]^{-1/2 - q} \left\{ \sum_{m=-l}^{l} |\beta_{l}^{m}|^{2} \right\}^{1/2}\,.$$

Another application of the Schwarz inequality then gives

$$|\phi|\leq\frac{1}{\sqrt{4\pi}} \left\{ \sum_{l=0}^{\infty} [2l+1] [1+l(l+1)]^{-1-2q} \right\}^{1/2} \left\{ \sum_{l=0}^{\infty} \sum_{m=-l}^{l} |\beta_{l}^{m}|^{2} \right\}^{1/2}\,,$$

or, using Eq. (75),

$$|\phi|^{2}\leq\frac{1}{4\pi} ||\phi\|_{1/2 + q}^{2} \sum_{l=0}^{\infty} \frac{2l+1}{[1+l(l+1)]^{1 + 2q}}\,.$$

Therefore, by Eq. (74), the sum in Eq. (76) converges absolutely, and Eq. (73) holds.

Now suppose that \(\phi \in \varPhi_{1+q}\) with \(q>0\). To establish that ϕ is Lipschitz continuous on S, note first that by the previous result,

$$\phi(\textbf{x}) = \sum_{l=0}^{\infty} \sum_{m=-l}^{l} [1+l(l+1)]^{-1 - q} \beta_{l}^{m} Y_{l}^{m}(\textbf{x})$$

for each \(\textbf{x} \in S\), where

$$\|\phi\|_{1 + q}^{2} = \|(I-\Delta)^{1 + q}\phi\|^{2} = \sum_{l=0}^{\infty} \sum_{m=-l}^{l} |\beta_{l}^{m}|^{2} < \infty\,.$$
((77))

Therefore,

$$|\phi(\textbf{x}) -\phi(\textbf{y})|\leq\sum_{l=0}^{\infty} \sum_{m=-l}^{l} [1+l(l+1)]^{-1-q} |\beta_{l}^{m}|\,|Y_{l}^{m}(\textbf{x})-Y_{l}^{m}(\textbf{y})|$$

for each \(\textbf{x},\textbf{y} \in S\), and so by the Schwarz inequality,

$$|\phi(\textbf{x}) - \phi(\textbf{y})| \leq\sum_{l=0}^{\infty} [1+l(l+1)]^{-1-q} \left\{ \sum_{m=-l}^{l} |\beta_{l}^{m}|^{2} \right\}^{1/2} \left\{ \sum_{m=-l}^{l} |Y_{l}^{m}(\textbf{x})-Y_{l}^{m}(\textbf{y})|^{2} \right\}^{1/2}.$$

By the spherical harmonic addition theorem,

$$\begin{aligned}\sum_{m=-l}^{l} |Y_{l}^{m}(\textbf{x})-Y_{l}^{m}(\textbf{y})|^{2} = & \sum_{m=-l}^{l} \left[ |Y_{l}^{m}(\textbf{x})|^{2} - 2 \textrm{Re} Y_{l}^{m}(\textbf{x}) \overline{Y}_{l}^{m}(\textbf{y}) + |Y_{l}^{m}(\textbf{y})|^{2} \right]\nonumber\\ = & \frac{2l+1}{2\pi} \left[ 1 - P_{l}(\cos \gamma) \right]\,,\end{aligned}$$

where \(\gamma = \gamma (\textbf{x},\textbf{y})\) is the angle between x and y. Therefore,

$$|\phi(\textbf{x}) -\phi(\textbf{y})|\leq\sum_{l=0}^{\infty} [1+l(l+1)]^{-1-q} \left( \frac{2l+1}{2\pi} \right)^{1/2} \left[ 1 - P_{l}(\cos \gamma) \right]^{1/2} \left\{ \sum_{m=-l}^{l} |\beta_{l}^{m}|^{2} \right\}^{1/2} ,$$

and so by Eq. (77) and the Schwarz inequality,

$$|\phi (\textbf{x}) - \phi (\textbf{y})|\leq\frac{1}{\sqrt{2\pi}} \left\{ \sum_{l=0}^{\infty} [1+l(l+1)]^{-2-2q} (2l+1) [1 - P_{l}(\cos \gamma)]\right\}^{1/2} ||\phi\|_{1 + q}\,.$$

Now, \(P_{l}(1)=1\), \(P_{l}^{\prime}(1) = l(l+1)/2\), and \(P_{l}^{\prime \prime}(1) = [l(l+1)-2]P_{l}^{\prime}(1)/4 \geq 0\) for \(l \geq 0\). It follows that for γ sufficiently small,

$$1 - P_{l}(\cos\gamma)\leq(1 - \cos\gamma) P_{l}^{\prime}(1) = l(l+1) \sin^{2} \tfrac{\gamma}{2}\,,$$

and so

$$|\phi (\textbf{x}) - \phi (\textbf{y})| \leq \frac{K}{\sqrt{2\pi}} \|\phi\|_{1 + q} \left|\sin \frac{\gamma (\textbf{x},\textbf{y})}{2}\right|\,,$$
((78))

where

$$K^{2} = \sum_{l=0}^{\infty} [1+l(l+1)]^{-2-2q} (2l+1) l(l+1)\,.$$

This series converges for \(q > 0\) since the terms decay like \(l^{-1-4q}\), and Eq. (78) shows that ϕ is Lipschitz continuous.

Appendix 3: Some Basic Concepts and Definitions

This appendix summarizes background material used elsewhere in this chapter. For further treatment see, for instance, Doob (1953), Royden (1968), and Reed and Simon (1972).

3.1 Measure Spaces

Let X be a set. A collection \({\mathcal C}\) of subsets of X is called a σ-algebra, or Borel field, if (i) the empty set ∅ is in \({\mathcal C}\), (ii) for every set \(A \in {\mathcal C}\), the complement \(\tilde{A} = \{x \in X: x \not\in A\}\) of A is in \({\mathcal C}\), and (iii) for every countable collection \(\{A_{i}\}_{i=1}^{\infty}\) of sets \(A_{i} \in {\mathcal C}\), the union \(\cup_{i=1}^{\infty} A_{i}\) of the sets is in \({\mathcal C}\). Given any collection \({\mathcal A}\) of subsets of X, there is a smallest σ-algebra which contains \({\mathcal A}\), i.e., there is a σ-algebra \({\mathcal C}\) such that (i) \({\mathcal A} \subset {\mathcal C}\), and (ii) if \({\mathcal B}\) is a σ-algebra and \({\mathcal A} \subset {\mathcal B}\) then \({\mathcal C} \subset {\mathcal B}\). The smallest σ-algebra containing a given collection \({\mathcal A}\) of subsets of X is called the Borel field of X generated by \({\mathcal A}\). A measurable space is a couple \((X,{\mathcal C})\) consisting of a set X and a σ-algebra \({\mathcal C}\) of subsets of X. If \((X,{\mathcal C})\) is a measurable space and \(Y \in {\mathcal C}\), then \((Y,{\mathcal C}_{Y})\) is a measurable space, where

$${\mathcal C}_{Y} = \{A\in{\mathcal C}: A\subset Y\}\,,$$

i.e., \({\mathcal C}_{Y}\) consists of all the sets in \({\mathcal C}\) that are subsets of Y.

The set \({\mathbb{R}}^{e}\) of extended real numbers is the union of the set \(\mathbb{R}\) of real numbers and the sets \(\{\infty\}\) and \(\{-\infty\}\). Multiplication of any two extended real numbers is defined as usual, with the convention that \(0 \cdot \infty = 0\). Addition and subtraction of any two extended real numbers is also defined, except that \(\infty - \infty\) is undefined, as usual.

Let Y and Z be two sets. A function g is called a map from Y into Z, written \(g: Y \rightarrow Z\), if \(g(y)\) is defined for all \(y \in Y\) and \(g(y) \in Z\) for all \(y \in Y\). Thus a map \(g: \mathbb{R} \rightarrow \mathbb{R}\) is a real-valued function defined on all of the real line, a map \(g: Y \rightarrow \mathbb{R}\) is a real-valued function defined on all of Y, and a map \(g: Y \rightarrow {\mathbb{R}}^{e}\) is an extended real-valued function defined on all of Y.

Let \((X,{\mathcal C})\) be a measurable space. A subset A of X is called measurable if \(A \in {\mathcal C}\). A map \(g: X \rightarrow {\mathbb{R}}^{e}\) is called measurable (with respect to \({\mathcal C}\)) if

$$\{x\in X: g(x)\leq\alpha\}\in{\mathcal C}\,,$$

for every \(\alpha \in \mathbb{R}\). If \(g: X \rightarrow {\mathbb{R}}^{e}\) is measurable then \(|g|\) is measurable, and if \(h: X \rightarrow {\mathbb{R}}^{e}\) is another measurable map then gh is measurable. A measure μ on \((X,{\mathcal C})\) is a map \(\mu: {\mathcal C} \rightarrow {\mathbb{R}}^{e}\) that satisfies (i) \(\mu(A) \geq 0\) for every measurable set A, (ii) \(\mu(\emptyset) = 0\), and (iii)

$$\mu\left( \bigcup_{i=1}^{\infty} E_{i} \right) = \sum_{i=1}^{\infty}\mu(E_{i})\,,$$

for every countable collection \(\{E_{i}\}_{i=1}^{\infty}\) of disjoint measurable sets, i.e., for every countable collection of sets \(E_{i} \in {\mathcal C}\) with \(\cap_{i=1}^{\infty} E_{i} = \emptyset\). A measure space \((X,{\mathcal C},\mu)\) is a measurable space \((X,{\mathcal C})\) together with a measure μ on \((X,{\mathcal C})\).

Let \((X,{\mathcal C},\mu)\) be a measure space. A condition \(C(x)\) defined for all \(x \in X\) is said to hold almost everywhere (a.e.) (with respect to μ) if the set \(E = \{x \in X: C(x)\ \textrm{is\ false}\}\) on which it fails to hold is a measurable set of measure zero, i.e., \(E \in {\mathcal C}\) and \(\mu(E) = 0\). In particular, two maps \(g: X \rightarrow {\mathbb{R}}^{e}\) and \(h: X \rightarrow {\mathbb{R}}^{e}\) are said to be equal almost everywhere, written \(g = h\) a.e., if the subset of X on which they are not equal is a measurable set of measure zero.

A measure space \((X,{\mathcal C},\mu)\) is called complete if \({\mathcal C}\) contains all subsets of measurable sets of measure zero, i.e., if \(B \in {\mathcal C}\), \(\mu(B) = 0\), and \(A \subset B\) together imply that \(A \in {\mathcal C}\). If \((X,{\mathcal C},\mu)\) is a complete measure space and A is a subset of a measurable set of measure zero, then \(\mu(A) = 0\). If \((X,{\mathcal C},\mu)\) is a measure space then there is a complete measure space \((X,{\mathcal C}_{0},\mu_{0})\), called the completion of \((X,{\mathcal C},\mu)\), which is determined uniquely by the conditions that (i) \({\mathcal C} \subset {\mathcal C}_{0}\), (ii) if \(D \in {\mathcal C}\) then \(\mu(D) = \mu_{0}(D)\), and (iii) \(D \in {\mathcal C}_{0}\) if and only if \(D = A \cup B\) where \(B \in {\mathcal C}\) and \(A \subset C \in {\mathcal C}\) with \(\mu(C) = 0\). Thus a measure space can always be completed by enlarging its σ-algebra to include the subsets of measurable sets of measure zero and extending its measure so that the domain of definition of the extended measure includes the enlarged σ-algebra.

An open interval on the real number line \(\mathbb{R}\) is a set \((\alpha,\beta) = \{x \in \mathbb{R}: \alpha < x < \beta\}\) with \(\alpha,\beta \in {\mathbb{R}}^{e}\) and \(\alpha < \beta\). Denote by \({\mathcal B}(\mathbb{R})\) the Borel field of \(\mathbb{R}\) generated by the open intervals, and denote by \({\mathcal I}(\mathbb{R}) \subset {\mathcal B}({\mathbb{R}})\) the sets that are countable unions of disjoint open intervals. For each set \(I = \cup_{i=1}^{\infty}(\alpha_{i},\beta_{i}) \in {\mathcal I}({\mathbb{R}})\), define

$$m^{\ast}(I) = \sum_{i=1}^{\infty} (\beta_{i} - \alpha_{i})\,,$$

and for each set \(B \in {\mathcal B}({\mathbb{R}})\) define

$$m^{\ast}(B) = \inf\,m^{\ast}(I)\,,$$

where the infimum (greatest lower bound) is taken over all those \(I \in {\mathcal I}({\mathbb{R}})\) such that \(B \subset I\). Then \(m^{\ast}\) is a measure on the measurable space \((\mathbb{R},{\mathcal B}(\mathbb{R}))\). The completion of the measure space \((\mathbb{R},{\mathcal B}(\mathbb{R}),m^{\ast})\) is denoted by \((\mathbb{R},{\mathcal M},m)\). The sets in \({\mathcal M}\) are called the Lebesgue measurable sets on \(\mathbb{R}\), and m is called Lebesgue measure on \(\mathbb{R}\).

Let \((X,{\mathcal C},\mu)\) be a complete measure space, and let \(g: X \rightarrow {\mathbb{R}}^{e}\) and \(h: X \rightarrow {\mathbb{R}}^{e}\) be two maps. If g is measurable and \(g = h\) a.e., then h is measurable.

3.2 Integration

In this subsection let \((X,{\mathcal C},\mu)\) be a measure space. The characteristic function \(\chi_{A}\) of a subset A of X is the map \(\chi_{A} : X \rightarrow \{0,1\}\) defined for each \(x \in X\) by

$$\chi_{A}(x) = \left\{\begin{array}{ll} 1 & \textrm{if}\ x \in \textrm{A} \\ 0 & \textrm{if}\ x \not\in \textrm{A} \end{array} \right. . $$

A characteristic function \(\chi_{A}\) is a measurable map if, and only if, A is a measurable set. A map \(\phi : X \rightarrow {\mathbb{R}}^{e}\) is called simple if it is measurable and takes on only a finite number of values. Thus the characteristic function of a measurable set is simple, and if ϕ is simple and takes on the values \(\alpha_{1},\ldots,\alpha_{n}\) then \(\phi = \sum_{i=1}^{n} \alpha_{i}\chi_{E_{i}}\), where \(E_{i} = \{x \in X: \phi(x) = \alpha_{i}\} \in {\mathcal C}\) for \(i=1,\ldots,n\). If ϕ is simple and the values \(\alpha_{1},\ldots,\alpha_{n}\) it takes on are all non-negative, the integral of ϕ over a measurable set E with respect to measure μ is defined as

$$\int_{E}\phi\,\textrm{d}\mu = \sum_{i=1}^{n}\alpha_{i}\mu(E_{i} \cap E)\,,$$

where \(E_{i} = \{x \in X: \phi(x) = \alpha_{i}\}\) for \(i=1,\ldots,n\). It is possible that \(\int_{E} \phi\,\textrm{d} \mu = \infty\), for instance if \(\alpha_1 \not= 0\) and \(\mu(E_1 \cap E) = \infty\), or if \(\alpha_1 = \infty\) and \(\mu(E_1 \cap E) \not= 0\).

Let E be a measurable set and let \(g: X \rightarrow {\mathbb{R}}^{e}\) be a map which is non-negative, i.e., \(g(x) \geq 0\) for all \(x \in X\). If g is measurable, the integral of g over E with respect to μ is defined as

$$\int_{E} g\,\textrm{d}\mu = \sup \int_{E}\phi\,\textrm{d}\mu\,,$$

where the supremum (least upper bound) is taken over all simple maps ϕ with \(0 \leq \phi \leq g\). Function g is called integrable (over E, with respect to μ) if g is measurable and

$$\int_{E} g\,\textrm{d}\mu < \infty\,.$$

If \(\{h_{i}\}_{i=1}^{\infty}\) is a collection of non-negative measurable maps from X into \({\mathbb{R}}^{e}\), then \(h = \sum_{i=1}^{\infty} h_{i}\) is a non-negative measurable map from X into \({\mathbb{R}}^{e}\) and

$$\int_{E} h\,\textrm{d}\mu= \sum_{i=1}^{\infty} \int_{E} h_{i}\,\textrm{d}\mu\,,$$

and in particular, h is integrable if and only if \(\sum_{i=1}^{\infty} \int_{E} h_{i}\,\textrm{d}\mu < \infty\).

Let E be a measurable set and let \(g: X \rightarrow {\mathbb{R}}^{e}\) be a map. The positive part \(g^{+}\) of g is the non-negative map \(g^{+} = g \vee 0\), i.e., \(g^{+}(x) = \max \{g(x),0\}\) for each \(x \in X\), and the negative part \(g^{-}\) is the non-negative map \(g^{-} = (-g) \vee 0\). Thus \(g = g^{+} - g^{-}\) and \(|g| = g^{+} + g^{-}\). If g is measurable, so are \(g^{+}\) and \(g^{-}\), as well as \(|g|\). Function g is called integrable (over E, with respect to μ) if both \(g^{+}\) and \(g^{-}\) are integrable, in which case the integral of g is defined as

$$\int_{E} g\,\textrm{d}\mu = \int_{E} g^{+}\,\textrm{d}\mu- \int_{E} g^{-}\,\textrm{d}\mu\,.$$

Thus g is integrable over E if, and only if, \(|g|\) is integrable over E, in which case

$$\left| \int_{E} g\,\textrm{d}\mu\right| \leq\int_{E} |g|\,\textrm{d}\mu< \infty\,.$$

If g is integrable over X, then \(|g| < \infty\) a.e., g is integrable over E, and

$$\int_{E} |g|\,\textrm{d}\mu\leq\int_{X} |g| \,\textrm{d}\mu < \infty\,.$$

If g is measurable, then

$$\int_{X} |g|\,\textrm{d}\mu= 0$$

if, and only if, \(g = 0\) a.e.

Let E be a measurable set and let \(g: X \rightarrow {\mathbb{R}}^{e}\) and \(h: X \rightarrow {\mathbb{R}}^{e}\) be two maps. If \(g^2\) and \(h^2\) are integrable over E then gh is integrable over E, and

$$\left| \int_{E} gh\,\textrm{d}\mu \right| \leq \int_{E} |gh|\,\textrm{d}\mu \leq \left( \int_{E} g^2\,\textrm{d}\mu \right)^{1/2} \left( \int_{E} h^2\,\textrm{d}\mu \right)^{1/2} < \infty\,.$$
((79))

If g and h are integrable over E and \(g=h\) a.e., then

$$\int_{E} g\,\textrm{d}\mu= \int_{E} h\,\textrm{d}\mu\,.$$

If the measure space is complete, and if g is integrable over E and \(g=h\) a.e., then h is integrable over E and

$$\int_{E} g\,\textrm{d}\mu= \int_{E} h\,\textrm{d}\mu\,.$$

Now consider the complete measure space \((\mathbb{R},{\mathcal M},m)\), where \({\mathcal M}\) is the σ-algebra of Lebesgue measurable sets on \(\mathbb{R}\) and m is Lebesgue measure on \(\mathbb{R}\). If \(g: \mathbb{R} \rightarrow {\mathbb{R}}^{e}\) is measurable with respect to \({\mathcal M}\), and is either non-negative or integrable over \(\mathbb{R}\) with respect to m, the integral of g over a Lebesgue measurable set E is called the Lebesgue integral of g over E, and is often written as

$$\int_{E} g\,\textrm{d}m = \int_{E} g(x)\,\textrm{d}x \,.$$

A Borel measure on \(\mathbb{R}\) is a measure defined on the Lebesgue measurable sets \({\mathcal M}\) that is finite for bounded sets. If F is a monotone increasing function on \(\mathbb{R}\) that is continuous on the right, i.e., if \(F(\beta) \geq F(\alpha)\) and \(\lim_{\beta \rightarrow \alpha}F(\beta) = F(\alpha)\) for all \(\alpha,\beta \in \mathbb{R}\) with \(\alpha < \beta\), then there exists a unique Borel measure μ on \(\mathbb{R}\) such that

$$\mu((\alpha,\beta]) = F(\beta) - F(\alpha)$$

for all \(\alpha,\beta \in \mathbb{R}\) with \(\alpha < \beta\), where \((\alpha,\beta] = \{x \in \mathbb{R}: \alpha < x \leq \beta\}\). Let F be a monotone increasing function that is continuous on the right, and let μ be the corresponding Borel measure. If \(g: \mathbb{R} \rightarrow {\mathbb{R}}^{e}\) is measurable with respect to \({\mathcal M}\), and is either non-negative or integrable over \(\mathbb{R}\) with respect to the Borel measure μ, the Lebesgue-Stieltjes integral of g over a Lebesgue measurable set E is defined as

$$\int_{E} g(x)\,\textrm{d}F(x) = \int_{E} g\,\textrm{d}\mu\,.$$

3.3 Probability

A probability space is a measure space \((\varOmega,{\mathcal F},P)\) with \(P(\varOmega) = 1\). The set Ω is called the sample space, the σ-algebra \({\mathcal F}\) of measurable sets is called the event space, a measurable set is called an event, and P is called the probability measure. For the rest of this subsection, let \((\varOmega,{\mathcal F},P)\) be a probability space.

A measurable map from Ω into \({\mathbb{R}}^{e}\) is called a (scalar) random variable. Thus a map \(s: \varOmega \rightarrow {\mathbb{R}}^{e}\) is a random variable if, and only if,

$$\{\omega\in\varOmega: s(\omega)\leq x\} \in{\mathcal F}$$

for every \(x \in \mathbb{R}\). In particular, if s is a random variable then the function

$$F_{s}(x) = P(\{\omega\in\varOmega: s(\omega)\leq x\})\,,$$

called the probability distribution function of s, is defined for all \(x \in \mathbb{R}\). The distribution function of a random variable is monotone increasing and continuous on the right. If the distribution function \(F_{s}\) of a random variable s is an indefinite integral, i.e., if

$$F_{s}(x) = \int_{-\infty}^{x} f_{s}(y)\,\textrm{d}y$$

for all \(x \in \mathbb{R}\) and some Lebesgue integrable function \(f_{s}\), then \(f_{s}\) is called the probabilty density function of s, and \(\textrm{d}F_{s}/\textrm{d} x = f_{s}\) a.e. (with respect to Lebesgue measure) in \(\mathbb{R}\).

The expectation operator \({\mathcal E}\) is the integration operator over Ω with respect to probability measure. Thus if s is a random variable then \({\mathcal E}|s|\) is defined, since \(|s|\) is a random variable and \(|s| \geq 0\), and

$${\mathcal E}|s| = \int_{\varOmega} |s|\,\textrm{d}P\leq\infty\,,$$

while a random variable s is integrable over Ω if, and only if, \({\mathcal E}|s| < \infty\), in which case

$${\mathcal E}s = \int_{\varOmega} s\,\textrm{d}P$$

and \(\left| {\mathcal E}s \right| \leq {\mathcal E}|s| < \infty\). If s is a random variable with \({\mathcal E}|s| < \infty\), then \(\overline{s} = {\mathcal E}s\) is called the mean of s, and the mean can be evaluated equivalently as the Lebesgue-Stieltjes integral

$${\mathcal E}s = \int_{-\infty}^{\infty} x\,\textrm{d}F_{s}(x)\,,$$

where \(F_{s}\) is the distribution function of s, hence

$${\mathcal E}s = \int_{-\infty}^{\infty} x f_{s}(x)\,\textrm{d}x$$

if also s has a density function \(f_{s}\), where the integral is the Lebesgue integral.

If s is a random variable then \({\mathcal E}s^{2}\) is defined, since \(s^{2} \geq 0\) is a random variable, and either \({\mathcal E}s^{2} = \infty\) or \({\mathcal E}s^{2} < \infty\). A random variable s is called second-order if \({\mathcal E}s^{2} < \infty\). If r and s are random variables then \({\mathcal E}|rs|\) is defined since rs is a random variable, and \({\mathcal E}|rs| \leq \infty\). If r and s are second-order random variables, then

$${\mathcal E}\left|rs\right|\leq\left({\mathcal E}r^{2}\right)^{1/2}\left({\mathcal E}s^{2}\right)^{1/2} < \infty$$

by Eq. (79), hence \({\mathcal E}rs\) is defined and \(\left|{\mathcal E}rs\right| \leq {\mathcal E}\left|rs\right| < \infty\). In particular, on taking \(r = 1\) and using the fact that

$${\mathcal E}1 = \int_{\varOmega} 1\,\textrm{d}P = P(\varOmega) = 1\,,$$

it follows that if s is a second-order random variable then its mean \(\overline{s} = {\mathcal E}s\) is defined, with

$$0 \leq\left|\overline{s}\right| = \left|{\mathcal E}s\right| \leq{\mathcal E}\left|s\right| \leq \left({\mathcal E}s^{2}\right)^{1/2} < \infty\,.$$

The variance \(\sigma^{2} = {\mathcal E}(s-\overline{s})^{2}\) of a second-order random variable s is therefore also defined, and finite, with

$$0 \leq\sigma^{2} = {\mathcal E}\left(s^{2} - 2\overline{s}s + {\overline{s}}^{2}\right) = {\mathcal E}s^{2} - {\overline{s}}^{2} < \infty\,,$$

and

$${\mathcal E}s^{2} = \overline{s}^{2} + \sigma^2\,.$$
((80))

A condition \(C(\omega)\) defined for all \(\omega \in \varOmega\) is said to hold with probability one (wp1), or almost surely (a.s.), if it holds a.e. with respect to probability measure. Thus if s is a random variable, then \(s=0\) wp1 if, and only if, \({\mathcal E}|s| = 0\). If s is a random variable with \({\mathcal E}|s| < \infty\), i.e., if the mean of s is defined, then \(|s| < \infty\) wp1. If r and s are two random variables with \({\mathcal E}|r| < \infty\) and \({\mathcal E}|s| < \infty\), and if \(r = s\) wp1, then r and s have the same distribution function and, in particular, \({\mathcal E}r = {\mathcal E}s\). If the probability space is complete, and if s is a random variable, \(r: \varOmega \rightarrow {\mathbb{R}}^{e}\) and \(r = s\) wp1, then r is a random variable and has the same distribution function as s, and if, in addition, \({\mathcal E}|s| < \infty\), then \({\mathcal E}|r| < \infty\) and \({\mathcal E}r = {\mathcal E}s\).

3.4 Hilbert Space

A non-empty set V is called a linear space or vector space (over the reals) if \(\alpha \textbf{g} + \beta \textbf{h} \in V\) for all \(\textbf{g}, \textbf{h} \in V\) and \(\alpha, \beta \in \mathbb{R}\). A norm on a linear space V is a real-valued function \(\|\cdot\|\) such that, for all \(\textbf{g}, \textbf{h} \in V\) and \(\alpha \in \mathbb{R}\), (i) \(\|\textbf{h}\| \geq 0\), (ii) \(\|\textbf{h}\| = 0\) if, and only if, \(\textbf{h} = \textbf{0}\), (iii) \(\|\alpha \textbf{h}\| = |\alpha|\,\|\textbf{h}\|\), and (iv) \(\|\textbf{g} + \textbf{h}\| \leq \|\textbf{g}\| + \|\textbf{h}\|\). An inner product on a linear space V is a real-valued function \((\cdot,\cdot)\) such that, for all \(\textbf{f}, \textbf{g}, \textbf{h} \in V\) and \(\alpha \in \mathbb{R}\), (i) \((\textbf{h},\textbf{h}) \geq 0\), (ii) \((\textbf{h},\textbf{h}) = 0\) if, and only if, \(\textbf{h} = \textbf{0}\), (iii) \((\textbf{g},\alpha\textbf{h}) = \alpha (\textbf{g},\textbf{h})\), (iv) \((\textbf{f},\textbf{g}+\textbf{h}) = (\textbf{f},\textbf{g})+(\textbf{f},\textbf{h})\), and (v) \((\textbf{g},\textbf{h}) = (\textbf{h},\textbf{g})\). A normed linear space is a linear space equipped with a norm, and an inner product space is a linear space equipped with an inner product. Every inner product space V is a normed linear space, with norm \(\|\cdot\|\) given by \(\|\textbf{h}\| = (\textbf{h},\textbf{h})^{1/2}\) for all \(\textbf{h} \in V\), where \((\cdot,\cdot)\) is the inner product on V. A normed linear space V is an inner product space if, and only if, its norm \(\|\cdot\|\) satisfies the parallelogram law

$$||\textbf{g}+\textbf{h}||^{2} + ||\textbf{g}-\textbf{h}||^{2} = 2(||\textbf{g}||^{2}+||\textbf{h}||^{2})\,,$$

for all \(\textbf{g}, \textbf{h} \in V\). On every inner product space V, the inner product \((\cdot,\cdot)\) is given by the polarization identity

$$(\textbf{g},\textbf{h}) = \tfrac{1}{4}(||\textbf{g}+\textbf{h}||^{2} - ||\textbf{g}-\textbf{h}||^{2})\,,$$

for all \(\textbf{g}, \textbf{h} \in V\), where \(\|\cdot\|\) is the norm corresponding to the inner product, i.e., \(\|\textbf{h}\| = (\textbf{h},\textbf{h})^{1/2}\) for all \(\textbf{h} \in V\). The Schwarz inequality

$$|(\textbf{g},\textbf{h})|\leq||\textbf{g}||\,||\textbf{h}|| < \infty\,,$$

for all \(\textbf{g}, \textbf{h} \in V\), holds on every inner product space V, where \((\cdot,\cdot)\) is the inner product on V and \(\|\cdot\|\) is the corresponding norm.

A subset O of a normed linear space V is called open in V if for every \(\textbf{g} \in O\), there exists an \(\varepsilon > 0\) such that if \(\textbf{h} \in V\) and \(\|\textbf{g}-\textbf{h}\| < \varepsilon\) then \(\textbf{h} \in O\). A subset B of a normed linear space V is called dense in V if for every \(\textbf{h} \in V\) and \(\varepsilon > 0\), there exists an element \(\textbf{g} \in B\) such that \(\|\textbf{g}-\textbf{h}\| < \varepsilon\). A normed linear space is called separable if it has a dense subset that contains countably many elements.

A sequence of elements \(\textbf{h}_{1}, \textbf{h}_{2}, \ldots\) in a normed linear space V is called a Cauchy sequence if \(\|\textbf{h}_{m}-\textbf{h}_{n}\| \rightarrow 0\) as \(m,n \rightarrow \infty\). A sequence of elements \(\textbf{h}_{1}, \textbf{h}_{2}, \ldots\) in a normed linear space V is said to converge in V if there exists an element \(\textbf{h} \in V\) such that \(\|\textbf{h}-\textbf{h}_{n}\| \rightarrow 0\) as \(n \rightarrow \infty\), in which case one writes \(\textbf{h} = \lim_{n \rightarrow \infty} \textbf{h}_{n}\). A normed linear space V is called complete if every Cauchy sequence of elements in V converges in V. A complete normed linear space is called a Banach space. A Banach space on which the norm is defined by an inner product is called a Hilbert space. That is, a Hilbert space is an inner product space which is complete in the norm defined by the inner product.

Let \({\mathcal H}\) be a Hilbert space, with inner product \((\cdot,\cdot)\) and corresponding norm \(\|\cdot\|\). A subset S of \({\mathcal H}\) is called an orthogonal system if \(\textbf{g} \not= \textbf{0}\), \(\textbf{h} \not= \textbf{0}\) and \((\textbf{g},\textbf{h}) = 0\), for every \(\textbf{g},\textbf{h} \in {\mathcal H}\). An orthogonal system S is called an orthogonal basis (or complete orthogonal system) if no other orthogonal system contains S as a proper subset. An orthogonal basis S is called an orthonormal basis if \(\|\textbf{h}\| = 1\) for every \(\textbf{h} \in S\). There exists an orthonormal basis which has countably many elements if, and only if, \({\mathcal H}\) is separable. If \({\mathcal H}\) is a separable Hilbert space then every orthonormal basis for \({\mathcal H}\) has the same number of elements \(N \leq \infty\), and N is called the dimension of \({\mathcal H}\).

Let \({\mathcal H}\) be a separable Hilbert space, with inner product \((\cdot,\cdot)\), corresponding norm \(\|\cdot\|\), and orthonormal basis \(S = \{\textbf{h}_{i}\}_{i=1}^{N}\), \(N \leq \infty\). If \(\textbf{h} \in {\mathcal H}\) then the sequence of partial sums \(\sum_{i=1}^{n} (\textbf{h}_{i},\textbf{h}) \textbf{h}_{i}\) converges to \(\textbf{h}\), i.e.,

$$\lim_{n \rightarrow N} ||\textbf{h} - \sum_{i=1}^{n} (\textbf{h}_{i},\textbf{h}) \textbf{h}_{i}|| = 0\,,$$

and so every \(\textbf{h} \in {\mathcal H}\) has the representation

$$\textbf{h} = \sum_{i=1}^{N} (\textbf{h}_{i},\textbf{h}) \textbf{h}_{i}\,.$$

Furthermore,

$$(\textbf{g},\textbf{h}) = \sum_{i=1}^{N}(\textbf{h}_{i},\textbf{g})(\textbf{h}_{i},\textbf{h})\,,$$

for every \(\textbf{g},\textbf{h} \in {\mathcal H}\). Therefore, for every \(\textbf{h} \in {\mathcal H}\),

$$||\textbf{h}||^{2} = \sum_{i=1}^{N}(\textbf{h}_{i},\textbf{h})^{2}\,,$$

which is called Parseval’s relation.

An example of a separable Hilbert space of dimension \(N \leq \infty\) is the space \(\ell^{2}_{N}\) of square-summable sequences of N real numbers, with inner product \((\textbf{g},\textbf{h}) = \sum_{i=1}^{N} g_{i}h_{i}\), where \(g_{i}\) and \(h_{i}\) denote element i of \(\textbf{g} \in \ell^{2}_{N}\) and \(\textbf{h} \in \ell^{2}_{N}\), respectively. An orthonormal basis for \(\ell^{2}_{N}\) is the set of unit vectors \(\{\textbf{e}_{j}\}_{j=1}^{N}\), where element i of \(\textbf{e}_{j}\) is 1 if \(i=j\) and 0 if \(i \not= j\). In case \(N < \infty\), the elements of \(\ell^{2}_{N}\) are usually written as (column) N-vectors \(\textbf{g} = (g_{1},\ldots,g_{N})^{T}\), the inner product is then \((\textbf{g},\textbf{h}) = \textbf{g}^{T}\textbf{h}\), and the columns of the \(N \times N\) identity matrix constitute an orthonormal basis.

Let \((X,{\mathcal C},\mu)\) be a measure space. Denote by \({\mathcal L}^{1}(X,{\mathcal C},\mu)\) the set of integrable maps from X into \({\mathbb R}^{e}\), and consider the function \(\|\cdot\|\) defined for all \(g \in {\mathcal L}^{1}(X,{\mathcal C},\mu)\) by

$$\|g|| = \int_{X} |g|\,\textrm{d}\mu\,.$$

The set \({\mathcal L}^{1}(X,{\mathcal C},\mu)\) is a linear space, and the function \(\|\cdot\|\) is by definition real-valued, i.e., \(\|g\| < \infty\) for all \(g \in {\mathcal L}^{1}(X,{\mathcal C},\mu)\). The function \(\|\cdot\|\) also satisfies all of the properties of a norm, except that \(\|g\| = 0\) does not imply \(g=0\). However, \(\|g\| = 0\) does imply that \(g = 0\) a.e., and \(g = 0\) a.e. implies that \(\|g\| = 0\), for all \(g \in {\mathcal L}^{1}(X,{\mathcal C},\mu)\). Two maps g and h from X into \({\mathbb R}^{e}\) are called equivalent, or are said to belong to the same equivalence class, if \(g = h\) a.e. If g and h are equivalent, and if \(g,h \in {\mathcal L}^{1}(X,{\mathcal C},\mu)\), then \(\|g\| = \|h\|\). That is, \(\|\cdot\|\) assigns the same real number to each member of a given equivalence class of elements of \({\mathcal L}^{1}(X,{\mathcal C},\mu)\), and thereby the domain of definition of the function \(\|\cdot\|\) is extended from the elements of \({\mathcal L}^{1}(X,{\mathcal C},\mu)\) to the equivalence classes of elements of \({\mathcal L}^{1}(X,{\mathcal C},\mu)\). The set \(L^{1}(X,{\mathcal C},\mu)\) of equivalence classes of elements of \({\mathcal L}^{1}(X,{\mathcal C},\mu)\) is a linear space, and \(\|\cdot\|\) is a norm on this space. The Riesz-Fischer theorem states that \(L^{1}(X,{\mathcal C},\mu)\) is complete in this norm, i.e., that \(L^{1}(X,{\mathcal C},\mu)\) is a Banach space under the norm \(\|\cdot\|\). The elements of \(L^{1}(X,{\mathcal C},\mu)\), unlike those of \({\mathcal L}^{1}(X,{\mathcal C},\mu)\), are not defined pointwise in X, and therefore are not maps.

Denote by \({\mathcal L}^{2}(X,{\mathcal C},\mu)\) the set of square-integrable maps from X into \({\mathbb R}^{e}\), and consider the function \(\|\cdot\|\) defined for all \(g \in {\mathcal L}^{2}(X,{\mathcal C},\mu)\) by

$$\|g|| = \left( \int_{X} g^{2}\,\textrm{d}\mu\right)^{1/2}\,.$$

Again, the function \(\|\cdot\|\) assigns the same real number to each member of any given equivalence class of elements of \({\mathcal L}^{2}(X,{\mathcal C},\mu)\), i.e., to each \(g,h \in {\mathcal L}^{2}(X,{\mathcal C},\mu)\) such that \(g = h\) a.e., and in particular, \(\|g\| = 0\) if and only if \(g = 0\) a.e. Thus the domain of definition of the function \(\|\cdot\|\) can be extended to the equivalence classes. The set \(L^{2}(X,{\mathcal C},\mu)\) of equivalence classes of elements of \({\mathcal L}^{2}(X,{\mathcal C},\mu)\) is a linear space, \(\|\cdot\|\) is a norm on this space, and \(L^{2}(X,{\mathcal C},\mu)\) is complete in this norm. Therefore \(L^{2}(X,{\mathcal C},\mu)\) is a Banach space under the norm \(\|\cdot\|\). Moreover, this norm satisfies the parallelogram law, and therefore \(L^{2}(X,{\mathcal C},\mu)\) is a Hilbert space. The polarization identity yields the inner product \((\cdot,\cdot)\) on \(L^{2}(X,{\mathcal C},\mu)\), viz.,

$$(g,h) = \int_{X} gh\,\textrm{d}\mu\,,$$

for all \(g,h \in L^{2}(X,{\mathcal C},\mu)\). Again, the elements of \(L^{2}(X,{\mathcal C},\mu)\) are not defined pointwise and are not maps. The Schwarz inequality holds on \(L^{2}(X,{\mathcal C},\mu)\) since \(L^{2}(X,{\mathcal C},\mu)\) is an inner product space, and gives Eq. (79) when restricted to the elements of \({\mathcal L}^{2}(X,{\mathcal C},\mu)\).

Let \(V_{1}\) and \(V_{2}\) be two normed linear spaces, with inner products \(\|\cdot\|_{1}\) and \(\|\cdot\|_{2}\), respectively, and let \({\mathcal H}\) be a Hilbert space, with inner product \((\cdot,\cdot)\). A bounded linear operator from \(V_{1}\) into \(V_{2}\) is a map \({\mathcal T}: V_{1} \rightarrow V_{2}\) such that (i) \({\mathcal T}(\alpha \textbf{g} + \beta \textbf{h}) = \alpha {\mathcal T} \textbf{g} + \beta {\mathcal T} \textbf{h}\) for all \(\textbf{g},\textbf{h} \in V_{1}\) and \(\alpha,\beta \in {\mathbb R}\), and (ii) there exists a constant \(\gamma \in {\mathbb R}\) such that \(\|{\mathcal T}\textbf{h}\|_{2} \leq \gamma \|\textbf{h}\|_{1}\) for all \(\textbf{h} \in V_{1}\). A bounded linear operator \({\mathcal T}:{\mathcal H} \rightarrow {\mathcal H}\) is called self-adjoint if \(({\mathcal T}\textbf{g},\textbf{h}) = (\textbf{g},{\mathcal T}\textbf{h})\) for all \(\textbf{g},\textbf{h} \in {\mathcal H}\), and is called positive semidefinite if \((\textbf{h},{\mathcal T}\textbf{h}) \geq 0\) for all \(\textbf{h} \in {\mathcal H}\).

At the beginning of this subsection, the field of scalars for linear spaces V was taken to be the real numbers, and inner products were therefore defined to be real-valued. Thus the Hilbert spaces defined here are real Hilbert spaces. It is also possible, of course, to define complex Hilbert spaces. One property that is lost by restricting attention in this chapter to real Hilbert spaces is that, while every positive semidefinite operator on a complex Hilbert space is self-adjoint, a positive semidefinite operator on a real Hilbert space need not be self-adjoint (e.g. Reed and Simon 1972, p. 195). Covariance operators on a real Hilbert space are necessarily self-adjoint as well as positive semidefinite, however, as discussed in Appendix 1c.

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Cohn, S.E. (2010). The Principle of Energetic Consistency in Data Assimilation. In: Lahoz, W., Khattatov, B., Menard, R. (eds) Data Assimilation. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-74703-1_7

Download citation

Publish with us

Policies and ethics