Abstract
Variational and Bayesian methods are two widely used set of approaches to solve image denoising problems. In a Bayesian setting, these approaches correspond, respectively, to using maximum a posteriori estimators and posterior mean estimators for reconstructing images. In this paper, we propose novel theoretical connections between Hamilton–Jacobi partial differential equations (HJ PDEs) and a broad class of posterior mean estimators with quadratic data fidelity term and log-concave prior. Where solutions to some first-order HJ PDEs with initial data describe maximum a posteriori estimators, here we show that solutions to some viscous HJ PDEs with initial data describe a broad class of posterior mean estimators. We use these connections to establish representation formulas and various properties of posterior mean estimators. In particular, we use these connections to show that some Bayesian posterior mean estimators can be expressed as proximal mappings of smooth functions and derive representation formulas for these functions.
This is a preview of subscription content, access via your institution.



References
Alberti, G., Ambrosio, L., Cannarsa, P.: On the singularities of convex functions. Manuscripta Math. 76(3–4), 421–435 (1992)
Aubert, G., Kornprobst, P.: Mathematical Problems in Image Processing: Partial Differential Equations and the Calculus of Variations, vol. 147. Springer Science & Business Media, Berlin (2006)
Aubin, J.P., Cellina, A.: Differential inclusions: set-valued maps and viability theory, vol. 264. Springer Science & Business Media, Berlin (2012)
Banerjee, A., Guo, X., Wang, H.: On the optimality of conditional expectation as a Bregman predictor. IEEE Trans. Inform. Theory 51(7), 2664–2669 (2005)
Boncelet, C.: Chapter 7 - image noise models. In: Bovik, A. (ed.) The Essential Guide to Image Processing, pp. 143–167. Academic Press, Cambridge (2009)
Bouman, C., Sauer, K.: A generalized Gaussian image model for edge-preserving map estimation. IEEE Trans. Image Process. 2(3), 296–310 (1993)
Boyat, A.K., Joshi, B.K.: A review paper: noise models in digital image processing. Signal Image Process. Int. J. (SIPIJ) 6(2), 63–75 (2015)
Burger, M., Dong, Y., Sciacchitano, F.: Bregman cost for non-Gaussian noise (2016). arXiv preprint arXiv:1608.07483
Burger, M., Lucka, F.: Maximum a posteriori estimates in linear inverse problems with log-concave priors are proper Bayes estimators. Inverse Probl. 30(11), 114004 (2014)
Candès, E., Romberg, J., Tao, T.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52(2), 489–509 (2006)
Chambolle, A., Darbon, J.: On total variation minimization and surface evolution using parametric maximum flows. Int. J. Comput. Vis. 84(3), 288 (2009)
Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997)
Chambolle, A., Pock, T.: An introduction to continuous optimization for imaging. Acta Numer. 25, 161–319 (2016)
Chan, T., Marquina, A., Mulet, P.: High-order total variation-based image restoration. SIAM J. Sci. Comput. 22(2), 503–516 (2000)
Chaudhari, P., Choromanska, A., Soatto, S., LeCun, Y., Baldassi, C., Borgs, C., Chayes, J., Sagun, L., Zecchina, R.: Entropy-sgd: Biasing gradient descent into wide valleys. J. Stat. Mech. Theory Exp. 2019(12), 124018 (2019)
Chaudhari, P., Oberman, A., Osher, S., Soatto, S., Carlier, G.: Deep relaxation: partial differential equations for optimizing deep neural networks. Res. Math. Sci. 5(3), 30 (2018)
Darbon, J.: On convex finite-dimensional variational methods in imaging sciences and Hamilton-Jacobi equations. SIAM J. Imaging Sci. 8(4), 2268–2293 (2015)
Darbon, J., Meng, T.: On decomposition models in imaging sciences and multi-time Hamilton-Jacobi partial differential equations. SIAM J. Imaging Sci. 13(2), 971–1014 (2020)
Darbon, J., Sigelle, M.: Image restoration with discrete constrained total variation part I: Fast and exact optimization. J. Math. Imaging Vis. 26(3), 261–276 (2006)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. A J. Issued Courant Insti. Math. Sci. 57(11), 1413–1457 (2004)
Demoment, G.: Image reconstruction and restoration: overview of common estimation structures and problems. IEEE Trans. Acoust. Speech Signal Process. 37(12), 2024–2036 (1989)
Deuschel, J.D., Stroock, D.: Large Deviations, Pure and Applied Mathematics, vol. 137. Academic Press, Cambridge (1989)
Dobson, D.C., Santosa, F.: Recovery of blocky images from noisy and blurred data. SIAM J. Appl. Math. 56(4), 1181–1198 (1996)
Donoho, D.: Compressed sensing. IEEE Trans. Inform. Theory 52(4), 1289–1306 (2006)
Durand, S., Malgouyres, F., Rougé, B.: Image deblurring, spectrum interpolation and application to satellite imaging. ESAIM Control Optim. Calc. Var. 5, 445–475 (2000)
Evans, L.: Partial Differential Equations. American Mathematical Society, Providence, RI (2010)
Falconer, K.J.: The Geometry of Fractal Sets. Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge (1985). https://doi.org/10.1017/CBO9780511623738
Federer, H.: Geometric Measure Theory. Springer, Berlin (1969)
Figueiredo, M.A., Nowak, R.D.: Wavelet-based image estimation: an empirical Bayes approach using Jeffrey’s noninformative prior. IEEE Trans. Image Process. 10(9), 1322–1331 (2001)
Folland, G.B.: Real Analysis: Modern Techniques and Their Applications. John Wiley & Sons, Hoboken (2013)
García Trillos, N., Kaplan, Z., Sanz-Alonso, D.: Variational characterizations of local entropy and heat regularization in deep learning. Entropy 21(5), 511 (2019)
Giusti, E.: Minimal Surfaces and Functions of Bounded Variation, Monographs in Mathematics, vol. 80. Birkhäuser Verlag, Basel (1984)
Gradshtein, I., Ryzhik, I., Jeffrey, A., Zwillinger, D.: Table of integrals, series and products. Academic Press, Cambridge (2007)
Gribonval, R.: Should penalized least squares regression be interpreted as maximum a posteriori estimation? IEEE Trans. Signal Process. 59(5), 2405–2410 (2011)
Gribonval, R., Machart, P.: Reconciling “priors” & “priors” without prejudice? In: Advances in Neural Information Processing Systems, pp. 2193–2201 (2013)
Gribonval, R., Nikolova, M.: A characterization of proximity operators. J. Math. Imaging Vis. 62, 773–789 (2020). https://doi.org/10.1007/s10851-020-00951-y
Gribonval, R., Nikolova, M.: On Bayesian estimation and proximity operators. Appl. Comput. Harmon. Anal. 50, 49–72 (2021). https://doi.org/10.1016/j.acha.2019.07.002
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals, Grundlehren Text Editions, vol. 305. Springer Science & Business Media, Berlin (1993)
Hiriart-Urruty, J.B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods, Grundlehren Text Editions, vol. 306. Springer Science & Business Media, Berlin (1993)
Hiriart-Urruty, J.B., Plazanet, P.H.: Moreau’s decomposition theorem revisited. In: Annales de l’Institut Henri Poincare (C) Non Linear Analysis, vol. 6, pp. 325–338. Elsevier (1989)
Hochbaum, D.S.: An efficient algorithm for image segmentation, Markov random fields and related problems. J. ACM (JACM) 48(4), 686–701 (2001)
Hörmander, L.: The analysis of linear partial differential operators. I. Classics in Mathematics. Springer-Verlag, Berlin (2003). https://doi.org/10.1007/978-3-642-61497-2. Distribution theory and Fourier analysis, Reprint of the second (1990) edition [Springer, Berlin; MR1065993 (91m:35001a)]
Kay, S.M.: Fundamentals of Statistical Signal Processing. Prentice Hall PTR, Hoboken (1993)
Keener, R.W.: Theoretical Statistics: Topics for a Core Course. Springer, Berlin (2011)
Lang, R.: A note on the measurability of convex sets. Arch. Math. (Basel) 47(1), 90–92 (1986)
Leindler, L.: On a certain converse of Hölder’s inequality inequality. In: Linear Operators and Approximation, Lineare Operatoren und Approximation, pp. 182–184. Springer (1972)
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)
Louchet, C.: Modèles variationnels et bayésiens pour le débruitage d’images: de la variation totale vers les moyennes non-locales. Ph.D. thesis, Université René Descartes-Paris V (2008)
Louchet, C., Moisan, L.: Posterior expectation of the total variation model: properties and experiments. SIAM J. Imaging Sci. 6(4), 2640–2684 (2013)
Moreau, J.J.: Proximité et dualité dans un espace hilbertien. Bull. de la Société mathématique de France 93, 273–299 (1965)
Nikolova, M.: Weakly constrained minimization: application to the estimation of images and signals involving constant regions. J. Math. Imaging Vis. 21(2), 155–175 (2004)
Nikolova, M.: Model distortions in Bayesian map reconstruction. Inverse Probl. Imaging 1(2), 399 (2007)
Pereyra, M.: Revisiting maximum-a-posteriori estimation in log-concave models. SIAM J. Imaging Sci. 12(1), 650–670 (2019)
Pfeffer, W.F.: Divergence theorem for vector fields with singularities. In: New Integrals, pp. 150–166. Springer (1990)
Phillips, D.L.: A technique for the numerical solution of certain integral equations of the first kind. J. ACM (JACM) 9(1), 84–97 (1962)
Prékopa, A.: Logarithmic concave measures with application to stochastic programming. Acta Sci. Math. 32, 301–316 (1971)
Roberts, G.O., Rosenthal, J.S., et al.: Optimal scaling for various Metropolis-Hastings algorithms. Stat. Sci. 16(4), 351–367 (2001)
Rockafellar, R.T.: Convex Analysis. Princeton University Press, Princeton (1970)
Rockafellar, R.T., Wets, R.J.B.: Variational Analysis, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 317. Springer-Verlag, Berlin (2009)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992)
Rudin, W.: Real and Complex Analysis. Tata McGraw-Hill education, New York (2006)
Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010)
Tarantola, A.: Inverse Problem Theory and Methods for Model Parameter Estimation. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2005)
Tikhonov, A.N., Goncharsky, A., Stepanov, V., Yagola, A.G.: Numerical Methods for the Solution of Ill-Posed Problems, vol. 328. Springer Science & Business Media, Berlin (1995)
Vidal, R., Bruna, J., Giryes, R., Soatto, S.: Mathematics of deep learning (2017). arXiv preprint arXiv:1712.04741
Vogel, C.R.: Computational Methods for Inverse Problems, Frontiers in Applied Mathematics, vol. 23. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA (2002)
Winkler, G.: Image Analysis, Random Fields and Markov Chain Monte Carlo Methods: a Mathematical Introduction, Applications of Mathematics (New York), vol. 27, 2nd edn. Springer-Verlag, Berlin (2003)
Woodford, O.J., Rother, C., Kolmogorov, V.: A global perspective on map inference for low-level vision. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 2319–2326. IEEE (2009)
Zhou, X.: On the Fenchel duality between strong convexity and Lipschitz continuous gradient (2018). arXiv preprint arXiv:1803.06573
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Research supported by NSF DMS 1820821. Authors’ names are given in last/family name alphabetical order.
Appendices
Proof of Proposition 3.1
We will use the following lemma, which characterizes the partition function (26) in terms of the solution to a Cauchy problem involving the heat equation with initial data \(J \in \Gamma _0({\mathbb {R}}^n)\), to prove parts (i) and (ii)(a)–(d) of Proposition 3.1.
Lemma A.1
(The heat equation with initial data in \(\Gamma _0({\mathbb {R}}^n)\)) Suppose the function \(J:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\cup \{+\infty \}\) satisfies assumptions (A1)–(A3).
-
(i)
For every \(\epsilon >0\), the function \(w_\epsilon :{\mathbb {R}}^n\times [0,+\infty )\rightarrow (0,1]\) defined by
$$\begin{aligned} w_{\epsilon }({\varvec{x}},t)&:=\frac{1}{(2\pi t \epsilon )^{n/2}}Z_J({\varvec{x}},t,\epsilon ) \nonumber \\&= \frac{1}{\left( 2\pi t\epsilon \right) ^{n/2}}\int _{{\mathbb {R}}^n}e^{-\left( \frac{1}{2t}\left\| {\varvec{x}}-{\varvec{y}}\right\| _{2}^{2}+J({\varvec{y}})\right) /\epsilon }d{\varvec{y}} \end{aligned}$$(55)is the unique smooth solution to the Cauchy problem
$$\begin{aligned} {\left\{ \begin{array}{ll} \frac{\partial w_{\epsilon }}{\partial t}({\varvec{x}},t)=\frac{\epsilon }{2}\Delta _{{\varvec{x}}}w_{\epsilon }({\varvec{x}},t) &{} \text{ in } {\mathbb {R}}^n\times (0,+\infty ),\\ w_{\epsilon }({\varvec{x}},0)=e^{-J({\varvec{x}})/\epsilon } &{} \text{ in } {\mathbb {R}}^n. \end{array}\right. } \end{aligned}$$(56)In addition, the domain of integration of the integral (55) can be taken to be \(\mathrm {dom}~J\) or, up to a set of Lebesgue measure zero, \(\mathrm {int}~(\mathrm {dom}~J)\) or \(\mathrm {dom}~\partial J\). Furthermore, for every \({\varvec{x}}\in {\mathbb {R}}^n\) and \(\epsilon > 0\), except possibly at the points \({\varvec{x}}\in (\mathrm {dom}~J)\setminus (\mathrm {int}~\mathrm {dom}~J)\) if such points exist, the pointwise limit of \(w_\epsilon ({\varvec{x}},t)\) as \(t \rightarrow 0\) exists and satisfies
$$\begin{aligned} \lim _{\begin{array}{c} t\rightarrow 0\\ t>0 \end{array}}w_{\epsilon }({\varvec{x}},t)=e^{-J({\varvec{x}})/\epsilon }, \end{aligned}$$with the limit equal to 0 whenever \({\varvec{x}}\notin \mathrm {dom}~J\).
-
(ii)
(Log-concavity and monotonicity properties).
-
(a)
The function \({\mathbb {R}}^n\times (0,+\infty )\ni ({\varvec{x}},t)\mapsto t^{n/2}w_{\epsilon }({\varvec{x}},t)\) is jointly log-concave.
-
(b)
The function \((0,+\infty )\ni t\mapsto t^{n/2}w_{\epsilon }({\varvec{x}},t)\) is strictly monotone increasing.
-
(c)
The function \((0,+\infty )\ni \epsilon \mapsto \epsilon ^{n/2}w_{\epsilon }({\varvec{x}},t)\) is strictly monotone increasing.
-
(d)
The function \({\mathbb {R}}^n\ni {\varvec{x}}\mapsto e^{\frac{1}{2t\epsilon }\left\| {{\varvec{x}}} \right\| _{2}^{2}}w_{\epsilon }({\varvec{x}},t)\) is strictly log-convex.
-
(a)
The proof of (i) follows from classical PDEs arguments for the Cauchy problem (56) tailored to the initial data \(({\varvec{x}},\epsilon )\mapsto e^{-J({\varvec{x}})/\epsilon }\) with J satisfying assumptions (A1)–(A3), and the proof of log-concavity and monotonicity (ii)(a)–(d) follows from the Prékopa–Leindler and Hölder’s inequalities [30, 46, 56]; we present the details below.
Proof
Proof of Lemma A.1(i): This result follows directly from the theory of convolution of Schwartz distributions ( [42], Chapter 2, Sect. 2.1, Chapter 4, Sect. 4.2 and 4.4., and in particular Theorem 4.4.1 on page 110). To see why this is the case, note that by assumptions (A1)–(A3) the initial condition \({\varvec{y}}\mapsto e^{-J({\varvec{y}})}\) is a locally integrable function, and locally integrable functions are Schwartz distributions.
Proof of Lemma A.1(ii)(a): The log-concavity property will be shown using the Prékopa–Leindler inequality.
Theorem A.1
(Prékopa–Leindler inequality [46, 56]) Let f, g, and h be non-negative real-valued and measurable functions on \({\mathbb {R}}^n\), and suppose
for every \({\varvec{y}}_{1},\) \({\varvec{y}}_{2}\in {\mathbb {R}}^n\) and \(\lambda \in (0,1)\). Then,
Proof of Lemma A.1(ii)(a) (continued): Let \(\epsilon >0\), \(\lambda \in (0,1)\), \({\varvec{x}}=\lambda {\varvec{x}}_{1}+(1-\lambda ){\varvec{x}}_{2}\), \({\varvec{y}}=\lambda {\varvec{y}}_{1}+(1-\lambda ){\varvec{y}}_{2}\), and \(t=\lambda t_{1}+(1-\lambda )t_{2}\) for any \({\varvec{x}}_{1},\,{\varvec{x}}_{2},\,{\varvec{y}}_{1},\,{\varvec{y}}_{2}\in {\mathbb {R}}^n\) and \(t_{1},\,t_{2}\in (0,+\infty )\). The joint convexity of the function \({\mathbb {R}}^n\times (0,+\infty )\ni ({\varvec{z}},t)\mapsto \frac{1}{2t}\left\| {\varvec{z}}\right\| _{2}^{2}\) and convexity of J imply
This gives
Applying the Prékopa–Leindler inequality with
and
and using the definition (55) of \(w_{\epsilon }({\varvec{x}},t)\), we get
As a result, the function \(({\varvec{x}},t)\mapsto t^{n/2}w_{\epsilon }({\varvec{x}},t)\) is jointly log-concave on \({\mathbb {R}}^n\times (0,+\infty )\).
Proof of Lemma A.1(ii)(b): Since \(t\mapsto \frac{1}{t}\) is strictly monotone decreasing on \((0,+\infty )\), then for every \({\varvec{x}}\in {\mathbb {R}}^n\), \({\varvec{y}}\in \mathrm {dom}~J\), \(\epsilon >0\), and \(0<t_1<t_2\),
whenever \({\varvec{x}}\ne {\varvec{y}}\). Integrating both sides of the inequality with respect to \({\varvec{y}}\) over \(\mathrm {dom}~J\) yields
As a result, the function \(t\mapsto t^{n/2}w_{\epsilon }({\varvec{x}},t)\) is strictly monotone increasing on \((0,+\infty )\).
Proof of Lemma A.1(ii)(c): Since \(\epsilon \mapsto \frac{1}{\epsilon }\) is strictly monotone decreasing on \((0,+\infty )\) and \(\mathrm {dom}~J \ni {\varvec{y}}\mapsto J({\varvec{y}})\) is non-negative by assumption (A3), then for every \({\varvec{x}}\in {\mathbb {R}}^n\), \(t>0\), and \(0<\epsilon _1<\epsilon _2\) we have
whenever \({\varvec{x}}\ne {\varvec{y}}\). Integrating both sides of the inequality with respect to \({\varvec{y}}\) over \(\mathrm {dom}~J\) yields
As a result, the function \(\epsilon \mapsto \epsilon ^{n/2}w_{\epsilon }({\varvec{x}},t)\) is strictly monotone increasing on \((0,+\infty )\).
Proof of Lemma A.1(ii)(d): Let \(\epsilon >0\), \(t>0\), \(\lambda \in (0,1)\), \({\varvec{x}}_1,{\varvec{x}}_2\in {\mathbb {R}}^n\) with \({\varvec{x}}_1\ne {\varvec{x}}_2\) and \({\varvec{x}}=\lambda {\varvec{x}}_{1}+(1-\lambda ){\varvec{x}}_{2}\). Then,
Hölder’s inequality [30, Theorem 6.2] then implies
where the inequality in the equation above is an equality if and only if there exists a constant \(\alpha \in {\mathbb {R}}\) such that \(\alpha e^{\left\langle {\varvec{x}}_1,{\varvec{y}}\right\rangle /t\epsilon } = e^{\left\langle {\varvec{x}}_x,{\varvec{y}}\right\rangle /t\epsilon }\) for almost every \({\varvec{y}}\in \mathrm {dom}~J\). This does not hold here since \({\varvec{x}}_1\ne {\varvec{x}}_2\). As a result, the function \({\mathbb {R}}^n\ni {\varvec{x}}\mapsto e^{\frac{1}{2t\epsilon }\left\| {{\varvec{x}}} \right\| _{2}^{2}}w_{\epsilon }({\varvec{x}},t)\) is strictly log-convex. \(\square \)
Proof of Proposition 3.1(i) and (ii)(a)–(d): The proof of these statements follows from Lemma A.1 and classic results about the Cole–Hopf transform (see, e.g., [26, Section 4.4.1]), with \(S_\epsilon ({\varvec{x}},t) :=-\epsilon \log (w_\epsilon ({\varvec{x}},t))\).
Proof of Proposition 3.1(iii): The formulas follow from a straightforward calculation of the gradient, divergence, and Laplacian of \(S_\epsilon ({\varvec{x}},t)\) that we omit here. Since the function \({\varvec{x}}\mapsto \frac{1}{2}\left\| {{\varvec{x}}} \right\| _{2}^{2}-tS_{\epsilon }({\varvec{x}},t)\) is strictly convex, we can invoke (Corollary 26.3.1, [58]) to conclude that its gradient \({\varvec{x}}\mapsto {\varvec{x}}- t\nabla _{{\varvec{x}}}S_\epsilon ({\varvec{x}},t)\), which gives the posterior mean \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\), is bijective.
Proof of Proposition 3.1(iv): We will prove this result in three steps. First, we will show that
and
Next, we will show that \(\liminf _{\begin{array}{c} \epsilon \rightarrow 0\\ \epsilon >0 \end{array} }S_{\epsilon }({\varvec{x}},t)\geqslant S_0({\varvec{x}},t)\). Finally, we will use steps 1 and 2 to conclude that \(\lim _{\begin{array}{c} \epsilon \rightarrow 0\\ \epsilon >0 \end{array}}S_{\epsilon }({\varvec{x}},t)=S_0({\varvec{x}},t)\). Pointwise and local uniform convergence of the gradient \(\lim _{\begin{array}{c} \epsilon \rightarrow 0\\ \epsilon >0 \end{array}}\nabla _{{\varvec{x}}}S_{\epsilon }({\varvec{x}},t)=\nabla _{{\varvec{x}}}S_0({\varvec{x}},t)\), the partial derivative \(\lim _{\begin{array}{c} \epsilon \rightarrow 0\\ \epsilon >0 \end{array}}\frac{\partial S_{\epsilon }({\varvec{x}},t)}{\partial t}=\frac{\partial S_0({\varvec{x}},t)}{\partial t}\), and the Laplacian \(\lim _{\begin{array}{c} \epsilon \rightarrow 0\\ \epsilon >0 \end{array}}\frac{\epsilon }{2}\Delta _{{\varvec{x}}}S_{\epsilon }({\varvec{x}},t)=0\) then follow from the convexity and differentiability of the solutions \(({\varvec{x}},t)\mapsto S_0({\varvec{x}},t)\) and \(({\varvec{x}},t)\mapsto S_{\epsilon }({\varvec{x}},t)\) to the HJ PDEs (20) and (29).
In what follows, we will use the following large deviation principle result [22]: For every Lebesgue measurable set \({\mathcal {A}} \in {\mathbb {R}}^n\),
where
Step 1. (Adapted from Deuschel and Stroock [22], Lemma 2.1.7.) By convexity, the function J is continuous for every \({\varvec{y}}_{0}\in \mathrm {int}~(\mathrm {dom}~J)\), the latter set being open. Therefore, for every such \({\varvec{y}}_{0}\) there exists a number \(r_{{\varvec{y}}_{0}}>0\) such that for every \(0<r\leqslant r_{{\varvec{y}}_{0}}\) the open ball \(B_{r}({\varvec{y}}_{0})\) is contained in \(\mathrm {int}~(\mathrm {dom}~J)\). Hence,
Take \(\limsup _{\begin{array}{c} \epsilon \rightarrow 0\\ \epsilon >0 \end{array}}\) and apply the large deviation principle to the term on the right to get
Take \(\lim _{r\rightarrow 0}\) on both sides of the inequality to find
Since the inequality holds for every \({\varvec{y}}_{0}\in \mathrm {int}~(\mathrm {dom}~J)\), we can take the infimum over all \(y\in \mathrm {int}~(\mathrm {dom}~J)\) on the right-hand-side of the inequality to get
By assumptions (A1) and (A2) that \(J \in \Gamma _0({\mathbb {R}}^n)\) and \(\mathrm {int}~(\mathrm {dom}~J) \ne \varnothing \), the infimum on the right hand side is equal to that taken over \(\mathrm {dom}~J\) [58, Corollary 7.3.2], i.e.,
We combine (57) and (58) to obtain
which is the desired result.
Step 2. We can invoke Lemma 2.1.8 in [22] because its conditions are satisfied (in the notation of [22], \(\Phi =-J\), which is upper semicontinuous, \({\varvec{y}}\mapsto \frac{1}{2t}\left\| {\varvec{x}}-{\varvec{y}}\right\| _{2}^{2}\) is the rate function, and note that the tail condition (2.1.9) is satisfied in that \(\sup _{{\varvec{y}}\in {\mathbb {R}}^n} -J({\varvec{y}}) = -\inf _{{\varvec{y}}\in {\mathbb {R}}^n} J({\varvec{y}}) = 0\) by assumption (A3)) to get
Step 3. Combining the two limits derived in steps 1 and 2 yields
for every \({\varvec{x}}\in {\mathbb {R}}^n\) and \(t>0\), where the limit converges uniformly on every compact subset \(({\varvec{x}},t)\) of \({\mathbb {R}}^n\times (0,+\infty )\) [58, Theorem 10.8].
By differentiability and joint convexity of both \({\mathbb {R}}^n\times (0,+\infty ) \ni ({\varvec{x}},t) \mapsto S_0({\varvec{x}},t)\) and \({\mathbb {R}}^n\times (0,+\infty ) \ni ({\varvec{x}},t)\mapsto S_{\epsilon }({\varvec{x}},t)-\frac{n\epsilon }{2}\ln t\) (Proposition 2.2 (i), and Proposition 3.1 (i) and (ii)(a)), we can invoke [58, Theorem 25.7] to get
for every \({\varvec{x}}\in {\mathbb {R}}^n\) and \(t>0\), where the limit converges uniformly on every compact subset of \({\mathbb {R}}^n\times (0,+\infty )\). Furthermore, the viscous HJ PDE (29) for \(S_\epsilon \) implies that
where the last equality holds thanks to the HJ PDE (20) (see Proposition 2.2). Here, again, the limit holds for every \({\varvec{x}}\in {\mathbb {R}}^n\) and \(t>0\), and the limit converges uniformly over any compact subset of \({\mathbb {R}}^n\times (0,+\infty )\). Finally, the limit \(\lim _{\begin{array}{c} \epsilon \rightarrow 0\\ \epsilon >0 \end{array}} {\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) = {\varvec{u}}_{MAP}({\varvec{x}},t)\) holds directly as a consequence to the limit \(\lim _{\begin{array}{c} \epsilon \rightarrow 0\\ \epsilon >0 \end{array}}\nabla _{{\varvec{x}}}S_{\epsilon }({\varvec{x}},t)=\nabla _{{\varvec{x}}}S_0({\varvec{x}},t)\) and the representation formulas (30) (see Proposition 3.1(iii)) and (23) (see Proposition 2.2(ii))for the posterior mean and MAP estimates, respectively.
Proof of Proposition 4.1
Proof of (i): We will prove that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\in \mathrm {int}~(\mathrm {dom}~J)\) in two steps. First, we will use the projection operator (10) (see Definition 4) and the posterior mean estimate \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\) to prove by contradiction that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\in \mathrm {cl}~(\mathrm {dom}~J)\). Second, we will use the following variant of the Hahn–Banach theorem for convex bodies in \({\mathbb {R}}^n\) to show in fact that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\in \mathrm {int}~(\mathrm {dom}~J)\).
Theorem B.1
[58, Theorem 11.6 and Corollary 11.6.2] Let C be a convex set. A point \({\varvec{u}}\in C\) is a relative boundary point of C if and only if there exist a vector \({\varvec{a}}\in {\mathbb {R}}^n\setminus \{{\varvec{0}}\}\) and a number \(b\in {\mathbb {R}}\) such that
with \(\left\langle {\varvec{a}},{\varvec{y}}\right\rangle +b < \left\langle {\varvec{a}},{\varvec{u}}\right\rangle +b\) for every \({\varvec{y}}\in \mathrm {int}~(C)\).
Step 1. Suppose \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\notin \mathrm {cl}~(\mathrm {dom}~J)\). Since the set \(\mathrm {cl}~(\mathrm {dom}~J)\) is closed and convex, the projection of \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\) onto \(\mathrm {cl}~(\mathrm {dom}~J)\) given by \(\pi _{\text{ cl }(\text{ dom } J)}({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )) \equiv {\bar{{\varvec{u}}}}\) is well-defined and unique (see Definition 4), with \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\ne {\bar{{\varvec{u}}}}\) by assumption. The projection \({\bar{{\varvec{u}}}}\) also satisfies the characterization (11), namely
for every \({\varvec{y}}\in \mathrm {cl}~(\mathrm {dom}~J)\). Then, by linearity of the posterior mean estimate,
which implies that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) = {\bar{{\varvec{u}}}}\). This contradicts the assumption that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\notin \mathrm {cl}~(\mathrm {dom}~J)\). Hence, it follows that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\in \mathrm {cl}~(\mathrm {dom}~J)\).
Step 2. We now wish to prove that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) \in \mathrm {int}~(\mathrm {dom}~J)\). Note that this inclusion trivially holds if there are no boundary points, i.e., \( \left( \mathrm {cl}~(\mathrm {dom}~J) \setminus \mathrm {int}~(\mathrm {dom}~J)\right) = \varnothing \). Now, we consider the case \(\left( \mathrm {cl}~(\mathrm {dom}~J) \setminus \mathrm {int}~(\mathrm {dom}~J)\right) \ne \varnothing \). Suppose that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) \in \left( \mathrm {cl}~(\mathrm {dom}~J) \setminus \mathrm {int}~(\mathrm {dom}~J)\right) \). Then, Thm .B.1 applies and there exist a vector \({\varvec{a}}\in {\mathbb {R}}^n\setminus \{{\varvec{0}}\}\) and a number \(b\in {\mathbb {R}}\) such that
with \(\left\langle {\varvec{a}},{\varvec{y}}\right\rangle +b<\left\langle {\varvec{a}},{\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\right\rangle +b\) for every \({\varvec{y}}\in \mathrm {int}~(\mathrm {dom}~J)\). By linearity of the posterior mean estimate,
where the strict inequality in the third line follows from integrating over \(\mathrm {int}~(\mathrm {dom}~J)\). This contradicts the assumption that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) \in \left( \mathrm {cl}~(\mathrm {dom}~J) \setminus \mathrm {int}~(\mathrm {dom}~J)\right) \). Hence, \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\in \mathrm {int}~(\mathrm {dom}~J)\).
Proof of (ii): First, as a consequence that \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) \in \mathrm {int}~(\mathrm {dom}~J)\), the subdifferential of J at \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\) is non-empty because the subdifferential \(\partial J\) is non-empty at every point \({\varvec{y}}\in \mathrm {int}~(\mathrm {dom}~J)\) [58, Theorem 23.4]. Hence, there exists a subgradient \({\varvec{p}}\in \partial J({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ))\) such that
Take the expectation \({\mathbb {E}}_{J}\left[ {\cdot }\right] \) on both sides of inequality (59) to find
This gives the lower bound of inequality (38).
Second, use the convex inequality \(1+z\leqslant e^{z}\) that holds on \({\mathbb {R}}\) with \(z \equiv J({\varvec{y}})/\epsilon \) for \({\varvec{y}}\in \mathrm {dom}~J\). This gives the inequality \(1 + \frac{1}{\epsilon }J({\varvec{y}}) \leqslant e^{J({\varvec{y}})/\epsilon }\). Multiply this inequality by \(e^{-J({\varvec{y}})/\epsilon }\) and subtract by \(e^{-J({\varvec{y}})/\epsilon }\) on both sides to find
Multiply both sides by \(e^{-\frac{1}{2t\epsilon }\left\| {\varvec{x}}-{\varvec{y}}\right\| _{2}^{2}}\), divide by the partition function \(Z_J({\varvec{x}},t,\epsilon )\) (see Eq. (26)), integrate with respect to \({\varvec{y}}\in \mathrm {dom}~J\), and use
to obtain
Now, we can bound the right hand side of (62) as follows
Combining (62) and (63), we get
Using the representation formula (28) for the solution \(({\varvec{x}},t) \mapsto S_\epsilon \) to the viscous HJ PDE (29), we have that \((2\pi t\epsilon )^{n/2}/Z_J({\varvec{x}},t,\epsilon ) = e^{S_\epsilon ({\varvec{x}},t)/\epsilon }\). We can therefore write (64) as follows
Combining the latter inequalities with (60), we obtain the desired set of inequalities (38).
Proof of Proposition 4.2
Proof of (i): We will show that \({\mathbb {E}}_{J}\left[ {\left\| {\pi _{\partial J ({\varvec{y}})}({\varvec{0}})} \right\| _{2}}\right] < +\infty \) and derive formulas (39), (40), (41), (42), and (43) in four steps. To describe these steps, let us first introduce some notation. Recall that J satisfies assumptions (A1)–(A3) and \(\mathrm {dom}~J = {\mathbb {R}}^n\). Define the set
We can invoke [58, Theorem 25.5] to conclude that \(D_J\) is a dense subset of \({\mathbb {R}}^n\), the n-dimensional Lebesgue measure of the set \(({\mathbb {R}}^n\setminus D_J)\) is zero, and the function \({\varvec{y}}\mapsto \nabla J({\varvec{y}})\) is continuous on \(D_J\). Now, let \({\varvec{x}}\in {\mathbb {R}}^n\), \(t>0\), \(\epsilon > 0\), and \({\varvec{y}}_0 \in {\mathbb {R}}^n\). Define the function \(\varphi _J:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) as
Note that for every \({\varvec{y}}\in {\mathbb {R}}^n\) we have \(\varphi _J({\varvec{y}}|{\varvec{x}},t) \in \partial \left( {\mathbb {R}}^n\ni {\varvec{u}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + J({\varvec{u}}) \right) ({\varvec{y}})\), i.e., \(\varphi _J({\varvec{y}}|{\varvec{x}},t)\) is a subgradient of the function \({\varvec{u}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + J({\varvec{u}})\) evaluated at \({\varvec{u}}= {\varvec{y}}\). Let
and note that by assumption (A3), the expected value \({\mathbb {E}}_{J}\left[ {\left\| {{\varvec{y}}-{\varvec{y}}_0} \right\| _{2}}\right] \) is bounded as follows
Define the vector field \({\varvec{V}}:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) as
which is continuous on \({\mathbb {R}}^n\). It is also bounded on \({\mathbb {R}}^n\); to see this, use the triangle inequality, assumption (A3), and the fact that the function \((0,+\infty ) \ni r \mapsto re^{-\frac{1}{2t\epsilon }r^2}\) attains its maximum at \(r^* = \sqrt{t\epsilon }\) to get
The divergence \(\nabla _{{\varvec{y}}} \cdot {\varvec{V}}({\varvec{y}})\), which is well-defined and continuous on \(D_J\), is given for every \({\varvec{y}}\in D_J\) by
We now outline the four steps that will be used to prove Proposition 4.2(i). In the first step, we will show that the divergence of the vector field \({\varvec{V}}\) on \(D_J\) integrates to zero in the sense that
In the second step, we will show that \({\mathbb {E}}_{J}\left[ {\left| \left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle \right| }\right] < +\infty \), with \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle }\right] = n\epsilon \), hereby proving formula (39), using the convexity of the function \({\varvec{y}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{y}}} \right\| _{2}^{2} + J({\varvec{y}})\), Fatou’s lemma [30, Lemma 2.18], and Eq. (70) derived in the first step. In the third step, we will combine the results from the first and second steps to show that \({\mathbb {E}}_{J}\left[ {\left\| {\pi _{\partial J ({\varvec{y}})}({\varvec{0}})} \right\| _{2}}\right] < +\infty \) and conclude that the representation formulas (40) and (41) hold. Finally, in the fourth step we will conclude that the representation formulas (42) and (43) hold using Eqs. (40) and (41) and Proposition (3.1)(iii).
Step 1. The proof of the limit result (70) that we present here is based on an application of Theorem 4.14 in [54] to the vector field \({\varvec{V}}(\cdot )\). As this result is fairly technical, we first introduce some terminology and definitions that will be used exclusively in this part of the proof of (i).
Let C be a non-empty convex subset of \({\mathbb {R}}^n\). The dimension of the set C is defined as the smallest dimension of a non-empty affine set containing C, with the dimension of a non-empty affine set being the dimension of the subspace parallel to it [58, pages 4 and 12]. If C consists of a single point, then its dimension is taken to be zero.
Let \(k \in \{0,\dots ,n\}\). Denote by \({\mathcal {H}}^{n-k}\) the \((n-k)\)-dimensional outer Hausdorff measure in \({\mathbb {R}}^n\) as defined in [28, Sect. 2.10.2, p.171]. The measure \({\mathcal {H}}^{n-k}\), in particular, is a constant multiple of the \((n-k)\)-dimensional Lebesgue measure for every measurable subset \(B \subset {\mathbb {R}}^n\) (see [27], Section 1.2, p.7, and Theorem 1.12, p.13).
A subset \(S \subset {\mathbb {R}}^n\) is called slight if \({\mathcal {H}}^{n-1}(S) = 0\), and a subset \(T \subset {\mathbb {R}}^n\) is called thin if T is \(\sigma \)-finite for \({\mathcal {H}}^{n-1}\), i.e., T can be expressed as a countable union of sets \(T = \cup _{k=1}^{+\infty } T_k\) with \({\mathcal {H}}^{n-1}(T_k) < +\infty \) for each \(k \in {\mathbb {N}}^+\) (see, e.g., [54]).
Let \(k \in \{0,\dots ,n\}\). A non-empty, measurable subset \(\Omega \subset {\mathbb {R}}^n\) is said to be countably \({\mathcal {H}}^{n-k}\)-rectifiable if it is contained, up to a null set of \((n-k)\)-dimensional outer Hausdorff measure \({\mathcal {H}}^{n-k}\) zero, in a countable union of continuously differentiable hypersurfaces of dimension \((n-k)\) (see, e.g., [1] and references therein). A non-empty, measurable and countably \({\mathcal {H}}^{n-k}\)-rectifiable subset of \({\mathbb {R}}^n\), in particular, is \(\sigma \)-finite for \({\mathcal {H}}^{n-k}\).
A subset \(A\subset {\mathbb {R}}^n\) is called admissible if its boundary \(\mathrm {bd}~A\) is thin and if the distributional gradient of the characteristic function of A is a vector measure on Borel subsets of \({\mathbb {R}}^n\) whose variation is finite (see [54] pp.151 and the reference therein). For the purpose of our proof, we will use the fact that the family of closed balls of radius \(r>0\), namely \(\{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}} \right\| _{2} \leqslant r\}\), are admissible sets (see [32], Example 1.10, and note that admissible sets are also called Caccioppoli sets [32], pages 5-6).
Let A be an admissible set and let \({\varvec{v}}:A\rightarrow {\mathbb {R}}^n\) be a vector field. In the terminology of [54], we say that \({\varvec{v}}\) is integrable over the admissible set A if \({\varvec{v}}\) satisfies definition 4.1 of [54], and in that case, the number \(I({\varvec{v}},A)\) is called the integral of \({\varvec{v}}\) over A. Note, here, that the notion of integrability considered in [54] is different from that of the usual Lebesgue integrability. Nevertheless, if \({\varvec{v}}\) is integrable in the sense of [54], \({\varvec{v}}\) is also Lebesgue measurable (Corollary 4.9, [54]), and if the Lebesgue integral \(\int _{A}|{\varvec{v}}({\varvec{y}})|d{{\varvec{y}}}\) is finite, then \(I({\varvec{v}},A) = \int _{A}|{\varvec{v}}({\varvec{y}})|d{{\varvec{y}}}\) ( [54], Proposition 4.7).
Let E be an non-empty subset of \({\mathbb {R}}^n\), let \({\varvec{v}}:E \rightarrow {\mathbb {R}}^n\) be a vector field, and let \(D_{{\varvec{v}}}\) denote the set of points at which \({\varvec{v}}\) is differentiable in \(\mathrm {int}~{E}\) (for the definition of differentiability of vector fields, see [61], page 150, Definition 7.22). In the terminology of [54], we call a divergence of \({\varvec{v}}\) any function \(g :E \mapsto {\mathbb {R}}\) such that \(g({\varvec{y}}) = \nabla \cdot {\varvec{v}}({\varvec{y}})\) for each \({\varvec{y}}\in (\mathrm {int}~{E}) \cap D_{{\varvec{v}}}\).
In addition to these definitions, we will need the following two results due to, respectively, [1] and [54].
Theorem C.1
[1, Theorem 4.1] (for convex functions) Let \(\Omega \) be a bounded, open, convex subset of \({\mathbb {R}}^n\), and let \(f:\Omega \rightarrow {\mathbb {R}}\) be a convex and Lipschitz continuous function. Denote the subdifferential of f at \({\varvec{y}}\in \Omega \) by \(\partial f({\varvec{y}})\). Then, for each \(k\in \{0,\dots ,n\}\), the set
is countably \({\mathcal {H}}^{n-k}\)-rectifiable.
Theorem C.2
[54, Theorem 4.14] Let A be an admissible set, and let S and T be, respectively, a slight and thin subset of \(\mathrm {cl}~{A}\). Let \({\varvec{v}}\) be a bounded vector field in \(\mathrm {cl}~{A}\) that is continuous in \((\mathrm {cl}~{A})\setminus S\) and differentiable in \((\mathrm {int}~{A})\setminus T\). Then, every divergence of \({\varvec{v}}\) is integrable in A. Moreover, there exists a vector field \(\mathrm {bd}~A \ni {\varvec{y}}\rightarrow n_{A}({\varvec{y}})\) with \(\left\| {n_{A}({\varvec{y}})} \right\| _{2} = 1\) for every \({\varvec{y}}\in \mathrm {bd}~A\) such that if \(\text {div }{\varvec{v}}\) denotes any divergence \({\varvec{v}}\), then
Step 1 (Continued). Fix \(r>0\) and let \(A = \{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}} \right\| _{2} \leqslant r\}\) denote the closed ball of radius r centered at the origin in \({\mathbb {R}}^n\). Note that A is bounded, convex, closed, and admissible. Consider now the restriction of the convex function J to \(\mathrm {int}~{A}\). As \(\mathrm {int}~{A}\) is bounded, open and convex, the function J is Lipschitz continuous on \(\mathrm {int}~{A}\) [58, Theorem 10.4]. All conditions in Theorem (C.1) are satisfied (with \(\Omega = \mathrm {int}~{A}\) and \(f = J\)), and we can invoke the theorem to conclude that the set
is countably \({\mathcal {H}}^{n-1}\)-rectifiable, and therefore \(\sigma \)-finite for \({\mathcal {H}}^{n-1}\). In particular, the set T is thin. Moreover, recalling the definition of the set \(D_J:=\left\{ {\varvec{y}}\in {\mathbb {R}}^n\mid \partial J({\varvec{y}}) = \{\nabla J({\varvec{y}})\} \right\} \), we find that the set \((\mathrm {int}~{A})\setminus T\) comprises the points \({\varvec{y}}\in \mathrm {int}~{A}\) at which the subdifferential \(\partial J({\varvec{y}})\) is a singleton, i.e., \(T = (\mathrm {int}~{A})\cap ({\mathbb {R}}^n\setminus D_J)\).
Now, consider the vector field \({\varvec{V}}\) defined by (67). This vector field is continuous in \({\mathbb {R}}^n\) by convexity of J and \(\mathrm {dom}~J = {\mathbb {R}}^n\). It is also bounded by (68). Now, define the function \(g :A \rightarrow {\mathbb {R}}\) via
The function g constitutes a divergence of the vector field \({\varvec{V}}\) because it coincides with the divergence \(\nabla \cdot {\varvec{V}}({\varvec{y}})\) at every \({\varvec{y}}\in (\mathrm {int}~{A})\cap D_J\). Moreover, its Lebesgue integral over A is finite; to see this, first note that for every \({\varvec{y}}\in A\cap D_J\) the absolute value of \(g({\varvec{y}})\) can be bounded using (69), the triangle inequality, the Cauchy–Schwarz inequality, and assumption (A3) as follows
Second, as the set A is a closed bounded subset of \(\mathrm {dom}~J = {\mathbb {R}}^n\) the function J is Lipschitz continuous relative to A, and therefore there exists a number \(L_A>0\) such that \(\left\| {\nabla J({\varvec{y}})} \right\| _{2} \leqslant L_A\) for every \({\varvec{y}}\in A \cap D_J\). As a consequence, we can further bound \(g({\varvec{y}})\) for every \({\varvec{y}}\in A\cap D_J\) in (73) as
In particular, using the definition of g given by (72), we have that (74) holds for every \({\varvec{y}}\in A\). We can now use (72) and (74) to get
Since the function
is continuous, it is bounded on the compact set \(A = \{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}} \right\| _{2} \leqslant r\}\). Its integral over A is therefore finite, and using (75) we find that \(\int _{A}|g({\varvec{y}})|d{\varvec{y}}\) is finite as well.
The previous considerations show that all conditions in Theorem (C.2) are satisfied (with \(A = \{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}} \right\| _{2} \leqslant r\}\), \({\varvec{v}}= {\varvec{V}}\), \(S=\varnothing \), \(T = \{{\varvec{y}}\in \mathrm {int}~{A} \mid \text {dim}\left( \partial J({\varvec{y}})\right) \geqslant 1\} = (\mathrm {int}~{A})\cap ({\mathbb {R}}^n\setminus D_J)\)). We can therefore invoke [54, Theorem 4.14] to conclude that the divergence of g is integrable (in the sense described by [54]), with integral I(g, A), and that there exists a vector field \(\mathrm {bd}~A \ni {\varvec{y}}\rightarrow {\varvec{n}}_{{\varvec{v}}}({\varvec{y}})\) with \(\left\| {{\varvec{n}}_{{\varvec{v}}}({\varvec{y}})} \right\| _{2} = 1\) for every \({\varvec{y}}\in \mathrm {bd}~A\) such that
Since the Lebesgue integral of |g| over A is finite, we also have [54, Prop 4.7]
Using that \(A = \{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}} \right\| _{2} \leqslant r\}\), Eqs. (72), (76), and (77), we obtain
As r was an arbitrary positive number, we can take the absolute value and then the limit \(r \rightarrow +\infty \) on both sides of (78) to find
We will now show that the limit on the right side of (79) is equal to zero. To show this, first take the absolute value inside the integral on the right side of (79) to find
Use the Cauchy–Schwarz inequality, Eq. (67), assumption (A3) (\(\inf _{{\varvec{y}}\in {\mathbb {R}}^n}J({\varvec{y}})=0\)) and \(\left\| {{\varvec{n}}_{{\varvec{v}}}} \right\| _{2} = 1\) to further bound the right side of (80) as follows
Use the parallelogram law \(2(\left\| {{\varvec{x}}} \right\| _{2}^{2} + \left\| {{\varvec{u}}} \right\| _{2}^{2}) = \left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + \left\| {{\varvec{x}}+{\varvec{u}}} \right\| _{2}^{2}\) with \({\varvec{u}}= {\varvec{x}}-{\varvec{y}}\) to bound the exponential \(e^{-\frac{1}{2t}\left\| {{\varvec{x}}- {\varvec{y}}} \right\| _{2}^{2}/\epsilon }\) by
and use it in (81) to get
Since the domain of integration in (83) is over the surface of an n-dimensional sphere of radius \(\left\| {{\varvec{y}}} \right\| _{2} = r\), the integral on the right side of (83) is given by
where \(n\pi ^{n/2}/\Gamma \left( \frac{n}{2} + 1\right) \) is the area of an n-dimensional sphere of radius one, with \(\Gamma \left( \frac{n}{2} + 1\right) \) denoting the Gamma function evaluated at \(\frac{n}{2} + 1\). Since
the limit \(r\rightarrow +\infty \) in (84) is equal to zero, i.e.,
Combining (79), (80), (83) and (85) yield
which proves the limit result (70).
Step 2. Recall that the divergence of the vector field \({\varvec{y}}\mapsto {\varvec{V}}({\varvec{y}})\) on \(D_J\) is given by (69). Combine (70) and (69) to conclude that
Note that the minimal subgradient \(\pi _{\partial J({\varvec{y}})}({\varvec{0}}) = \nabla J({\varvec{y}})\) for every \({\varvec{y}}\in D_J\). We can therefore substitute the minimal subgradient \(\pi _{\partial J({\varvec{y}})}({\varvec{0}})\) for the gradient \(\nabla J({\varvec{y}})\) inside the integral in the limit (86) without changing its value. Moreover, since the set \(D_J\) is dense in \({\mathbb {R}}^n\) and the n-dimensional Lebesgue measure of \(({\mathbb {R}}^n\setminus D_J)\) is zero, we can further substitute the domain of integration \(\left\{ {\varvec{y}}\in {\mathbb {R}}^n\mid \, \left\| {{\varvec{y}}} \right\| _{2} \leqslant r\right\} \cap \,D_J\) of the integral in the limit (86) with \(\left\{ {\varvec{y}}\in {\mathbb {R}}^n\mid \, \left\| {{\varvec{y}}} \right\| _{2} \leqslant r\right\} \) without changing its value. With these two changes, the limit (86) can be written as
Using the notation \(\varphi _J({\varvec{y}}|{\varvec{x}},t) = \left( \frac{{\varvec{y}}- {\varvec{x}}}{t}\right) + \pi _{\partial J({\varvec{y}})}({\varvec{0}})\), we can write this limit more succinctly as
Now, consider the function \({\mathbb {R}}^n\ni {\varvec{y}}\mapsto \left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t) - \varphi _J({\varvec{y}}_0|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle \). Note here that as J is convex with \(\mathrm {dom}~J ={\mathbb {R}}^n\), both \(\varphi _J({\varvec{y}}|{\varvec{x}},t)\) and \(\varphi _J({\varvec{y}}_0|{\varvec{x}},t)\) are subgradients of the convex function \({\varvec{u}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + J({\varvec{u}})\) at \({\varvec{u}}= {\varvec{y}}\) and \({\varvec{u}}= {\varvec{y}}_0\), respectively [58, Theorem 23.4]. We can therefore apply inequality (13) (with \(p = \varphi _J({\varvec{y}}|{\varvec{x}},t)\), \(p_0 = \varphi _J({\varvec{y}}_0|{\varvec{x}},t)\), and \(m = 0\)) to find \(\left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t) - \varphi _J({\varvec{y}}_0|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle \geqslant 0\). Define \(F:{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) and \(G :{\mathbb {R}}^n\rightarrow {\mathbb {R}}\) as follows:
and
Note that \(F({\varvec{y}})-G({\varvec{y}}) = \left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t) - \varphi _J({\varvec{y}}_0|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle e^{-\left( \frac{1}{2t}\left\| {\varvec{x}}-{\varvec{y}}\right\| _{2}^{2} + J({\varvec{y}})\right) /\epsilon } \geqslant 0\) for every \({\varvec{y}}\in {\mathbb {R}}^n\). Integrate \({\varvec{y}}\mapsto F({\varvec{y}})-G({\varvec{y}})\) over \({\mathbb {R}}^n\) and use Fatou’s lemma to find
Use the Cauchy–Schwarz inequality assumption (A3) (\(\inf _{{\varvec{y}}\in {\mathbb {R}}^n} J({\varvec{y}}) = 0\)) to bound the second integral on the right hand side of (88) as follows
where \(C_1({\varvec{x}},{\varvec{y}}_0,t,\epsilon )\) was defined in (65). Combine (88) and (89) to find
The integral on the right hand side of (90) can be bounded using assumption (A3) as follows
Divide (93) by the partition function \(Z_J({\varvec{x}},t,\epsilon )\) (see Eq. (26)) to get
Now, using the Cauchy–Schwarz inequality and (66), we can bound \({\mathbb {E}}_{J}\left[ {\left| \left\langle \varphi _J({\varvec{y}}_0|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0\right\rangle \right| }\right] \) as follows
Use the triangle inequality and the upper bounds in (94) and (95) to obtain
Since \({\mathbb {E}}_{J}\left[ {\left| \left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle \right| }\right] < +\infty \), we can use (87) to conclude that
Inequality (96) and equality (97) show the desired results \({\mathbb {E}}_{J}\left[ {\left| \left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle \right| }\right] < +\infty \) and \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle }\right] = n\epsilon \), which, after recalling the definition \(\varphi _J({\varvec{y}}|{\varvec{x}},t) = \left( \frac{{\varvec{y}}-{\varvec{x}}}{t}\right) + \pi _{\partial J({\varvec{y}})}({\varvec{0}})\), also proves formula (39).
Step 3. Thanks to Step 2, we have \({\mathbb {E}}_{J}\left[ {\left| \left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle \right| }\right] < +\infty \) and \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}-{\varvec{y}}_0 \right\rangle }\right] = n\epsilon \) for every \({\varvec{y}}_0 \in {\mathbb {R}}^n\). In particular, the choice of \({\varvec{y}}_0 = {\varvec{0}}\) yields \({\mathbb {E}}_{J}\left[ {\left| \left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}\right\rangle \right| }\right] < +\infty \) and \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}\right\rangle }\right] = n\epsilon \). As a consequence, we have that
and
for every \({\varvec{y}}_0 \in {\mathbb {R}}^n\). Now, let \(\{{\varvec{e}}_i\}_{i=1}^{n}\) denote the standard basis in \({\mathbb {R}}^n\) and let \(\{\varphi _J({\varvec{y}}|{\varvec{x}},t)_i\}_{i=1}^{n}\) denote the components of the vector \(\varphi _J({\varvec{y}}|{\varvec{x}},t)\), i.e., \(\varphi _J({\varvec{y}}|{\varvec{x}},t) = (\varphi _J({\varvec{y}}|{\varvec{x}},t)_1,\dots ,\varphi _J({\varvec{y}}|{\varvec{x}},t)_n)\). Using (98) with the choice of \({\varvec{y}}_0 = {\varvec{e}}_i\) for \(i\in \{1,\dots ,n\}\), we get \({\mathbb {E}}_{J}\left[ {|\varphi _J({\varvec{y}}|{\varvec{x}},t)_i|}\right] < +\infty \) for every \(i \in \{1,\dots ,n\}\). Using the norm inequality \(\left\| {\varphi _J({\varvec{y}}|{\varvec{x}},t)} \right\| _{2} \leqslant \sum _{i=1}^{n}\left| \varphi _J({\varvec{y}}|{\varvec{x}},t)_i\right| \), we can bound \({\mathbb {E}}_{J}\left[ {\left\| {\varphi _J({\varvec{y}}|{\varvec{x}},t)} \right\| _{2}}\right] \) as follows
We can therefore combine (99) and (100) to get \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _J({\varvec{y}}|{\varvec{x}},t),{\varvec{y}}_0 \right\rangle }\right] = \left\langle {\mathbb {E}}_{J}\left[ {\varphi _J({\varvec{y}}|{\varvec{x}},t)}\right] ,{\varvec{y}}_0 \right\rangle = 0\) for every \({\varvec{y}}_0 \in {\mathbb {R}}^n\), which yields the following equality:
Moreover, recalling the definition \(\varphi _J({\varvec{y}}|{\varvec{x}},t) = \left( \frac{{\varvec{y}}-{\varvec{x}}}{t}\right) + \pi _{\partial J({\varvec{y}})}({\varvec{0}})\) and using (66) (with \({\varvec{y}}_0 = {\varvec{x}}\)) and (100), we can bound \({\mathbb {E}}_{J}\left[ {\left\| {\pi _{\partial J({\varvec{y}})}({\varvec{0}})} \right\| _{2}}\right] \) as follows
We can now combine (66), (101) and (102) to expand the expected value of \({\mathbb {E}}_{J}\left[ {\varphi _J({\varvec{y}}|{\varvec{x}},t)}\right] \) as follows
Solving for \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\) in (103) yields \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) = {\varvec{x}}-t{\mathbb {E}}_{J}\left[ {\pi _{\partial J({\varvec{y}})}({\varvec{0}})}\right] \), which gives the representation formula (40).
We now derive the second representation formula (41). Let \({\varvec{y}}_0 = {\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\) in Eq. (97) and use the representation formula (40) to find
We will use (104) to derive a representation formula for \({\mathbb {E}}_{J}\left[ {\left\| {{\varvec{y}}- {\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )} \right\| _{2}^{2}}\right] \). Multiply (104) by t and rearrange to get
The left hand side of (105) can be expressed as
Combine Eqs. (105) and (106) to get
which gives the representation formula (41).
Step 4. Thanks to Step 3, the representation formulas (40) and (41) hold. Recall that by Proposition 3.1(iii), the gradient \(\nabla _{{\varvec{x}}}S_\epsilon ({\varvec{x}},t)\) and Laplacian \(\Delta _{{\varvec{x}}}S_\epsilon ({\varvec{x}},t)\) of the solution \(S_\epsilon \) to the viscous HJ PDE (29) satisfy the representation formulas
and
which is the representation formula (42). Use (41) and (108) to get
which is, after dividing by \(t\epsilon \) on both sides, the representation formulas (43). This concludes Step 4.
Proof of (ii): Here, we only assume that J satisfies assumptions (A1)–(A3); we do not assume that \(\mathrm {dom}~J = {\mathbb {R}}^n\). Let \(\{\mu _k\}_{k=1}^{+\infty }\) be a sequence of positive real numbers converging to zero. Define \(f_k:{\mathbb {R}}^n\times (0,+\infty )\times (0,+\infty ) \rightarrow {\mathbb {R}}\) by
and let \(S_0({\varvec{x}},\mu _k)\) denote the solution to the first-order HJ PDE (20) with initial data J evaluated at \(({\varvec{x}},\mu _k)\), that is,
By Proposition 2.2(i), the function \({\mathbb {R}}^n\ni {\varvec{x}}\mapsto S_0({\varvec{x}},\mu _k)\) is continuously differentiable and convex for each \(k \in {\mathbb {N}}\), and the sequence of real numbers \(\{S_0({\varvec{x}},\mu _k)\}_{k=1}^{+\infty }\) converges to \(J({\varvec{x}})\) for every \({\varvec{x}}\in \mathrm {dom}~J\). Moreover, by assumption (A3) (\(\inf _{{\varvec{y}}\in {\mathbb {R}}^n} J({\varvec{y}}) = 0\)) the sequence \(\{S_0({\varvec{x}},\mu _k)\}_{k=1}^{+\infty }\) is uniformly bounded from below by 0, that is,
As a consequence, we can invoke Proposition 3.1(i) to conclude that for each \(k \in {\mathbb {N}}\), the function \(({\varvec{x}},t) \mapsto f_\epsilon ({\varvec{x}},t,k)\) corresponds to the solution to the viscous HJ PDE (29) with initial data \(f_k({\varvec{x}},0,\epsilon ) = S_0({\varvec{x}},\mu _k)\). Moreover, \({\mathbb {R}}^n\ni {\varvec{x}}\mapsto f_\epsilon ({\varvec{x}},t,k)\) is continuously differentiable and convex by Proposition 3.1(i) and (ii)(a). Finally, as the domain of the function \({\varvec{x}}\mapsto S_0({\varvec{x}},\mu _k)\) is \({\mathbb {R}}^n\), we can use the representation formula (42) in Proposition 4.2(i) (which was proven previously in this Appendix) to express the gradient \(\nabla _{{\varvec{x}}}f_{k}({\varvec{x}},t,\epsilon )\) as follows
Now, since \(S_0({\varvec{x}},\mu _k) \geqslant 0\) for every \(k \in {\mathbb {N}}\), we can bound the integrand in (109) as follows
where \(\int _{{\mathbb {R}}^n} \frac{1}{(2\pi t\epsilon )^{n/2}}e^{-\frac{1}{2t\epsilon }\left\| {\varvec{x}}-{\varvec{y}}\right\| _{2}^{2}} d{{\varvec{y}}} = 1\). We can therefore invoke the Lebesgue dominated convergence theorem [30, Theorem 2.24] and use (109) and the limit \(\lim _{k\rightarrow +\infty }e^{-S_0({\varvec{x}},\mu _k)/\epsilon }=e^{-J({\varvec{x}})/\epsilon }\) (with \(\lim _{k\rightarrow +\infty }e^{-S_0({\varvec{x}},\mu _k)/\epsilon }=0\) for every \({\varvec{x}}\notin \mathrm {dom}~J\)) to find
which gives the limit (44). By continuous differentiability and convexity of \({\mathbb {R}}^n\ni {\varvec{x}}\mapsto f_\epsilon ({\varvec{x}},t,k)\) and \({\mathbb {R}}^n\ni {\varvec{x}}\mapsto S_\epsilon ({\varvec{x}},t)\) and the limit (113), we can invoke [58, Theorem 25.7] to conclude that the gradient \(\nabla _{{\varvec{x}}}f_k({\varvec{x}},t,\mu _k)\) converges to the gradient \(\nabla _{{\varvec{x}}}S_\epsilon ({\varvec{x}},t)\) as \(k \rightarrow +\infty \). Hence, we can take the limit \(k\rightarrow +\infty \) in (111) to find
which gives the limit (45). Finally, using the definition of the posterior mean estimate (3), (114), and the representation formula (30) derived in Proposition 3.1(iii), namely \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) = {\varvec{x}}- t\nabla _{{\varvec{x}}}S_\epsilon ({\varvec{x}},t)\), we find the two limits
which establishes (46). This concludes the proof of (ii).
Proof of Proposition 4.3
Let us first introduce some notation. Let \({\varvec{x}}\in {\mathbb {R}}^n\), \(t>0\), \(\epsilon >0\), and \({\varvec{y}}_0 \in \mathrm {dom}~\partial J\). Define the functions
and
Note that for every \({\varvec{y}}\in {\mathbb {R}}^n\), \(\varphi _J({\varvec{y}}|{\varvec{x}},t)\) is a subgradient of the function \({\varvec{u}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + J({\varvec{u}})\) evaluated at \({\varvec{u}}= {\varvec{y}}\) and \(\varphi _{S_0(\cdot ,\mu _k)}({\varvec{y}}|{\varvec{x}},t)\) is a subgradient of the function \({\varvec{u}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + S({\varvec{u}},\mu _k)\) evaluated at \({\varvec{u}}= {\varvec{y}}\). Let \(\{\mu _k\}_{k=1}^{+\infty }\) be a sequence of positive real numbers converging to zero and let \(S_0:{\mathbb {R}}^n\times (0,+\infty ) \rightarrow {\mathbb {R}}\) denote the solution to the first-order HJ PDE (20) with initial data J (see Proposition 2.2). Note that the sequence \(\{S_0({\varvec{y}},\mu _k)\}_{k=1}^{+\infty }\) is uniformly bounded from below since
Now, define the function \(F:\mathrm {dom}~\partial J \times \mathrm {dom}~\partial J\times {\mathbb {R}}^n\times (0,+\infty )\rightarrow {\mathbb {R}}\) as
and the sequence of functions \(\{F_{\mu _k}\}_{k=1}^{+\infty }\) with \(F_{\mu _k}:{\mathbb {R}}^n\times {\mathbb {R}}^n\times {\mathbb {R}}^n\times (0,+\infty )\rightarrow {\mathbb {R}}\) as
Since \(\lim _{k\rightarrow +\infty } S_0({\varvec{y}},\mu _k) = J({\varvec{y}})\) and \(\lim _{k\rightarrow +\infty } \nabla _{{\varvec{y}}}S_0({\varvec{y}},\mu _k) = \pi _{\partial J({\varvec{y}})}({\varvec{0}})\) for every \({\varvec{y}}\in \mathrm {dom}~\partial J\) by Proposition 2.2(i) and (iv), and
by (44) in Proposition 4.2(ii) and continuity of the logarithm, the limit \(\lim _{k \rightarrow +\infty } F_{\mu }({\varvec{y}},{\varvec{y}}_0,{\varvec{x}},t) = F({\varvec{y}},{\varvec{y}}_0,{\varvec{x}},t)\) holds for every \({\varvec{y}}\in \mathrm {dom}~\partial J\), \({\varvec{y}}_0 \in \mathrm {dom}~\partial J\), \({\varvec{x}}\in {\mathbb {R}}^n\) and \(t > 0\). Note that as J is strongly convex with parameter \(m\geqslant 0\), the functions \({\varvec{y}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{y}}} \right\| _{2}^{2} + J({\varvec{y}})\) and \({\varvec{y}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{y}}} \right\| _{2}^{2} + S_0({\varvec{y}},\mu _k)\) are strongly convex with parameter \(\left( \frac{1+mt}{t}\right) > 0\). As a consequence, for every pair \(({\varvec{y}},{\varvec{y}}_0) \in \mathrm {dom}~\partial J \times \mathrm {dom}~\partial J\), the following monotonicity inequalities hold (see Definition 5, Eq. (13)):
and
Multiply the first set of inequalities by \(e^{-\left( \frac{1}{2t}\left\| {\varvec{x}}- {\varvec{y}}\right\| _{2}^{2} + J({\varvec{y}})\right) /\epsilon }/\int _{{\mathbb {R}}^n}e^{-\left( \frac{1}{2t}\left\| {\varvec{x}}- {\varvec{y}}\right\| _{2}^{2} + J({\varvec{y}})\right) /\epsilon } d{{\varvec{y}}}\) and the second set of inequalities by \(e^{-\left( \frac{1}{2t}\left\| {\varvec{x}}- {\varvec{y}}\right\| _{2}^{2} + S_0({\varvec{y}},\mu _k)\right) /\epsilon }/\int _{{\mathbb {R}}^n}e^{-\left( \frac{1}{2t}\left\| {\varvec{x}}- {\varvec{y}}\right\| _{2}^{2} + S_0({\varvec{y}},\mu _k)\right) /\epsilon } d{{\varvec{y}}}\) and use the definition of F and \(F_{\mu _k}\) to get the inequalities
These inequalities show, in particular, that F and \(F_{\mu _k}\) are both non-negative functions for every \(({\varvec{y}},{\varvec{y}}_0) \in \mathrm {dom}~\partial J \times \mathrm {dom}~\partial J\), \({\varvec{x}}\in {\mathbb {R}}^n\), and \(t>0\). As a consequence, Fatou’s lemma [30, Lemma 2.18] applies to the sequence of functions \(\{F_{\mu _k}\}_{k=1}^{+\infty }\), and hence
We now wish to compute the limit in (122). On the one hand, we can apply formula (39) in Proposition 4.2(i) (with initial data \(S_0(\cdot ,\mu _k)\) and using \(\varphi _{S_0(\cdot ,\mu _k)}({\varvec{y}}|{\varvec{x}},t) = \left( \frac{{\varvec{y}}-{\varvec{x}}}{t}\right) + \nabla _{{\varvec{y}}}S_0({\varvec{y}},\mu _k)\)) to the first integral on the right side on the last line of (122) to get
On the other hand, applying the limit result (46) in Proposition 4.2(ii) for the posterior mean estimate \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\) and the limit \(\lim _{k\rightarrow +\infty } \varphi _{S_0(\cdot ,\mu _k)}({\varvec{y}}_0|{\varvec{x}},t) = \varphi _{J}({\varvec{y}}_0|{\varvec{x}},t) = \left( \frac{{\varvec{y}}_0-{\varvec{x}}}{t}\right) + \pi _{\partial J({\varvec{y}}_0)}({\varvec{0}})\) to the second integral on the right side on the last line of (122), we get
Combine (27), (116), (121), (122), (123), and (124) to get
This establishes the set of inequalities (47).
Next, we show that \({\mathbb {E}}_{J}\left[ {\left\| {\pi _{\partial J({\varvec{y}})}({\varvec{0}})} \right\| _{2}}\right] < +\infty \) indirectly using the set of inequalities (47). By Proposition 4.1, \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\in \mathrm {int}~(\mathrm {dom}~J)\). Hence, there exists a number\(\delta >0\) such that the open ball \(\{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}-{\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )} \right\| _{2} < \delta \}\) is contained in \(\mathrm {int}~(\mathrm {dom}~J)\). Let \({\varvec{y}}_0 \in \{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}-{\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )} \right\| _{2} < \delta \}\) with \({\varvec{y}}_0 \ne {\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\). Recall \(\mathrm {int}~(\mathrm {dom}~J) \subset \mathrm {dom}~\partial J\), so that both \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\) and \({\varvec{y}}_0\) are in the set \(\mathrm {dom}~\partial J\). We claim that \({\mathbb {E}}_{J}\left[ {\left| \left\langle \varphi _{J}({\varvec{y}}|{\varvec{x}},t),{\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )-{\varvec{y}}_0\right\rangle \right| }\right] < +\infty \). Indeed, using the triangle inequality, the set of inequalities (47) proven previously, the Cauchy-Schwarz inequality, and that \({\mathbb {E}}_{J}\left[ {\left\| {{\varvec{y}}-{\varvec{y}}_0} \right\| _{2}}\right] \leqslant \left( \int _{{\mathbb {R}}^n} \left\| {{\varvec{y}}-{\varvec{y}}_0} \right\| _{2}e^{-\frac{1}{2t\epsilon }\left\| {{\varvec{x}}-{\varvec{y}}} \right\| _{2}^{2}} d{\varvec{y}}\right) /Z_J({\varvec{x}},t,\epsilon ) < +\infty \) by assumption (A3),
This shows that \({\mathbb {E}}_{J}\left[ {\left| \left\langle \varphi _{J}({\varvec{y}}|{\varvec{x}},t),{\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )-{\varvec{y}}_0\right\rangle \right| }\right] < +\infty \) for every \({\varvec{y}}_0 \in \{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}-{\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )} \right\| _{2} < \delta \}\) different from \({\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )\). Now, let \(\{{\varvec{e}}_i\}_{i=1}^{n}\) denote the standard basis in \({\mathbb {R}}^n\) and let \(\{\varphi _J({\varvec{y}}|{\varvec{x}},t)_i\}_{i=1}^{n}\) denote the components of the vector \(\varphi _J({\varvec{y}}|{\varvec{x}},t)\), i.e., \(\varphi _J({\varvec{y}}|{\varvec{x}},t) = (\varphi _J({\varvec{y}}|{\varvec{x}},t)_1,\dots ,\varphi _J({\varvec{y}}|{\varvec{x}},t)_n)\). Using (125) with the choice of \({\varvec{y}}_0 = {\varvec{u}}_{PM}({\varvec{x}},t,\epsilon ) - \frac{\delta }{2}{\varvec{e}}_i\), which is contained in the open ball \(\{{\varvec{y}}\in {\mathbb {R}}^n\mid \left\| {{\varvec{y}}-{\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )} \right\| _{2} < \delta \}\) for each \(i\in \{1,\dots ,n\}\), we get
Using (126) and the norm inequality \(\left\| {\varphi _J({\varvec{y}}|{\varvec{x}},t)} \right\| _{2} \leqslant \sum _{i=1}^{n}\left| \varphi _J({\varvec{y}}|{\varvec{x}},t)_i\right| \), we can bound \({\mathbb {E}}_{J}\left[ {\left\| {\varphi _J({\varvec{y}}|{\varvec{x}},t)} \right\| _{2}}\right] \) as follows
This shows that \({\mathbb {E}}_{J}\left[ {\left\| {\varphi _J({\varvec{y}}|{\varvec{x}},t)} \right\| _{2}}\right] < + \infty \). Finally, use (127), \(\varphi _J({\varvec{y}}|{\varvec{x}},t) = \frac{{\varvec{y}}-{\varvec{x}}}{t} + \pi _{\partial J({\varvec{y}})}({\varvec{0}})\), and assumption (A3) to find
This shows that \({\mathbb {E}}_{J}\left[ {\left\| {\pi _{\partial J({\varvec{y}})}({\varvec{0}})} \right\| _{2}}\right] < +\infty \).
Proof of Proposition 4.5
Proof of (i): Let \({\varvec{x}}\in {\mathbb {R}}^n\) and \(t > 0\) and define the functions
Note that for every \({\varvec{y}}\in {\mathbb {R}}^n\), \(\varphi _J({\varvec{y}}|{\varvec{x}},t)\) is a subgradient of the function \({\varvec{u}}\mapsto \frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + J({\varvec{u}})\) evaluated at \({\varvec{u}}= {\varvec{y}}\).
Let \({\varvec{u}}\in \mathrm {dom}~\partial J\). The Bregman divergence of the function \(\mathrm {dom}~\partial J\ni {\varvec{y}}\mapsto \Phi _{J}({\varvec{y}}|{\varvec{x}},t)\) at \(({\varvec{u}},\varphi _{J}({\varvec{y}}|{\varvec{x}},t))\) is given by
where the second equality follows by definition of the subdifferential (see Definition 6) and that \(\varphi _{J}({\varvec{y}}|{\varvec{x}},t) \in \partial \Phi _J({\varvec{y}},{\varvec{x}},t)\).
Take the expected value with respect to the variable \({\varvec{y}}\) over \(\mathrm {dom}~\partial J\) to find
We claim that the expected value \({\mathbb {E}}_{J}\left[ {D_{\Phi _{J}}({\varvec{u}},\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \) is finite. We will show this by proving, in turn, that the expected values \({\mathbb {E}}_{J}\left[ {\Phi _{J}({\varvec{y}}|{\varvec{x}},t)}\right] \) and \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _{J}({\varvec{y}}|{\varvec{x}},t)),{\varvec{y}}-{\varvec{u}}\right\rangle }\right] \) are finite. Establishing the finiteness of \({\mathbb {E}}_{J}\left[ {D_{\Phi _{J}}({\varvec{u}},\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \) will enable us to conclude that the expected value \({\mathbb {E}}_{J}\left[ {\Phi _{J}^{*}(\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \) on the right hand side of the first equality of (128) is also finite.
First, using the definition of \(\Phi _{J}({\varvec{y}}|{\varvec{x}},t)\) we have
The expected value \({\mathbb {E}}_{J}\left[ {\frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{y}}} \right\| _{2}^{2}}\right] \) is finite because we can use the definitions of the posterior mean estimate and inequality (48) (with \(m \equiv 0\) in (48)) to express it as
The expected value \({\mathbb {E}}_{J}\left[ {J({\varvec{y}})}\right] \) is also finite because it is bounded by the set of inequalities (38) in Proposition 4.1. Hence, the expected value \({\mathbb {E}}_{J}\left[ {\Phi _{J}({\varvec{y}}|{\varvec{x}},t)}\right] \equiv {\mathbb {E}}_{J}\left[ {\frac{1}{2t}\left\| {{\varvec{x}}-{\varvec{y}}} \right\| _{2}^{2}}\right] + {\mathbb {E}}_{J}\left[ {J({\varvec{y}})}\right] \) is finite.
Second, note that the expected value \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _{J}({\varvec{y}}|{\varvec{x}},t), {\varvec{y}}-{\varvec{u}}\right\rangle }\right] \) can be written as
Apply the monotonicity property (47) to the expected value \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _{J}({\varvec{y}}|{\varvec{x}},t)-\varphi _{J}({\varvec{u}}|{\varvec{x}},t), {\varvec{y}}-{\varvec{u}}\right\rangle }\right] \) (with \({\varvec{y}}_0 \equiv {\varvec{u}}\) in (47)) in the previous equation to find
Add the term \(\left\langle \varphi _{J}({\varvec{u}}|{\varvec{x}},t), {\varvec{u}}_{PM}({\varvec{x}},t,\epsilon )-{\varvec{u}}\right\rangle \) on both sides of these inequalities to get
Combine the inequalities (130) with the equality (129) to find
These bounds prove that the expected value \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _{J}({\varvec{y}}|{\varvec{x}},t), {\varvec{y}}-{\varvec{u}}\right\rangle }\right] \) is finite.
The previous arguments show that the expected value \({\mathbb {E}}_{J}\left[ {D_{\Phi _{J}}({\varvec{u}},\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \) is finite. Now, we claim that the expected value \({\mathbb {E}}_{J}\left[ {\left\langle \varphi _{J}({\varvec{y}}|{\varvec{x}},t)),{\varvec{u}}\right\rangle }\right] \equiv \left\langle {\mathbb {E}}_{J}\left[ {\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] ,{\varvec{u}}\right\rangle \) is finite. Indeed, we can use the representation formula (30) for expressing the posterior mean estimate in terms of the gradient \(\nabla _{{\varvec{x}}}S_\epsilon ({\varvec{x}},t)\) of the solution to the viscous HJ PDE (29) and use that \({\mathbb {E}}_{J}\left[ {\pi _{\partial J({\varvec{y}})}({\varvec{0}})}\right] \) is finite (Proposition 4.3) to write
where both terms on the right hand side are finite. This shows that \({\mathbb {E}}_{J}\left[ {\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \) is finite.
Using that \({\mathbb {E}}_{J}\left[ {D_{\Phi _{J}}({\varvec{u}},\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \) and \({\mathbb {E}}_{J}\left[ {\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \) are finite in Eq. (128), we conclude that the expected value \({\mathbb {E}}_{J}\left[ {\Phi _{J}^{*}(\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \) is also finite. We can now use the definitions of \(\Phi _J\) and \(\varphi _J\) to express Eq. (128) as
where, again, we used that \({\mathbb {E}}_{J}\left[ {\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] = -\nabla _{{\varvec{x}}}S_\epsilon ({\varvec{x}},t) + {\mathbb {E}}_{J}\left[ {\pi _{\partial J({\varvec{y}})}({\varvec{0}})}\right] \). Now, let
Take the infimum over \({\varvec{u}}\in {\mathbb {R}}^n\) on both sides of Eq. (131) to find:
Now, note that by assumption (A1) \({\varvec{y}}\mapsto J({\varvec{y}}) \in \Gamma _0({\mathbb {R}}^n)\), and therefore the function \({\varvec{u}}\mapsto {\tilde{J}}({\varvec{u}}) \in \Gamma _0({\mathbb {R}}^n)\). Therefore, the function \(u \ni {\mathbb {R}}^n\rightarrow \mapsto \frac{1}{2}\left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + {\tilde{J}}(u)\) is strictly convex and has a unique minimizer denoted by \({\bar{{\varvec{u}}}}\). Therefore, the infimum in the equality above can be replaced by a minimum. In addition, recall that \(\min _{u \in {\mathbb {R}}^n} {\frac{1}{2}\left\| {{\varvec{x}}-{\varvec{u}}} \right\| _{2}^{2} + {\tilde{J}}(u)} \) corresponds to the solution to the first-order HJ PDE (20) with initial condition \({\tilde{J}}\). Using Proposition 2.2(ii), the unique minimizer \({\bar{{\varvec{u}}}}\) can be expressed using the inclusion relation
Therefore, the minimizer \({\bar{{\varvec{u}}}}\) is also the unique minimizer to \({\varvec{u}}\mapsto {\mathbb {E}}_{J}\left[ {D_{\Phi _{J}}({\varvec{u}},\varphi _{J}({\varvec{y}}|{\varvec{x}},t))}\right] \).
Proof of (ii): If \(\mathrm {dom}~J = {\mathbb {R}}^n\), then the representation formula \(\nabla _{{\varvec{x}}}S_\epsilon ({\varvec{x}},t) = {\mathbb {E}}_{J}\left[ {\pi _{\partial J({\varvec{y}})}({\varvec{0}})}\right] \) derived in Proposition 4.2 holds and the characterization of the unique minimizer \({\bar{{\varvec{u}}}}\) in equation (132) reduces to
By Proposition 2.2(ii), the unique minimizer that satisfies this characterization is the MAP estimate \({\varvec{u}}_{MAP}({\varvec{x}},t)\), i.e., \({\bar{{\varvec{u}}}} = {\varvec{u}}_{MAP}({\varvec{x}},t)\).\(\square \)
Rights and permissions
About this article
Cite this article
Darbon, J., Langlois, G.P. On Bayesian Posterior Mean Estimators in Imaging Sciences and Hamilton–Jacobi Partial Differential Equations. J Math Imaging Vis 63, 821–854 (2021). https://doi.org/10.1007/s10851-021-01036-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10851-021-01036-0
Keywords
- Hamilton–Jacobi partial differential equations
- Imaging inverse problems
- Maximum a posteriori estimation
- Bayesian posterior mean estimation
- Convex analysis