Wavelet-Based Priors Accelerate Maximum-a-Posteriori Optimization in Bayesian Inverse Problems


Wavelet (Besov) priors are a promising way of reconstructing indirectly measured fields in a regularized manner. We demonstrate how wavelets can be used as a localized basis for reconstructing permeability fields with sharp interfaces from noisy pointwise pressure field measurements in the context of the elliptic inverse problem. For this we derive the adjoint method of minimizing the Besov-norm-regularized misfit functional (this corresponds to determining the maximum a posteriori point in the Bayesian point of view) in the Haar wavelet setting. As it turns out, choosing a wavelet–based prior allows for accelerated optimization compared to established trigonometrically–based priors.

This is a preview of subscription content, log in to check access.


  1. Agapiou S, Burger M, Dashti M, Helin T (2017) Sparsity-promoting and edge-preserving maximum a posteriori estimators in non-parametric bayesian inverse problems, arXiv:1705.03286

  2. Beck A, Teboulle M (2009) A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J Imag Sci 2:183–202

    MathSciNet  Article  Google Scholar 

  3. Blatter C (2003) Wavelets: a primer. CRC press

  4. Blömker D, Schillings C, Wacker P (2017) A strongly convergent numerical scheme from EnKF continuum analysis. In: Submission

  5. Bogachev VI, Bogachev VI (1998) Gaussian measures, vol 62 American Mathematical Society Providence

  6. Bui-Thanh T, Ghattas O (2015) A scalable algorithm for map estimators in bayesian inverse problems with besov priors. Inverse Problems and Imaging 9:27–53

    MathSciNet  Article  Google Scholar 

  7. Burger M, Lucka F (2014) Maximum a posteriori estimates in linear inverse problems with log-concave priors are proper bayes estimators. Inverse Prob 30:114004

    MathSciNet  Article  Google Scholar 

  8. Burger M, Dong Y, Sciacchitano F (2016) Bregman cost for non-gaussian noise, arXiv:1608.07483

  9. Carrera J, Neuman SP (1986) Estimation of aquifer parameters under transient and steady state conditions: 1. maximum likelihood method incorporating prior information. Water Resour Res 22:199–210

    Article  Google Scholar 

  10. Chatterjee S, Dimitrakopoulos R (2012) Multi-scale stochastic simulation with a wavelet-based approach. Comput Geosci 45:177–189

    Article  Google Scholar 

  11. Cui T, Law KJ, Marzouk YM (2016) Dimension-independent likelihood-informed mcmc. J Comput Phys 304:109–137

    MathSciNet  Article  Google Scholar 

  12. Da Prato G, Zabczyk J (2014) Stochastic equations in infinite dimensions. Cambridge University Press, Cambridge

    Google Scholar 

  13. Dashti M, Stuart A (2017) The bayesian approach to inverse problems. In: Ghanem R, Higdon D, Owhadi H (eds) Handbook of uncertainty quantification. Springer, pp 311–428

  14. Dashti M, Harris S, Stuart A (2011) Besov priors for bayesian inverse problems, arXiv:1105.0889

  15. Dashti M, Law KJ, Stuart AM, Voss J (2013) Map estimators and their consistency in bayesian nonparametric inverse problems. Inverse Prob 29:095017

    MathSciNet  Article  Google Scholar 

  16. Daubechies I (1988) Orthonormal bases of compactly supported wavelets. Commun Pure Appl Math 41:909–996

    MathSciNet  Article  Google Scholar 

  17. Daubechies I (1992) Ten lectures on wavelets, vol 61 of cbms-nsf regional conference series in applied mathematics

  18. Delhomme J, Lavenue A (2000) Four decades of inverse problems in hydrogeology, theory, modeling, and field investigation in hydrogeology: a special volume in honor of Shlomo P. Neuman’s 60th Birthday 348:1

    Google Scholar 

  19. Dubot F, Favennec Y, Rousseau B, Rousse DR (2015) A wavelet multi-scale method for the inverse problem of diffuse optical tomography. J Comput Appl Math 289:267–281

    MathSciNet  Article  Google Scholar 

  20. Engl HW, Hanke M, Neubauer A (1996) Regularization of inverse problems, vol 375 Springer Science & Business Media

  21. Estep D (2004) A short course on duality, adjoint operators, Green’s functions, and a posteriori error analysis, Lecture Notes

  22. Fitzpatrick BG (1991) Bayesian analysis in inverse problems. Inverse Prob 7:675

    MathSciNet  Article  Google Scholar 

  23. Franklin JN (1970) Well-posed stochastic extensions of ill-posed linear problems. J Math Anal Appl 31:682–716

    MathSciNet  Article  Google Scholar 

  24. Giles M, Glasserman P (2006) Smoking adjoints: fast monte carlo greeks. Risk 19:88–92

    Google Scholar 

  25. Giles MB, Pierce NA (2000) An introduction to the adjoint approach to design. Flow Turbul Combust 65:393–415

    Article  Google Scholar 

  26. Girolami M, Calderhead B (2011) Riemann manifold langevin and hamiltonian monte carlo methods. J R Stat Soc Ser B Stat Methodol 73:123–214

    MathSciNet  Article  Google Scholar 

  27. Haar A (1910) Zur Theorie der orthogonalen Funktionensysteme. Math Ann 69:331–371

    MathSciNet  Article  Google Scholar 

  28. Helin T, Burger M (2015) Maximum a posteriori probability estimates in infinite-dimensional bayesian inverse problems. Inverse Prob 31:085009

    MathSciNet  Article  Google Scholar 

  29. Kaipio J, Somersalo E (2006) Statistical and computational inverse problems, vol 160 Springer Science & Business Media

  30. Kolehmainen V, Lassas M, Niinimäki K, Siltanen S (2012) Sparsity-promoting bayesian inversion. Inverse Prob 28:025005

    MathSciNet  Article  Google Scholar 

  31. Kuo H-H (1975) Gaussian measures in banach spaces. In: Gaussian measures in banach spaces. Springer, pp 1–109

  32. Lassas M, Saksman E, Siltanen S (2009) Discretization-invariant bayesian inversion and besov space priors. Inverse Problems and Imaging 3:87–122

    MathSciNet  Article  Google Scholar 

  33. Mallat SG (1989) Multiresolution approximations and wavelet orthonormal bases of \(L^{2}(\mathbb R)\). Trans Amer Math Soc 315:69–87

    MathSciNet  MATH  Google Scholar 

  34. Mandelbaum A (1984) Linear estimators and measurable linear transformations on a hilbert space. Zeitschrift für Wahrscheinlichkeitstheorie und Verwandte Gebiete 65:385–397

    MathSciNet  Article  Google Scholar 

  35. Meyer Y (1995) Wavelets and operators, vol 1. Cambridge University Press, Cambridge

    Google Scholar 

  36. Neubauer A, Pikkarainen HK (2008) Convergence results for the bayesian inversion theory. Journal of Inverse and Ill-posed Problems 16:601–613

    MathSciNet  Article  Google Scholar 

  37. Rantala M, Vanska S, Jarvenpaa S, Kalke M, Lassas M, Moberg J, Siltanen S (2006) Wavelet-based reconstruction for limited-angle x-ray tomography. IEEE Trans Med Imaging 25:210–217

    Article  Google Scholar 

  38. Rieder A (1997) A wavelet multilevel method for ill-posed problems stabilized by Tikhonov regularization. Numer Math 75:501–522

    MathSciNet  Article  Google Scholar 

  39. Roberts GO, Tweedie RL et al (1996) Exponential convergence of langevin distributions and their discrete approximations. Bernoulli 2:341–363

    MathSciNet  Article  Google Scholar 

  40. Stuart AM (2010) Inverse problems: a Bayesian perspective. Acta Numerica 19:451–559

    MathSciNet  Article  Google Scholar 

  41. Sullivan TJ (2015) Introduction to uncertainty quantification, vol 63 Springer

  42. Sun N-Z (2013) Inverse problems in groundwater modeling, vol 6 Springer Science & Business Media

  43. Sun N-Z, Yeh WW-G (1985) Identification of parameter structure in groundwater inverse problem. Water Resour Res 21:869–883

    Article  Google Scholar 

  44. Triebel H (2008) Function spaces and wavelets on domains, vol 7 European Mathematical Society

  45. Wang Z, Bardsley JM, Solonen A, Cui T, Marzouk YM (2017) Bayesian inverse problems with l_1 priors: a randomize-then-optimize approach. SIAM J Sci Comput 39:S140–S166

    MathSciNet  Article  Google Scholar 

Download references


P.W. is thankful for a fruitful discussion with Youssef Marzouk, a very helpful email from Donald Estep and in particular for the guidance of Claudia Schillings which ultimately led to the idea of this paper.

Author information



Corresponding author

Correspondence to Philipp Wacker.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


Appendix A: Haar Wavelets

The first wavelet was the Haar wavelet, conceived by Alfred Haar (1910) as a method of generating an unconditional basis for L2([0, 1]) (the Haar wavelet in fact builds an unconditional basis also for general Lp spaces). Wavelets really took off in the 70s and 80s with notably Ingrid Daubechies’ description of compactly supported continuous wavelets (Daubechies 1988) (the Haar wavelet is discontinous) and Stéphane Mallat’s general framework of multiresolution analysis (Mallat 1989). A classical introduction are Daubechies’ marvelous “ten lectures” (Daubechies 1992), a very instructive and readable account is by Blatter (2003). In the following, we will only use Haar’s original wavelets. The discontinuity (which can be a disadvantage for some applications e.g. in image processing) is actually a favorable property in our case as we want to model sharp interfaces in subsurface topology and thus we do not need or want more sophisticated wavelets (at least for the purpose of this study).

Fig. 7

MAP “proposal” (i.e. not converged result) for method 1 in the Fourier setting

A.1 Haar Wavelet Expansions of Functions on [0,1]d

We will work with dimension one and two; the step to higher dimensions is straightforward (although computationally challenging!).

A.1.1 Dimension One

Define the Haar scale function and Haar mother wavelet in 1d by

$$ \phi(x) = \chi_{[0,1)}(x),\quad \psi(x) = \chi_{\left[0,\tfrac{1}{2}\right)}(x) - \chi_{\left[\tfrac{1}{2}, 1\right)}(x), $$

where χA(x) = 1 if xA and χA(x) = 0 else. The scaled wavelets are

$$ \psi_{j,k}(x) = 2^{j/2}\cdot \psi(2^{j}x-k). $$

We know that we can write any fL2[0, 1] as

$$ \begin{array}{@{}rcl@{}} f(x) &=& w_{0}\cdot \phi(x) + \sum\limits_{j=0}^{\infty} \sum\limits_{k=0}^{2^{j}-1}w_{j,k}\cdot \psi_{j,k}(x) \end{array} $$
$$ \begin{array}{@{}rcl@{}} &=& w_{0}\cdot \phi(x) + \sum\limits_{l=1}^{\infty} c_{l} \psi_{l}(x) \end{array} $$

where the second line is a re-indexing of (j, k)↦l(j, k) = 2j + k and ψl = ψj, k for the appropriate change of index (see Fig. 7). The justification for this to work is multiresolutional analysis (see Daubechies 1992), but the main point is that wavelets give a way to give a local way of spanning functions (as compared to, say, trigonometric polynomials – the exact opposite of local) which is also robust with respect to fine discretization (as opposed to a naive spanning of a function by evaluating it on a grid). We show how the wavelet expansion is computed for a concrete function: Let f be given by its values on a uniform grid of [0, 1). We always assume that the number of grid points is a power of 2, i.e. \(\vec f = (f(0), f(2^{-N}),{\ldots } f(1-2^{-N}))^{T}\). Now we define \(a_{N}^{(k)} := f(k\cdot 2^{-N})\). Then iteratively we compute

$$ \begin{array}{@{}rcl@{}} a_{j}^{(k)} &:=&\frac{a_{j+1}^{(2k)}+a_{j+1}^{(2k+1)}}{2}\\ d_{j}^{(k)} &:=& \frac{a_{j+1}^{(2k)}-a_{j+1}^{(2k+1)}}{2} \end{array} $$

for j = 0,…,N − 1 and k = 0,…, 2n − 1, i.e. every step jj + 1 halves the size of the vectors an and dn (by coarsening the resolution by a factor of two). Note that we can forget now all aj for j > 0, as \(a_{j+1}^{(2k)} = a_{n}^{(j)} + d_{n}^{(k)}\) and \(a_{j+1}^{(2k+1)} = a_{n}^{(j)} - d_{n}^{(k)}\). Then the wavelet expansion is given by the form

$$ w_{0}\cdot \phi(x) + \sum\limits_{j=0}^{N-1} \sum\limits_{k=0}^{2^{j}-1}w_{j,k}\cdot \psi_{j,k}(x) $$

with w0 = a0 and \(w_{j,k} = d_{j}^{(k)}\). Note that the sum is finite as having started with values of f on a grid puts a lid on the maximal resolution.

Fig. 8

MAP estimator (converged) for method 2 in the Wavelet setting

Fig. 9

MAP “proposal” (i.e. not converged result) for method 1 in the wavelet setting

Fig. 10

Plot of the Cameron-Martin-norm over time for all three relevant methods. This explains the abundance of detail in Fig. 9 versus 8

Fig. 11

MAP estimator (with method 2) for a Besov prior

Fig. 12

Coefficients of the MAP estimator and ground truth. Note the much higher sparsity of the MAP estimator

Fig. 13

Index table for dimension one

Fig. 14

Expansion calculation. Filled nodes represent quantities directly used for the expansion

Fig. 15

Left: Reconstruction of a one-dimensional function from its wavelet decomposition: -th row shows \(a_{0}\cdot \phi (x) + {\sum }_{j=0}^{\ell -1}{\sum }_{k=0}^{2^{j}-1}d_{j}^{(k)}\cdot \psi _{j,k}(x)\). Right: Reconstruction of a two-dimensional function from its wavelet decomposition

A.1.2 Dimension Two

Here we need to define the scale function and three mother wavelets.

$$ \begin{array}{@{}rcl@{}} \phi(x,y) &=& \phi(x)\cdot\phi(y)\\ &=&\chi_{[0,1)^{2}}(x,y)\\ \psi^{(0)}(x,y) &=& \phi(x)\cdot\psi(y)\\ &=&\chi_{[0,1)\times [0,1/2)}(x,y) - \chi_{[0,1)\times [1/2,1)}(x,y)\\ \psi^{(1)}(x,y) &=& \psi(x)\cdot\phi(y)\\ &=&\chi_{[0,1/2)\times [0,1)}(x,y) - \chi_{[1/2,1)\times [0,1)}(x,y)\\ \psi^{(2)}(x,y) &=& \psi(x)\cdot\psi(y)\\ &=&\chi_{[0,1/2)\times [0,1/2)}(x,y) - \chi_{[0,1/2)\times [1/2, 1)}(x,y) \\ &&- \chi_{[1/2, 1)\times [0,1/2)}(x,y) + \chi_{[1/2, 1)\times [1/2, 1)}(x,y) \end{array} $$

and scaled wavelets

$$ \psi_{j,k,n}^{(m)}(x,y) = 2^{j}\cdot \psi^{(m)}(2^{j}x-k, 2^{j}y-n). $$

With this we can expand a function defined on [0, 1]2 by

$$ \begin{array}{@{}rcl@{}} f(x,y) &=& w_{0} \cdot \phi(x,y) + \sum\limits_{j=0}^{\infty} \sum\limits_{m=0}^{2}\sum\limits_{k=0}^{2^{j}-1}\sum\limits_{n=0}^{2^{j}-1} w_{j,k,n}^{(m)}\cdot \psi_{j,k,n}^{(m)}(x,y)\\ &=& w_{0} \cdot \phi(x,y) + \sum\limits_{l=1}^{\infty} c_{l}\psi_{l}(x,y) \end{array} $$

where as in 1d we re-index (j, m, k, n)↦l(j, m, k, n) = 4j + m ⋅ 4j + k ⋅ 2j + l or as in the table in Fig. 10.

Calculation of the wavelet expansion in two dimension is done as follows:

Fig. 16

Index table for dimension two

Given a function by its values on a square, power-of-4 grid {0, 2N,…1 − 2N}2 we define \(a_{N}^{(k,n)} = f(k\cdot 2^{-N}, n\cdot 2^{-N})\) and

$$ \begin{array}{@{}rcl@{}} a_{j}^{(k,n)} &:=& \frac{a_{j+1}^{(2k,2n)}+a_{j+1}^{(2k+1,2n)}+a_{j+1}^{(2k,2n+1)}+a_{j+1}^{(2k+1,2n+1)}}{4}\\ d_{j}^{0, (k,n)} &:=& \frac{a_{j+1}^{(2k,2n)}+a_{j+1}^{(2k+1,2n)}-a_{j+1}^{(2k,2n+1)}-a_{j+1}^{(2k+1,2n+1)}}{4}\\ d_{j}^{1, (k,n)} &:=& \frac{a_{j+1}^{(2k,2n)}-a_{j+1}^{(2k+1,2n)}+a_{j+1}^{(2k,2n+1)}-a_{j+1}^{(2k+1,2n+1)}}{4}\\ d_{j}^{2, (k,n)} &:=& \frac{a_{j+1}^{(2k,2n)}-a_{j+1}^{(2k+1,2n)}-a_{j+1}^{(2k,2n+1)}+a_{j+1}^{(2k+1,2n+1)}}{4} \end{array} $$

and analogously to one dimension, we can then span f by

$$ a_{0} \cdot \phi(x,y) + \sum\limits_{j=0}^{N-1} \sum\limits_{m=0}^{2}\sum\limits_{k=0}^{2^{j}-1}\sum\limits_{n=0}^{2^{j}-1} d_{j}^{m,(k,n)}\cdot \psi_{j,k,n}^{(m)}(x,y) $$

where the meaning of the indices is as follows:

  • j: Scale

  • m: Orientation (0=Horizontal, 1=Vertical, 2=Diagonal)

  • k: Shift in horizontal direction

  • n: Shift in vertical direction

A.1.3 Arbitrary Dimension

Just for reference we give the straight-forward extension to arbitrary dimension \(d\in \mathbb {N}\): Given a function \(f :[0,1]^{d}\to \mathbb {R}\), we define Λ = {0,…, 2j − 1}d and span

$$ f(z) = w_{0} \cdot \phi(z) + \sum\limits_{j=0}^{\infty} \sum\limits_{m=0}^{2^{d}-1}\sum\limits_{k\in {\Lambda}} w_{j,k}^{(m)}\cdot \psi_{j,k}^{(m)}(z) $$

Appendix B: Besov Spaces

We can characterize (see Triebel 2008; Daubechies 1992; Meyer 1995; Bui-Thanh and Ghattas2015; Dashti et al. 2011; Kolehmainen et al. 2012; Lassas et al. 2009) elements of Besov space \(B_{pp}^{s}\) by the following: Let \(f:[0,1]^{d}\to \mathbb {R}\). Assume that the wavelet basis used is regular enough. Then for p > 0 and \(s \in \mathbb {R}\),

$$ f\in B_{pp}^{s}([0,1]^{d}) \Leftrightarrow \left( \sum\limits_{l=1}^{\infty} l^{\frac{ps}{d}+\frac{p}{2}-1}\cdot |c_{l}|^{p}\right)^{\frac{1}{p}} < \infty. $$

In one and two dimensions this reduces to the following:

$$ \begin{array}{@{}rcl@{}} &&f\in B_{pp}^{s}([0,1]), {\kern1.7pt} d=1\\ &&\quad\Leftrightarrow \|f\|_{B_{pp}^{s}} := \left( |w_{0}|^{p} + \sum\limits_{j=0}^{\infty} 2^{jp(s+\frac{1}{2}-\frac{1}{p})}\cdot \sum\limits_{k=0}^{2^{j}-1} |w_{j,k}|^{p} \right)^{\frac{1}{p}} < \infty \end{array} $$
$$ \begin{array}{@{}rcl@{}} &&f\in B_{pp}^{s}([0,1]^{2}), {\kern1.7pt} d=2 \\ &&\quad\Leftrightarrow \|f\|_{B_{pp}^{s}} := \left( |w_{0}|^{p} + \sum\limits_{j=0}^{\infty} 2^{jp(s+1-\frac{2}{p})}\cdot \sum\limits_{m=0}^{2}\sum\limits_{k=0}^{2^{j}-1}\sum\limits_{n=0}^{2^{j}-1} |w_{j,k,n}^{(m)}|^{p} \right)^{\frac{1}{p}} < \infty. \end{array} $$

Note that the formula in Eq. 3 does not yield the same numerical value as the appropriate dimension-dependent formula for \(\|f\|_{B_{pp}^{s}}\) but that the expressions are merely equivalent in the sense of equivalent norms.

If we set p = 2, we recover the Sobolev spaces, i.e. \(B_{2,2}^{s} = H^{s}\).

Remark 1

Note that Besov spaces usually have an additional parameter, i.e. \(B_{p,q}^{s}\) although we only use the case p = q for our choice of priors. For completeness, (both for general \(p>0,q>0,s\in \mathbb {R}\) and dimension d) the \(B_{p,q}^{s}\) Besov norm for functions defined on [0, 1]d is defined by

$$\|f\|_{B_{pq}^{s}([0,1]^{d})}:= \left( |w_{0}|^{p} + \sum\limits_{j=0}^{\infty} 2^{jq(s+\frac{d}{2}-\frac{d}{p})}\left( \sum\limits_{m=0}^{2^{d}-1}\sum\limits_{k\in{\Lambda}} |w_{j,k}^{(m)}|^{p}\right)^{\frac{q}{p}} \right)^{\frac{1}{q}}$$

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Wacker, P., Knabner, P. Wavelet-Based Priors Accelerate Maximum-a-Posteriori Optimization in Bayesian Inverse Problems. Methodol Comput Appl Probab 22, 853–879 (2020). https://doi.org/10.1007/s11009-019-09736-2

Download citation


  • Bayesian inverse problems
  • Besov priors
  • Optimization
  • Elliptical inverse problem

Mathematics Subject Classification (2010)

  • 65M32
  • 62F15
  • 65K10