Skip to main content

Relationship Between Data Smoothing and the Regularization of Inverse Problems


We investigate the practice of regularization (also termed damping) in inverse problems, meaning the use of prior information to supplement observations, in order to suppress instabilities in the solution caused by noisy and incomplete data. Our focus is on forms of regularization that create smooth solutions, for smoothness is often considered a desirable—or at least acceptable—attribute of inverse theory solutions (and especially tomographic images). We consider the general inverse problem, in its continuum limit. By deconstruction into the part controlled by the regularization and the part controlled by the data kernel, we show the general solution depends on a smoothed version of the back-projected data as well as a smoothed version of the generalized inverse. Crucially, the smoothing function that controls both is the solution to the simple data smoothing problem. We then consider how the choice of regularization shapes the smoothing function, in particular exploring the dichotomy between expressing prior information either as a constraint equation (such as a spatial derivative of the solution being small) or as a covariance matrix (such as spatial correlation falling off at a specified rate). By analyzing the data smoothing problem in its continuum limit, we derive analytic solutions for different choices of regularization. We consider four separate cases: (1) the first derivative of the solution is close to zero, (2) the prior covariance is a two-sided declining exponential, (3) the second derivative of the solution is close to zero, and (4) the solution is close to its localized average. First-derivative regularization is put forward as having several attractive properties and few, if any, drawbacks.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3


  1. Abers, G. (1994). Three-dimensional inversion of regional P and S arrival times in the East Aleutians and sources of subduction zone gravity highs, J. Geophys. Res. 99, 4395–4412.

  2. Aki, K., A. Christoffersson and E. Husebye (1976). Three-dimensional seismic structure under the Montana LASA, Bull. Seismol. Soc. Am. 66, 501–524.

  3. Backus G.E. and Gilbert, J.F. (1968). The resolving power of gross earth data, Geophys. J. R. Astron. Soc. 16, 169–205.

  4. Backus G.E. and Gilbert, J.F. (1970). Uniqueness in the inversion of gross Earth data, Phil. Trans. R. Soc. London Ser. A 266, 123–192.

  5. Boschi, L. and A.M. Dziewonski (1999), High- and low-resolution images of the earth’s mantle: implications of different approaches to tomographic modeling, J. Geophys. Res. 104, 25567–25594.

  6. Ekstrom, G., J. Tromp and E.W.F. Larson (1997). Measurements and global models of surface wave propagation, J. Geophys. Res. 102, 8127–8157.

  7. Gradshteyn, I.S. and Ryzhik, I.M., Tables of Integrals, Series and Products, Corrected and Enlarged Edition (Academic, New York, 1980).

  8. Hetenyi, M., (1979). Beams on Elastic Foundation (University of Michigan Press, Ann Arbor, 1979).

  9. Humphreys, E., R.W. Clayton and B.H. Hager (1984). A tomographic image of mantle structure beneath southern California, Geophys. Res. Lett. 11, 625–627.

  10. Laske, G. and G. Masters (1996). Constraints on global phase velocity maps from long-period polarization data, J. Geophys. Res. 101, 16059–16075.

  11. Lawson, C. and Hanson, R., Solving Least Squares Problems (Prentice-Hall, New York, 1974).

  12. Levenberg, K., (1944). A method for the solution of certain non-linear problems in least-squares, Q. Appl. Math. 2, 164–168.

  13. Menke, W., Geophysical Data Analysis: Discrete Inverse Theory (First Edition). (Academic, New York, 1984).

  14. Menke, W. (2005). Case studies of seismic tomography and earthquake location in a regional context, in Seismic Earth: Array Analysis of Broadband Seismograms, A. Levander and G. Nolet, eds., Geophysical Monograph Series 157. American Geophysical Union, 7–36.

  15. Menke, W., Geophysical Data Analysis: Discrete Inverse Theory (MATLAB Edition), (Elsevier, New York, 2012).

  16. Menke, W. (2014). Resolution and Covariance in Generalized Least Squares Inversion, in press in Surveys of Geophysics.

  17. Menke, W. and Menke, J., Environmental Data Analysis with MATLAB (Elsevier, New York, 2011).

  18. Menke W. and Abbott, D., Geophysical Theory (Columbia University Press, New York, 1989).

  19. Nettles, M., and A.M. Dziewonski (2008). Radially anisotropic shear-velocity structure of the upper mantle globally and beneath North America, J. Geophys. Res., 113, doi:10.1029/2006JB004819, 2008.

  20. Smith, W. and Wessel, P. (1990), Gridding with continuous curvature splines in tension, Geophysics 55, 293–305.

  21. Trampert, J. and J.H. Woodhouse (1995), Global phase velocity maps of Love and Rayleigh waves between 40 and 130 seconds, Geophys. J. Int., 122, 675–690.

  22. Tarantola A. and Valette B. (1982a), Generalized non-linear inverse problems solved using the least squares criterion, Rev. Geophys. Space Phys. 20, 219–232.

  23. Tarantola A, Valette B. (1982b), Inverse problems = quest for information, J. Geophys. 50, 159–170.

  24. Tromp, J., Tape, C. and Liu, Q. (2005), Seismic tomography, adjoint methods, time reversal and banana-doughnut kernels. Geophys. J. Int., 160: 195–216. doi:10.1111/j.1365-246X.2004.02453.x.

  25. Wiggins, R.A. (1972), The general linear inverse problem: Implication of surface waves and free oscillations for Earth structure, Rev. Geophys. Space Phys. 10, 251–285.

  26. Wikipedia (2014), Screened Poisson Equation,

  27. Zha, Y., S.C. Webb, S.S. Wei, D.A. Wiens, D.K. Blackman, W.H. Menke, R.A. Dunn, J.A. Conders (2014), Upper mantle shear velocity structure beneath the Eastern Lau Spreading Center from OBS ambient noise tomography, Earth Planet. Sci. Lett. 408, 194–206.

Download references


This research was supported by the US National Science Foundation under grants OCE-0426369 and EAR 11-47742.

Author information



Corresponding author

Correspondence to William Menke.

Appendix: Derivations of Smoothing Kernels and Covariances for Case Studies

Appendix: Derivations of Smoothing Kernels and Covariances for Case Studies

Case 1: First-Derivative Minimization

The operator \({ \mathcal{L}} = \varepsilon {\text{d}}/{\text{d}}x\) has translational invariance, so we expect that the smoothing kernel a(xx′) = a(x − x′) will depend only upon the separation distance (x − x′) (as also will C h , P h , Q h , and R). Without loss of generality, we can set x′ = 0, so that Eq. (13) becomes

$$\left( { - \varepsilon^{2} \frac{{{\text{d}}^{2} }}{{{\text{d}}x^{2} }} + 1} \right) a(x) = \delta (x).$$

Here, we utilize the relationship that \(\left( {{\text{d}}/{\text{d}}x} \right)^{\dag } = - {\text{d}}/{\text{d}}x\). The solution to this well-known 1D screened Poisson equation is

$$a\left( x \right) = \frac{{\varepsilon^{ - 1} }}{2}\exp \left( { - \varepsilon^{ - 1} \left| x \right|} \right).$$

This solution can be verified by substituting it into the differential equation:

$$\begin{aligned} \frac{{{\text{d}}a}}{{{\text{d}}x}} = - \frac{{\varepsilon^{ - 2} }}{2}\text{sgn} \left( x \right) {\text{exp}}\left( { - \varepsilon^{ - 1} \left| x \right|} \right)\quad {\text{and }}\quad \frac{{{\text{d}}^{2} a}}{{{\text{d}}x^{2} }} = \frac{{\varepsilon^{ - 3} }}{2} {\text{exp}}\left( { - \varepsilon^{ - 1} \left| x \right|} \right) - \varepsilon^{ - 2} \delta \left( x \right) \hfill \\ {\text{so}}\quad - \varepsilon^{2} \frac{{\varepsilon^{ - 3} }}{2} {\text{exp}}\left( { - \varepsilon^{ - 1} \left| x \right|} \right) - \varepsilon^{2} \left( { - \varepsilon^{ - 2} } \right) \delta \left( x \right) + \frac{{\varepsilon^{ - 1} }}{2}\exp \left( { - \varepsilon^{ - 1} \left| x \right|} \right) = \delta (x). \hfill \\ \end{aligned}$$

Here, we have relied on the fact that (d/dx)|x| = sgn(x) and \(\left( {{\text{d}}/{\text{d}}x} \right) {\text{sgn}}(x) = 2\delta \left( x \right)\). Note that a(x) is a two-sided declining exponential with unit area and decay rate ɛ −1. Because of the translational invariance, the integral in Eq. (11) has the interpretation of a convolution. The solution is the observed data d(x) convolved with this smoothing kernel:

$$m\left( x \right) = a\left( x \right)*d(x).$$

The variance C h of the prior information satisfies Eq. (15a):

$$- \sigma_{d}^{ - 2} \varepsilon^{2} \frac{{{\text{d}}^{2} }}{{{\text{d}}x^{2} }} C_{h} (x) = \delta (x).$$

This is a 1D Poisson equation, with solution

$$C_{h} \left( x \right) = \sigma_{d}^{2} \frac{{\varepsilon^{ - 2} }}{2}\left( {C_{0} - \left| x \right|} \right)\quad {\text{with}}\;C_{0} {\text{ arbitrary}}.$$

This solution can be verified by substituting it into the differential equation:

$$\begin{aligned} \frac{{{\text{d}}C_{h} }}{{{\text{d}}x}} = - \sigma_{d}^{2} \frac{{\varepsilon^{ - 2} }}{2} {\text{sgn}}\left( x \right)\quad {\text{and }}\quad \frac{{{\text{d}}^{2} C_{h} }}{{{\text{d}}x^{2} }} = - \sigma_{d}^{2} \varepsilon^{ - 2} \delta \left( x \right) \hfill \\ {\text{thus}}\quad - \sigma_{d}^{ - 2} \varepsilon^{2} \frac{{{\text{d}}^{2} }}{{{\text{d}}x^{2} }}C_{h} \left( x \right) = - \sigma_{d}^{ - 2} \varepsilon^{2} \left( { - \sigma_{d}^{2} \varepsilon^{ - 2} } \right) \delta \left( x \right) = \delta \left( x \right). \hfill \\ \end{aligned}$$

The covariance C h (x − x′) implies that the errors associated with neighboring points of the prior information equation m(x) = 0 are highly and positively correlated, and that the degree of correlation declines with separation distance, becoming negative at large separation.

Finally, we note that the operator \({\mathcal{L}} = \varepsilon {\text{d}}/{\text{d}}x\) is not self-adjoint, so that it is not the continuous analogue of the symmetric matrix C −1/2 h . As described earlier, we can construct a symmetric operator by introducing a unitary transformation. \({\mathcal{L}}\) is antisymmetric in x, but we seek a symmetric operator, so the correct transformation is the Hilbert transform, \({\mathcal{H}}\), that is, the linear operator that phase-shifts a function by π/2. It obeys the rules \({\mathcal{H}}^{\dag } = - {\mathcal{H}}\), \({\mathcal{H}}^{\dag } {\mathcal{H}} = 1\), and \({\mathcal{H}}\left( {{\text{d}}/{\text{d}}x} \right) = \left( {{\text{d}}/{\text{d}}x} \right){\mathcal{H}}\). The modified operator \({\mathcal{L}}_{\text{sa}} = \varepsilon { \mathcal{H}}{\text{d}}/{\text{d}}x\) is self-adjoint and satisfies \({\mathcal{L}}_{\text{sa}}^{\dag } {\mathcal{L}}_{\text{sa}} = {\mathcal{L}}^{\dag } {\mathcal{L}}.\)

Case 2: Exponentially Decaying Covariance

For a covariance described by a two-sided declining exponential function,

$$C_{h} \left( {x - x'} \right) = \varepsilon^{ - 2} \exp \left( { - \eta \left| {x - x'} \right|} \right) = \frac{{2\varepsilon^{ - 2} }}{\eta } \frac{\eta }{2}\exp \left( { - \eta \left| {x - x'} \right|} \right).$$

By comparing Eqs. (42) and (43), we find that this prior covariance is the inverse of the operator

$${\mathcal{L}}^{\dag } {\mathcal{L}} = \frac{{\eta \varepsilon^{2} }}{2}\left( { - \eta^{ - 2} \frac{{{\text{d}}^{2} }}{{{\text{d}}x^{2} }} + 1} \right).$$

The smoothing kernel solves the equation

$$\gamma^{2} \left( { - \beta^{ - 2} \gamma^{ - 2} \frac{{{\text{d}}^{2} }}{{{\text{d}}x^{2} }} + 1} \right)a\left( x \right) = \delta \left( x \right)$$
$${\text{with}}\quad \beta^{2} = 2\eta \varepsilon^{ - 2} \quad {\text{and}}\quad \gamma^{2} = \left( {1 + \frac{{\eta \varepsilon^{2} }}{2}} \right).$$

By analogy to Eqs. (42) and (43), the smoothing kernel is

$$a\left( x \right) = \gamma^{ - 2} \frac{\beta \gamma }{2}\exp \left( { - \beta \gamma \left| x \right|} \right).$$

An operator \({\mathcal{L}}\) that reproduces the form of \({\mathcal{L}}^{\dag } {\mathcal{L}}\) given in Eq. (50) is

$${\mathcal{L}} = \lambda \left( {\eta^{ - 1} \frac{\text{d}}{\text{d}x} + 1} \right)\quad {\text{with}}\quad \lambda^{2} = \eta /2\varepsilon^{ - 2}.$$

The function P h solves Eq. (15b), \({\mathcal{L}}^{\dag } P_{h} = \delta (x)\), which for the operator in (30) has the form of a one-sided exponential,

$$P_{h} \left( x \right) = \alpha \lambda^{ - 1} H( - x)\exp \left( {\eta x} \right).$$

Here, H(x) is the Heaviside step function. Because of the translational invariance, the inner product in Eq. (14) relating P h to C h is a convolution. That, together with the rule that the adjoint of a convolution is the convolution backwards in time, implies that \(C_{h} \left( t \right) = P_{h} \left( { - t} \right)*P_{h} \left( t \right) = P_{h} \left( t \right){ \star }P_{h} (t)\), where \({ \star }\) signifies cross-correlation. The reader may easily verify that the autocorrelation of Eq. (31) reproduces the formula for C h given in (25). Unfortunately, its Hilbert transform cannot be written as a closed-form expression, so no simple formula for the symmetrized form of P h , analogous to C 1/2h , can be given.

Case 3: Second-Derivative Minimization

The smoothing kernel a(x) satisfies the differential equation

$$\left( {\varepsilon^{2} \frac{{\text{d}^{4} }}{{\text{d}x^{4} }} + 1} \right)a(x) = \delta (x).$$

This well-known differential equation has solution (Hetenyi 1979; see also Menke and Abbott 1989; Smith and Wessel 1990; Menke 2014)

$$a(x) = V { \exp }\left( { - \left| x \right|/\lambda } \right) \left\{ {\cos \left( {\left| x \right|/\lambda } \right) + \sin \left( {\left| x \right|/\lambda } \right)} \right\},$$
$$\lambda = \left( {2\varepsilon } \right)^{1/2} \quad {\text{and}}\quad V = \frac{{\lambda^{3} }}{{8\varepsilon^{2} }}.$$

The variance C h of the prior information satisfies Eq. (15a):

$$\sigma_{d}^{ - 2} \varepsilon^{2} \frac{{{\text{d}}^{4} }}{{{\text{d}}x^{4} }} C_{h} (x) = \delta (x),$$

and by analogy to (47) has solution

$$C_{h} \left( x \right) = - \sigma_{d}^{2} \frac{{\varepsilon^{ - 2} }}{12}\left( {C_{0} - \left| {x^{3} } \right|} \right)\quad {\text{with}}\;C_{0} {\text{ arbitrary}}.$$

This solution can be verified by substituting it into the differential equation:

$$\frac{{{\text{d}}^{3} C_{h} }}{{{\text{d}}x^{3} }} = \sigma_{d}^{2} \frac{{\varepsilon^{ - 2} }}{2} {\text{sgn}}\left( x \right)\quad {\text{and}}\quad \frac{{{\text{d}}^{2} C_{h} }}{{{\text{d}}x^{2} }} = \sigma_{d}^{2} \varepsilon^{ - 2} \delta \left( x \right),$$
$${\text{thus}}\quad \sigma_{d}^{ - 2} \,\varepsilon^{2} \frac{{{\text{d}}^{4} }}{{{\text{d}}x^{4} }}C_{h} \left( x \right) = \sigma_{d}^{ - 2} \varepsilon^{2} \left( {\sigma_{d}^{2} \varepsilon^{ - 2} } \right) \delta \left( x \right) = \delta \left( x \right).$$

This function implies a steep drop off in covariance between neighboring points and increasingly great anticorrelation with distance.

Case 4: Damping towards Localized Average

From Eq. (37), we find that the smoothing kernel a(x) satisfies

$${\mathcal{L}}^{\dag } {\mathcal{L} }a + a = \varepsilon^{2} \left[ {\delta \left( x \right) - s\left( x \right)} \right]*\left[ {\delta \left( x \right) - s\left( x \right)} \right]*a + a = \delta \left( x \right).$$

We now make use of the fact that the operator \({\mathcal{L}}_{s} = 1 - \eta^{ - 2} \text{d}^{2} /\text{d}x^{2}\) is the inverse to convolution by s(x). Applying \({\mathcal{L}}_{s}\) twice to (37) yields the differential equation

$$\left( {1 + \varepsilon^{2} } \right)\eta^{ - 4} \frac{{{\text{d}}^{4} a}}{{{\text{d}}x^{4} }} - 2\eta^{ - 2} \frac{{{\text{d}}^{2} a}}{{{\text{d}}x^{2} }} + a = f(x)\quad {\text{with}}\quad f(x) = {\mathcal{L}}_{s} {\mathcal{L}}_{s} \delta \left( x \right).$$

We solve this equation by finding its Green function [that is, solving (39) with f(x) = δ(x)] and then by convolving this Green function by the actual f(x). This Green function can be found using Fourier transforms, with the relevant integral given by Equation 3.728.1 of Gradshteyn and Ryzhik (1980) (which needs to be corrected by dividing their stated result by a factor of 2). The result is

$$a\left( x \right) = \left( {1 - AD} \right) \delta \left( x \right) - A \left\{ {S\;\sin \left( {\eta q\left| x \right|/r} \right) - {C\text{ cos}}\left( {\eta q\left| x \right|/r} \right)} \right\} {\text{exp}}\left( { - \eta p\left| x \right|/r} \right),$$


$$S = \left( {\frac{\eta }{r}} \right)^{4} p\left\{ {\left( {p^{4} - q^{4} } \right) - 2q^{2} \left( {p^{2} + q^{2} } \right)} \right\},$$
$$C = \left( {\frac{\eta }{r}} \right)^{4} q\left\{ {\left( {p^{4} - q^{4} } \right) + 2p^{2} \left( {p^{2} + q^{2} } \right)} \right\},$$
$$A = \varepsilon^{2} \eta^{ - 4} \times \frac{2}{\pi }\left( {\frac{{\eta^{4} }}{{\varepsilon^{2} + 1}}} \right) \times \left( {\frac{\pi }{4uv}} \right) \times 2\left( {\frac{\eta }{r}} \right)$$
$${\text{or}}\; A = \left( {\frac{{\varepsilon^{2} }}{{\varepsilon^{2} + 1}}} \right)\left( {\frac{\eta }{uvr}} \right),$$
$$D = 4\left( {\frac{\eta }{r}} \right)^{3} pq\left( {p^{2} + q^{2} } \right),$$
$$u = \frac{{2\varepsilon \eta^{2} }}{{\varepsilon^{2} + 1}} \quad {\text{and}}\quad v = \frac{{\eta^{2} \left( {\varepsilon^{2} + 1} \right)^{1/2} }}{{r^{2} }},$$
$$r = \left( {\varepsilon^{2} + 1} \right)^{1/2} \quad {\text{and}}\quad p = \left( {\frac{r + 1}{2}} \right)^{1/2} \quad {\text{and}}\quad q = \left( {\frac{r - 1}{2}} \right)^{1/2}.$$

We determine the area under the smoothing kernel by taking the Fourier transform of (61):

$$\left( {\left( {1 + \varepsilon^{2} } \right)\eta^{ - 4} k^{4} - 2\eta^{ - 2} k^{2} + 1} \right) a(k) = 1 - 2\eta^{ - 2} k^{2} + \eta^{ - 4} k^{4}$$

and evaluating it at zero wavenumber. Thus, a(k = 0) = 1; that is, the area is unity.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Menke, W., Eilon, Z. Relationship Between Data Smoothing and the Regularization of Inverse Problems. Pure Appl. Geophys. 172, 2711–2726 (2015).

Download citation


  • Inverse theory
  • tomography
  • spatial analysis
  • damping
  • smoothing
  • regularization