Skip to main content
Log in

Diagnosing Forward Operator Error Using Optimal Transport

  • Published:
Journal of Scientific Computing Aims and scope Submit manuscript

Abstract

We investigate overdetermined linear inverse problems for which the forward operator may not be given accurately. We introduce a new tool called the structure, based on the Wasserstein distance, and propose the use of this to diagnose and remedy forward operator error. Computing the structure turns out to use an easy calculation for a Euclidean homogeneous degree one distance, the Earth Mover’s Distance, based on recently developed algorithms. The structure is proven to distinguish between noise and signals in the residual and gives a plan to help recover the true direct operator in some interesting cases. We expect to use this technique not only to diagnose the error, but also to correct it, which we do in some simple cases presented below.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

Notes

  1. While the definition of the EMD in Eq. 10 is still well-defined for delta function, the formula in Eq. 11 is not. Thus while we use Eq. 11 for numerical calculations, we often rely on Eq. 10 for theoretical bounds.

References

  1. Arridge, S.R.: Optical tomography in medical imaging. Inverse Problems 15(2), R41 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  2. Becker, S., Horesh, L., Aravkin, A., Zhuk, S.: General optimization framework for robust and regularized 3D full waveform inversion (2015). arXiv preprint arXiv:1504.04677

  3. Broyden, C.G.: The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J. Appl. Math. 6(1), 76–90 (1970)

    Article  MATH  Google Scholar 

  4. Chahine, M.T.: Inverse problems in radiative transfer: determination of atmospheric parameters. J. Atmos. Sci. 27(6), 960–967 (1970)

    Article  Google Scholar 

  5. Chan, T.F., Shen, J.J.: Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods, vol. 94. SIAM, Philadelphia (2005)

    Book  MATH  Google Scholar 

  6. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41(7), 909–996 (1988)

    Article  MathSciNet  MATH  Google Scholar 

  7. Engquist, Bjorn, Froese, Brittany D.: Application of the Wasserstein metric to seismic signals (2013). arXiv preprint arXiv:1311.4581

  8. Engquist, Bjorn, Froese, Brittany D, Yang, Yunan.: Optimal transport for seismic full waveform inversion (2016). arXiv preprint arXiv:1602.01540

  9. Evans, L.C.: Partial differential equations and Monge–Kantorovich mass transfer. Curr. Dev. Math. 1997(1), 65–126 (1997)

    Article  MATH  Google Scholar 

  10. Evans, L.C., Gangbo, W.: Differential Equations Methods for the Monge–Kantorovich Mass Transfer Problem, vol. 653. American Mathematical Society, Providence (1999)

    MATH  Google Scholar 

  11. Fedorczak, N., Brochard, F., Bonhomme, G., Schneider, K., Farge, M., Monier-Garbet, P., et al.: Tomographic reconstruction of tokamak plasma light emission from single image using wavelet–vaguelette decomposition. Nuclear Fusion 52(1), 013005 (2011)

    Google Scholar 

  12. Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)

    Article  MATH  Google Scholar 

  13. Freeman, A.: SAR calibration: an overview. IEEE Trans. Geosci. Remote Sens. 30(6), 1107–1121 (1992)

    Article  Google Scholar 

  14. Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24(109), 23–26 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  15. Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2(2), 323–343 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  16. Golub, G.H.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)

    MATH  Google Scholar 

  17. Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  18. Hansen, P.C.: Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev. 34(4), 561–580 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  19. Hansen, P.C., O’Leary, D.P.: The use of the L-curve in the regularization of discrete ill-posed problems. SIAM J. Sci. Comput. 14(6), 1487–1503 (1993)

    Article  MathSciNet  MATH  Google Scholar 

  20. Jacobs, M., Léger, F., Li, W., Osher, S.: Solving large-scale optimization problems with a convergence rate independent of grid size (2018). arXiv preprint arXiv:1805.09453

  21. Kennedy, M.C., O’Hagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 63(3), 425–464 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  22. Kirsch, A.: An Introduction to the Mathematical Theory of Inverse Problems, vol. 120. Springer, New York (2011)

    Book  MATH  Google Scholar 

  23. Li, W., Osher, S., Gangbo, W.: A fast algorithm for earth Mover’s distance based on optimal transport and L1 type regularization (2016). arXiv preprint arXiv:1609.07092

  24. Li, W., Ryu, E.K., Osher, S., Yin, W., Gangbo, W.: A parallel method for earth Mover’s distance. J. Sci. Comput. 75(1), 182–197 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  25. Mallat, S.G.: Multiresolution approximations and wavelet orthonormal bases of \(\text{ l }^2\)(r). Trans. Am. Math. Soc. 315(1), 69–87 (1989)

    MATH  Google Scholar 

  26. Oliver, D.S., Reynolds, A.C., Liu, N.: Inverse Theory for Petroleum Reservoir Characterization and History Matching. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  27. Perlin, K.: An image synthesizer. ACM SIGGRAPH Comput. Graph. 19(3), 287–296 (1985)

    Article  Google Scholar 

  28. Perlin, K.: Improving noise. In: ACM Transactions on Graphics (TOG), vol. 21, pp. 681–682. ACM (2002)

  29. Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D: Nonlinear Phenom. 60(1–4), 259–268 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  30. Ryu, E., Chen, Y., Li, W., Osher, S.: Vector and matrix optimal mass transport: theory, algorithm, and applications (2017). arXiv:1712.10279 [math.OC]

  31. Schneider, U., Pedroni, E., Lomax, A.: The calibration of ct hounsfield units for radiotherapy treatment planning. Phys. Med. Biol. 41(1), 111 (1996)

    Article  Google Scholar 

  32. Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)

    Article  MathSciNet  MATH  Google Scholar 

  33. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, New York (2008)

    MATH  Google Scholar 

  34. Wingen, A., Shafer, M.W., Unterberg, E.A., Hill, J.C., Hillis, D.L.: Regularization of soft-X-ray imaging in the DIII-D tokamak. J. Comput. Phys. 289, 83–95 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  35. Yang, Y., Engquist, B., Sun, J., Hamfeldt, B.F.: Application of optimal transport and the quadratic Wasserstein metric to full-waveform inversion. Geophysics 83(1), R43–R62 (2018)

    Article  Google Scholar 

  36. Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: LBFGS-B: Fortran Subroutines for Large-Scale Bound Constrained Optimization. Report NAM-11, EECS Department, Northwestern University (1994)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Michael A. Puthawala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This manuscript has been authored, in part, by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

M. A. Puthawala and S. J. Osher: The research was sponsored by Department of Energy Grant DOE-SC0013838 and NSF (STROBE), NSFC 11671005. C. D. Hauck: The research was sponsored by the Office of Advanced Scientific Computing Research and performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725.

Appendices

Appendix A: Proofs

Proof (Proof of Proposition 1)

Given \(\varPhi ({\mathbf {v}};\lambda ) = \lambda \left||{\mathbf {C}}{\mathbf {v}}\right||^2_2\), the normal equations for Eq. 6 are

$$\begin{aligned} ({\mathbf {L}}_\theta ^T{\mathbf {L}}_\theta + \lambda {\mathbf {C}}^T{\mathbf {C}}){{\tilde{{\mathbf {u}}}}}_{\theta ,\eta } = {\mathbf {L}}_\theta ^T ({\mathbf {b}}+ \varvec{\eta }). \end{aligned}$$
(51)

Therefore \({{\tilde{{\mathbf {L}}}}}^{-1}_\theta = ({\mathbf {L}}_\theta ^T {\mathbf {L}}_\theta + \lambda {\mathbf {C}}^T {\mathbf {C}}^T)^{-1} {\mathbf {L}}_\theta ^T\). Using the GSVD in Eq. 18, a direct calculation gives

$$\begin{aligned} {\mathbf {L}}_\theta {{\tilde{{\mathbf {L}}}}}_\theta ^{-1} = {\mathbf {U}}_\theta {\mathbf {D}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T, \quad \text {where} {\mathbf {D}}_{\theta ,\lambda } := \frac{ {\varvec{\Sigma }}^2_{\theta } }{ {\varvec{\Sigma }}_{\theta }^2 + \lambda {\varvec{\Gamma }}_{\theta }^2} \in {\mathbb {R}}^{n \times n}. \end{aligned}$$
(52)

Thus according to the definition of the residual in Eq. 8,

$$\begin{aligned} {\mathbf {r}}_{ \theta ,\eta }&= ({\mathbf {I}}- {\mathbf {L}}{{\tilde{{\mathbf {L}}}}}^{-1} ) ( {\mathbf {b}}+ \varvec{\eta }) = {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T ({\mathbf {b}}+ \varvec{\eta }) + ({\mathbf {I}}- {\mathbf {U}}_\theta {\mathbf {U}}_\theta ^T)({\mathbf {b}}+ \varvec{\eta }) \end{aligned}$$
(53)

where

$$\begin{aligned} {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } := ({\mathbf {I}}- {\mathbf {D}}_{\theta ,\lambda }) = \frac{\lambda {\varvec{\Gamma }}^2_{\theta }}{{\varvec{\Sigma }}_{\theta }^2 + \lambda {\varvec{\Gamma }}_\theta ^2} > 0. \end{aligned}$$
(54)

We first bound two of the deterministic components of the residual. Using the GSVD,

$$\begin{aligned} {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T {\mathbf {b}}&= {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T {\mathbf {L}}_\theta {\mathbf {u}}+ {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T ({\mathbf {b}}- {\mathbf {L}}_\theta {\mathbf {u}}) \nonumber \\&= {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\varvec{\Sigma }}_\theta {\mathbf {Z}}_\theta ^T {\mathbf {u}}+ {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T ({\mathbf {b}}- {\mathbf {L}}_\theta {\mathbf {u}}). \end{aligned}$$
(55)

Since \(\left||{{\hat{{\mathbf {D}}}}}_{\theta ,\lambda }\right||_2 \le 1\) and \({\mathbf {U}}_\theta \) is orthogonal, it follows that

$$\begin{aligned} \left||{\mathbf {U}}_\theta {\hat{{\mathbf {D}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T ({\mathbf {b}}- {\mathbf {L}}_\theta {\mathbf {u}})\right||^2_2 \le \left||({\mathbf {b}}- {\mathbf {L}}_\theta {\mathbf {u}})\right||^2_2 \end{aligned}$$
(56)

Furthermore, since

$$\begin{aligned} {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\varvec{\Sigma }}_\theta = \frac{\lambda {\varvec{\Gamma }}^2_{\theta }{\varvec{\Sigma }}_\theta }{{\varvec{\Sigma }}_{\theta }^2 + \lambda {\varvec{\Gamma }}_\theta ^2} \le \frac{1}{2} \sqrt{\lambda } {\varvec{\Gamma }}_\theta \le \frac{1}{2} \sqrt{\lambda } {\mathbf {I}}\end{aligned}$$
(57)

(where the inequalities between the diagonal matrices above are interpreted element-wise), it follows that

$$\begin{aligned} \left||{\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\varvec{\Sigma }}_\theta {\mathbf {Z}}_\theta ^T {\mathbf {u}}\right||^2_2 \le \left||{{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\varvec{\Sigma }}_\theta \right||^2_2\left||{\mathbf {Z}}_\theta ^T {\mathbf {u}}\right||^2_2 \le \frac{1}{4}\lambda \left||{\mathbf {Z}}_\theta ^T {\mathbf {u}}\right||^2_2. \end{aligned}$$
(58)

We next bound the noise component of the residual. Let \({\mathbf {W}}_\theta \in {\mathbb {R}}^{m \times (m-n)}\) be a matrix such that \({\mathbf {Q}}:= ({\mathbf {U}}_\theta | {\mathbf {W}}_\theta ) \in {\mathbb {R}}^{m \times m}\) is orthogonal and set

$$\begin{aligned} \varvec{\alpha }= \begin{pmatrix} \varvec{\alpha }_\parallel \\ \varvec{\alpha }_{\perp } \end{pmatrix} := {\mathbf {Q}}^T \varvec{\eta }= \begin{pmatrix} {\mathbf {U}}^T_\theta \varvec{\eta }\\ {\mathbf {W}}_\theta ^T \varvec{\eta }\end{pmatrix}. \end{aligned}$$
(59)

Then

$$\begin{aligned} \left||({\mathbf {I}}- {\mathbf {L}}{{\tilde{{\mathbf {L}}}}}^{-1} ) \varvec{\eta }\right||^2_2 = \left||{\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T \varvec{\eta }+ ({\mathbf {I}}- {\mathbf {U}}_\theta {\mathbf {U}}_\theta ^T) \varvec{\eta }\right||^2_2 = \left||{\mathbf {U}}_\theta \hat{\mathbf {D}}_{\theta ,\lambda } \varvec{\alpha }_\parallel \right||^2_2 + \left||{\mathbf {W}}_\theta \varvec{\alpha }_{\perp }\right||^2_2,\nonumber \\ \end{aligned}$$
(60)

where the last equality uses the fact that the columns of \({\mathbf {U}}_\theta \) and \({\mathbf {W}}_\theta \) are orthogonal and \({\mathbf {I}}- {\mathbf {U}}_\theta {\mathbf {U}}_\theta ^T = {\mathbf {W}}_\theta {\mathbf {W}}_\theta ^T\). Due to the spherical symmetry assumption on \(\varvec{\eta }\), \(\varvec{\alpha }_\parallel \) and \(\varvec{\alpha }_{\perp }\) are spherically symmetric random variables of dimension n and \(m-n\), respectively, with components that are independent. Therefore

$$\begin{aligned} {\mathbb {E}}\left[ \left||{\mathbf {U}}_\theta \hat{\mathbf {D}}_{\theta ,\lambda }\varvec{\alpha }_\parallel \right||^2_2 \right]&= {\mathbb {E}}\left[ \left||{{\hat{{\mathbf {D}}}}}_{\theta ,\lambda }\varvec{\alpha }_\parallel \right||^2_2 \right] \nonumber \\&= \sum ^n_{i = 1}\left( \frac{\lambda \gamma ^2_i}{\sigma ^2_i + \lambda \gamma ^2_i} \right) ^2 {\mathbb {E}}\left[ \varvec{\eta }^2_i \right] = \frac{1}{m}{{\,\mathrm{Tr}\,}}({\hat{{\mathbf {D}}}}^2_{\theta ,\lambda }) {\mathbb {E}}\left[ \left||\varvec{\eta }\right||^2_2 \right] \end{aligned}$$
(61)

and

$$\begin{aligned} {\mathbb {E}}\left[ {\mathbf {W}}_\theta \left||\varvec{\alpha }_\perp \right||^2_2 \right] = {\mathbb {E}}\left[ \left||\varvec{\alpha }_\perp \right||^2_2 \right] = \frac{m-n}{m}{\mathbb {E}}\left[ \left||\varvec{\eta }\right||^2_2 \right] . \end{aligned}$$
(62)

This completes the proof. \(\square \)

Proof (Proof of Proposition 2)

It is convenient to write Eq. 11 in the abstract form

$$\begin{aligned} {\text {EMD}}(\rho _1,\rho _2)&= \min _{m \in C(\rho _1,\rho _2)} {{\mathcal {T}}(m)}, \end{aligned}$$
(63)

where

$$\begin{aligned} {\mathcal {T}}(m)&= \int _\varOmega \left||m\right||_2dx \end{aligned}$$
(64)
$$\begin{aligned} C(\rho _1,\rho _2)&= \left\{ m : \begin{aligned} \quad&\nabla \cdot m(x) + \rho _2(x) - \rho _1(x) = 0\quad \forall x \in \varOmega ,\\&m(x) \cdot n(x) = 0 \quad \forall x \in \partial \varOmega \end{aligned} \right\} . \end{aligned}$$
(65)

In addition, for any \(f \in L^1(\varOmega )\), let \(m_f\) be a minimizer of \({\mathcal {T}}(m)\) over \(C(f^+,f^-)\) so that \({\text {struc}}\left[ f \right] = {\mathcal {T}}(m_f)\).

  1. 1.

    We check absolute homogeneity, positivity, and the triangle inequality.

    1. (a)

      To check absolute homogeneity, let \(\lambda \in {\mathbb {R}}\) be a nonzero scalar. By linearity, \(|\lambda |m \in C(|\lambda | f,|\lambda | g)\) if and only if \(m \in C(f, g)\). Therefore

      $$\begin{aligned}&{\text {EMD}}(|\lambda | f, |\lambda | g) = \min _{m \in C(|\lambda |f,|\lambda | g)} {\mathcal {T}}(m) \nonumber \\&\quad = \min _{m \in C(f,g)} {\mathcal {T}}(|\lambda | m) = |\lambda | \min _{m \in C(f,g)} {\mathcal {T}}(m) = |\lambda | {\text {EMD}}(f,g), \end{aligned}$$
      (66)

      If \(\lambda > 0\), Eq. 66 implies that

      $$\begin{aligned} {\text {struc}}\left[ \lambda f \right] = {\text {EMD}}(\lambda f^+, \lambda f^-) = |\lambda | {\text {EMD}}(f^+, f^-) = |\lambda | {\text {struc}}\left[ f \right] \end{aligned}$$
      (67)

      If \(\lambda < 0\), then \((\lambda f)^{\pm } = |\lambda | f^{\mp }\). Again Eq. 66 implies that

      $$\begin{aligned}&{\text {struc}}\left[ \lambda f \right] = {\text {EMD}}((\lambda f)^+, (\lambda f)^-) = {\text {EMD}}(|\lambda | f^-, |\lambda | f^+) \nonumber \\&\quad = |\lambda | {\text {EMD}}(f^-, f^+) = |\lambda | {\text {EMD}}(f^+, f^-) = |\lambda | {\text {struc}}\left[ f \right] . \end{aligned}$$
      (68)

      Finally, if \(\lambda =0\), then the fact that \({\text {struc}}\left[ \lambda f \right] = \lambda {\text {struc}}\left[ f \right] = 0\) is trivial.

    2. (b)

      Positivity follows immediately from the positivity of \({\text {EMD}}\).

    3. (c)

      The triangle inequality follows from the fact that

      $$\begin{aligned} (f + g)^+ - (f + g)^- = (f^+ - f^-) + (g^+ - g^-) \end{aligned}$$
      (69)

      for all \(f, g \in L^1(\varOmega )\). Thus if \(m_f \in C(f^+,f^-)\) and \(m_g \in C(g^+,g^-)\), then \(m_f + m_g \in C\left( (f+g)^+,(f+g)^-\right) \). Along with the triangle inequality for \({\mathcal {T}}\), this implies that

      $$\begin{aligned} {\text {struc}}\left[ f + g \right]&\equiv {\mathcal {T}}(m_{f+g}) \le {\mathcal {T}}(m_{f} + m_g) \le {\mathcal {T}}(m_{f}) + {\mathcal {T}}(m_g) \nonumber \\&\equiv {\text {struc}}\left[ f \right] + {\text {struc}}\left[ g \right] . \end{aligned}$$
      (70)
  2. 2.

    Because \(\frac{1}{\left||\varOmega \right||}\int _\varOmega (g + c) dx = \frac{1}{\left||\varOmega \right||}\int _\varOmega g dx + c,\) we have that \(g^+ = (g + c)^+,\) and \(g^- = (g + c)^-.\) Therefore

    $$\begin{aligned} {\text {struc}}\left[ g + c \right] = {\text {EMD}}\left( (g + c)^+,(g + c)^-\right) = {\text {EMD}}(g^+,g^-) = {\text {struc}}\left[ g \right] . \end{aligned}$$
    (71)
  3. 3.

    Let \(g = 0\) in Eq. 71 above. Then

    $$\begin{aligned} {\text {struc}}\left[ c \right] = {\text {struc}}\left[ 0 \right] = 0, \quad \forall c \in {\mathbb {R}}. \end{aligned}$$
    (72)
  4. 4.

    Because the constraint in Eq. 11 involves only the difference of \(\rho _1\) and \(\rho _2\), it follows that \({\text {EMD}}(\rho _1,\rho _2) = {\text {EMD}}(\rho _1 + f,\rho _2 + f)\) for any non-negative \(f \in L^1(\varOmega )\). Moreover, because \(\rho _2\) and \(\rho _1\) have the same mass, the average of \(\rho _2 - \rho _1\) is zero. Hence,

    $$\begin{aligned} {\text {struc}}\left[ \rho _2 - \rho _1 \right]&= {\text {EMD}}(\max (\rho _2 - \rho _1,0),\max (\rho _1 - \rho _2,0)) \nonumber \\&= {\text {EMD}}(\max (\rho _2 - \rho _1,0) + \min (\rho _1,\rho _2),\max (\rho _1 - \rho _2,0) \nonumber \\&\quad + \min (\rho _1,\rho _2)) \end{aligned}$$
    (73)

    Since \(\forall x,y \in {\mathbb {R}}, \max (x - y,0) + \min (x,y) = x\), it follows from Eq. 73 that

    $$\begin{aligned} {\text {struc}}\left[ \rho _2 - \rho _1 \right] = {\text {EMD}}(\rho _2,\rho _1) = {\text {EMD}}(\rho _1,\rho _2) \end{aligned}$$
    (74)

\(\square \)

Before proving Theorems 12, we will first prove two useful lemmas, which will be used extensively.

Lemma 2

(\({\text {EMD}}\) triangle inequality) Let \(\varOmega \subset {\mathbb {R}}^n\) be a bounded set and f, g, \(h \in L^{\infty }(\varOmega )\) and \(\int _{\varOmega }f dx = \int _{\varOmega }h dx = \int _{\varOmega }g dx\). Then

$$\begin{aligned} {\text {EMD}}(f,g) \le {\text {EMD}}(f,h) + {\text {EMD}}(h,g). \end{aligned}$$
(75)

Proof

Recall from Prop. 2 that \({\text {struc}}\left[ f - g \right] = {\text {EMD}}(f,g)\), then by the triangle inequality of \({\text {struc}}\left[ \cdot \right] \),

$$\begin{aligned} {\text {EMD}}(f,g) = {\text {struc}}\left[ f - g \right] \le {\text {struc}}\left[ f - h \right] + {\text {struc}}\left[ h - g \right] = {\text {EMD}}(f, h) + {\text {EMD}}(h, g) \end{aligned}$$
(76)

\(\square \)

Lemma 3

(\({\text {struc}}\left[ \cdot \right] \) and \({\text {EMD}}\) of the mean) \(\varOmega \subset {\mathbb {R}}^n\) be a bounded set and \(f \in L^{\infty }(\varOmega )\) and \(\mu = \frac{1}{|\varOmega |}\int _{\varOmega }f dx\). Then

$$\begin{aligned} {\text {struc}}\left[ f \right] = {\text {EMD}}(f , \mu ). \end{aligned}$$
(77)

Proof

Recall from Prop. 2 that \({\text {EMD}}(f,g) = {\text {EMD}}(f + h, g + h),\) therefore

$$\begin{aligned} {\text {struc}}\left[ f \right] = {\text {EMD}}(f^+, f^-) = {\text {EMD}}(f^+ +(\mu - f^-), f^- +(\mu - f^-)) = {\text {EMD}}(f , \mu ). \end{aligned}$$
(78)

\(\square \)

Lemma 4

(\({\text {EMD}}\) Subadditivity) If \({\text {EMD}}(f_1,g_1)\) and \({\text {EMD}}(f_2, g_2)\) are well defined, then so too is \({\text {EMD}}(f_1 + f_2, g_1 + g_2)\), and

$$\begin{aligned} {\text {EMD}}(f_1 + f_2, g_1 + g_2) \le {\text {EMD}}(f_1, g_1) + {\text {EMD}}(f_2, g_2). \end{aligned}$$
(79)

Proof

We use the Eq. 10 of the \({\text {EMD}}\). Let \(\pi _1\) and \(\pi _2\) satisfy the constraint of Eq. 9 for \({\text {EMD}}(f_1, g_1)\) and \({\text {EMD}}(f_2, g_2)\) resp. Then clearly

$$\begin{aligned} \int _{\varOmega } (\pi _1 + \pi _2)dx^{(2)}&= f_1 + f_2 \nonumber \\ \int _{\varOmega } (\pi _1 + \pi _2)dx^{(1)}&= g_1 + g_2 \nonumber \\ \pi _1 + \pi _2&\ge 0, \end{aligned}$$
(80)

and so by the minimality of the \({\text {EMD}}\),

$$\begin{aligned} {\text {EMD}}(f_1, g_1) + {\text {EMD}}(f_2, g_2)&= \int _{\varOmega \times \varOmega }c \pi _1 dx^{(1)} dx^{(2)} + \int _{\varOmega \times \varOmega }c \pi _2 dx^{(1)} dx^{(2)} \nonumber \\&= \int _{\varOmega \times \varOmega }c (\pi _1 + \pi _2) dx^{(1)} dx^{(2)} \nonumber \\&\ge \min _{\pi \ge 0} \int _{\varOmega \times \varOmega }c \pi dx^{(1)} dx^{(2)}\nonumber \\&= {\text {EMD}}(f_1 + f_2, g_1 + g_2) \end{aligned}$$
(81)

where \(\pi \) is subject to the constraints of Eq. 9 where \(\rho _1 = f_1 + f_2\) and \(\rho _2 = g_1 + g_2\). \(\square \)

Lemma 5

(\({\text {EMD}}\) is bounded by the \(L^1\) norm) Let \(\varOmega \) be a bounded set, and \(l \ge \left||x^{(1)} - x^{(2)}\right||_2\) for all \(x^{(1)}, x^{(2)} \in \varOmega \). If \(f,g : \varOmega \rightarrow {\mathbb {R}}^+\) then

$$\begin{aligned} {\text {EMD}}(f,g) \le \frac{l}{2} \left||f - g\right||_{L^1(\varOmega )}. \end{aligned}$$
(82)

Proof

Let \(\gamma = \int _{\varOmega }{(f - g)^+}dx\) and \({x^{\mathrm{c}}}\) be such that \(\left||{x^{\mathrm{c}}}- x\right||_2 \le l/2\)\(\forall x \in \varOmega \) then

$$\begin{aligned} {\text {EMD}}(f,g)&= {\text {struc}}\left[ f - g \right] \le {\text {EMD}}((f-g)^+,\gamma \delta _{{x^{\mathrm{c}}}}) + {\text {EMD}}(\gamma \delta _{{x^{\mathrm{c}}}}, (f-g)^-)\nonumber \\&\le \frac{l}{2} \left||(f - g)^+\right||_{L^1(\varOmega )} + \frac{l}{2} \left||(f - g)^-\right||_{L^1(\varOmega )} = \frac{l}{2} \left||f - g\right||_{L^1(\varOmega )} \end{aligned}$$
(83)

\(\square \)

Lemma 6

(Expectation bound by the standard deviation) Let \(\eta \) be a scalar random variable with zero mean such that \(\mathrm {Var}[\eta ]\) is finite. Then \({\mathbb {E}}\left[ |\eta | \right] \le \sqrt{ \mathrm {Var}[\eta ]}\).

Proof

Let \(\psi \) be the probability distribution for \(\eta \). By the Cauchy-Schwarz inequality,

$$\begin{aligned} {\mathbb {E}}\left[ |\eta | \right] \equiv \int ^\infty _{-\infty } |x|\psi (x)dx \le \left( \int ^{\infty }_{-\infty } x^2\psi (x) dx \right) ^\frac{1}{2} \left( \int ^\infty _{-\infty } \psi (x) dx \right) ^\frac{1}{2} = \big (\mathrm {Var}[\eta ] \big )^{1/2}. \end{aligned}$$
(84)

\(\square \)

We now proceed to the proof of Theorem 1, but first it is helpful to give a brief summary. To bound the EMD from above, we give a candidate transport plan that is based on the multigrid strategy depicted in Fig. 13 for the case \(d=2\). In this case, the strategy is to divide the domain into square windows with two square panels per side, as shown in Fig. 13. The mass in each window is then redistributed in such a way that the new distribution is constant on each window. Each window then becomes a panel in a window that is a factor a factor of two larger in each dimension, and the process is repeated until the distribution on the entire square is constant. For \(d >2\), the plan is the same, except that each window is a hypercube \(2^d\) panels. The cost of the complete transport plan can be bounded by the sum of the costs of the transport plan for each step. These costs are computed in the proof below and their sum leads to the bound in Theorem 1.

Fig. 13
figure 13

The multigrid idea of Theorem 1 when \(\ell =3\). At each step, a transport plan is computed in each 2x2 window. Then the same problem is solved at the next coarser scale. In the above figures, the arrow tip area is proportional to the mass transported at each substep. The function \(H_i\) is defined in Eq. 94

Proof (Proof of Theorem 1)

Since \({\text {struc}}\left[ h_\ell \right] = {\text {struc}}\left[ h_\ell - \mu \right] \) we can assume, without loss of generality, that \(h_\ell \) has zero mean. Consider the case \(\ell = 1\), which will be used for the general setting later. We construct a two-step plan that first moves all of the mass in \(h_1^+\) to the point \({y^{\mathrm{c}}}= ( 1/2, \dots , 1/2)\) at the center of the domain and then moves the mass from \({y^{\mathrm{c}}}\) to \(h_1^-\).Footnote 1

Let \(\gamma = \int _{\varOmega } h^+_1 dy = \int _{\varOmega } h^-_1 dy\), \(\mu _0 = \int _{\varOmega } h_1 dy\), and \(\gamma _{1,k} = |\eta _{1,k} - \mu _0| |\omega _{1,k}|\). Then \({\text {EMD}}(h^+_1,\gamma \delta _{{y^{\mathrm{c}}}}) = {\text {EMD}}(\gamma \delta _{{y^{\mathrm{c}}}},h^-_1)\) and

$$\begin{aligned} {\text {struc}}\left[ h_1 \right] \equiv {\text {EMD}}(h_1^+, h_1^-)&\le {\text {EMD}}(h_1^+, \gamma \delta _{{y^{\mathrm{c}}}}) + {\text {EMD}}(\gamma \delta _{{y^{\mathrm{c}}}}, h_1^-) \nonumber \\&= \sum _{k = 1}^{2^{d}} {\text {EMD}}\left( |\eta _{1,k} - \mu _0|\chi _{1,k}, \gamma _{1,k}\delta _{{y^{\mathrm{c}}}}\right) . \end{aligned}$$
(85)

Thus we turn our attention to computing the terms in the sum above. First,

$$\begin{aligned} {\text {EMD}}(|\eta _{1,k} - \mu _0|\chi _{1,k}, \gamma _{1,k} \delta _{{y^{\mathrm{c}}}}) = |\eta _{1,k} - \mu _0|{\text {EMD}}(\chi _{1,k},|\omega _{1,k}| \, \delta _{{y^{\mathrm{c}}}}). \end{aligned}$$
(86)

There is only one one admissible transport plan (see from Eq. 10) between \(\chi _{1,k}\) and \(|\omega _{1,k}|\delta _{{y^{\mathrm{c}}}}\); it simply moves the mass around each point of \(\omega _{1,k}\) to \({y^{\mathrm{c}}}\):

$$\begin{aligned} \pi \left( x^{(1)},x^{(2)}\right) = \chi _{1,k} (x^{(1)}) \times \delta _{{y^{\mathrm{c}}}}(x^{(2)}) \end{aligned}$$
(87)

If we consider the more general case where \(\omega _{1,k}\) has side length l, then upon a change of coordinates,

$$\begin{aligned} {\text {EMD}}(\chi _{1,k},|\omega _{1,k}| \delta _{{y^{\mathrm{c}}}})&= \int _\varOmega \int _\varOmega \left||x^{(1)} - x^{(2)}\right||_2 \chi _{1,k} (x^{(1)}) \times \delta _{{y^{\mathrm{c}}}}(x^{(2)}) dx^{(1)} dx^{(2)}\nonumber \\&= \int _{\omega _{1,k}}\int _\varOmega \left||x^{(1)} - x^{(2)}\right||_2\delta _{{y^{\mathrm{c}}}}(x^{(2)}) dx^{(1)} dx^{(2)} \nonumber \\&= \int _{\omega _{1,k}} \left||x^{(1)} - {y^{\mathrm{c}}}\right||_2 dx^{(1)} = \int _{\left[ 0,l\right] ^d}\left||x^{(1)}\right||_2 dx^{(1)} \nonumber \\&\le \sqrt{d} \int _{\left[ 0,l\right] ^d} \left||x^{(1)}\right||_\infty dx^{(1)} \le \sqrt{d} \frac{l^{d+1}}{2} \end{aligned}$$
(88)

Substituting Eqs. 86 and 88 into Eq. 85 gives

$$\begin{aligned} {\text {struc}}\left[ h_1 \right]&\le \sum _{k=1}^{2^d} |\eta _{1,k} - \mu _0|\frac{\sqrt{d}l^{d+1}}{2} = \frac{\sqrt{d}}{2^{d+2}} \sum _{k=1}^{2^d} |\eta _{1,k} - \mu _0|, \end{aligned}$$
(89)

where we have used the fact that when \(\ell = 1, l = 2^{-1}\). A standard calculation shows that

$$\begin{aligned} \mathrm {Var}\left( \left| \eta _{1,k} - \mu _0\right| \right) \le \mathrm {Var}(|\eta _{1,k}|),\quad i = 1,\dots ,2^{d}. \end{aligned}$$
(90)

Further \({\mathbb {E}}\left[ \eta _{1,k} \right] = 0\) and so Lemma 6 give:

$$\begin{aligned} {\mathbb {E}}\left[ |\eta _{1,k} - \mu _0| \right] \le \sigma \end{aligned}$$
(91)

with Eq. 89 and get

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_1 \right] \right]&\le \frac{\sqrt{d} 2^{d}}{2^{(d+2)}} \sum _{k = 1}^{2^d} {\mathbb {E}}\left[ |n_{1,k} - \mu _0| \right] \le \frac{\sqrt{d} 2^{d}}{2^{(d+2)}} \sigma = \frac{\sqrt{d}}{4}\sigma . \end{aligned}$$
(92)

Now we consider the case when \(\ell > 1\). Define the functions

$$\begin{aligned} H_\ell (y)= & {} h_\ell (y) = \sum ^{2^{\ell d}}_{k = 1} \eta _{\ell ,k}\chi _{\ell ,k}(y) \end{aligned}$$
(93)
$$\begin{aligned} H_i(y)= & {} \sum ^{2^{id}}_{k = 1} \mu _{i,k} \chi _{i,k}(y), \text { where } \mu _{i,k} = \frac{1}{|\omega _{i,k}|}\int _{\omega _{i,k}} H_{i+1}(y) dy, \quad i = 0,1,\dots ,\ell -1.\nonumber \\ \end{aligned}$$
(94)

Instances of \(H_i\) are shown in Fig. 13. The function \(h_\ell \) can be written as the telescoping sum

$$\begin{aligned} h_\ell = H_\ell = (H_\ell - H_{\ell - 1}) + (H_{\ell - 1} - H_{\ell - 2}) + \dots + (H_2 - H_1) + (H_1 - H_0) + H_0. \end{aligned}$$
(95)

Moreover, because \(H_i = \sum ^{2^{d(i-1)}}_{k = 1}H_i\chi _{i - 1,k}\), it follows that

$$\begin{aligned} H_i - H_{i-1} = \sum ^{2^{d(i-1)}}_{k = 1}s_{i-1,k}, \quad \text {where } s_{i-1,k}(y) = \left( H_i(y) - \mu _{i-1,k} \right) \chi _{i-1,k}(y). \end{aligned}$$
(96)

We apply \({\text {struc}}\left[ \cdot \right] \) to Eq. 95, using Eq. 96, the triangle inequality, and the fact that \({\text {struc}}\left[ H_0 \right] =0\) (because it is a constant). The result is

$$\begin{aligned} {\text {struc}}\left[ h_\ell \right] \le \sum ^{\ell }_{i = 1} {\text {struc}}\left[ H_i - H_{i - 1} \right] \le \sum ^{\ell }_{i = 1} \sum ^{2^{d(i - 1)}}_{k = 1} {\text {struc}}\left[ s_{i-1,k} \right] . \end{aligned}$$
(97)

To evaluate \({\text {struc}}\left[ s_{i-1,k} \right] \), we repeat the argument used to generate Eq. 89. This gives

$$\begin{aligned} {\text {struc}}\left[ s_{i-1,k} \right] \equiv {\text {EMD}}(s_{i-1,k}^+, s_{i-1,k}^-) \le \frac{\sqrt{d}l^{d+1}}{2} \sum _{k':\omega _{i,k'} \subset \omega _{i-1,k}} |\mu _{i,k'} - \mu _{i-1,k}|. \end{aligned}$$
(98)

By construction,

$$\begin{aligned} \mu _{i-1,k} = 2^{-d} \sum _{k':\omega _{i,k'} \subset \omega _{i-1,k}} \mu _{i,k}. \end{aligned}$$
(99)

It follows that the random variable \((\mu _{i+1,k'} - \mu _{i,k})\) that appears in Eq. 98 has zero mean. Thus Lemma 6 applies and

$$\begin{aligned} {\mathbb {E}}\left[ |\mu _{i,k'} - \mu _{i-1,k}| \right] \le \left( \mathrm {Var}[|\mu _{i,k'} - \mu _{i-1,k}|]\right) ^{\frac{1}{2}}&\le \left( \mathrm {Var}[|\mu _{i,k'}|]\right) ^{\frac{1}{2}} := \sigma _i, \end{aligned}$$
(100)

where the last two inequalities above follows from standard probability theory. Also, because of Eq. 99, another standard probability result gives

$$\begin{aligned} \sigma _{i} = 2^{-\frac{d}{2}} \sigma _{i+1} = \dots = 2^{-\frac{d}{2}(\ell - i)} \sigma _\ell , \quad i = 1,\dots , \ell . \end{aligned}$$
(101)

We now take the expectation of Eq. 98, using the fact that \(\omega _{i,k'}\) has side length \(l = 2^{-i}\), along with the triangle inequality and Eq. 101. The result is

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ s_{i-1,k} \right] \right] \le \sqrt{d} 2^{-i(d+1)-1} \sum _{k':\omega _{i,k'} \subset \omega _{i-1,k}} 2^{-\frac{d}{2}(\ell - i)} \sigma _\ell = \sqrt{d} 2^{-\frac{id}{2} -i+d- \frac{d\ell }{2} -1} \sigma _\ell \end{aligned}$$
(102)

Substituting this bound into Eq. 97 gives

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_\ell \right] \right] \le \sum ^{\ell }_{i = 1} \sum ^{2^{d(i - 1)}}_{k = 1} \sqrt{d} 2^{-\frac{id}{2} -i+d- \frac{d\ell }{2} -1} \sigma _\ell = \frac{\sqrt{d}\sigma _\ell }{2^{1+\frac{\ell d}{2}}}\sum ^{\ell }_{i = 1} \left( 2^{\frac{d}{2} - 1}\right) ^i \end{aligned}$$
(103)

If \(d = 2,\) then \(2^{\frac{d}{2} - 1} = 1\) and Eq. 103 becomes

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_\ell \right] \right] = {\mathbb {E}}\left[ {\text {struc}}\left[ H_\ell \right] \right]&\le \frac{2\sigma _\ell }{2^{1+i}}\ell = \frac{\sigma _\ell \ell }{2^{\ell }}. \end{aligned}$$
(104)

If \(d \ge 3\), then \(2^{\frac{d}{2}-1}/(2^{\frac{d}{2}-1} -1) \le 4\), so the geometric sum in Eq. 103 is

$$\begin{aligned} \sum ^{\ell }_{i = 1} \left( 2^{\frac{d}{2} - 1}\right) ^i = \frac{2^{\left( \frac{d}{2} - 1\right) (\ell + 1)} - 2^{\frac{d}{2} - 1}}{2^{\frac{d}{2} - 1} - 1} \le \frac{2^{\frac{d}{2} - 1} 2^{\left( \frac{d}{2} - 1\right) \ell } }{{2^{\frac{d}{2} - 1} - 1}} \le 2^{\frac{\ell d}{2} - \ell + 2}. \end{aligned}$$
(105)

Thus for \(d \ge 3\),

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_\ell \right] \right]&\le \sqrt{d}\sigma _\ell \frac{2^{\frac{\ell \sqrt{d}}{2} - \ell + 2}}{2^{1+\frac{\ell \sqrt{d}}{2}}} = \sqrt{d}\sigma _\ell 2^{-\ell +1} \end{aligned}$$
(106)

Finally, setting \(\epsilon _\ell = 2^{-\ell }\) gives

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_\ell \right] \right] \le \sigma {\left\{ \begin{array}{ll} -\epsilon _\ell \log (\epsilon _\ell )&{}\quad \text { when }d = 2\\ 2\sqrt{d}\epsilon _\ell &{}\quad \text { when }d > 2\\ \end{array}\right. } \end{aligned}$$
(107)

This completes the proof. \(\square \)

Proof (Proof of Lemma 1)

The proof follows directly from the definition of \(h_\ell \) in the statement of Theorem 1:

$$\begin{aligned} {\mathbb {E}}\left[ \left||h_\ell \right||_2^2 \right] = {\mathbb {E}}\left[ \int _{[0,1)^d} \left( h_\ell (y)\right) ^2 dy \right] = \sum ^{2^{\ell d}}_{k = 1} {\mathbb {E}}\left[ \eta _{\ell ,k}^2 \right] 2^{-\ell d} = 2^{-\ell d}\sum ^{2^{\ell d}}_{k=1} \sigma ^2= \sigma ^2. \end{aligned}$$
(108)

Proof (Proof of Theorem 2)

Without loss of generality, assume that \(\phi \) is positive a.e. (If not, simply replace \(\phi \) by \(\phi - {{\,\mathrm{ess\,inf}\,}}{\phi }\) and use Eq. 71.) By construction, \(\phi \) and \(R_\ell \phi \) have the same average over Y, which we denote by \(\mu \). Thus by Lemmas 2 and 3,

$$\begin{aligned} {\text {struc}}\left[ R_\ell \phi \right] = {\text {EMD}}(R_\ell \phi ,\mu ) \le {\text {EMD}}(R_\ell \phi , \phi ) + {\text {EMD}}(\phi ,\mu ) = {\text {EMD}}(R_\ell \phi ,\phi ) + {\text {struc}}\left[ \phi \right] . \end{aligned}$$
(109)

Hence

$$\begin{aligned} {\text {struc}}\left[ R_\ell \phi \right] - {\text {struc}}\left[ \phi \right] \le {\text {EMD}}(R_\ell \phi , \phi ). \end{aligned}$$
(110)

One the other hand, switching the roles of \(R_\ell \phi \) and \(\phi \) Eq. 109 gives

$$\begin{aligned} {\text {struc}}\left[ \phi \right] - {\text {struc}}\left[ R_\ell \phi \right] \le {\text {EMD}}(R_\ell \phi , \phi ) \end{aligned}$$
(111)

Together Eqs. 110 and 110 imply the bound

$$\begin{aligned} |{\text {struc}}\left[ R_\ell \phi \right] - {\text {struc}}\left[ R_\ell \phi \right] | \le {\text {EMD}}(R_\ell \phi , \phi ). \end{aligned}$$
(112)

We now bound \({\text {EMD}}(R_\ell \phi , \phi )\). For any \(\ell ,i\)\(\int _{\omega _{\ell ,i}} R_\ell \phi dy = \int _{\omega _{\ell ,i}} \phi dy\). Thus by Lemma 4,

$$\begin{aligned} {\text {EMD}}(R_\ell \phi , \phi ) \le \sum ^{2^{\ell d}}_{i = 1} {\text {EMD}}(R_\ell \phi \chi _{\ell ,i}, \phi \chi _{\ell ,i}) \end{aligned}$$
(113)

and further by Lemma 5, for \(i = 1, \dots , 2^{\ell d}\)

$$\begin{aligned} {\text {EMD}}(R_\ell \phi \chi _{\ell ,i}, \phi \chi _{\ell ,i}) \le \left||R_\ell \phi - \phi \right||_{L^1(\omega _{\ell ,i})} {d}^{1/2} 2^{-\ell } \end{aligned}$$
(114)

Now we bound \(\left||R_\ell \phi - \phi \right||_{L^1(\omega _{\ell , i})}\). Since \(\phi \in C^1\left( {{\overline{Y}}}\right) \), it follows that, for \(y \in \omega _{\ell ,i}\)

$$\begin{aligned} | R_\ell \phi (y) - \phi (y) |&= \frac{1}{|\omega _{\ell ,i}|} \left| \int _{\omega _{\ell ,i}} (\phi (y') - \phi (y)) dy' \right| \nonumber \\&\le \sup _{y \in \omega _{\ell ,i}} |\nabla \phi (y)| \sup _{y \in \omega _{\ell ,i}} |y'-y| \le d^{1/2} 2^{-\ell } \sup _{y \in \omega _{\ell ,i}} |\nabla \phi (y)| \end{aligned}$$
(115)

Therefore

$$\begin{aligned} \left||R_\ell \phi - \phi \right||_{L^1(\omega _{\ell ,i})} \le |\omega _{\ell ,i}| d^{1/2}2^{-\ell }\sup _{y \in \omega _{\ell ,i}}|\nabla \phi (y)| = d^{1/2}2^{-(d+1) \ell }\sup _{y \in \omega _{\ell ,i}}|\nabla \phi (y)|.\nonumber \\ \end{aligned}$$
(116)

Combining Eqs. 112, 114, and 116 yields

$$\begin{aligned} |{\text {struc}}\left[ R_\ell \phi \right] - {\text {struc}}\left[ \phi \right] |&\le \sum ^{2^{\ell d}}_{i = 1} d 2^{-(d+2)\ell } \sup _{y \in \omega _{\ell ,i}}|\nabla \phi (y)| \le d2^{-2\ell }\sup _{y \in Y}|\nabla \phi (y)| \nonumber \\&\equiv C(|\nabla \phi |)d\epsilon _\ell ^{2}, \end{aligned}$$
(117)

where \(C(|\nabla \phi |) = \sup _{y \in Y} |\nabla \phi (y)|\) and \(\epsilon _\ell = 2^{-\ell }\). This completes the proof. \(\square \)

Appendix B: Line Integral Operators

Recall from Sect. 2 the spaces \({\mathcal {U}}\) and \({\mathcal {B}}\) of functions defined on domains X and Y, respectively. An operator \({\mathcal {L}}:{\mathcal {U}}\rightarrow {\mathcal {B}}\) is a line integral operators (LIO), if \(\forall f \in {\mathcal {U}},\)

$$\begin{aligned} ({\mathcal {L}}f)(y) = \int _{P_y} f(x) d\ell _x = \int ^1_0 f({{\hat{x}}}(t;y))\left||{{\hat{x}}}'(t;y)\right||_2 dt, \end{aligned}$$
(118)

where for each \(y \in Y\), \(P_y = \{ {{\hat{x}}}(t;y) : t \in (0,1) \} \subset X\), and \({{\hat{x}}}(t;y)\) is continuous in t and y. In particular, if f is a continuous function on X, then \({\mathcal {L}}f\) is continuous on Y. Figure 14a, b illustrate a LIO in two dimensions. The recipe we used to generate examples of \({\hat{x}}\) is given below.

Fig. 14
figure 14

An example of a LIO. Points on the right are used to generate curves on the left of the same color. Coefficients for the parameterization in Eq. 120 of \(P_y\) come from Perlin noise

To discretize \({\mathcal {L}}\), we generate a path \(P_y\) for each hypercube \(\omega \subset Y\). Line integrals along these paths are approximated via quadrature. For all LIOs, we use same the quadratures, and X, and Y.

To construct the LIO for Experiments 1–3, we do the following.

  1. 1.

    Construction of numerical grids In all of the computational examples, the domains X and Y are unit squares in \({\mathbb {R}}^2\). We discretize these domains with \(N^x\) and \(N^y\) points, respectively, on each side and define grid points

    $$\begin{aligned} x_{i,j}= & {} \left( i\varDelta x, j \varDelta x\right) , \quad 0 \le i,j \le N^x -1, \end{aligned}$$
    (119a)
    $$\begin{aligned} y_{k,l}= & {} \left( k\varDelta y, l \varDelta y\right) , \quad 0 \le k,l \le N^y -1, \end{aligned}$$
    (119b)

    where \(\varDelta x = 1/N^x\) and \(\varDelta y = 1/ N^y\). We then generate values \(u_{i,j}\) by sampling a prescribed function at the points \(x_{i,j}\). An illustrative example is given in Fig. 3a, where piecewise smooth rings have been sampled on a \(64\times 64\) grid.

  2. 2.

    Generation of smooth paths To form \({{\hat{x}}}\), we first sample coefficients \(\alpha _{p,r}\) for \(p = 0,\dots , 4\) and \(r = 1,2\) from Perlin noise [27, 28] of order four. In Fig. 14c, a realization of one such coefficient as a function of y is shown on a \(256\times 256\) grid. Given these coefficients, we let \({{\bar{x}}} =( x^{(1)},x^{(2)})\) be polynomials in t:

    $$\begin{aligned} {{\bar{x}}}^{(r)}(t;y_{k,l})&= \sum ^{4}_{p = 0}\frac{\alpha _{p,r}(y_{k,l})}{p!} t^p, \quad r=1,2, \end{aligned}$$
    (120)

    and then let \({{\hat{x}}}\) be the following normalization of \({{\bar{x}}}\):

    $$\begin{aligned} {{\hat{x}}}^{(r)}(t;y_{k,l}) = \frac{{{\bar{x}}}^{(r)}(t;y_{k,l}) - \min _{s \in [0,1]} {{\bar{x}}}^{(r)}(s;y_{k,l})}{\max _{s \in [0,1]}\bar{x}^{(r)}(s;y_{k,l}) - \min _{s \in [0,1]}{{\bar{x}}}^{(r)}(s;y_{k,l})}, \quad r=1,2. \end{aligned}$$
    (121)
  3. 3.

    Let the paths be given as

    $$\begin{aligned} {{\hat{x}}}(t; y_{k,l})= & {} \left( {{\hat{x}}}^{(1)}(t;y_{k,l}), {{\hat{x}}}^{(2)}(t;y_{k,l}) \right) \end{aligned}$$
    (122a)
    $$\begin{aligned} P_{y_{k,l}}= & {} \{ {{\hat{x}}}(t; y_{k,l}) :t \in [0, 1]\}. \end{aligned}$$
    (122b)

    To discretize \({\mathcal {L}}\) we approximate the integral in Eq. 118, for each grid point \(y_{k,l} \subset Y\), using an arc length parameterization of the curve \(P_{y_{k,l}}\). The resulting quadrature takes the form

    $$\begin{aligned} ({\mathcal {L}}f) (y_{k, l}) \approx \sum _{q} w_q f(x_q) \end{aligned}$$
    (123)

    where \(\{x_q\} \subset X\) and each weight \(w_q > 0\). Because this quadrature involves points \(x_q\) not on the computational grid, we approximate the value \(f(x_q)\) by interpolating the grid function values \(f(x_{i,j})\). The result takes the form

    $$\begin{aligned} ({\mathcal {L}}f)(y_{k, l}) \approx \sum _{i, j} L_{(k, l),(i, j)}f(x_{i,j}), \end{aligned}$$
    (124)

    where the values \(L_{(k, l),(i, j)}\) are now the components of the matrix operator \({\mathbf {L}}\).

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Puthawala, M.A., Hauck, C.D. & Osher, S.J. Diagnosing Forward Operator Error Using Optimal Transport. J Sci Comput 80, 1549–1576 (2019). https://doi.org/10.1007/s10915-019-00989-0

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10915-019-00989-0

Navigation