Diagnosing Forward Operator Error Using Optimal Transport

Puthawala, Michael A.; Hauck, Cory D.; Osher, Stanley J.

doi:10.1007/s10915-019-00989-0

Diagnosing Forward Operator Error Using Optimal Transport

Published: 03 July 2019

Volume 80, pages 1549–1576, (2019)
Cite this article

Journal of Scientific Computing Aims and scope Submit manuscript

295 Accesses
1 Citation
Explore all metrics

Abstract

We investigate overdetermined linear inverse problems for which the forward operator may not be given accurately. We introduce a new tool called the structure, based on the Wasserstein distance, and propose the use of this to diagnose and remedy forward operator error. Computing the structure turns out to use an easy calculation for a Euclidean homogeneous degree one distance, the Earth Mover’s Distance, based on recently developed algorithms. The structure is proven to distinguish between noise and signals in the residual and gives a plan to help recover the true direct operator in some interesting cases. We expect to use this technique not only to diagnose the error, but also to correct it, which we do in some simple cases presented below.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Ishikawa type mean convergence theorems for finding common fixed points of nonlinear mappings in Hilbert spaces

Article 19 April 2022

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Article 14 February 2018

Mean convergence theorems using hybrid methods to find common fixed points for noncommutative nonlinear mappings in Hilbert spaces

Article 19 March 2021

Notes

While the definition of the EMD in Eq. 10 is still well-defined for delta function, the formula in Eq. 11 is not. Thus while we use Eq. 11 for numerical calculations, we often rely on Eq. 10 for theoretical bounds.

References

Arridge, S.R.: Optical tomography in medical imaging. Inverse Problems 15(2), R41 (1999)
Article MathSciNet MATH Google Scholar
Becker, S., Horesh, L., Aravkin, A., Zhuk, S.: General optimization framework for robust and regularized 3D full waveform inversion (2015). arXiv preprint arXiv:1504.04677
Broyden, C.G.: The convergence of a class of double-rank minimization algorithms 1. General considerations. IMA J. Appl. Math. 6(1), 76–90 (1970)
Article MATH Google Scholar
Chahine, M.T.: Inverse problems in radiative transfer: determination of atmospheric parameters. J. Atmos. Sci. 27(6), 960–967 (1970)
Article Google Scholar
Chan, T.F., Shen, J.J.: Image Processing and Analysis: Variational, PDE, Wavelet, and Stochastic Methods, vol. 94. SIAM, Philadelphia (2005)
Book MATH Google Scholar
Daubechies, I.: Orthonormal bases of compactly supported wavelets. Commun. Pure Appl. Math. 41(7), 909–996 (1988)
Article MathSciNet MATH Google Scholar
Engquist, Bjorn, Froese, Brittany D.: Application of the Wasserstein metric to seismic signals (2013). arXiv preprint arXiv:1311.4581
Engquist, Bjorn, Froese, Brittany D, Yang, Yunan.: Optimal transport for seismic full waveform inversion (2016). arXiv preprint arXiv:1602.01540
Evans, L.C.: Partial differential equations and Monge–Kantorovich mass transfer. Curr. Dev. Math. 1997(1), 65–126 (1997)
Article MATH Google Scholar
Evans, L.C., Gangbo, W.: Differential Equations Methods for the Monge–Kantorovich Mass Transfer Problem, vol. 653. American Mathematical Society, Providence (1999)
MATH Google Scholar
Fedorczak, N., Brochard, F., Bonhomme, G., Schneider, K., Farge, M., Monier-Garbet, P., et al.: Tomographic reconstruction of tokamak plasma light emission from single image using wavelet–vaguelette decomposition. Nuclear Fusion 52(1), 013005 (2011)
Google Scholar
Fletcher, R.: A new approach to variable metric algorithms. Comput. J. 13(3), 317–322 (1970)
Article MATH Google Scholar
Freeman, A.: SAR calibration: an overview. IEEE Trans. Geosci. Remote Sens. 30(6), 1107–1121 (1992)
Article Google Scholar
Goldfarb, D.: A family of variable-metric methods derived by variational means. Math. Comput. 24(109), 23–26 (1970)
Article MathSciNet MATH Google Scholar
Goldstein, T., Osher, S.: The split Bregman method for L1-regularized problems. SIAM J. Imaging Sci. 2(2), 323–343 (2009)
Article MathSciNet MATH Google Scholar
Golub, G.H.: Matrix Computations. Johns Hopkins University Press, Baltimore (1996)
MATH Google Scholar
Golub, G.H., Hansen, P.C., O’Leary, D.P.: Tikhonov regularization and total least squares. SIAM J. Matrix Anal. Appl. 21(1), 185–194 (1999)
Article MathSciNet MATH Google Scholar
Hansen, P.C.: Analysis of discrete ill-posed problems by means of the L-curve. SIAM Rev. 34(4), 561–580 (1992)
Article MathSciNet MATH Google Scholar
Hansen, P.C., O’Leary, D.P.: The use of the L-curve in the regularization of discrete ill-posed problems. SIAM J. Sci. Comput. 14(6), 1487–1503 (1993)
Article MathSciNet MATH Google Scholar
Jacobs, M., Léger, F., Li, W., Osher, S.: Solving large-scale optimization problems with a convergence rate independent of grid size (2018). arXiv preprint arXiv:1805.09453
Kennedy, M.C., O’Hagan, A.: Bayesian calibration of computer models. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 63(3), 425–464 (2001)
Article MathSciNet MATH Google Scholar
Kirsch, A.: An Introduction to the Mathematical Theory of Inverse Problems, vol. 120. Springer, New York (2011)
Book MATH Google Scholar
Li, W., Osher, S., Gangbo, W.: A fast algorithm for earth Mover’s distance based on optimal transport and L1 type regularization (2016). arXiv preprint arXiv:1609.07092
Li, W., Ryu, E.K., Osher, S., Yin, W., Gangbo, W.: A parallel method for earth Mover’s distance. J. Sci. Comput. 75(1), 182–197 (2018)
Article MathSciNet MATH Google Scholar
Mallat, S.G.: Multiresolution approximations and wavelet orthonormal bases of $\text{ l }^2$(r). Trans. Am. Math. Soc. 315(1), 69–87 (1989)
MATH Google Scholar
Oliver, D.S., Reynolds, A.C., Liu, N.: Inverse Theory for Petroleum Reservoir Characterization and History Matching. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Perlin, K.: An image synthesizer. ACM SIGGRAPH Comput. Graph. 19(3), 287–296 (1985)
Article Google Scholar
Perlin, K.: Improving noise. In: ACM Transactions on Graphics (TOG), vol. 21, pp. 681–682. ACM (2002)
Rudin, L.I., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D: Nonlinear Phenom. 60(1–4), 259–268 (1992)
Article MathSciNet MATH Google Scholar
Ryu, E., Chen, Y., Li, W., Osher, S.: Vector and matrix optimal mass transport: theory, algorithm, and applications (2017). arXiv:1712.10279 [math.OC]
Schneider, U., Pedroni, E., Lomax, A.: The calibration of ct hounsfield units for radiotherapy treatment planning. Phys. Med. Biol. 41(1), 111 (1996)
Article Google Scholar
Shanno, D.F.: Conditioning of quasi-Newton methods for function minimization. Math. Comput. 24(111), 647–656 (1970)
Article MathSciNet MATH Google Scholar
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, New York (2008)
MATH Google Scholar
Wingen, A., Shafer, M.W., Unterberg, E.A., Hill, J.C., Hillis, D.L.: Regularization of soft-X-ray imaging in the DIII-D tokamak. J. Comput. Phys. 289, 83–95 (2015)
Article MathSciNet MATH Google Scholar
Yang, Y., Engquist, B., Sun, J., Hamfeldt, B.F.: Application of optimal transport and the quadratic Wasserstein metric to full-waveform inversion. Geophysics 83(1), R43–R62 (2018)
Article Google Scholar
Zhu, C., Byrd, R.H., Lu, P., Nocedal, J.: LBFGS-B: Fortran Subroutines for Large-Scale Bound Constrained Optimization. Report NAM-11, EECS Department, Northwestern University (1994)

Download references

Author information

Authors and Affiliations

Department of Mathematics, University of California, Los Angeles, CA, 90095, USA
Michael A. Puthawala & Stanley J. Osher
Computational Mathematics Group, Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN, 37831, USA
Cory D. Hauck

Authors

Michael A. Puthawala
View author publications
You can also search for this author in PubMed Google Scholar
Cory D. Hauck
View author publications
You can also search for this author in PubMed Google Scholar
Stanley J. Osher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Michael A. Puthawala.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This manuscript has been authored, in part, by UT-Battelle, LLC, under Contract No. DE-AC0500OR22725 with the U.S. Department of Energy. The United States Government retains and the publisher, by accepting the article for publication, acknowledges that the United States Government retains a non-exclusive, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allow others to do so, for the United States Government purposes. The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http://energy.gov/downloads/doe-public-access-plan).

M. A. Puthawala and S. J. Osher: The research was sponsored by Department of Energy Grant DOE-SC0013838 and NSF (STROBE), NSFC 11671005. C. D. Hauck: The research was sponsored by the Office of Advanced Scientific Computing Research and performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. DE-AC05-00OR22725.

Appendices

Appendix A: Proofs

Proof (Proof of Proposition 1)

Given $\varPhi ({\mathbf {v}};\lambda ) = \lambda \left||{\mathbf {C}}{\mathbf {v}}\right||^2_2$, the normal equations for Eq. 6 are

$$\begin{aligned} ({\mathbf {L}}_\theta ^T{\mathbf {L}}_\theta + \lambda {\mathbf {C}}^T{\mathbf {C}}){{\tilde{{\mathbf {u}}}}}_{\theta ,\eta } = {\mathbf {L}}_\theta ^T ({\mathbf {b}}+ \varvec{\eta }). \end{aligned}$$

(51)

Therefore ${{\tilde{{\mathbf {L}}}}}^{-1}_\theta = ({\mathbf {L}}_\theta ^T {\mathbf {L}}_\theta + \lambda {\mathbf {C}}^T {\mathbf {C}}^T)^{-1} {\mathbf {L}}_\theta ^T$. Using the GSVD in Eq. 18, a direct calculation gives

$$\begin{aligned} {\mathbf {L}}_\theta {{\tilde{{\mathbf {L}}}}}_\theta ^{-1} = {\mathbf {U}}_\theta {\mathbf {D}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T, \quad \text {where} {\mathbf {D}}_{\theta ,\lambda } := \frac{ {\varvec{\Sigma }}^2_{\theta } }{ {\varvec{\Sigma }}_{\theta }^2 + \lambda {\varvec{\Gamma }}_{\theta }^2} \in {\mathbb {R}}^{n \times n}. \end{aligned}$$

(52)

Thus according to the definition of the residual in Eq. 8,

$$\begin{aligned} {\mathbf {r}}_{ \theta ,\eta }&= ({\mathbf {I}}- {\mathbf {L}}{{\tilde{{\mathbf {L}}}}}^{-1} ) ( {\mathbf {b}}+ \varvec{\eta }) = {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T ({\mathbf {b}}+ \varvec{\eta }) + ({\mathbf {I}}- {\mathbf {U}}_\theta {\mathbf {U}}_\theta ^T)({\mathbf {b}}+ \varvec{\eta }) \end{aligned}$$

(53)

where

$$\begin{aligned} {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } := ({\mathbf {I}}- {\mathbf {D}}_{\theta ,\lambda }) = \frac{\lambda {\varvec{\Gamma }}^2_{\theta }}{{\varvec{\Sigma }}_{\theta }^2 + \lambda {\varvec{\Gamma }}_\theta ^2} > 0. \end{aligned}$$

(54)

We first bound two of the deterministic components of the residual. Using the GSVD,

$$\begin{aligned} {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T {\mathbf {b}}&= {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T {\mathbf {L}}_\theta {\mathbf {u}}+ {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T ({\mathbf {b}}- {\mathbf {L}}_\theta {\mathbf {u}}) \nonumber \\&= {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\varvec{\Sigma }}_\theta {\mathbf {Z}}_\theta ^T {\mathbf {u}}+ {\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T ({\mathbf {b}}- {\mathbf {L}}_\theta {\mathbf {u}}). \end{aligned}$$

(55)

Since $\left||{{\hat{{\mathbf {D}}}}}_{\theta ,\lambda }\right||_2 \le 1$ and ${\mathbf {U}}_\theta $ is orthogonal, it follows that

$$\begin{aligned} \left||{\mathbf {U}}_\theta {\hat{{\mathbf {D}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T ({\mathbf {b}}- {\mathbf {L}}_\theta {\mathbf {u}})\right||^2_2 \le \left||({\mathbf {b}}- {\mathbf {L}}_\theta {\mathbf {u}})\right||^2_2 \end{aligned}$$

(56)

Furthermore, since

$$\begin{aligned} {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\varvec{\Sigma }}_\theta = \frac{\lambda {\varvec{\Gamma }}^2_{\theta }{\varvec{\Sigma }}_\theta }{{\varvec{\Sigma }}_{\theta }^2 + \lambda {\varvec{\Gamma }}_\theta ^2} \le \frac{1}{2} \sqrt{\lambda } {\varvec{\Gamma }}_\theta \le \frac{1}{2} \sqrt{\lambda } {\mathbf {I}}\end{aligned}$$

(57)

(where the inequalities between the diagonal matrices above are interpreted element-wise), it follows that

$$\begin{aligned} \left||{\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\varvec{\Sigma }}_\theta {\mathbf {Z}}_\theta ^T {\mathbf {u}}\right||^2_2 \le \left||{{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\varvec{\Sigma }}_\theta \right||^2_2\left||{\mathbf {Z}}_\theta ^T {\mathbf {u}}\right||^2_2 \le \frac{1}{4}\lambda \left||{\mathbf {Z}}_\theta ^T {\mathbf {u}}\right||^2_2. \end{aligned}$$

(58)

We next bound the noise component of the residual. Let ${\mathbf {W}}_\theta \in {\mathbb {R}}^{m \times (m-n)}$ be a matrix such that ${\mathbf {Q}}:= ({\mathbf {U}}_\theta | {\mathbf {W}}_\theta ) \in {\mathbb {R}}^{m \times m}$ is orthogonal and set

$$\begin{aligned} \varvec{\alpha }= \begin{pmatrix} \varvec{\alpha }_\parallel \\ \varvec{\alpha }_{\perp } \end{pmatrix} := {\mathbf {Q}}^T \varvec{\eta }= \begin{pmatrix} {\mathbf {U}}^T_\theta \varvec{\eta }\\ {\mathbf {W}}_\theta ^T \varvec{\eta }\end{pmatrix}. \end{aligned}$$

(59)

Then

$$\begin{aligned} \left||({\mathbf {I}}- {\mathbf {L}}{{\tilde{{\mathbf {L}}}}}^{-1} ) \varvec{\eta }\right||^2_2 = \left||{\mathbf {U}}_\theta {{\hat{{\mathbf {D}}}}}_{\theta ,\lambda } {\mathbf {U}}_\theta ^T \varvec{\eta }+ ({\mathbf {I}}- {\mathbf {U}}_\theta {\mathbf {U}}_\theta ^T) \varvec{\eta }\right||^2_2 = \left||{\mathbf {U}}_\theta \hat{\mathbf {D}}_{\theta ,\lambda } \varvec{\alpha }_\parallel \right||^2_2 + \left||{\mathbf {W}}_\theta \varvec{\alpha }_{\perp }\right||^2_2,\nonumber \\ \end{aligned}$$

(60)

where the last equality uses the fact that the columns of ${\mathbf {U}}_\theta $ and ${\mathbf {W}}_\theta $ are orthogonal and ${\mathbf {I}}- {\mathbf {U}}_\theta {\mathbf {U}}_\theta ^T = {\mathbf {W}}_\theta {\mathbf {W}}_\theta ^T$. Due to the spherical symmetry assumption on $\varvec{\eta }$, $\varvec{\alpha }_\parallel $ and $\varvec{\alpha }_{\perp }$ are spherically symmetric random variables of dimension n and $m-n$, respectively, with components that are independent. Therefore

$$\begin{aligned} {\mathbb {E}}\left[ \left||{\mathbf {U}}_\theta \hat{\mathbf {D}}_{\theta ,\lambda }\varvec{\alpha }_\parallel \right||^2_2 \right]&= {\mathbb {E}}\left[ \left||{{\hat{{\mathbf {D}}}}}_{\theta ,\lambda }\varvec{\alpha }_\parallel \right||^2_2 \right] \nonumber \\&= \sum ^n_{i = 1}\left( \frac{\lambda \gamma ^2_i}{\sigma ^2_i + \lambda \gamma ^2_i} \right) ^2 {\mathbb {E}}\left[ \varvec{\eta }^2_i \right] = \frac{1}{m}{{\,\mathrm{Tr}\,}}({\hat{{\mathbf {D}}}}^2_{\theta ,\lambda }) {\mathbb {E}}\left[ \left||\varvec{\eta }\right||^2_2 \right] \end{aligned}$$

(61)

and

$$\begin{aligned} {\mathbb {E}}\left[ {\mathbf {W}}_\theta \left||\varvec{\alpha }_\perp \right||^2_2 \right] = {\mathbb {E}}\left[ \left||\varvec{\alpha }_\perp \right||^2_2 \right] = \frac{m-n}{m}{\mathbb {E}}\left[ \left||\varvec{\eta }\right||^2_2 \right] . \end{aligned}$$

(62)

This completes the proof. $\square $

Proof (Proof of Proposition 2)

It is convenient to write Eq. 11 in the abstract form

$$\begin{aligned} {\text {EMD}}(\rho _1,\rho _2)&= \min _{m \in C(\rho _1,\rho _2)} {{\mathcal {T}}(m)}, \end{aligned}$$

(63)

where

$$\begin{aligned} {\mathcal {T}}(m)&= \int _\varOmega \left||m\right||_2dx \end{aligned}$$

(64)

$$\begin{aligned} C(\rho _1,\rho _2)&= \left\{ m : \begin{aligned} \quad&\nabla \cdot m(x) + \rho _2(x) - \rho _1(x) = 0\quad \forall x \in \varOmega ,\\&m(x) \cdot n(x) = 0 \quad \forall x \in \partial \varOmega \end{aligned} \right\} . \end{aligned}$$

(65)

In addition, for any $f \in L^1(\varOmega )$, let $m_f$ be a minimizer of ${\mathcal {T}}(m)$ over $C(f^+,f^-)$ so that ${\text {struc}}\left[ f \right] = {\mathcal {T}}(m_f)$.

1.
We check absolute homogeneity, positivity, and the triangle inequality.
1. (a)
  To check absolute homogeneity, let $\lambda \in {\mathbb {R}}$ be a nonzero scalar. By linearity, $|\lambda |m \in C(|\lambda | f,|\lambda | g)$ if and only if $m \in C(f, g)$. Therefore
  $$\begin{aligned}&{\text {EMD}}(|\lambda | f, |\lambda | g) = \min _{m \in C(|\lambda |f,|\lambda | g)} {\mathcal {T}}(m) \nonumber \\&\quad = \min _{m \in C(f,g)} {\mathcal {T}}(|\lambda | m) = |\lambda | \min _{m \in C(f,g)} {\mathcal {T}}(m) = |\lambda | {\text {EMD}}(f,g), \end{aligned}$$
  (66)
  If $\lambda > 0$, Eq. 66 implies that
  $$\begin{aligned} {\text {struc}}\left[ \lambda f \right] = {\text {EMD}}(\lambda f^+, \lambda f^-) = |\lambda | {\text {EMD}}(f^+, f^-) = |\lambda | {\text {struc}}\left[ f \right] \end{aligned}$$
  (67)
  If $\lambda < 0$, then $(\lambda f)^{\pm } = |\lambda | f^{\mp }$. Again Eq. 66 implies that
  $$\begin{aligned}&{\text {struc}}\left[ \lambda f \right] = {\text {EMD}}((\lambda f)^+, (\lambda f)^-) = {\text {EMD}}(|\lambda | f^-, |\lambda | f^+) \nonumber \\&\quad = |\lambda | {\text {EMD}}(f^-, f^+) = |\lambda | {\text {EMD}}(f^+, f^-) = |\lambda | {\text {struc}}\left[ f \right] . \end{aligned}$$
  (68)
  Finally, if $\lambda =0$, then the fact that ${\text {struc}}\left[ \lambda f \right] = \lambda {\text {struc}}\left[ f \right] = 0$ is trivial.
2. (b)
  Positivity follows immediately from the positivity of ${\text {EMD}}$.
3. (c)
  The triangle inequality follows from the fact that
  $$\begin{aligned} (f + g)^+ - (f + g)^- = (f^+ - f^-) + (g^+ - g^-) \end{aligned}$$
  (69)
  for all $f, g \in L^1(\varOmega )$. Thus if $m_f \in C(f^+,f^-)$ and $m_g \in C(g^+,g^-)$, then $m_f + m_g \in C\left( (f+g)^+,(f+g)^-\right) $. Along with the triangle inequality for ${\mathcal {T}}$, this implies that
  $$\begin{aligned} {\text {struc}}\left[ f + g \right]&\equiv {\mathcal {T}}(m_{f+g}) \le {\mathcal {T}}(m_{f} + m_g) \le {\mathcal {T}}(m_{f}) + {\mathcal {T}}(m_g) \nonumber \\&\equiv {\text {struc}}\left[ f \right] + {\text {struc}}\left[ g \right] . \end{aligned}$$
  (70)
2.
Because $\frac{1}{\left||\varOmega \right||}\int _\varOmega (g + c) dx = \frac{1}{\left||\varOmega \right||}\int _\varOmega g dx + c,$ we have that $g^+ = (g + c)^+,$ and $g^- = (g + c)^-.$ Therefore
$$\begin{aligned} {\text {struc}}\left[ g + c \right] = {\text {EMD}}\left( (g + c)^+,(g + c)^-\right) = {\text {EMD}}(g^+,g^-) = {\text {struc}}\left[ g \right] . \end{aligned}$$
(71)
3.
Let $g = 0$ in Eq. 71 above. Then
$$\begin{aligned} {\text {struc}}\left[ c \right] = {\text {struc}}\left[ 0 \right] = 0, \quad \forall c \in {\mathbb {R}}. \end{aligned}$$
(72)
4.
Because the constraint in Eq. 11 involves only the difference of $\rho _1$ and $\rho _2$, it follows that ${\text {EMD}}(\rho _1,\rho _2) = {\text {EMD}}(\rho _1 + f,\rho _2 + f)$ for any non-negative $f \in L^1(\varOmega )$. Moreover, because $\rho _2$ and $\rho _1$ have the same mass, the average of $\rho _2 - \rho _1$ is zero. Hence,
$$\begin{aligned} {\text {struc}}\left[ \rho _2 - \rho _1 \right]&= {\text {EMD}}(\max (\rho _2 - \rho _1,0),\max (\rho _1 - \rho _2,0)) \nonumber \\&= {\text {EMD}}(\max (\rho _2 - \rho _1,0) + \min (\rho _1,\rho _2),\max (\rho _1 - \rho _2,0) \nonumber \\&\quad + \min (\rho _1,\rho _2)) \end{aligned}$$
(73)
Since $\forall x,y \in {\mathbb {R}}, \max (x - y,0) + \min (x,y) = x$, it follows from Eq. 73 that
$$\begin{aligned} {\text {struc}}\left[ \rho _2 - \rho _1 \right] = {\text {EMD}}(\rho _2,\rho _1) = {\text {EMD}}(\rho _1,\rho _2) \end{aligned}$$
(74)

$\square $

Before proving Theorems 1–2, we will first prove two useful lemmas, which will be used extensively.

Lemma 2

(${\text {EMD}}$ triangle inequality) Let $\varOmega \subset {\mathbb {R}}^n$ be a bounded set and f, g, $h \in L^{\infty }(\varOmega )$ and $\int _{\varOmega }f dx = \int _{\varOmega }h dx = \int _{\varOmega }g dx$. Then

$$\begin{aligned} {\text {EMD}}(f,g) \le {\text {EMD}}(f,h) + {\text {EMD}}(h,g). \end{aligned}$$

(75)

Proof

Recall from Prop. 2 that ${\text {struc}}\left[ f - g \right] = {\text {EMD}}(f,g)$, then by the triangle inequality of ${\text {struc}}\left[ \cdot \right] $,

$$\begin{aligned} {\text {EMD}}(f,g) = {\text {struc}}\left[ f - g \right] \le {\text {struc}}\left[ f - h \right] + {\text {struc}}\left[ h - g \right] = {\text {EMD}}(f, h) + {\text {EMD}}(h, g) \end{aligned}$$

(76)

$\square $

Lemma 3

(${\text {struc}}\left[ \cdot \right] $ and ${\text {EMD}}$ of the mean) $\varOmega \subset {\mathbb {R}}^n$ be a bounded set and $f \in L^{\infty }(\varOmega )$ and $\mu = \frac{1}{|\varOmega |}\int _{\varOmega }f dx$. Then

$$\begin{aligned} {\text {struc}}\left[ f \right] = {\text {EMD}}(f , \mu ). \end{aligned}$$

(77)

Proof

Recall from Prop. 2 that ${\text {EMD}}(f,g) = {\text {EMD}}(f + h, g + h),$ therefore

$$\begin{aligned} {\text {struc}}\left[ f \right] = {\text {EMD}}(f^+, f^-) = {\text {EMD}}(f^+ +(\mu - f^-), f^- +(\mu - f^-)) = {\text {EMD}}(f , \mu ). \end{aligned}$$

(78)

$\square $

Lemma 4

(${\text {EMD}}$ Subadditivity) If ${\text {EMD}}(f_1,g_1)$ and ${\text {EMD}}(f_2, g_2)$ are well defined, then so too is ${\text {EMD}}(f_1 + f_2, g_1 + g_2)$, and

$$\begin{aligned} {\text {EMD}}(f_1 + f_2, g_1 + g_2) \le {\text {EMD}}(f_1, g_1) + {\text {EMD}}(f_2, g_2). \end{aligned}$$

(79)

Proof

We use the Eq. 10 of the ${\text {EMD}}$. Let $\pi _1$ and $\pi _2$ satisfy the constraint of Eq. 9 for ${\text {EMD}}(f_1, g_1)$ and ${\text {EMD}}(f_2, g_2)$ resp. Then clearly

$$\begin{aligned} \int _{\varOmega } (\pi _1 + \pi _2)dx^{(2)}&= f_1 + f_2 \nonumber \\ \int _{\varOmega } (\pi _1 + \pi _2)dx^{(1)}&= g_1 + g_2 \nonumber \\ \pi _1 + \pi _2&\ge 0, \end{aligned}$$

(80)

and so by the minimality of the ${\text {EMD}}$,

$$\begin{aligned} {\text {EMD}}(f_1, g_1) + {\text {EMD}}(f_2, g_2)&= \int _{\varOmega \times \varOmega }c \pi _1 dx^{(1)} dx^{(2)} + \int _{\varOmega \times \varOmega }c \pi _2 dx^{(1)} dx^{(2)} \nonumber \\&= \int _{\varOmega \times \varOmega }c (\pi _1 + \pi _2) dx^{(1)} dx^{(2)} \nonumber \\&\ge \min _{\pi \ge 0} \int _{\varOmega \times \varOmega }c \pi dx^{(1)} dx^{(2)}\nonumber \\&= {\text {EMD}}(f_1 + f_2, g_1 + g_2) \end{aligned}$$

(81)

where $\pi $ is subject to the constraints of Eq. 9 where $\rho _1 = f_1 + f_2$ and $\rho _2 = g_1 + g_2$. $\square $

Lemma 5

(${\text {EMD}}$ is bounded by the $L^1$ norm) Let $\varOmega $ be a bounded set, and $l \ge \left||x^{(1)} - x^{(2)}\right||_2$ for all $x^{(1)}, x^{(2)} \in \varOmega $. If $f,g : \varOmega \rightarrow {\mathbb {R}}^+$ then

$$\begin{aligned} {\text {EMD}}(f,g) \le \frac{l}{2} \left||f - g\right||_{L^1(\varOmega )}. \end{aligned}$$

(82)

Proof

Let $\gamma = \int _{\varOmega }{(f - g)^+}dx$ and ${x^{\mathrm{c}}}$ be such that $\left||{x^{\mathrm{c}}}- x\right||_2 \le l/2$$\forall x \in \varOmega $ then

$$\begin{aligned} {\text {EMD}}(f,g)&= {\text {struc}}\left[ f - g \right] \le {\text {EMD}}((f-g)^+,\gamma \delta _{{x^{\mathrm{c}}}}) + {\text {EMD}}(\gamma \delta _{{x^{\mathrm{c}}}}, (f-g)^-)\nonumber \\&\le \frac{l}{2} \left||(f - g)^+\right||_{L^1(\varOmega )} + \frac{l}{2} \left||(f - g)^-\right||_{L^1(\varOmega )} = \frac{l}{2} \left||f - g\right||_{L^1(\varOmega )} \end{aligned}$$

(83)

$\square $

Lemma 6

(Expectation bound by the standard deviation) Let $\eta $ be a scalar random variable with zero mean such that $\mathrm {Var}[\eta ]$ is finite. Then ${\mathbb {E}}\left[ |\eta | \right] \le \sqrt{ \mathrm {Var}[\eta ]}$.

Proof

Let $\psi $ be the probability distribution for $\eta $. By the Cauchy-Schwarz inequality,

$$\begin{aligned} {\mathbb {E}}\left[ |\eta | \right] \equiv \int ^\infty _{-\infty } |x|\psi (x)dx \le \left( \int ^{\infty }_{-\infty } x^2\psi (x) dx \right) ^\frac{1}{2} \left( \int ^\infty _{-\infty } \psi (x) dx \right) ^\frac{1}{2} = \big (\mathrm {Var}[\eta ] \big )^{1/2}. \end{aligned}$$

(84)

$\square $

We now proceed to the proof of Theorem 1, but first it is helpful to give a brief summary. To bound the EMD from above, we give a candidate transport plan that is based on the multigrid strategy depicted in Fig. 13 for the case $d=2$. In this case, the strategy is to divide the domain into square windows with two square panels per side, as shown in Fig. 13. The mass in each window is then redistributed in such a way that the new distribution is constant on each window. Each window then becomes a panel in a window that is a factor a factor of two larger in each dimension, and the process is repeated until the distribution on the entire square is constant. For $d >2$, the plan is the same, except that each window is a hypercube $2^d$ panels. The cost of the complete transport plan can be bounded by the sum of the costs of the transport plan for each step. These costs are computed in the proof below and their sum leads to the bound in Theorem 1.

Proof (Proof of Theorem 1)

Since ${\text {struc}}\left[ h_\ell \right] = {\text {struc}}\left[ h_\ell - \mu \right] $ we can assume, without loss of generality, that $h_\ell $ has zero mean. Consider the case $\ell = 1$, which will be used for the general setting later. We construct a two-step plan that first moves all of the mass in $h_1^+$ to the point ${y^{\mathrm{c}}}= ( 1/2, \dots , 1/2)$ at the center of the domain and then moves the mass from ${y^{\mathrm{c}}}$ to $h_1^-$.^{Footnote 1}

Let $\gamma = \int _{\varOmega } h^+_1 dy = \int _{\varOmega } h^-_1 dy$, $\mu _0 = \int _{\varOmega } h_1 dy$, and $\gamma _{1,k} = |\eta _{1,k} - \mu _0| |\omega _{1,k}|$. Then ${\text {EMD}}(h^+_1,\gamma \delta _{{y^{\mathrm{c}}}}) = {\text {EMD}}(\gamma \delta _{{y^{\mathrm{c}}}},h^-_1)$ and

$$\begin{aligned} {\text {struc}}\left[ h_1 \right] \equiv {\text {EMD}}(h_1^+, h_1^-)&\le {\text {EMD}}(h_1^+, \gamma \delta _{{y^{\mathrm{c}}}}) + {\text {EMD}}(\gamma \delta _{{y^{\mathrm{c}}}}, h_1^-) \nonumber \\&= \sum _{k = 1}^{2^{d}} {\text {EMD}}\left( |\eta _{1,k} - \mu _0|\chi _{1,k}, \gamma _{1,k}\delta _{{y^{\mathrm{c}}}}\right) . \end{aligned}$$

(85)

Thus we turn our attention to computing the terms in the sum above. First,

$$\begin{aligned} {\text {EMD}}(|\eta _{1,k} - \mu _0|\chi _{1,k}, \gamma _{1,k} \delta _{{y^{\mathrm{c}}}}) = |\eta _{1,k} - \mu _0|{\text {EMD}}(\chi _{1,k},|\omega _{1,k}| \, \delta _{{y^{\mathrm{c}}}}). \end{aligned}$$

(86)

There is only one one admissible transport plan (see from Eq. 10) between $\chi _{1,k}$ and $|\omega _{1,k}|\delta _{{y^{\mathrm{c}}}}$; it simply moves the mass around each point of $\omega _{1,k}$ to ${y^{\mathrm{c}}}$:

$$\begin{aligned} \pi \left( x^{(1)},x^{(2)}\right) = \chi _{1,k} (x^{(1)}) \times \delta _{{y^{\mathrm{c}}}}(x^{(2)}) \end{aligned}$$

(87)

If we consider the more general case where $\omega _{1,k}$ has side length l, then upon a change of coordinates,

$$\begin{aligned} {\text {EMD}}(\chi _{1,k},|\omega _{1,k}| \delta _{{y^{\mathrm{c}}}})&= \int _\varOmega \int _\varOmega \left||x^{(1)} - x^{(2)}\right||_2 \chi _{1,k} (x^{(1)}) \times \delta _{{y^{\mathrm{c}}}}(x^{(2)}) dx^{(1)} dx^{(2)}\nonumber \\&= \int _{\omega _{1,k}}\int _\varOmega \left||x^{(1)} - x^{(2)}\right||_2\delta _{{y^{\mathrm{c}}}}(x^{(2)}) dx^{(1)} dx^{(2)} \nonumber \\&= \int _{\omega _{1,k}} \left||x^{(1)} - {y^{\mathrm{c}}}\right||_2 dx^{(1)} = \int _{\left[ 0,l\right] ^d}\left||x^{(1)}\right||_2 dx^{(1)} \nonumber \\&\le \sqrt{d} \int _{\left[ 0,l\right] ^d} \left||x^{(1)}\right||_\infty dx^{(1)} \le \sqrt{d} \frac{l^{d+1}}{2} \end{aligned}$$

(88)

Substituting Eqs. 86 and 88 into Eq. 85 gives

$$\begin{aligned} {\text {struc}}\left[ h_1 \right]&\le \sum _{k=1}^{2^d} |\eta _{1,k} - \mu _0|\frac{\sqrt{d}l^{d+1}}{2} = \frac{\sqrt{d}}{2^{d+2}} \sum _{k=1}^{2^d} |\eta _{1,k} - \mu _0|, \end{aligned}$$

(89)

where we have used the fact that when $\ell = 1, l = 2^{-1}$. A standard calculation shows that

$$\begin{aligned} \mathrm {Var}\left( \left| \eta _{1,k} - \mu _0\right| \right) \le \mathrm {Var}(|\eta _{1,k}|),\quad i = 1,\dots ,2^{d}. \end{aligned}$$

(90)

Further ${\mathbb {E}}\left[ \eta _{1,k} \right] = 0$ and so Lemma 6 give:

$$\begin{aligned} {\mathbb {E}}\left[ |\eta _{1,k} - \mu _0| \right] \le \sigma \end{aligned}$$

(91)

with Eq. 89 and get

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_1 \right] \right]&\le \frac{\sqrt{d} 2^{d}}{2^{(d+2)}} \sum _{k = 1}^{2^d} {\mathbb {E}}\left[ |n_{1,k} - \mu _0| \right] \le \frac{\sqrt{d} 2^{d}}{2^{(d+2)}} \sigma = \frac{\sqrt{d}}{4}\sigma . \end{aligned}$$

(92)

Now we consider the case when $\ell > 1$. Define the functions

$$\begin{aligned} H_\ell (y)= & {} h_\ell (y) = \sum ^{2^{\ell d}}_{k = 1} \eta _{\ell ,k}\chi _{\ell ,k}(y) \end{aligned}$$

(93)

$$\begin{aligned} H_i(y)= & {} \sum ^{2^{id}}_{k = 1} \mu _{i,k} \chi _{i,k}(y), \text { where } \mu _{i,k} = \frac{1}{|\omega _{i,k}|}\int _{\omega _{i,k}} H_{i+1}(y) dy, \quad i = 0,1,\dots ,\ell -1.\nonumber \\ \end{aligned}$$

(94)

Instances of $H_i$ are shown in Fig. 13. The function $h_\ell $ can be written as the telescoping sum

$$\begin{aligned} h_\ell = H_\ell = (H_\ell - H_{\ell - 1}) + (H_{\ell - 1} - H_{\ell - 2}) + \dots + (H_2 - H_1) + (H_1 - H_0) + H_0. \end{aligned}$$

(95)

Moreover, because $H_i = \sum ^{2^{d(i-1)}}_{k = 1}H_i\chi _{i - 1,k}$, it follows that

$$\begin{aligned} H_i - H_{i-1} = \sum ^{2^{d(i-1)}}_{k = 1}s_{i-1,k}, \quad \text {where } s_{i-1,k}(y) = \left( H_i(y) - \mu _{i-1,k} \right) \chi _{i-1,k}(y). \end{aligned}$$

(96)

We apply ${\text {struc}}\left[ \cdot \right] $ to Eq. 95, using Eq. 96, the triangle inequality, and the fact that ${\text {struc}}\left[ H_0 \right] =0$ (because it is a constant). The result is

$$\begin{aligned} {\text {struc}}\left[ h_\ell \right] \le \sum ^{\ell }_{i = 1} {\text {struc}}\left[ H_i - H_{i - 1} \right] \le \sum ^{\ell }_{i = 1} \sum ^{2^{d(i - 1)}}_{k = 1} {\text {struc}}\left[ s_{i-1,k} \right] . \end{aligned}$$

(97)

To evaluate ${\text {struc}}\left[ s_{i-1,k} \right] $, we repeat the argument used to generate Eq. 89. This gives

$$\begin{aligned} {\text {struc}}\left[ s_{i-1,k} \right] \equiv {\text {EMD}}(s_{i-1,k}^+, s_{i-1,k}^-) \le \frac{\sqrt{d}l^{d+1}}{2} \sum _{k':\omega _{i,k'} \subset \omega _{i-1,k}} |\mu _{i,k'} - \mu _{i-1,k}|. \end{aligned}$$

(98)

By construction,

$$\begin{aligned} \mu _{i-1,k} = 2^{-d} \sum _{k':\omega _{i,k'} \subset \omega _{i-1,k}} \mu _{i,k}. \end{aligned}$$

(99)

It follows that the random variable $(\mu _{i+1,k'} - \mu _{i,k})$ that appears in Eq. 98 has zero mean. Thus Lemma 6 applies and

$$\begin{aligned} {\mathbb {E}}\left[ |\mu _{i,k'} - \mu _{i-1,k}| \right] \le \left( \mathrm {Var}[|\mu _{i,k'} - \mu _{i-1,k}|]\right) ^{\frac{1}{2}}&\le \left( \mathrm {Var}[|\mu _{i,k'}|]\right) ^{\frac{1}{2}} := \sigma _i, \end{aligned}$$

(100)

where the last two inequalities above follows from standard probability theory. Also, because of Eq. 99, another standard probability result gives

$$\begin{aligned} \sigma _{i} = 2^{-\frac{d}{2}} \sigma _{i+1} = \dots = 2^{-\frac{d}{2}(\ell - i)} \sigma _\ell , \quad i = 1,\dots , \ell . \end{aligned}$$

(101)

We now take the expectation of Eq. 98, using the fact that $\omega _{i,k'}$ has side length $l = 2^{-i}$, along with the triangle inequality and Eq. 101. The result is

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ s_{i-1,k} \right] \right] \le \sqrt{d} 2^{-i(d+1)-1} \sum _{k':\omega _{i,k'} \subset \omega _{i-1,k}} 2^{-\frac{d}{2}(\ell - i)} \sigma _\ell = \sqrt{d} 2^{-\frac{id}{2} -i+d- \frac{d\ell }{2} -1} \sigma _\ell \end{aligned}$$

(102)

Substituting this bound into Eq. 97 gives

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_\ell \right] \right] \le \sum ^{\ell }_{i = 1} \sum ^{2^{d(i - 1)}}_{k = 1} \sqrt{d} 2^{-\frac{id}{2} -i+d- \frac{d\ell }{2} -1} \sigma _\ell = \frac{\sqrt{d}\sigma _\ell }{2^{1+\frac{\ell d}{2}}}\sum ^{\ell }_{i = 1} \left( 2^{\frac{d}{2} - 1}\right) ^i \end{aligned}$$

(103)

If $d = 2,$ then $2^{\frac{d}{2} - 1} = 1$ and Eq. 103 becomes

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_\ell \right] \right] = {\mathbb {E}}\left[ {\text {struc}}\left[ H_\ell \right] \right]&\le \frac{2\sigma _\ell }{2^{1+i}}\ell = \frac{\sigma _\ell \ell }{2^{\ell }}. \end{aligned}$$

(104)

If $d \ge 3$, then $2^{\frac{d}{2}-1}/(2^{\frac{d}{2}-1} -1) \le 4$, so the geometric sum in Eq. 103 is

$$\begin{aligned} \sum ^{\ell }_{i = 1} \left( 2^{\frac{d}{2} - 1}\right) ^i = \frac{2^{\left( \frac{d}{2} - 1\right) (\ell + 1)} - 2^{\frac{d}{2} - 1}}{2^{\frac{d}{2} - 1} - 1} \le \frac{2^{\frac{d}{2} - 1} 2^{\left( \frac{d}{2} - 1\right) \ell } }{{2^{\frac{d}{2} - 1} - 1}} \le 2^{\frac{\ell d}{2} - \ell + 2}. \end{aligned}$$

(105)

Thus for $d \ge 3$,

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_\ell \right] \right]&\le \sqrt{d}\sigma _\ell \frac{2^{\frac{\ell \sqrt{d}}{2} - \ell + 2}}{2^{1+\frac{\ell \sqrt{d}}{2}}} = \sqrt{d}\sigma _\ell 2^{-\ell +1} \end{aligned}$$

(106)

Finally, setting $\epsilon _\ell = 2^{-\ell }$ gives

$$\begin{aligned} {\mathbb {E}}\left[ {\text {struc}}\left[ h_\ell \right] \right] \le \sigma {\left\{ \begin{array}{ll} -\epsilon _\ell \log (\epsilon _\ell )&{}\quad \text { when }d = 2\\ 2\sqrt{d}\epsilon _\ell &{}\quad \text { when }d > 2\\ \end{array}\right. } \end{aligned}$$

(107)

This completes the proof. $\square $

Proof (Proof of Lemma 1)

The proof follows directly from the definition of $h_\ell $ in the statement of Theorem 1:

$$\begin{aligned} {\mathbb {E}}\left[ \left||h_\ell \right||_2^2 \right] = {\mathbb {E}}\left[ \int _{[0,1)^d} \left( h_\ell (y)\right) ^2 dy \right] = \sum ^{2^{\ell d}}_{k = 1} {\mathbb {E}}\left[ \eta _{\ell ,k}^2 \right] 2^{-\ell d} = 2^{-\ell d}\sum ^{2^{\ell d}}_{k=1} \sigma ^2= \sigma ^2. \end{aligned}$$

(108)

Proof (Proof of Theorem 2)

Without loss of generality, assume that $\phi $ is positive a.e. (If not, simply replace $\phi $ by $\phi - {{\,\mathrm{ess\,inf}\,}}{\phi }$ and use Eq. 71.) By construction, $\phi $ and $R_\ell \phi $ have the same average over Y, which we denote by $\mu $. Thus by Lemmas 2 and 3,

$$\begin{aligned} {\text {struc}}\left[ R_\ell \phi \right] = {\text {EMD}}(R_\ell \phi ,\mu ) \le {\text {EMD}}(R_\ell \phi , \phi ) + {\text {EMD}}(\phi ,\mu ) = {\text {EMD}}(R_\ell \phi ,\phi ) + {\text {struc}}\left[ \phi \right] . \end{aligned}$$

(109)

Hence

$$\begin{aligned} {\text {struc}}\left[ R_\ell \phi \right] - {\text {struc}}\left[ \phi \right] \le {\text {EMD}}(R_\ell \phi , \phi ). \end{aligned}$$

(110)

One the other hand, switching the roles of $R_\ell \phi $ and $\phi $ Eq. 109 gives

$$\begin{aligned} {\text {struc}}\left[ \phi \right] - {\text {struc}}\left[ R_\ell \phi \right] \le {\text {EMD}}(R_\ell \phi , \phi ) \end{aligned}$$

(111)

Together Eqs. 110 and 110 imply the bound

$$\begin{aligned} |{\text {struc}}\left[ R_\ell \phi \right] - {\text {struc}}\left[ R_\ell \phi \right] | \le {\text {EMD}}(R_\ell \phi , \phi ). \end{aligned}$$

(112)

We now bound ${\text {EMD}}(R_\ell \phi , \phi )$. For any $\ell ,i$$\int _{\omega _{\ell ,i}} R_\ell \phi dy = \int _{\omega _{\ell ,i}} \phi dy$. Thus by Lemma 4,

$$\begin{aligned} {\text {EMD}}(R_\ell \phi , \phi ) \le \sum ^{2^{\ell d}}_{i = 1} {\text {EMD}}(R_\ell \phi \chi _{\ell ,i}, \phi \chi _{\ell ,i}) \end{aligned}$$

(113)

and further by Lemma 5, for $i = 1, \dots , 2^{\ell d}$

$$\begin{aligned} {\text {EMD}}(R_\ell \phi \chi _{\ell ,i}, \phi \chi _{\ell ,i}) \le \left||R_\ell \phi - \phi \right||_{L^1(\omega _{\ell ,i})} {d}^{1/2} 2^{-\ell } \end{aligned}$$

(114)

Now we bound $\left||R_\ell \phi - \phi \right||_{L^1(\omega _{\ell , i})}$. Since $\phi \in C^1\left( {{\overline{Y}}}\right) $, it follows that, for $y \in \omega _{\ell ,i}$

$$\begin{aligned} | R_\ell \phi (y) - \phi (y) |&= \frac{1}{|\omega _{\ell ,i}|} \left| \int _{\omega _{\ell ,i}} (\phi (y') - \phi (y)) dy' \right| \nonumber \\&\le \sup _{y \in \omega _{\ell ,i}} |\nabla \phi (y)| \sup _{y \in \omega _{\ell ,i}} |y'-y| \le d^{1/2} 2^{-\ell } \sup _{y \in \omega _{\ell ,i}} |\nabla \phi (y)| \end{aligned}$$

(115)

Therefore

$$\begin{aligned} \left||R_\ell \phi - \phi \right||_{L^1(\omega _{\ell ,i})} \le |\omega _{\ell ,i}| d^{1/2}2^{-\ell }\sup _{y \in \omega _{\ell ,i}}|\nabla \phi (y)| = d^{1/2}2^{-(d+1) \ell }\sup _{y \in \omega _{\ell ,i}}|\nabla \phi (y)|.\nonumber \\ \end{aligned}$$

(116)

Combining Eqs. 112, 114, and 116 yields

$$\begin{aligned} |{\text {struc}}\left[ R_\ell \phi \right] - {\text {struc}}\left[ \phi \right] |&\le \sum ^{2^{\ell d}}_{i = 1} d 2^{-(d+2)\ell } \sup _{y \in \omega _{\ell ,i}}|\nabla \phi (y)| \le d2^{-2\ell }\sup _{y \in Y}|\nabla \phi (y)| \nonumber \\&\equiv C(|\nabla \phi |)d\epsilon _\ell ^{2}, \end{aligned}$$

(117)

where $C(|\nabla \phi |) = \sup _{y \in Y} |\nabla \phi (y)|$ and $\epsilon _\ell = 2^{-\ell }$. This completes the proof. $\square $

Appendix B: Line Integral Operators

Recall from Sect. 2 the spaces ${\mathcal {U}}$ and ${\mathcal {B}}$ of functions defined on domains X and Y, respectively. An operator ${\mathcal {L}}:{\mathcal {U}}\rightarrow {\mathcal {B}}$ is a line integral operators (LIO), if $\forall f \in {\mathcal {U}},$

$$\begin{aligned} ({\mathcal {L}}f)(y) = \int _{P_y} f(x) d\ell _x = \int ^1_0 f({{\hat{x}}}(t;y))\left||{{\hat{x}}}'(t;y)\right||_2 dt, \end{aligned}$$

(118)

where for each $y \in Y$, $P_y = \{ {{\hat{x}}}(t;y) : t \in (0,1) \} \subset X$, and ${{\hat{x}}}(t;y)$ is continuous in t and y. In particular, if f is a continuous function on X, then ${\mathcal {L}}f$ is continuous on Y. Figure 14a, b illustrate a LIO in two dimensions. The recipe we used to generate examples of ${\hat{x}}$ is given below.

To discretize ${\mathcal {L}}$, we generate a path $P_y$ for each hypercube $\omega \subset Y$. Line integrals along these paths are approximated via quadrature. For all LIOs, we use same the quadratures, and X, and Y.

To construct the LIO for Experiments 1–3, we do the following.

1.
Construction of numerical grids In all of the computational examples, the domains X and Y are unit squares in ${\mathbb {R}}^2$. We discretize these domains with $N^x$ and $N^y$ points, respectively, on each side and define grid points
$$\begin{aligned} x_{i,j}= & {} \left( i\varDelta x, j \varDelta x\right) , \quad 0 \le i,j \le N^x -1, \end{aligned}$$
(119a)
$$\begin{aligned} y_{k,l}= & {} \left( k\varDelta y, l \varDelta y\right) , \quad 0 \le k,l \le N^y -1, \end{aligned}$$
(119b)
where $\varDelta x = 1/N^x$ and $\varDelta y = 1/ N^y$. We then generate values $u_{i,j}$ by sampling a prescribed function at the points $x_{i,j}$. An illustrative example is given in Fig. 3a, where piecewise smooth rings have been sampled on a $64\times 64$ grid.
2.
Generation of smooth paths To form ${{\hat{x}}}$, we first sample coefficients $\alpha _{p,r}$ for $p = 0,\dots , 4$ and $r = 1,2$ from Perlin noise [27, 28] of order four. In Fig. 14c, a realization of one such coefficient as a function of y is shown on a $256\times 256$ grid. Given these coefficients, we let ${{\bar{x}}} =( x^{(1)},x^{(2)})$ be polynomials in t:
$$\begin{aligned} {{\bar{x}}}^{(r)}(t;y_{k,l})&= \sum ^{4}_{p = 0}\frac{\alpha _{p,r}(y_{k,l})}{p!} t^p, \quad r=1,2, \end{aligned}$$
(120)
and then let ${{\hat{x}}}$ be the following normalization of ${{\bar{x}}}$:
$$\begin{aligned} {{\hat{x}}}^{(r)}(t;y_{k,l}) = \frac{{{\bar{x}}}^{(r)}(t;y_{k,l}) - \min _{s \in [0,1]} {{\bar{x}}}^{(r)}(s;y_{k,l})}{\max _{s \in [0,1]}\bar{x}^{(r)}(s;y_{k,l}) - \min _{s \in [0,1]}{{\bar{x}}}^{(r)}(s;y_{k,l})}, \quad r=1,2. \end{aligned}$$
(121)
3.
Let the paths be given as
$$\begin{aligned} {{\hat{x}}}(t; y_{k,l})= & {} \left( {{\hat{x}}}^{(1)}(t;y_{k,l}), {{\hat{x}}}^{(2)}(t;y_{k,l}) \right) \end{aligned}$$
(122a)
$$\begin{aligned} P_{y_{k,l}}= & {} \{ {{\hat{x}}}(t; y_{k,l}) :t \in [0, 1]\}. \end{aligned}$$
(122b)
To discretize ${\mathcal {L}}$ we approximate the integral in Eq. 118, for each grid point $y_{k,l} \subset Y$, using an arc length parameterization of the curve $P_{y_{k,l}}$. The resulting quadrature takes the form
$$\begin{aligned} ({\mathcal {L}}f) (y_{k, l}) \approx \sum _{q} w_q f(x_q) \end{aligned}$$
(123)
where $\{x_q\} \subset X$ and each weight $w_q > 0$. Because this quadrature involves points $x_q$ not on the computational grid, we approximate the value $f(x_q)$ by interpolating the grid function values $f(x_{i,j})$. The result takes the form
$$\begin{aligned} ({\mathcal {L}}f)(y_{k, l}) \approx \sum _{i, j} L_{(k, l),(i, j)}f(x_{i,j}), \end{aligned}$$
(124)
where the values $L_{(k, l),(i, j)}$ are now the components of the matrix operator ${\mathbf {L}}$.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Puthawala, M.A., Hauck, C.D. & Osher, S.J. Diagnosing Forward Operator Error Using Optimal Transport. J Sci Comput 80, 1549–1576 (2019). https://doi.org/10.1007/s10915-019-00989-0

Download citation

Received: 25 October 2018
Revised: 03 June 2019
Accepted: 11 June 2019
Published: 03 July 2019
Issue Date: 15 September 2019
DOI: https://doi.org/10.1007/s10915-019-00989-0

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Diagnosing Forward Operator Error Using Optimal Transport

Abstract

Access this article

Similar content being viewed by others

Ishikawa type mean convergence theorems for finding common fixed points of nonlinear mappings in Hilbert spaces

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Mean convergence theorems using hybrid methods to find common fixed points for noncommutative nonlinear mappings in Hilbert spaces

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proofs

Proof (Proof of Proposition 1)

Proof (Proof of Proposition 2)

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Proof (Proof of Theorem 1)

Proof (Proof of Lemma 1)

Proof (Proof of Theorem 2)

Appendix B: Line Integral Operators

Rights and permissions

About this article

Cite this article

Navigation

Diagnosing Forward Operator Error Using Optimal Transport

Abstract

Access this article

Similar content being viewed by others

Ishikawa type mean convergence theorems for finding common fixed points of nonlinear mappings in Hilbert spaces

The Deep Ritz Method: A Deep Learning-Based Numerical Algorithm for Solving Variational Problems

Mean convergence theorems using hybrid methods to find common fixed points for noncommutative nonlinear mappings in Hilbert spaces

Notes

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Appendix A: Proofs

Proof (Proof of Proposition 1)

Proof (Proof of Proposition 2)

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Proof (Proof of Theorem 1)

Proof (Proof of Lemma 1)

Proof (Proof of Theorem 2)

Appendix B: Line Integral Operators

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation