Skip to main content
Log in

Approximation properties of slice-matching operators

  • Original Article
  • Published:
Sampling Theory, Signal Processing, and Data Analysis Aims and scope Submit manuscript

Abstract

Iterative slice-matching procedures are efficient schemes for transferring a source measure to a target measure, especially in high dimensions. These schemes have been successfully used in applications such as color transfer and shape retrieval, and are guaranteed to converge under regularity assumptions. In this paper, we explore approximation properties related to a single step of such iterative schemes by examining an associated slice-matching operator, depending on a source measure, a target measure, and slicing directions. In particular, we demonstrate an invariance property with respect to the source measure, an equivariance property with respect to the target measure, and Lipschitz continuity concerning the slicing directions. We furthermore establish error bounds corresponding to approximating the target measure by one step of the slice-matching scheme and characterize situations in which the slice-matching operator recovers the optimal transport map between two measures. We also investigate connections to affine registration problems with respect to (sliced) Wasserstein distances. These connections can be also be viewed as extensions to the invariance and equivariance properties of the slice-matching operator and illustrate the extent to which slice-matching schemes incorporate affine effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Data Availability

The data that support the findings of this study are available from the corresponding author, [SL], upon reasonable request.

References

  1. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. In: Precup, D., Teh, Y.W. (eds.) Proceedings of Machine Learning Research, vol. 70, pp. 214–223 (2017)

  2. Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vis. 40(2), 99–121 (2000)

    Article  Google Scholar 

  3. Kolouri, S., Park, S.R., Thorpe, M., Slepcev, D., Rohde, G.K.: Optimal mass transport: Signal processing and machine-learning applications. IEEE Signal Process. Mag. 34(4), 43–59 (2017)

    Article  Google Scholar 

  4. Peyré, G., Cuturi, M.: Computational optimal transport. Foundations Trends Mach. Learn. 11(5–6), 355–607 (2019)

    Article  Google Scholar 

  5. Bousquet, O., Gelly, S., Tolstikhin, I., Simon-Gabriel, C.-J., Schoelkopf, B.: From optimal transport to generative modeling: the VEGAN cookbook (2017). arXiv preprint arXiv:1705.07642

  6. Baptista, R., Hosseini, B., Kovachki, N.B., Marzouk, Y.M., Sagiv, A.: An Approximation Theory Framework for Measure-Transport Sampling Algorithms. arXiv:2302.13965 (2023)

  7. Lambert, M., Chewi, S., Bach, F., Bonnabel, S., Rigollet, P.: Variational inference via Wasserstein gradient flows. Adv. Neural Inform. Process. Syst. 35, 14434–14447 (2022)

    Google Scholar 

  8. Lipman, Y., Chen, R.T., Ben-Hamu, H., Nickel, M., Le, M.: Flow matching for generative modeling. In: ICLR 2023 (2023)

  9. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. In: International Conference on Learning Representations (ICLR) (2021)

  10. Chen, R.T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary differential equations. In: Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 31 (2018)

  11. Taghvaei, A., Mehta, P.G.: An optimal transport formulation of the linear feedback particle filter. In: 2016 American Control Conference (ACC), pp. 3614–3619 (2016). IEEE

  12. Kobyzev, I., Prince, S.J., Brubaker, M.A.: Normalizing flows: an introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 3964–3979 (2020)

    Article  Google Scholar 

  13. Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems 26, pp. 2292–2300 (2013)

  14. Kolouri, S., Pope, P.E., Martin, C.E., Rohde, G.K.: Sliced wasserstein auto-encoders. In: International Conference on Learning Representations (2018)

  15. Bonet, C., Courty, N., Septier, F., Drumetz, L.: Efficient gradient flows in sliced-Wasserstein space. Transactions on Machine Learning Research (2022)

  16. Pitié, F., Kokaram, A.C., Dahyot, R.: Automated colour grading using colour distribution transfer. Comput. Vis. Image Understanding 107(1–2), 123–137 (2007)

    Article  Google Scholar 

  17. Bonneel, N., Rabin, J., Peyré, G., Pfister, H.: Sliced and Radon Wasserstein barycenters of measures. J. Math. Imaging Vis. 51, 22–45 (2015)

    Article  MathSciNet  Google Scholar 

  18. Papamakarios, G.: Neural density estimation and likelihood-free inference. PhD thesis, University of Edinburgh (2019)

  19. Bonnotte, N.: Unidimensional and evolution methods for optimal transportation. PhD thesis, Université Paris-Sud, Scuola Normale Superiore (December 2013)

  20. Li, S., Moosmueller, C.: Measure transfer via stochastic slicing and matching (2023). arXiv:2307.05705

  21. Feydy, J., Charlier, B., Vialard, F.-X., Peyré, G.: Optimal transport for diffeomorphic registration. In: Descoteaux, M., Maier-Hein, L., Franz, A., Jannin, P., Collins, D.L., Duchesne, S. (eds.) Medical Image Computing and Computer Assisted Intervention-MICCAI 2017, pp. 291–299. Springer, Cham (2017)

    Google Scholar 

  22. De Lara, L., González-Sanz, A., Loubes, J.-M.: Diffeomorphic registration using sinkhorn divergences. SIAM J. Imaging Sci. 16(1), 250–279 (2023)

    Article  MathSciNet  Google Scholar 

  23. Shen, Z., Feydy, J., Liu, P., Curiale, A.H., San Jose Estepar, R., Niethammer, M.: Accurate point cloud registration with robust optimal transport. Advances in Neural Information Processing Systems 34, 5373–5389 (2021)

    Google Scholar 

  24. Rabin, J., Peyré, G., Delon, J., Bernot, M.: Wasserstein barycenter and its application to texture mixing. In: Scale Space and Variational Methods in Computer Vision: Third International Conference, SSVM 2011, Ein-Gedi, Israel, May 29–June 2, 2011, Revised Selected Papers 3, pp. 435–446 (2012). Springer

  25. Rabin, J., Peyré, G., Cohen, L.D.: Geodesic shape retrieval via optimal mass transport. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) Computer Vision - ECCV 2010, pp. 771–784. Springer, Berlin, Heidelberg (2010)

    Chapter  Google Scholar 

  26. Ambrosio, L., Gigli, N.: A User’s Guide to Optimal Transport, pp. 1–155. Springer, Berlin, Heidelberg (2013)

  27. Khurana, V., Kannan, H., Cloninger, A., Moosmüller, C.: Supervised learning of sheared distributions using linearized optimal transport. Sampling Theory, Signal Processing, and Data Analysis 21(1) (2023)

  28. Villani, C.: Optimal Transport: Old and New. Grundlehren der mathematischen Wissenschaften, vol. 338. Springer, Berlin (2009)

  29. Brenier, Y.: Polar factorization and monotone rearrangement of vector-valued functions. Commun. Pure Appl. Math. 44(4), 375–417 (1991)

    Article  MathSciNet  Google Scholar 

  30. Robbins, H., Monro, S.: A stochastic approximation method. Annal. Math. Stat. 400–407 (1951)

  31. Aldroubi, A., Li, S., Rohde, G.K.: Partitioning signal classes using transport transforms for data analysis and machine learning. Sampl. Theory Signal Process. Data Anal. 19(6) (2021)

  32. Park, S.R., Kolouri, S., Kundu, S., Rohde, G.K.: The cumulative distribution transform and linear pattern classification. Appl. Comput. Harmonic Anal. 45(3), 616–641 (2018)

    Article  MathSciNet  Google Scholar 

  33. Moosmüller, C., Cloninger, A.: Linear optimal transport embedding: Provable Wasserstein classification for certain rigid transformations and perturbations. Inform. Inference 12(1), 363–389 (2023)

    Article  MathSciNet  Google Scholar 

  34. Brown, L.G.: A survey of image registration techniques. ACM Comput. Surv. 24(4), 325–376 (1992)

    Article  Google Scholar 

  35. Sotiras, A., Davatzikos, C., Paragios, N.: Deformable medical image registration: a survey. IEEE Trans. Med. Imaging 32(7), 1153–1190 (2013)

    Article  Google Scholar 

  36. Meckes, E.S.: The Random Matrix Theory of the Classical Compact Groups. Cambridge Tracts in Mathematics. Cambridge University Press, Cambridge (2019)

    Book  Google Scholar 

  37. Santambrogio, F.: Optimal Transport for Applied Mathematicians. Birkäuser Cham, NY (2015)

    Book  Google Scholar 

  38. Zemel, Y., Panaretos, V.M.: Fréchet means and procrustes analysis in wasserstein space. Bernoulli 25(2), 932–976 (2019)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors appreciate helpful discussions with Dr. Hengrong Du regarding Example 4.1 and Proposition 28.

Funding

CM is supported by NSF award DMS-2306064 and by a seed grant from the School of Data Science and Society at UNC.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Shiying Li.

Ethics declarations

Conflict of interest

The authors have no competing or conflicting interests to declare that are relevant to the content of this article.

Additional information

Communicated by Mike Neamtu.

Appendices

Appendix A Proofs for Sect. 4

1.1 A.1 Key facts for proof of Remark 4

Proposition 16

Let \({\mathcal {D}}({\mathbb {R}}^n)\) be the set of differentiable vector fields from \({\mathbb {R}}^n\) to \({\mathbb {R}}^n\).

$$\begin{aligned} \Big (\bigcap _{P \in O(n)}{\mathfrak {S}}(P)\Big ) \cap {\mathcal {D}}({\mathbb {R}}^n)= \{x \mapsto a{{x}}+b: a>0 \text { and } b\in {\mathbb {R}}^n \}. \end{aligned}$$

Proof

For the proof, we need to show that a differentiable vector field \(S \in \bigcap _{P \in O(n)}{\mathfrak {S}}(P)\) is an isotropic scaling with translation. Choose \(P \in O(n)\) and write \(S(x) = \sum _{i=1}^n f_i^P(x\cdot \theta _{i})\theta _{i}\) with \(P=[\theta _1,\ldots ,\theta _n]\). Note that using the standard basis, we can also write \(S(x)=\sum _{i=1}^n g_i(x_i)e_{i}\). Computing the Jacobian of S with respect to the two basis representations, we obtain

$$\begin{aligned} \begin{bmatrix} g_1'(x_1)&{}\\ {} &{}\ddots \\ {} &{} \quad &{}g_n'(x_n) \end{bmatrix} = P \begin{bmatrix} {f_1^P}^{'}({{x}}\cdot {{\theta }}_1)&{} &{}&{}\\ &{} {f_2^P}^{'}({{x}}\cdot {{\theta }}_2)&{}&{}\\ &{}&{} \ddots &{}\\ &{}&{}&{} \ {f_n^P}'({{x}}\cdot {{\theta }}_n) \end{bmatrix}P^t. \end{aligned}$$

Hence the two diagonal matrices above have the same diagonal entries, allowing for a possible reordering of the entries. Without loss of generality, we assume that \(g_i^{\prime }(x_i)={f_i^P}^{'}({{x}}\cdot {{\theta }}_i), i= 1,...,n\) by possibly performing a column permutation of P and renaming \(f_i^P\)’s. Choosing an orthogonal matrix P such that one of its column \(\theta _i\) with all entries being non-zero, one can immediately derive that the diagonal entries \(g_i^{\prime }(x_i)\)’s are the same for any fixed x. In summary,

$$\begin{aligned} g_1^{\prime }(x_1)= \cdots = g_n^{\prime }(x_n) = {f_1^P}^{\prime }({{x}}\cdot {{\theta }}_1) = \cdots = {f_n^P}^{\prime }({{x}}\cdot {{\theta }}_n) = a_{{x}}, \end{aligned}$$

where \(a_{{x}}\) is a constant depending on \({{x}}= [x_1, \cdots ,x_n]^t\in {\mathbb {R}}^n\). Since the diagonal element \(g_i^{\prime }(x_i)\) only depends on \(x_i\), it follows that \(a_{{{x}}}\) is a constant independent of \({{x}}\). Hence \(S({{x}})=a{{x}}+b\) for some \(a>0, b\in {\mathbb {R}}^n\). \(\square \)

Remark 11

In general, if \(T\in \bigcap _{P \in O(n)}{\mathfrak {S}}(P)\) is differentiable on an open set \(\Omega \subseteq {{\mathbb {R}}^n}\), the \(T|_{\Omega }: \Omega \rightarrow {\mathbb {R}}^n\) is an isotropic scaling with translation. In particular, \(\bigcap _{P \in O(n)}{\mathfrak {S}}(P)\) include some piecewise isotropic scalings with translations.

1.2 A.2 Proof of Proposition 10

We need the following proposition to derive the proof of Proposition 10:

Proposition 17

Consider two angles \(\theta ,\nu \in S^{n-1}\), and assume that \(T_{\sigma ^{\nu }}^{\mu ^{\nu }}\) is L-Lipschitz for all \(\nu \), i.e. there exists \(L>0\) such that \(|T_{\sigma ^{\nu }}^{\mu ^{\nu }}(x)-T_{\sigma ^{\nu }}^{\mu ^{\nu }}(y)| \le L |x-y|\) for \(x,y\in {\mathbb {R}}\) and \(\nu \in S^{n-1}\), then

$$\begin{aligned} \Vert T_{{\sigma }^{\theta }}^{\mu ^{\theta }}\circ {\mathcal {P}}_{\theta } - T_{{\sigma }^{\nu }}^{\mu ^{\nu }}\circ {\mathcal {P}}_{\nu }\Vert _{\sigma } \le (2L+1)C\Vert \theta - \nu \Vert _2, \end{aligned}$$

where C is the max over the second moments of \(\sigma \) resp. \(\mu \).

Proof

$$\begin{aligned} \Vert T_{{\sigma }^{\theta }}^{\mu ^{\theta }}\circ {\mathcal {P}}_{\theta } - T_{{\sigma }^{\nu }}^{\mu ^{\nu }}\circ {\mathcal {P}}_{\nu }\Vert _{\sigma } \le \Vert T_{{\sigma }^{\theta }}^{\mu ^{\theta }}\circ {\mathcal {P}}_{\theta } - T_{{\sigma }^{\theta }}^{\mu ^{\nu }}\circ {\mathcal {P}}_{\theta }\Vert _{\sigma } + \Vert T_{{\sigma }^{\theta }}^{\mu ^{\nu }}\circ {\mathcal {P}}_{\theta } - T_{{\sigma }^{\nu }}^{\mu ^{\nu }}\circ {\mathcal {P}}_{\nu }\Vert _{\sigma } = (\diamond ). \end{aligned}$$

We bound these separately.

$$\begin{aligned} \Vert T_{{\sigma }^{\theta }}^{\mu ^{\theta }}\circ {\mathcal {P}}_{\theta } - T_{{\sigma }^{\theta }}^{\mu ^{\nu }}\circ {\mathcal {P}}_{\theta }\Vert _{\sigma }&= \Vert T_{{\sigma }^{\theta }}^{\mu ^{\theta }} - T_{{\sigma }^{\theta }}^{\mu ^{\nu }}\Vert _{\sigma ^{\theta }} = W_2\left( \mu ^{\theta }, \mu ^{\nu }\right) \\&\le \Vert {\mathcal {P}}_{\theta }-{\mathcal {P}}_{\nu }\Vert _{\mu } = \left( \int _{{\mathbb {R}}^n}|{\mathcal {P}}_{\theta }(x)-{\mathcal {P}}_{\nu }(x)|^2\, d\mu (x)\right) ^{1/2}\\&= \left( \int _{{\mathbb {R}}^n}|(\theta - \nu )\cdot x|^2\, d\mu (x)\right) ^{1/2}\\&\le \Vert \theta - \nu \Vert _2 \left( \int _{{\mathbb {R}}^n}\Vert x\Vert ^2\, d\mu (x)\right) ^{1/2}\\&\le C\Vert \theta - \nu \Vert _2, \end{aligned}$$

with C max of the second moments, which is bounded by assumption. Now for the second part, note that on \({\mathbb {R}}\) we have \(T_{\sigma ^{\theta }}^{\mu ^{\nu }} = T_{\sigma ^{\nu }}^{\mu ^{\nu }} \circ T_{\sigma ^{\theta }}^{\sigma ^{\nu }}\)

$$\begin{aligned}&\Vert T_{{\sigma }^{\theta }}^{\mu ^{\nu }}\circ {\mathcal {P}}_{\theta } - T_{{\sigma }^{\nu }}^{\mu ^{\nu }}\circ {\mathcal {P}}_{\nu }\Vert _{\sigma }\\&= \left( \int _{{\mathbb {R}}^n} |T_{\sigma ^{\nu }}^{\mu ^{\nu }}(T_{\sigma ^{\theta }}^{\sigma ^{\nu }}( {\mathcal {P}}_{\theta }(x))) - T_{{\sigma }^{\nu }}^{\mu ^{\nu }}({\mathcal {P}}_{\nu }(x))|^2\, d\sigma (x)\right) ^{1/2} = (\star ) \end{aligned}$$

Since \(T_{\sigma ^{\nu }}^{\mu ^{\nu }}\) is L-Lipschitz, we get

$$\begin{aligned} (\star )&\le L\,\left( \int _{{\mathbb {R}}^n} | T^{\sigma ^{\nu }}_{\sigma ^{\theta }}({\mathcal {P}}_{\theta }(x)) - {\mathcal {P}}_{\nu }(x)|^2\, d\sigma (x)\right) ^{1/2} = L\Vert T^{\sigma ^{\nu }}_{\sigma ^{\theta }}\circ {\mathcal {P}}_{\theta } -{\mathcal {P}}_{\nu } \Vert _{\sigma } \\&\le L\left( \Vert T^{\sigma ^{\nu }}_{\sigma ^{\theta }}\circ {\mathcal {P}}_{\theta } - {\mathcal {P}}_{\theta }\Vert _{\sigma } + \Vert {\mathcal {P}}_{\theta }-{\mathcal {P}}_{\nu }\Vert _{\sigma }\right) \\&\le L \left( \Vert T^{\sigma ^{\nu }}_{\sigma ^{\theta }} - {\text {id}}\Vert _{\sigma ^{\theta }} + C\Vert \theta -\nu \Vert _2 \right) \\&= L\left( W_2(\sigma ^{\theta },\sigma ^{\nu })+C \Vert \theta - \nu \Vert _2 \right) \\&\le L\left( \Vert {\mathcal {P}}_{\theta }-{\mathcal {P}}_{\nu }\Vert _{\sigma }+C \Vert \theta - \nu \Vert _2 \right) \\&\le 2LC\Vert \theta - \nu \Vert _2 \end{aligned}$$

This implies

$$\begin{aligned} (\diamond ) \le (2L+1)C\Vert \theta - \nu \Vert _2. \end{aligned}$$

\(\square \)

Proof of Proposition 10

Based on (9), we let \(T_{\sigma ,\mu ;P} = PD\circ P^t\) where \(D(x)=[T_{\sigma ^{\theta _1}}^{\mu ^{\theta _1}}(x_1), T_{\sigma ^{\theta _2}}^{\mu ^{\theta _2}}(x_2), \cdots , T_{\sigma ^{\theta _n}}^{\mu ^{\theta _n}}(x_n)]^t\) for \(x \in {\mathbb {R}}^n\) and \(P = [\theta _1,\ldots ,\theta _n]\). Similarly, we let and \(T_{\sigma ,\mu ;Q} = Q{\widetilde{D}}\circ Q^t\), with \(Q = [\nu _1,\ldots ,\nu _n]\). We continue with deriving the bound:

$$\begin{aligned} \Vert T_{\sigma ,\mu ;P}-T_{\sigma ,\mu ;Q}\Vert _{\sigma }&= \Vert PDP^t-Q{\widetilde{D}}Q^t\Vert _{\sigma }\\&\le \Vert PDP^t - P{\widetilde{D}}Q^t\Vert _{\sigma } + \Vert P{\widetilde{D}}Q^t-Q{\widetilde{D}}Q^t\Vert _{\sigma } \\&= (1) + (2). \end{aligned}$$

We bound the two terms seperately. For (1), using Proposition 17, we get

$$\begin{aligned} \Vert PDP^t - P{\widetilde{D}}Q^t\Vert _{\sigma }^2&= \int _{{\mathbb {R}}^n} \Vert D(P^tx) - {\widetilde{D}}(Q^tx)\Vert _2^2\,d\sigma (x) \\&= \sum _{i=1}^n \int _{{\mathbb {R}}^n} |T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}}((P^tx)_i)-T_{\sigma ^{\nu _i}}^{\mu ^{\nu _i}}((Q^tx)_i)|^2\, d\sigma (x) \\&= \sum _{i=1}^n \Vert T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}}\circ {\mathcal {P}}_{\theta _i} - T_{\sigma ^{\nu _i}}^{\mu ^{\nu _i}}\circ {\mathcal {P}}_{\nu _i} \Vert _{\sigma }^2 \\&\le ((2L+1)C)^2 \sum _{i=1}^n \Vert \theta _i - \nu _i\Vert _2^2 \\&= ((2L+1)C)^2 \Vert P - Q\Vert _F^2. \end{aligned}$$

For (2) we get

$$\begin{aligned} \Vert P{\widetilde{D}}Q^t-Q{\widetilde{D}}Q^t\Vert _{\sigma }^2&= \int _{{\mathbb {R}}^n} \Vert (P-Q){\widetilde{D}}(Q^tx)\Vert _2^2 \, d\sigma (x) \\&\le \Vert P-Q\Vert _2^2\int _{{\mathbb {R}}^n} \Vert {\widetilde{D}}(Q^tx)\Vert _2^2 \, d\sigma (x) \\&\le \Vert P-Q\Vert _2^2 \, L^2 \int _{{\mathbb {R}}^n} \Vert Q^tx\Vert _2^2\, d\sigma (x) \le \Vert P-Q\Vert _2^2 \, L^2 C^2 \\&\le \Vert P-Q\Vert _F^2 \, L^2 C^2 \end{aligned}$$

Combining (1) and (2) gives the final bound. \(\square \)

Appendix B Proofs for Sect. 5

Proof of Proposition 13

Let \( S^{\sigma ,\mu ,W_2}(x) = a^{W_2} x+ b^{W_2}\) and \(S^{\sigma ,{{\mathcal {U}}}(\sigma ,\mu ,P), W_2}(x)= {\widetilde{a}}^{W_2}x+ {\widetilde{b}}^{W_2}\) be the critical functions for the associated minimization problem (22). By Proposition 18 and Corollary 20, we have

$$\begin{aligned} {\widetilde{a}}^{W_2} - a^{W_2}&=\frac{W_2^2(\sigma ,\mu )-\sum _{i=1}^nW^2_2(\sigma ^{\theta _i},\mu ^{\theta _i})}{2(M_2(\sigma )-\Vert E(\sigma )\Vert ^2)},\end{aligned}$$
(27)
$$\begin{aligned} {\widetilde{b}}^{W_2}-b^{W_2}&= -({\widetilde{a}}^{W_2} - a^{W_2})E(\sigma ), \end{aligned}$$
(28)

and the norm bound \( \Vert S^{\sigma ,\mu ,W_2}-S^{\sigma ,{{\mathcal {U}}}(\sigma ,\mu ,P), W_2}\Vert _{\sigma }\) in (24) can be obtained via direct computation and the fact that the RHS is non-negative, see Lemma 24. It is left to show that these critical functions are indeed the minimizers by verifying

  1. 1.

    \({\widetilde{a}}^{W_2}\ge a^{W_2} >0\), see Lemmas 24, 26, and 27.

  2. 2.

    The Hessian associated H(ab) with both the minimization problems are positive definite by a direct calculation and Lemma 26, where

    $$\begin{aligned} H(a,b)= 2 \begin{bmatrix} M_2(\sigma ) &{} (E(\sigma ))^t\\ E(\sigma ) &{} I_{n-1} \end{bmatrix}. \end{aligned}$$

    Here \(I_{n-1}\) denotes the identity matrix of size \((n-1)\times (n-1)\).

The equality concerning the means follows from Corollary 19. \(\square \)

Proposition 18

Let \(S^{\sigma ,\eta ,W_2}\) and \(S^{\sigma ,\eta , SW_2}\) correspond to the critical points of the minimization problems in (22) and (30), respectively. Then the corresponding parameters satisfy

$$\begin{aligned} a^{W_2}&= \frac{\frac{1}{2}(M_2(\eta )+M_2(\sigma )-W_2^2(\sigma ,\eta ))-E(\sigma )\cdot E(\eta )}{M_2(\sigma )-\Vert E(\sigma )\Vert ^2},\\ b^{W_2}&= E(\eta )-a^{W_2}E(\sigma ),\\ a^{SW_2}&= \frac{\frac{1}{2}(M_2(\eta )+M_2(\sigma )-nSW_2^2(\sigma ,\eta ))-E(\sigma )\cdot E(\eta )}{M_2(\sigma )-\Vert E(\sigma )\Vert ^2},\\ b^{SW_2}&= E(\eta )-a^{SW_2}E(\sigma ), \end{aligned}$$

where \(S^{\sigma ,\eta ,W_2}(x) = a^{W_2} x+ b^{W_2}\) and \(S^{\sigma ,\eta ,SW_2}(x) = a^{SW_2} x+ b^{SW_2}\).

Proof

Given \(\sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\) and \(\eta \in {{\mathcal {W}}}_2({\mathbb {R}}^n)\), let \(M_2(\sigma )= \int \Vert x\Vert ^2d\sigma (x)\) (similarly define \(M_2(\eta )\)), \(E(\sigma )= \int xd\sigma (x)\) (similarly define \(E(\eta )\)). For \(S(x)=ax+b\), by the changes of variables formula and the fact that \(T_{\sigma }^\eta = T_{S_{\sharp }\sigma }^{\eta }\circ S\), we have

$$\begin{aligned} W_2^2(S_{\sharp }\sigma ,\eta )&= \Vert T_{\sigma }^{\eta }-(ax+b)\Vert ^2_{\sigma } = M_2(\eta )+a^2M_2(\sigma )+2ab\cdot E(\sigma )\\ {}&-2a\int T_{\sigma }^{\eta }(x)\cdot x d\sigma (x)-E(\sigma )\cdot E(\eta )+\Vert b\Vert ^2-2E(\eta )\cdot b. \end{aligned}$$

Taking the partial derivatives gives

$$\begin{aligned} \frac{\partial }{\partial a}&= 2a M_2(\sigma )+2b\cdot E(\sigma )-2\int T_{\sigma }^{\eta }(x)\cdot x d\sigma (x), \\ \frac{\partial }{\partial b}&= 2b+2aE(\sigma )-2E(\eta ). \end{aligned}$$

Setting the above equations to zero and with the observation that \(\int T_{\sigma }^{\eta }(x)\cdot x d\sigma (x) = \frac{1}{2}(M_2(\eta )+M_2(\sigma )-W_2^2(\sigma ,\eta ))\), we get the the desired formulas for \(a^{W_2}\) and \(b^{W_2}\). Similarly,

$$\begin{aligned} SW_2^2(S_{\sharp }\sigma ,\eta )&= \int _{S^{n-1}}W_2^2((S_{\sharp }\sigma )^{\theta }, \eta ^{\theta })du(\theta ) \\ {}&= \int _{S^{n-1}}\int _{{\mathbb {R}}} |T_{\sigma ^{\theta }}^{\eta ^\theta }(t)-(at+b\cdot \theta )|^2dt du(\theta )\\ {}&= \frac{1}{n}\Big (M_2(\eta )+a^2M_2(\sigma )+2ab\cdot E(\sigma )\\ {}&-2na\int _{S^{n-1}}\int _{{\mathbb {R}}} tT_{\sigma ^{\theta }}^{\eta ^\theta }(t)d\sigma ^{\theta }(t) du(\theta )-E(\sigma )\cdot E(\eta )+\Vert b\Vert ^2-2E(\eta )\cdot b\Big ). \end{aligned}$$

Taking the partial derivatives gives

$$\begin{aligned} \frac{\partial }{\partial a}&=\frac{1}{n} \Big (2a M_2(\sigma )+2b\cdot E(\sigma )-2n\int _{S^{n-1}}\int _{{\mathbb {R}}} tT_{\sigma ^{\theta }}^{\eta ^\theta }(t)dt du(\theta )\Big ), \\ \frac{\partial }{\partial b}&= \frac{1}{n} \Big (2b+2aE(\sigma )-2E(\eta )\Big ). \end{aligned}$$

Setting the above equations to zero and with the observation that \(\int _{S^{n-1}}\int _{{\mathbb {R}}} tT_{\sigma ^{\theta }}^{\eta ^\theta }(t)d\sigma ^{\theta }(t) du(\theta ) = \frac{1}{2n}(M_2(\sigma )+M_2(\eta )-nSW_2^2(\sigma ,\eta ))\), we get the desired formulas for \(a^{SW_2}\) and \(b^{SW_2}\). We provide computational details in Appendix C. \(\square \)

Corollary 19

Given the same assumptions as in Proposition 18, for \(D=W_2~\text {or}~ SW_2\)

$$\begin{aligned} E({S^{\sigma ,\eta ,D}}_\sharp \sigma ) = E(\eta ). \end{aligned}$$
(29)

Proof

Upon direct calculation, we have \(E({S^{\sigma ,\eta ,D}}_\sharp \sigma = a^DE(\sigma )+b^D\), where \(a^D, b^D\) are as in Proposition 18. The conclusion can be derived from the expressions for \(b^{W_2}\) and \(b^{SW_2}\). \(\square \)

Corollary 20

Let \(\eta = {{\mathcal {U}}}(\sigma ,\mu ,P)\) in Proposition 18. Then the parameters corresponding to \(S^{\sigma ,{{\mathcal {U}}}(\sigma ,\mu ,P), W_2}\) and \(S^{\sigma ,{{\mathcal {U}}}(\sigma ,\mu ,P), SW_2}\) satisfy

$$\begin{aligned} {\widetilde{a}}^{W_2}&= \frac{\frac{1}{2}(M_2(\mu )+M_2(\sigma )-\sum _{i=1}^nW_2^2(\sigma ^{\theta _i},\mu ^{\theta _i}))-E(\sigma )\cdot E(\mu )}{M_2(\sigma )-\Vert E(\sigma )\Vert ^2},\\ {\widetilde{b}}^{W_2}&= E(\mu )-{\widetilde{a}}^{W_2}E(\sigma ),\\ {\widetilde{a}}^{SW_2}&= \frac{\frac{1}{2}(M_2(\mu )+M_2(\sigma )-nSW_2^2(\sigma ,{{\mathcal {U}}}(\sigma ,\mu ,P)))-E(\sigma )\cdot E(\mu )}{M_2(\sigma )-\Vert E(\sigma )\Vert ^2},\\ {\widetilde{b}}^{SW_2}&= E(\mu )-a^{SW_2}E(\sigma ), \end{aligned}$$

where \(S^{\sigma ,{{\mathcal {U}}}(\sigma ,\mu ,P), W_2}(x)= {\widetilde{a}}^{W_2}x+ {\widetilde{b}}^{W_2}\) and \(S^{\sigma ,{{\mathcal {U}}}(\sigma ,\mu ,P), SW_2}(x)= {\widetilde{a}}^{SW_2}x+ {\widetilde{b}}^{SW_2}\).

Proof

The above formulas follows directly from Proposition 18, the fact that \({{\mathcal {U}}}(\sigma ,\mu ,P)\) and \(\mu \) have the same mean (see (13)), and the formula (7) for \(W_2^2(\sigma , {{\mathcal {U}}}(\sigma ,\mu ,P))\). \(\square \)

Proposition 21

Let

$$\begin{aligned} {{\mathcal {S}}}(P):= \{x\mapsto P\Lambda P^t x+b: \Lambda \text{ is } \text{ positive } \text{ and } \text{ diagonal }, b\in {\mathbb {R}}^n\}. \end{aligned}$$

Consider the minimization problem

$$\begin{aligned} S_P^{\sigma ,\eta }&:= \mathop {\mathrm {arg\,min}}\limits _{S_P\in {{\mathcal {S}}}(P)} \Vert {S_P}-T_\sigma ^\eta \Vert _{\sigma }. \end{aligned}$$
(30)

Let \(S^{\sigma ,\mu }_{P}\) and \(S^{\sigma ,{{\mathcal {U}}}(\sigma ,\mu ;P)}_{P}\) be the minimizers of (30) with \(\eta =\mu \) and \(\eta = {{\mathcal {U}}}(\sigma ,\mu ,P)\), respectively. We denote the diagonal entries of the corresponding \(\Lambda \) by \(a_i\) and \({\widetilde{a}}_i\), respectively. Similar notation holds for \(b_i\) and \({\widetilde{b}}_i\). Then

$$\begin{aligned} {\widetilde{a}}_i - a_i&= \frac{\int |\theta _i\cdot (T_{\sigma }^{\mu }(x)-x)|^2d\sigma (x)- W_2^2(\sigma ^{\theta _i},\mu ^{\theta _i})}{2(M_2^{\sigma ^{\theta _i}}- (E^{\theta _i})^2)}\ge 0,\\ {\widetilde{b}}_i - b_i&= - \sum _{i=1}^n\theta _i E^{\sigma ^{\theta _i}} ( {\widetilde{a}}_i - a_i). \end{aligned}$$

Proof

The proof uses similar arguments in Proposition 18 and Corollary 20 except the partial derivatives are with respect to \(a_i\) and \(\widetilde{a_i}\) instead of a and \({\widetilde{a}}\). Note that following these arguments, we use the equations presented in Lemma 23. \(\square \)

Appendix C Other technical details

Lemma 22

Let \(\sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\) and \(\eta , \mu \in {{\mathcal {W}}}_2({\mathbb {R}}^n)\). Then we get

$$\begin{aligned}&E({(T_{\sigma ,\mu ;P})_{\sharp }\sigma })=\int T_{\sigma ,\mu ;P}(x) d\sigma (x) = \int yd\mu (y)=E(\mu ),\\&\int T_{\sigma }^{\eta }(x)\cdot x d\sigma (x) = \frac{1}{2}(M_2(\eta )+M_2(\sigma )-W_2^2(\sigma ,\eta ))\\&M_2({(T_{\sigma ,\mu ;P})_{\sharp }\sigma })=\int \Vert T_{\sigma ,\mu ;P}(x)\Vert ^2 d\sigma (x) = M_2(\mu ) \end{aligned}$$

Proof

By the change of variables formula, we have

$$\begin{aligned} \int T_{\sigma ,\mu ;P}(x) d\sigma (x)&= \sum _{i=1}^n \theta _i\int T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}}(x\cdot \theta _i)d\sigma (x) = \sum _{i=1}^n \theta _i\int T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}}(t)d\sigma ^{\theta _i}(t)\\&= \sum _{i=1}^n \theta _i\int zd\mu ^{\theta _i}(z) = \sum _{i=1}^n \theta _i\int y\cdot \theta _i d\mu (y)\\&= \int y d\mu (y), \end{aligned}$$
$$\begin{aligned} \int T_{\sigma }^{\eta }(x)\cdot x d\sigma (x)&= \frac{1}{2}\Big (\int \Vert T_{\sigma }^{\eta }(x)\Vert ^2d\sigma (x)\\ {}&+\int \Vert x\Vert ^2d\sigma (x)- \int \Vert T_{\sigma }^{\eta }(x)-x\Vert ^2d\sigma (x)\Big )\\ {}&=\frac{1}{2}(M_2(\eta )+M_2(\sigma )-W_2^2(\sigma ,\eta )), \end{aligned}$$
$$\begin{aligned} \int \Vert T_{\sigma ,\mu ;P}(x)\Vert ^2 d\sigma (x)&=\int \sum _{i=1}^n |T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}}(x\cdot \theta )|^2d\sigma (x) = \sum _{i=1}^n \int |T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}}(t)|^2d\sigma ^{\theta _i}(t)\\&=\sum _{i=1}^n \int |w|^2d\mu ^{\theta _i}(w) = \sum _{i=1}^n\int |y\cdot \theta _i|^2d\mu (y)\\&= \int \Vert y\Vert ^2d\mu (y)= M_2(\mu ), \end{aligned}$$

where the last steps make use of the fact that \(P=[\theta _1,\cdots ,\theta _n]\) is an orthogonal matrix. \(\square \)

Lemma 23

Let \(\sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\), \(\eta \in {{\mathcal {W}}}_2({\mathbb {R}}^n)\), and \(b\in {\mathbb {R}}^n\). Then

$$\begin{aligned}&\int _{S^{n-1}}\int _{{\mathbb {R}}} tT_{\sigma ^{\theta }}^{\eta ^\theta }(t)d\sigma ^{\theta }(t) du(\theta ) = \frac{M_2(\sigma )+M_2(\eta )-nSW_2^2(\sigma ,\eta )}{2n}\end{aligned}$$
(31)
$$\begin{aligned}&\int _{S^{n-1}}\int _{{\mathbb {R}}} t^2d\sigma ^{\theta }(t)du(\theta ) = \frac{M_2(\sigma ) }{n}\end{aligned}$$
(32)
$$\begin{aligned}&\int _{S^{n-1}}\int _{{\mathbb {R}}} |T_{\sigma ^{\theta }}^{\eta ^\theta }(t)|^2d\sigma ^{\theta }(t)du(\theta ) = \frac{M_2(\eta )}{n}\end{aligned}$$
(33)
$$\begin{aligned}&\int _{S^{n-1}}\int _{{\mathbb {R}}} (b\cdot \theta ) t d\sigma ^{\theta }(t)du(\theta ) = \frac{E(\sigma )\cdot b}{n}\end{aligned}$$
(34)
$$\begin{aligned}&\int _{S^{n-1}}\int _{{\mathbb {R}}} (b\cdot \theta ) T_{\sigma ^{\theta }}^{\eta ^\theta }(t)td\sigma ^{\theta }(t)du(\theta ) = \frac{E(\eta )\cdot b}{n} \end{aligned}$$
(35)

Proof

We note that (32) and (33) are analogous by the change of variables formula, so are (34) and (35). We will first show (32).

$$\begin{aligned} \int _{S^{n-1}}\int _{{\mathbb {R}}} t^2d\sigma ^{\theta }(t)du(\theta )&= \int _{S^{n-1}}\int _{{\mathbb {R}}^n}|x\cdot \theta |^2 d\sigma (x)du(\theta ) \\&\hspace{-.4cm}\overset{\text {Fubini}}{=} \int _{{\mathbb {R}}^n}\int _{S^{n-1}}|x\cdot \theta |^2 du(\theta )d\sigma (x)\\&= \int _{{\mathbb {R}}^n}\frac{\Vert x\Vert ^2}{2}d\sigma (x)\\&=\frac{M_2(\sigma )}{n}. \end{aligned}$$

For (34), we have

$$\begin{aligned}&\int _{S^{n-1}}\int _{{\mathbb {R}}} (b\cdot \theta ) t d\sigma ^{\theta }(t)du(\theta )\\&\quad = \int _{S^{n-1}}b\cdot \theta \int _{{\mathbb {R}}^n}x\cdot \theta d\sigma (x)du(\theta )\\&\quad = \int _{S^{n-1}}(b\cdot \theta )(E(\sigma )\cdot \theta ) du(\theta )\\&\quad = \int _{S^{n-1}}\frac{1}{2}\Big (|b\cdot \theta )|^2+|E(\sigma )\cdot \theta |^2-|(b-E(\sigma ))\cdot \theta |^2\Big )du(\theta )\\&\quad = \frac{1}{2n}\Big (\Vert b\Vert ^2+\Vert E(\sigma )\Vert ^2- \Vert b-E(\sigma )\Vert ^2\Big )\\&\quad = \frac{E(\sigma )\cdot b}{n}. \end{aligned}$$

With (32) and (33), we have (31):

$$\begin{aligned}&\int _{S^{n-1}}\int _{{\mathbb {R}}} tT_{\sigma ^{\theta }}^{\eta ^\theta }(t)d\sigma ^{\theta }(t) du(\theta ) \\ {}&=\frac{1}{2} \int _{S^{n-1}}\int _{{\mathbb {R}}}\Big (t^2+(T_{\sigma ^{\theta }}^{\eta ^\theta }(t))^2- (t-T_{\sigma ^{\theta }}^{\eta ^\theta }(t))^2\Big ) d\sigma ^{\theta }(t) du(\theta )\\&= \frac{1}{2n}\Big (M_2(\sigma )+M_2(\eta )- n\int _{S^{n-1}}W_2^2(\sigma ^{\theta },\eta ^{\theta })du(\theta )\Big )\\&= \frac{M_2(\sigma )+M_2(\eta )- nSW_2^2(\sigma ,\eta )}{2n}. \end{aligned}$$

\(\square \)

Lemma 24

Let \(\sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\) and \(\mu \in {{\mathcal {W}}}_2({\mathbb {R}}^n)\) and \(P = [\theta _1, \cdots , \theta _n]\in O(n)\). Then

$$\begin{aligned} W_2^2(\sigma ,\mu ) \ge \sum _{i=1}^nW_2^2(\sigma ^{\theta _i},\mu ^{\theta _i}). \end{aligned}$$

Proof

By [19, Proposition 5.1.3],

$$\begin{aligned} W_2^2(\sigma ^{\theta },\mu ^{\theta })\le \int |\theta \cdot x-\theta \cdot y|^2d\gamma ^*(x,y), \end{aligned}$$

where \(\gamma ^*\) is the optimal transport plan between \(\sigma \) and \(\mu \). Then

$$\begin{aligned} \sum _{i=1}^n W_2^2(\sigma ^{\theta _i},\mu ^{\theta _i})&\le \int \sum _{i=1}^n |\theta _i\cdot (x-y)|^2d\gamma ^*(x,y)\\&= \int \Vert x-y\Vert ^2d\gamma ^*(x,y)\\&= W_2^2(\sigma ,\mu ). \end{aligned}$$

\(\square \)

Lemma 25

Let \(h:{\mathbb {R}}^n\rightarrow {\mathbb {R}}^n\) and \(\sigma ({\mathbb {R}}^n)=1\). Then

$$\begin{aligned} \int \Vert h(x)\Vert ^2d\sigma (x) \Vert \ge \Vert \int h(x)d\sigma (x)\Vert ^2, \end{aligned}$$

where equality holds if and only if \(h(x)=v\) \(\sigma \)-a.e. for some \(v\in {\mathbb {R}}^n\).

Proof

Let \(h(x)= [h_1(x), \cdots , h_n(x)]^t\). By Hölder’s inequality,

$$\begin{aligned} \int |h_i(x)|d\sigma (x&)\le \left( \int |h_i(x)|^2d\sigma (x)\right) ^{1/2}\left( \int 1^2 d\sigma (x)\right) ^{1/2} \\&= \left( \int |h_i(x)|^2d\sigma (x)\right) ^{1/2}. \end{aligned}$$

Squaring the above inequality and summing over i gives the desired inequality. Observe that equality holds if and only if \(h_i(x)=v_i\) for some constant \(v_i\in {\mathbb {R}}\). \(\square \)

Lemma 26

Let \(\sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\), and \( M_2(\sigma ), E(\sigma )\) be defined as in Proposition 18. Then

$$\begin{aligned} M_2(\sigma )- \Vert E(\sigma )\Vert ^2>0. \end{aligned}$$

Proof

Since x is not a constant vector \(\sigma \)-a.e. (\(\sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\)), it follows from Lemma 25 with \(h(x)=x\) that

$$\begin{aligned} \int \Vert x\Vert ^2 d\sigma (x) > \Vert \int x d\sigma (x)\Vert ^2. \end{aligned}$$

\(\square \)

Lemma 27

Let \(\sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n), \mu \in {{\mathcal {W}}}_2({\mathbb {R}}^n)\) and \(\phi \) be a convex function such that \(\triangledown \phi = T_{\sigma }^{\mu }\) given by Brenier’s theorem (see e.g., [37, Theorem 1.48]). If \(\phi \) is differentiable at \(E(\sigma )\), where \(E(\sigma ) = \int xd\sigma (x)\), then

$$\begin{aligned} \int T_{\sigma }^{\mu }(x)\cdot x d\sigma (x)- \Big (\int xd\sigma (x)\Big )\cdot \Big (\int T_{\sigma }^{\mu }(x)d\sigma (x)\Big )\ge 0. \end{aligned}$$
(36)

Proof

Let \(A = \{x\in {\mathbb {R}}^n: \phi \text{ is } \text{ differentiable } \text{ at } x\}\). Since \(\phi \) is \(\sigma \)-a.e. differentiable, we have \(\sigma (A)=1\). Then it follows from the convexity of \(\phi \) that

$$\begin{aligned} (\triangledown \phi (x)-\triangledown \phi (E(\sigma )))\cdot (x-E(\sigma ))\ge 0, \quad \forall x\in A. \end{aligned}$$
(37)

Hence

$$\begin{aligned} \int _{A} (T_{\sigma }^{\mu }(x)-T_{\sigma }^{\mu }(E(\sigma )))\cdot (x-E(\sigma )) d\sigma (x)\ge 0, \end{aligned}$$

which is exactly the desired inequality (36) by a direct computation using \(\sigma (A)=1\):

$$\begin{aligned}&-\int T_{\sigma }^{\mu }(E(\sigma ))\cdot x d\sigma (x)- \int T_{\sigma }^{\mu }(x)\cdot E(\sigma ) d\sigma (x) + T_{\sigma }^{\mu }(E(\sigma ))\cdot E(\sigma ) \\ {}&= -T_{\sigma }^{\mu }(E(\sigma )) \cdot E(\sigma ) - \Big (\int xd\sigma (x)\Big )\cdot \Big (\int T_{\sigma }^{\mu }(x)d\sigma (x)\Big )+ T_{\sigma }^{\mu }(E(\sigma )) \cdot E(\sigma )\\&= \Big (\int xd\sigma (x)\Big )\cdot \Big (\int T_{\sigma }^{\mu }(x)d\sigma (x)\Big ). \end{aligned}$$

\(\square \)

Remark 12

The same conclusion holds if the assumption were “\(E(\sigma )\) lies in the support of \(\sigma \)" instead of \(\phi \) being differentiable at \(E(\sigma )\), which can be proved using the fact that the support of optimal transport plan is cyclically monotone.

Remark 13

Given the assumptions in:Lemma 27, one can show that the inequality is strict if in addition, there exists a ball B(xr), where x lies in the support of \(\sigma \), such that for any \( \lambda \in (0,1)\) and \(y\in B(x,r)\)

$$\begin{aligned} \phi ((1-\lambda )y+\lambda E(\sigma )) < (1-\lambda )\phi (y)+\lambda \phi (E(\sigma )), \end{aligned}$$

which guarantees that the inequality (37) is strict for y in a set with positive measure. In particular, if furthermore \(\phi \) in Lemma 27 is strictly convex, the strict inequality holds.

Proposition 28

Let \(\sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\) and \(\mu = T^b_{\sharp }\sigma \) with \(T^b(x)= x+b, b\ne 0\in {\mathbb {R}}^n\). Consider iteration \(\sigma _{k+1}=(T_{\sigma _k,\mu ; \theta _k})_\sharp \sigma _k\), with \(\sigma _0=\sigma \) and where \(\theta _k\) is chosen i.i.d. according to the uniform measure on \(S^{n-1}\). Then

$$\begin{aligned} \sigma _k \xrightarrow {a.s.} \mu \quad \text {in~} W_2. \end{aligned}$$

Proof

By a direct computation, \(T_{\sigma _k}^{\mu }(x) =x+b_k\), where

$$\begin{aligned} b_{k+1}= b_k - \theta _k(\theta _k\cdot b_k). \end{aligned}$$

To show \(\sigma _k\rightarrow \mu \) almost surely, it suffices to show that \(b_k \rightarrow 0\) almost surely. By symmetry of \(S^{n-1}\), we assume without of generality that \(b_0 = [1,0,\cdots ,0]^t\). Note that \(\Vert b_1\Vert ^2= 1-|\theta _0\cdot b_0|^2\). Consider the spherical coordinates for \(S^{n-1}\) with \(\phi _1,\ldots , \phi _{n-2}\in [0,\pi ]\) and \(\phi _{n-1}\in [0,2\pi ]\):

$$\begin{aligned} \begin{aligned}&x_{1} = \cos (\varphi _{1}),\quad x_{2} = \sin (\varphi _{1})\cos (\varphi _{2}), \quad x_{3} = \sin (\varphi _{1})\sin (\varphi _{2})\cos (\varphi _{3}) \\&\hspace{6cm} \cdots \\&x_{n-1} = \sin (\varphi _{1})\cdots \sin (\varphi _{n-2})\cos (\varphi _{n-1}),\ x_{n} = \sin (\varphi _{1})\cdots \sin (\varphi _{n-2})\sin (\varphi _{n-1}). \\ \end{aligned} \end{aligned}$$

The corresponding Jacobian is \(\sin ^{n-2}(\varphi _1)\sin ^{n-3}(\varphi _2)\cdots \sin \varphi _{n-2}\). A direct computation gives

$$\begin{aligned} {\mathbb {E}}[|\theta _0\cdot b_0|^2]&= \frac{\int _0^{\pi } \sin ^{n-2}(\varphi _1)\cos ^2(\varphi _1)d\varphi _1}{\int _0^{\pi }\sin ^{n-2}(\varphi _1)d\varphi _1}\\&= 1-\frac{\int _0^{\pi } \sin ^{n}(\varphi _1)d\varphi _1}{\int _0^{\pi }\sin ^{n-2}(\varphi _1)d\varphi _1} \\ {}&= \rho <1. \end{aligned}$$

Hence \( {\mathbb {E}}[\Vert b_1\Vert ^2] = 1-\rho \in (0,1)\). By symmetry and induction, one can show that

$$\begin{aligned} {\mathbb {E}}[\Vert b_k\Vert ^2] = (1-\rho )^k \xrightarrow {k\rightarrow \infty } 0. \end{aligned}$$

Since \(\Vert b_{k+1}\Vert \le \Vert b_k\Vert \), by the monotone convergence theorem, we have

$$\begin{aligned} {\mathbb {E}}[\Vert b_k\Vert ^2]\longrightarrow {\mathbb {E}}[\alpha _{\infty }^2], \end{aligned}$$

where \(\alpha _{\infty } = \lim \alpha _k\) and \(\alpha _k = \Vert b_k\Vert \), which implies \(\alpha _{\infty }= 0\) almost surely and hence \(b_k \rightarrow 0\) almost surely. \(\square \)

Lemma 29

Let \(\sigma ,\mu \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\). Then \((T_{\sigma ,\mu ;P})_\sharp \sigma \in {{\mathcal {W}}}_{2,ac}({\mathbb {R}}^n)\), for any \(P\in O(n)\).

Proof

Let \(P = [\theta _1,\cdots , \theta _n]\). A direct computation shows

$$\begin{aligned} \triangledown T_{\sigma ,\mu ;P}(x) = P \begin{bmatrix} (T_{\sigma ^{\theta _2}}^{\mu ^{\theta _i}})^{\prime }({{x}}\cdot {{\theta }}_1)&{} &{}&{}\\ &{} (T_{\sigma ^{\theta _2}}^{\mu ^{\theta _i}})^{\prime }({{x}}\cdot {{\theta }}_2)&{}&{}\\ &{}&{} \ddots &{}\\ &{}&{}&{} (T_{\sigma ^{\theta _n}}^{\mu ^{\theta _i}})^{\prime }({{x}}\cdot {{\theta }}_n) \end{bmatrix}P^t. \end{aligned}$$

Following similar arguments as in [38, Proof of Lemma 1, p. 949], it suffices to show that there exists a set \(\Sigma \) such that (i) \(\sigma ({\mathbb {R}}^n\setminus \Sigma )=0\) (ii) \(T_{\sigma ,\mu ;P}|_{\Sigma }\) is injective and \( \triangledown T_{\sigma ,\mu ;P}\) is positive definite on \(\Sigma \). To this end, it suffices to observe that \(T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}}\) is injective and \((T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}})^{\prime }>0\) outside a set \(U_i\) that is \(\sigma ^{\theta _i}\)-negligible, i.e., \(\sigma ^{\theta _i}(U_i)=0\). Here we have used the fact that \(T_{\sigma ^{\theta _i}}^{\mu ^{\theta _i}}\) exists and is unique given that \(\sigma \in {\mathcal {P}}_{ac}({\mathbb {R}}^n)\) (and hence \(\sigma ^{\theta _i}\) is absolutely continuous, see e.g., Box 2.4. in [37, p. 82]). The fact that \(M_2((T_{\sigma ,\mu ;P})_\sharp \sigma )\) is finite follows from (14) and that \(M_2(\mu )<\infty \). \(\square \)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Li, S., Moosmüller, C. Approximation properties of slice-matching operators. Sampl. Theory Signal Process. Data Anal. 22, 15 (2024). https://doi.org/10.1007/s43670-024-00089-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s43670-024-00089-7

Keywords

Mathematics Subject Classification

Navigation