Parallelly Sliced Optimal Transport on Spheres and on the Rotation Group

Quellmalz, Michael; Buecher, Léo; Steidl, Gabriele

doi:10.1007/s10851-024-01206-w

Parallelly Sliced Optimal Transport on Spheres and on the Rotation Group

Open access
Published: 26 July 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript

Parallelly Sliced Optimal Transport on Spheres and on the Rotation Group

Download PDF

Michael Quellmalz¹,
Léo Buecher^1,2 &
Gabriele Steidl¹

100 Accesses
Explore all metrics

Abstract

Sliced optimal transport, which is basically a Radon transform followed by one-dimensional optimal transport, became popular in various applications due to its efficient computation. In this paper, we deal with sliced optimal transport on the sphere $\mathbb {S}^{d-1}$ and on the rotation group $\textrm{SO}(3)$. We propose a parallel slicing procedure of the sphere which requires again only optimal transforms on the line. We analyze the properties of the corresponding parallelly sliced optimal transport, which provides in particular a rotationally invariant metric on the spherical probability measures. For $\textrm{SO}(3)$, we introduce a new two-dimensional Radon transform and develop its singular value decomposition. Based on this, we propose a sliced optimal transport on $\textrm{SO}(3)$. As Wasserstein distances were extensively used in barycenter computations, we derive algorithms to compute the barycenters with respect to our new sliced Wasserstein distances and provide synthetic numerical examples on the 2-sphere that demonstrate their behavior for both the free- and fixed-support setting of discrete spherical measures. In terms of computational speed, they outperform the existing methods for semicircular slicing as well as the regularized Wasserstein barycenters.

The Slice Approximating Property and Figiel-Type Problem on Unit Spheres

Article 08 December 2023

The Vertical Slice Transform on the Unit Sphere

Article 01 August 2019

Lattice points in d-dimensional spherical segments

Article 09 October 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Optimal transport (OT) deals with the problem of finding the most efficient way to transport probability measures. The Wasserstein distance is a metric on the space of probability measures and has received much attention [55, 67, 75], e.g., for neural gradient flows [4, 26, 35, 45] in machine learning. Since OT on multi-dimensional domains is hard to compute, there exist different modifications that allow an efficient computation, such as the entropic regularization that yields the Sinkhorn algorithm [7, 21, 42, 55]. Sliced OT on Euclidean spaces utilizes the Radon transform to reduce the problem to the real line [16, 53, 62, 67], where OT possesses an analytic solution that can be computed efficiently. The notion of sliced OT can be generalized to other Radon-like transforms [43]. Specific geometries have been considered such as spheres [13, 58], manifolds with constant negative curvature [15] or separable Hilbert spaces [33].

In this paper, we are interested in sliced OT on special manifolds. A slicing approach on Riemannian manifolds based on eigenfunctions of the Laplacian was proposed in [66]. OT on the sphere has been intensely studied, e.g., the computation of Wasserstein barycenters [69, 70], the regularity of optimal maps [48], isometric rigidity of Wasserstein spaces [29], a connection with a Monge–Ampère-type equation [32, 50, 78] or a variational framework [20]. Sliced OT was generalized to spheres in two different ways: Bonet et al. [14] introduced a slicing along semicircles to reduce the OT problem to one-dimensional circle, see Fig. 1 right. This requires only OT computations on circles which was examined in [23]. The respective sliced Wasserstein distance is a metric in the space of probability measures on the 2-sphere [58]. Note that Radon transforms on such semicircles have been considered before in [30, 38], providing an extension of the Funk–Radon transform [28, 34, 39, 49, 60]. A second approach [58] of sliced spherical Wasserstein distances is based on the vertical slice transform [40, 64, 79]. This yields a family of measures on the unit interval instead of the circle and is therefore faster to compute than the first approach. However, this vertical slicing approach provides only a metric for even measures on the 2-sphere, i.e., the same values are taken on the upper and lower hemisphere. This is a serious restriction for practical applications.

In this paper, we generalize the vertically sliced OT from even measures to arbitrary probability measures by constructing a so-called parallelly sliced OT, see Fig. 1 for an illustration. We provide a method for spheres $\mathbb {S}^{d-1}$ in general dimensions d. The key advantages are that the respective sliced Wasserstein distance is a rotationally invariant metric on the spherical probability measures and that it is faster to compute than the semicircular sliced Wasserstein distance since we project on intervals instead of circles. Our numerical tests indicate a speedup between 40 and 100 times. In Theorem 3.7, we prove estimates between the spherical Wasserstein distance and its parallelly sliced version.

Furthermore, we consider OT in the group $\textrm{SO}(3)$ of three-dimensional rotation matrices, which has applications in the synchronization of probability measures on rotations [11]. A Radon transform along one-dimensional geodesics of $\textrm{SO}(3)$ was proposed in [37, 72], but, for the purpose of OT, we require slicing along two-dimensional submanifolds of $\textrm{SO}(3)$. Therefore, we develop a new two-dimensional Radon transform on $\textrm{SO}(3)$, including its singular value decomposition and adjoint operator. This paves the way to prove that the corresponding sliced Wasserstein distance fulfills the metric properties on the set of probability measures on $\textrm{SO}(3)$.

Barycenter computations with respect to Wasserstein distances and their sliced variants are of increasing interest [16, 55, 62]. Therefore, we deal with barycenters with respect to our new sliced Wasserstein distances and describe their computation in both the free- and fixed-support discrete setting, as well as so-called Radon Wasserstein barycenter. As proof of the concept, we give numerical examples of barycenter computations on the 2-sphere. We compare our approach with the semicircular slicing [13] as well as the entropy-regularized Wasserstein barycenter computed with PythonOT [27].

Outline of the paper. We provide the basic preliminaries on OT and the manifolds ${\mathbb {S}}^{d-1}$ and $\textrm{SO}(3)$ in Sect. 2. The parallel slice transform for functions and measures, and the corresponding parallelly sliced Wasserstein p-distances are introduced in Sect. 3. In Sect. 4, we generate sliced Wasserstein distances on $\textrm{SO}(3)$ based on our new two-dimensional Radon transform on $\textrm{SO}(3)$. Barycenter computations are examined in two different ways, namely sliced Wasserstein barycenters and Radon Wasserstein barycenters in Sect. 5. In the former case, we deal both with free- and fixed-support discretization. In Sect. 6, we demonstrate by synthetic numerical examples the performance of our barycenter algorithms on the 2-sphere. In particular, we compare our parallel slicing approach with the slicing method of Bonet et al. [14]. Here some theoretically expected phenomena are illustrated. Finally, conclusions are drawn in Sect. 7. The appendix contains technical proofs.

2 Preliminaries

In this section, we provide the notation and necessary preliminaries on OT, in particular on the interval, and harmonic analysis on the unit sphere on ${\mathbb {R}}^d$.

2.1 Measures and OT

Let $\mathbb {X}$ be a compact Riemannian manifold with metric $d:\mathbb {X}\times \mathbb {X}\rightarrow \mathbb {R}$, and let $\mathcal {B}(\mathbb {X})$ be the Borel $\sigma $-algebra induced by d. We denote by $\mathcal {M}(\mathbb {X})$ the Banach space of signed, finite measures, and by $\mathcal {P}(\mathbb {X})$ the subset of probability measures on $\mathbb {X}$. The pre-dual space of $\mathcal {M}(\mathbb {X})$ is the space of continuous functions $C(\mathbb {X})$. Let $\mathbb {Y}$ be another compact manifold and $T:\mathbb {X}\rightarrow \mathbb {Y}$ be measurable. For $\mu \in \mathcal {M}(\mathbb {X})$, we define the push-forward measure $T_\# \mu :=\mu \circ T^{-1} \in \mathcal {M}({\mathbb {Y}})$.

The p-Wasserstein distance, $p\in [1,\infty )$, of $\mu ,\nu \in \mathcal {P}({\mathbb {X}})$ is given by

$$\begin{aligned} {{\,\textrm{W}\,}}_p^p(\mu ,\nu ) :=\min _{\pi \in \Pi (\mu ,\nu )} \int _{{\mathbb {X}}^2} d^p(x,y) \, \textrm{d}\pi (x,y), \end{aligned}$$

(1)

with $ \Pi (\mu ,\nu ) :=\{\pi \in \mathcal {M}({\mathbb {X}} \times {\mathbb {X}}): \pi (B \times {\mathbb {X}}) = \mu (B), \pi ({\mathbb {X}}\times B) = \nu (B) \ \text {for all } B \in \mathcal {B}({\mathbb {X}})\} $. It defines a metric on $\mathcal {P}({\mathbb {X}})$. The metric space $\mathcal {P}^p({\mathbb {X}}) :=(\mathcal {P}({\mathbb {X}}),W_p)$ is called p-Wasserstein space and, in case $p=2$, just Wasserstein space. The p-Wasserstein distance is a special case of the more general optimal transport (OT) problem, where $d^p(x,y)$ can be replaced by a more general cost function c(x, y). For ${\varvec{\lambda }}\in \Delta _M :=\{{\varvec{\lambda }}\in [0,1]^M\mid \sum _{i=1}^{M} \lambda _i=1\},$ the Wasserstein barycenter of $\mu _i\in \mathcal {P}^2(\mathbb {X})$, $i\in \llbracket M\rrbracket :=\{1,\dots ,M\}$, is the minimizer

$$\begin{aligned} {{\,\textrm{Bary}\,}}^{{{\,\textrm{W}\,}}}_X(\mu _i, \lambda _i)_{i=1}^{M} :=\mathop {\mathrm {arg\,min}}\limits _{\nu \in \mathcal {P}(\mathbb {X})} \sum _{i=1}^{M} \lambda _i\, {{\,\textrm{W}\,}}_2^2(\nu ,\mu _i), \end{aligned}$$

(2)

see [3]. The Wasserstein barycenter of absolutely continuous measures is unique [41].

OT on the Interval If $\mathbb {X}$ is the unit interval $\mathbb {I}:=[-1,1]$ with the distance $d(x,y) = \left| x-y\right| $, the OT between two probability measures $\mu ,\nu \in \mathcal {P}(\mathbb {I})$ can be computed easily [55, 67, 75] using the cumulative distribution function $F_\mu (x) :=\mu ([-1,x])$, $x\in \mathbb {I}$, which is non-decreasing and right continuous. Its pseudoinverse, the quantile function $ F_\mu ^{-1}(r) :=\min \{ x\in \mathbb {I}\mid F_\mu (x)\ge r\}$, $r\in [0,1]$, is non-decreasing and left continuous. The p-Wasserstein distance (1) between $\mu ,\nu \in \mathcal {P}^p(\mathbb {I})$ now equals ${{\,\textrm{W}\,}}_p(\mu ,\nu ) = \Vert F_\mu ^{-1} - F_\nu ^{-1} \Vert _{L^p([0,1])}$. If $\mu \in \mathcal {P}_{\textrm{ac}}(\mathbb {I})$, where $\mathcal {P}_{\textrm{ac}}(\mathbb {I})$ denotes the probability measures that are absolutely continuous with respect to the Lebesgue measure, then the OT plan $\pi $ in (1) is uniquely given by

$$\begin{aligned} \pi = ({{\,\textrm{Id}\,}}, T^{\mu ,\nu })_\# \mu \quad \text {with}\quad T^{\mu ,\nu }(x) :=F_\nu ^{-1}(F_\mu (x)), \quad x \in \mathbb {I}. \end{aligned}$$

Based on the OT map $T^{\mu ,\nu }$, the Wasserstein space $\mathcal {P}^p(\mathbb {I})$ can be isometrically embedded into $L^p_\omega (\mathbb {I})$ with $\omega \in \mathcal {P}_{\textrm{ac}}(\mathbb {I})$ [8, 44, 54], where $L^p_\omega (\mathbb {I})$ consists of all p-integrable functions with respect to $\omega $. For a reference measure $\omega \in \mathcal {P}_{\textrm{ac}}(\mathbb {I})$, the cumulative distribution transform (CDT) is defined by ${{\,\textrm{CDT}\,}}_\omega :\mathcal {P}^p(\mathbb {I}) \rightarrow L^p_\omega (\mathbb {I})$ with

$$\begin{aligned} {{\,\textrm{CDT}\,}}_{\omega }[\mu ] (x) :=(T^{\omega ,\mu } - {{\,\textrm{Id}\,}})(x)= & {} \bigl (F_{\mu }^{-1} \circ F_{\omega } \bigr ) (x) - x, \nonumber \\{} & {} \quad x\in \mathbb {I}. \end{aligned}$$

(3)

The CDT is in fact a mapping from $\mathcal {P}^p(\mathbb {I})$ into the tangent space of $\mathcal {P}^p(\mathbb {I})$ at $\omega $, see [5, § 8.5]. Due to the relation to the OT map, the CDT can be inverted by $\mu = {{\,\textrm{CDT}\,}}^{-1}_\omega [h] :=(h + {{\,\textrm{Id}\,}})_\# \omega $ for $h = {{\,\textrm{CDT}\,}}_\omega [\mu ]$. If $\mu ,\omega \in \mathcal {P}_{\textrm{ac}}(\mathbb {I})$ possess positive density functions $f_\mu $ and $f_\omega $, then, by the transformation formula for push-forward measures, $f_\mu $ can be recovered by

$$\begin{aligned} f_\mu (x)= & {} \left( g^{-1} \right) '(x)\, f_\omega (g^{-1}(x)) \quad \text {with} \nonumber \\ g(x)= & {} {{\,\textrm{CDT}\,}}_\omega [\mu ](x) + x,\quad x \in \mathbb {I}. \end{aligned}$$

(4)

For $\mu _i \in \mathcal {P}(\mathbb {I})$ and an arbitrary reference $\omega \in \mathcal {P}_{\textrm{ac}}(\mathbb {I})$, the Wasserstein barycenter (2) has the form [44]

$$\begin{aligned} {{\,\textrm{Bary}\,}}_\mathbb {I}(\mu _i,\lambda _i)_{i=1}^{M} = {{\,\textrm{CDT}\,}}^{-1}_{\omega }\left( \sum _{i=1}^{M} \lambda _i {{\,\textrm{CDT}\,}}_\omega [\mu _i] \right) . \end{aligned}$$

(5)

Sliced OT Given another compact space $\mathbb {D}$ with a probability measure $u_\mathbb {D}$ and a slicing operator $\mathcal {S}_{\psi } :\mathbb {X}\rightarrow \mathbb {R}$ for all $\psi \in \mathbb {D}$, we define the sliced p-Wasserstein distance

$$\begin{aligned} {{\,\textrm{SW}\,}}_p^p(\mu ,\nu ) :=\int _\mathbb {D}{{\,\textrm{W}\,}}_p^p((\mathcal {S}_{\psi })_\#\mu , (\mathcal {S}_{\psi })_\#\nu ) \, \textrm{d}u_{\mathbb {D}}(\psi ). \end{aligned}$$

(6)

Sliced Wasserstein distances on the Euclidean space $\mathbb {X}=\mathbb {R}^d$ with the slicing operator $\mathcal {S}^{\mathbb {R}^d}_{\varvec{\psi }}:=\left\langle {\varvec{\psi }},\cdot \right\rangle $ for ${\varvec{\psi }}\in \mathbb {D}= \mathbb {S}^{d-1}$ are well known [16, 62, 67]. Sliced OT is closely related to the Radon transform

$$\begin{aligned} \mathcal {R}_{\varvec{\psi }}:\mathcal {P}(\mathbb {R}^d)\rightarrow \mathcal {P}(\mathbb {R}),\quad \mu \mapsto (\mathcal {S}^{\mathbb {R}^d}_{\varvec{\psi }})_\# \mu . \end{aligned}$$

The Radon transform is often defined for functions on $\mathbb {R}^d$ via an integral, see [52].

2.2 Sphere

Let $d\in \mathbb {N}$ with $d\ge 3$. We define the $(d-1)$-dimensional unit sphere in $\mathbb {R}^d$ by

$$\begin{aligned} \mathbb {S}^{d-1} :=\{{{\varvec{x}}}\in \mathbb {R}^d\mid \left\Vert {{\varvec{x}}} \right\Vert =1\}, \end{aligned}$$

and denote the canonical unit vectors by $\varvec{e}^j\in \mathbb {R}^d$ for $j \in \llbracket d\rrbracket :=\{1,\dots ,d\}$. The geodesic distance on the sphere $\mathbb {S}^{d-1}$ reads as

$$\begin{aligned} d(\varvec{\xi },\varvec{\eta }) :=\arccos (\left\langle \varvec{\xi },\varvec{\eta }\right\rangle ),\qquad \forall \varvec{\xi },\varvec{\eta }\in \mathbb {S}^{d-1}, \end{aligned}$$

(7)

and we denote the volume of $\mathbb {S}^{d-1}$ by

$$\begin{aligned} \left| \mathbb {S}^{d-1}\right| :=\int _{\mathbb {S}^{d-1}} \, \textrm{d}\sigma _{\mathbb {S}^{d-1}} = \frac{2\pi ^{d/2}}{\Gamma (d/2)}, \end{aligned}$$

(8)

where $\sigma _{\mathbb {S}^{d-1}}$ is the surface measure on $\mathbb {S}^{d-1}$. Normalizing $\sigma _{\mathbb {S}^{d-1}}$ yields the uniform measure $u_{\mathbb {S}^{d-1}} :=\left| {\mathbb {S}^{d-1}}\right| ^{-1} \sigma _{\mathbb {S}^{d-1}}$. We can write any vector $\varvec{\xi }\in \mathbb {S}^{d-1}$ as

$$\begin{aligned} \varvec{\xi }= \begin{pmatrix}\sqrt{1-t^2}\,\varvec{\eta }\\ t \end{pmatrix} \qquad \text {for}\quad \varvec{\eta }\in \mathbb {S}^{d-2},\ t\in \mathbb {I}, \end{aligned}$$

then the surface measure on the sphere $\mathbb {S}^{d-1}$ decomposes as [6, (1.16)]

$$\begin{aligned} \textrm{d}\sigma _{\mathbb {S}^{d-1}}(\varvec{\xi }) = \textrm{d}\sigma _{\mathbb {S}^{d-2}}(\varvec{\eta }) \,(1-t^2)^{\frac{d-3}{2}} \, \textrm{d}t. \end{aligned}$$

(9)

For $\mathbb {S}^2$, we denote the bijective spherical coordinate transform by

$$\begin{aligned}{} & {} {{\,\mathrm{\Phi }\,}}:[0,2\pi )\times (0,\pi )\cup \{0\}\times \{0,\pi \}\rightarrow \mathbb {S}^2,\nonumber \\{} & {} \quad (\varphi ,\theta )\mapsto (\cos \varphi \sin \theta ,\sin \varphi \sin \theta ,\cos \theta )^\top . \end{aligned}$$

(10)

Spherical harmonics Let $n\in \mathbb {N}_0$. We denote by $\mathbb {Y}_{n,d}$ the space of all polynomials $f:\mathbb {R}^d\rightarrow \mathbb {C}$ which are harmonic, i.e., the Laplacian $\Delta f$ vanishes everywhere, and homogeneous of degree n, i.e., $f(\alpha {{\varvec{x}}}) = \alpha ^n f({{\varvec{x}}})$ for all $\alpha \in \mathbb {R}$ and ${{\varvec{x}}}\in \mathbb {R}^d$, restricted to the sphere $\mathbb {S}^{d-1}$. Setting

$$\begin{aligned} N_{n,d} :=\dim (\mathbb {Y}_{n,d}) = \frac{(2n+d-2)\, (n+d-3)!}{n!\, (d-2)!}, \end{aligned}$$

(11)

we call an orthonormal basis $ \{Y_{n,d}^k \mid k \in \llbracket N_{n,d}\rrbracket \} $ of $\mathbb {Y}_{n,d}$ a basis of spherical harmonics on $\mathbb {S}^{d-1}$ of degree n, cf. [6]. Then $\{ Y_{n,d}^k \mid n\in \mathbb {N}_0,\, k\in \llbracket N_{n,d}\rrbracket \}$ forms an orthonormal basis of $L^2(\mathbb {S}^{d-1})$. In particular, we can write any $f\in L^2(\mathbb {S}^{d-1})$ as spherical Fourier series

$$\begin{aligned} f= & {} \sum _{n=0}^\infty \sum _{k=1}^{N_{n,d}} {\hat{f}}_{n,d}^k\, Y_{n,d}^k, \quad \text {where} \\{} & {} {\hat{f}}_{n,d}^k :=\left\langle f,Y_{n,d}^k\right\rangle _{L^2(\mathbb {S}^{d-1})}. \end{aligned}$$

The Legendre polynomial $P_{n,d}$ of degree $n\in \mathbb {N}_0$ in dimension $d\ge 2$ is given by [6, (2.70)]

$$\begin{aligned} P_{n,d}(t):= & {} (-1)^n\, \frac{(d-3)!!}{(2n+d-3)!!}\, (1-t^2)^{\frac{3-d}{2}}\\{} & {} \left( \frac{\textrm{d}}{\textrm{d}t}\right) ^n (1-t^2)^{n+\frac{d-3}{2}},\qquad t\in [-1,1]. \end{aligned}$$

Up to normalization, the Legendre polynomials are equal to the Gegenbauer or ultraspherical polynomials, see [6, (2.145)]. The normalized Legendre polynomials

$$\begin{aligned}{} & {} {\widetilde{P}}_{n,d}(t) :=\sqrt{\frac{N_{n,d}\left| \mathbb {S}^{d-2}\right| }{\left| \mathbb {S}^{d-1}\right| }}\nonumber \\{} & {} P_{n,d}(t) = \frac{\sqrt{(2n+d-2)\,(n+d-3)!}}{2^{(d-2)/2}\,\sqrt{n!}\,\Gamma (\frac{d-1}{2})}\, P_{n,d}(t) \end{aligned}$$

(12)

satisfy the orthonormality relation $ \int _{-1}^1 {\widetilde{P}}_{n, d}(t)\, \overline{{\widetilde{P}}_{m, d}(t)}\, (1-t^2)^{\frac{d-3}{2}} \, \textrm{d}t = \delta _{n, m}, $ where $\delta $ denotes the Kronecker symbol.

2.3 Rotation Group

We define the rotation group

$$\begin{aligned} \textrm{SO}(d):=\{{{\varvec{Q}}}\in \mathbb {R}^{d\times d}\mid {{\varvec{Q}}}^\top {{\varvec{Q}}}=I, \det ({{\varvec{Q}}})=1\}. \end{aligned}$$

We are especially interested in the 3D rotation group $\textrm{SO}(3)$. Every rotation matrix can be written as the rotation around an axis ${{\varvec{n}}}\in \mathbb {S}^2$ with an angle $\omega \in \mathbb {T}{:=\mathbb {R}/(2\pi \mathbb {Z})}$, i.e.,

$$\begin{aligned} \textrm{R}_{{{\varvec{n}}}}(\omega ):= & {} (1-\cos \omega )\, {{\varvec{n}}}{{\varvec{n}}}^\top \nonumber \\{} & {} + \begin{pmatrix} \cos \omega &{} - n_3 \sin \omega &{} n_2 \sin \omega \\ n_3 \sin \omega &{} \cos \omega &{} n_1 \sin \omega \\ - n_2\sin \omega &{} n_1 \sin \omega &{} \cos \omega \end{pmatrix} \in \textrm{SO}(3). \qquad \end{aligned}$$

(13)

Furthermore, we consider the Euler angle parameterization

$$\begin{aligned}{} & {} {{\,\mathrm{\Psi }\,}}:\mathbb {T}\times [0,\pi ]\times \mathbb {T}\rightarrow \textrm{SO}(3),\nonumber \\{} & {} {{\,\mathrm{\Psi }\,}}(\alpha ,\beta ,\gamma ) :=\textrm{R}_{{{\varvec{e}}}^3}(\alpha ) \textrm{R}_{{{\varvec{e}}}^2}(\beta ) \textrm{R}_{{{\varvec{e}}}^3}(\gamma ). \end{aligned}$$

(14)

The rotationally invariant Lebesgue measure $\sigma _{\textrm{SO}(3)}$ on $\textrm{SO}(3)$ is given by

$$\begin{aligned}&\int _{\textrm{SO}(3)} f({{\varvec{Q}}}) \, \textrm{d}\sigma _{\textrm{SO}(3)}({{\varvec{Q}}}) \nonumber \\&\quad = 2 \int _{0}^{\pi } \int _{\mathbb {S}^2} f(\textrm{R}_{\varvec{\xi }}(\omega )) (1-\cos (\omega )) \, \textrm{d}\sigma _{\mathbb {S}^2}(\varvec{\xi }) \, \textrm{d}\omega , \end{aligned}$$

(15)

see [36, p. 8], and the uniform measure on $\textrm{SO}(3)$ is $u_{\textrm{SO}(3)} :=(8\pi ^{2})^{-1} \sigma _{\textrm{SO}(3)}$.

The rotational harmonics or Wigner D-functions $D_n^{k,j}$ of degree $n\in \mathbb {N}_0$ and orders $k,j\in \{-n,\dots ,n\}$ are defined by

$$\begin{aligned} D_n^{k,j} ({{\,\mathrm{\Psi }\,}}(\alpha ,\beta ,\gamma )) :=\textrm{e}^{-\textrm{i}k\alpha }\, d_n^{k,j} (\cos \beta )\, \textrm{e}^{-\textrm{i}j\gamma }, \end{aligned}$$

(16)

where the Wigner d-functions are given for $t\in [-1,1]$ by

$$\begin{aligned} d_n^{k,j}(t)&:=\frac{(-1)^{n-j}}{2^n} \sqrt{\frac{(n+k)!(1-t)^{j-k}}{(n-j)!(n+j)!(n-k)!(1+t)^{j+k}}}\\&\quad \frac{\textrm{d}^{n-k}}{\textrm{d}t^{n-k}} \frac{(1+t)^{n+j}}{(1-t)^{-n+j}}, \end{aligned}$$

see [73, chap. 4]. The rotational harmonics satisfy the orthogonality relations

$$\begin{aligned} \int _{\textrm{SO}(3)} D_n^{j,k}({{\varvec{Q}}}) D_{n'}^{j',k'}({{\varvec{Q}}}) \, \textrm{d}\sigma _{\textrm{SO}(3)}({{\varvec{Q}}}) = \frac{8\pi ^2}{2n+1} \delta _{n,n'} \delta _{k,k'} \delta _{j,j'} \nonumber \\ \end{aligned}$$

(17)

and

$$\begin{aligned} \int _{0}^{\pi } d_n^{j,k}(\beta ) d_{n'}^{j,k}(\beta )\, \sin (\beta ) \, \textrm{d}\beta = \frac{2}{2n+1} \delta _{n,n'} \end{aligned}$$

(18)

for all $n,n'\in \mathbb {N}_0,$ $j,k=-n,\dots ,n,$ and $j',k'=-n',\dots ,n'$. The normalized rotational harmonics

$$\begin{aligned} {\widetilde{D}}_n^{j,k} :=\sqrt{\frac{2n+1}{8\pi ^2}}\, D_n^{j,k},\qquad n \in {\mathbb {N}}_0,\ j,k = -n,\ldots ,n, \nonumber \\ \end{aligned}$$

(19)

form an orthonormal basis of $L^2(\textrm{SO}(3))$.

3 Sliced OT on the Sphere

We present a slicing approach for OT on the sphere $\mathbb {X}=\mathbb {S}^{d-1}$ for $d\ge 3$. We define the parallel slicing operator for a fixed ${\varvec{\psi }}\in \mathbb {S}^{d-1}$ by

$$\begin{aligned} \mathcal {S}_{{\varvec{\psi }}}^{\mathbb {S}^{d-1}} :\mathbb {S}^{d-1} \rightarrow \mathbb {I},\qquad \mathcal {S}_{{\varvec{\psi }}}^{\mathbb {S}^{d-1}} (\varvec{\xi }) :=\left\langle \varvec{\xi }, {\varvec{\psi }}\right\rangle . \end{aligned}$$

(20)

We will omit the superscript of $\mathcal {S}_{\varvec{\psi }}$ if no confusion arises. The corresponding slice is the $(d-2)$-dimensional subsphere

$$\begin{aligned} C_{{\varvec{\psi }}}^{t}:= & {} \mathcal {S}_{{\varvec{\psi }}}^{-1}(t) = \{\varvec{\xi }\in \mathbb {S}^{d-1}\mid \mathcal {S}_{{\varvec{\psi }}} (\varvec{\xi }) = t\} \nonumber \\= & {} \{\varvec{\xi }\in \mathbb {S}^{d-1}\mid \left\langle {\varvec{\psi }},\varvec{\xi }\right\rangle = t\}, \quad \ t \in \mathbb {I}, \end{aligned}$$

(21)

which is the intersection of $\mathbb {S}^{d-1}$ and the hyperplane of $\mathbb {R}^d$ with normal ${\varvec{\psi }}$ and distance t from the origin.

In this section, we first analyze the respective Radon transform for functions and measures on $\mathbb {S}^{d-1}$, and then show that the sliced Wasserstein distance is a metric on $\mathcal {P}(\mathbb {S}^{d-1})$.

3.1 Parallel Slice Transform of Functions

For $f :\mathbb {S}^{d-1} \rightarrow \mathbb {R}$, we define the parallel slice transform

$$\begin{aligned}{} & {} \mathcal {U}f({\varvec{\psi }},t) \nonumber \\{} & {} \quad :={\left\{ \begin{array}{ll} \displaystyle \frac{1}{\left| \mathbb {S}^{d-1}\right| \sqrt{1-t^2}} \int _{C_{{\varvec{\psi }}}^{t}} f(\varvec{\xi }) \, \,\textrm{ds}(\varvec{\xi }), &{}\quad {\varvec{\psi }}\in \mathbb {S}^{d-1},\ t\in (-1,1),\\ \displaystyle \frac{1}{2}\, \delta _{d,3}\, f(\pm {{\varvec{\psi }}}), &{}\quad {\varvec{\psi }}\in {\mathbb {S}}^{d-1},\ t=\pm 1, \end{array}\right. } \nonumber \\ \end{aligned}$$

(22)

where $\textrm{d} \text {s}$ denotes the $(d-2)$-dimensional volume element on $C_{\varvec{{\varvec{\psi }}}}^{t}$ and $\delta $ is the Kronecker symbol. The second line in (22) ensures the continuity of $\mathcal {U}f$ if f is continuous. For fixed ${\varvec{\psi }}\in \mathbb {S}^{d-1}$, the (normalized) restriction

$$\begin{aligned} \mathcal {U}_{\varvec{\psi }}{f}:=\left| {\mathbb {S}^{d-1}}\right| \, \mathcal {U}{f}({\varvec{\psi }},\cdot ) \end{aligned}$$

belongs to the class of convolution operators [59] and is known as the spherical section transform [63], translation [22] or shift operator [65]. The chosen normalization will become clear in Proposition 3.5. The following proposition was shown, e.g., in [57, Cor. 3.3].

Proposition 3.1

(Integration in t) For every $f \in L^1(\mathbb {S}^{d-1})$ and ${\varvec{\psi }}\in \mathbb {S}^{d-1}$, we have $\mathcal {U}_{\varvec{\psi }}\in L^1(\mathbb {I})$ and

$$\begin{aligned} \int _{\mathbb {I}} \mathcal {U}_{\varvec{\psi }}f(t) \, \textrm{d}t = \int _{\mathbb {S}^{d-1}} f(\varvec{\xi }) \, \textrm{d}\sigma _{\mathbb {S}^{d-1}}(\varvec{\xi }). \end{aligned}$$

(23)

Theorem 3.2

(Positivity) Let $f\in C(\mathbb {S}^{d-1})$. Then we have $f(\varvec{\xi }) \ge 0$ for all $\varvec{\xi }\in \mathbb {S}^{d-1}$ if and only if $\mathcal {U}f({\varvec{\psi }},t) \ge 0$ for all ${\varvec{\psi }}\in \mathbb {S}^{d-1}$ and $t\in \mathbb {I}$.

Proof

Let $f\in C(\mathbb {S}^{d-1})$. If $f \ge 0$ everywhere, then $\mathcal {U}f$ is nonnegative everywhere as the integral of a nonnegative function. Conversely, let $\varvec{\eta }\in \mathbb {S}^{d-1}$ such that $f(\varvec{\eta }) = -\delta < 0$. By continuity, there exists $\varepsilon >0$ such that $f(\varvec{\xi }) < -\delta /2$ for all $\varvec{\xi }\in \mathbb {S}^{d-1}$ with $d(\varvec{\xi },\varvec{\eta }) \le \varepsilon .$ Because all points in $C_{\varvec{\eta }}^{\cos (\varepsilon )}$ have spherical distance $\varepsilon $ to $\varvec{\eta }$, we conclude that

$$\begin{aligned} \mathcal {U}f(\varvec{\eta },\cos (\varepsilon )) = \frac{1}{\left| \mathbb {S}^{d-1}\right| \sin (\varepsilon )} \int _{C_{\varvec{\eta }}^{\cos (\varepsilon )}} f(\varvec{\xi }) \, \,\textrm{ds}(\varvec{\xi }) <0. \end{aligned}$$

$\square $

There is no analogue to Theorem 3.2 for the vertical slice nor for the semicircle transform considered in [14, 58] since one can always construct a function that is negative on a small ball and positive outside such that either transform is nonnegative everywhere.

Theorem 3.3

(Singular value decomposition) For each $n\in \mathbb {N}_0$, let $\{ Y_{n,d}^k \mid k\in \llbracket N_{n,d}\rrbracket \}$ be an orthonormal basis of $\mathbb {Y}_{n,d}$. Then (22) is a compact operator $\mathcal {U}:L^2(\mathbb {S}^{d-1}) \rightarrow L^2_{w_d}(\mathbb {S}^{d-1} \times \mathbb {I})$, where $L^2_{w_d}(\mathbb {S}^{d-1} \times \mathbb {I})$ is the space of square integrable functions with the weighted norm $g\mapsto (\int _{\mathbb {S}^{d-1}\times \mathbb {I}} g(\varvec{\xi },t) (1-t^2)^{(3-d)/2} \, \textrm{d}(\varvec{\xi },t))^{1/2}$, with the singular value decomposition

$$\begin{aligned} \mathcal {U}Y_{n,d}^k ({\varvec{\psi }},t)= & {} \lambda _{n,d}^{\mathcal {U}}\, Y_{n,d}^k({\varvec{\psi }})\, {\widetilde{P}}_{n,d}(t)\, (1-t^2)^{\frac{d-3}{2}},\nonumber \\{} & {} \forall n\in \mathbb {N}_0,\ k\in \llbracket N_{n,d}\rrbracket , \end{aligned}$$

(24)

where ${\widetilde{P}}_{n,d}$ are given in (12) and the singular values are

$$\begin{aligned} \lambda _{n,d}^{\mathcal {U}} :=\frac{2^{\frac{d}{2}}\, \root 4 \of {\pi }\, \sqrt{n!}\, \Gamma (\frac{d}{2})}{\sqrt{(2n+d-2)\, (n+d-3)!}}. \end{aligned}$$

(25)

Proof

The theorem basically follows from the generalized Funk–Hecke formula [10, (4.2.10)], which states for $\varvec{\xi }\in \mathbb {S}^{d-1}$, $t\in (-1,1)$ and $Y_{n,d} \in \mathbb {Y}_{n,d}(\mathbb {S}^{d-1})$ that

$$\begin{aligned} \frac{1}{\left| \mathbb {S}^{d-2}\right| ({1-t^2})^{\frac{d-2}{2}}} \int _{\langle \varvec{\xi },\varvec{\eta }\rangle = t} Y_{n, d}(\varvec{\eta }) \, \textrm{d}\sigma _{\mathbb {S}^{d-2}}(\varvec{\eta }) = \frac{P_{n,d}(t)}{P_{n,d}(1)}\, Y_{n, d}(\varvec{\xi }). \nonumber \\ \end{aligned}$$

Inserting the normalization (12) yields (24) with

$$\begin{aligned} \lambda _{n,d}^{\mathcal {U}}= & {} \sqrt{\frac{\left| \mathbb {S}^{d-2}\right| }{\left| \mathbb {S}^{d-1}\right| N_{n,d}}}\\{} & {} \overset{(8), (11)}{=} \sqrt{\frac{\pi \, \Gamma (\frac{d}{2})\, n!\, (d-2)!}{\Gamma (\frac{d-1}{2})\, (2n+d-2)\, (n+d-3)!}}. \end{aligned}$$

The Legendre duplication formula $\Gamma (\frac{d-1}{2})\, \Gamma (\frac{d}{2}) = 2^{-d} \sqrt{\pi } \Gamma (d-1)$ yields (25). Expanding the product (11) asymptotically for $n\rightarrow \infty $, we have

$$\begin{aligned} N_{n,d}= & {} (2n+d-2)\,\frac{ (n+d-3)\, (n+d-2) \cdots (n+1)}{(d-2)!} \\= & {} \frac{2}{(d-2)!} \left( n^{d-2} + {\mathcal {O}}(n^{d-3}) \right) , \end{aligned}$$

and hence, the singular values $\lambda _{n,d}^{\mathcal {U}}$ converge to zero. Together with the orthonormality of (12), we deduce that (24) is a singular value decomposition. $\square $

Theorem 3.4

(Adjoint) Let $1 \le p,q \le \infty $ with $1/p + 1/q = 1$. For $1 \le p < \infty $, the adjoint $\mathcal {U}^*:L^q(\mathbb {S}^{d-1} \times \mathbb {I}) \rightarrow L^q(\mathbb {S}^{d-1})$ of $\mathcal {U}:L^p(\mathbb {S}^{d-1}) \rightarrow L^p(\mathbb {S}^{d-1} \times \mathbb {I})$ is given by

$$\begin{aligned} \mathcal {U}^*g(\varvec{\xi }) = \frac{1}{\left| \mathbb {S}^{d-1}\right| } \int _{\mathbb {S}^{d-1}} g(\left\langle \varvec{\xi },{\varvec{\psi }}\right\rangle ) \, \textrm{d}\sigma _{\mathbb {S}^{d-1}}({\varvec{\psi }}), \end{aligned}$$

(26)

and the adjoint $\mathcal {U}_{\varvec{\psi }}^*:L^q(\mathbb {I}) \rightarrow L^q(\mathbb {S}^{d-1})$ of $\mathcal {U}_{\varvec{\psi }}:L^p(\mathbb {S}^{d-1}) \rightarrow L^p(\mathbb {I})$ by

$$\begin{aligned} \mathcal {U}_{\varvec{\psi }}^*g(\varvec{\xi }) = g(\left\langle \varvec{\xi },{\varvec{\psi }}\right\rangle ) \end{aligned}$$

(27)

for all $\varvec{\xi }\in \mathbb {S}^{d-1}$. Both adjoint operators map continuous functions to continuous functions.

Proof

Let $f\in L^p(\mathbb {S}^{d-1})$ and $g\in L^q(\mathbb {S}^{d-1}\times \mathbb {I})$. We have

$$\begin{aligned} \langle \mathcal {U}f,g \rangle&= \int _{\mathbb {S}^{d-1}} \int _{\mathbb {I}} \mathcal {U}f ({\varvec{\psi }},t) \, g({\varvec{\psi }},t) \, \textrm{d}t \,\, \textrm{d}\sigma _{\mathbb {S}^{d-1}}({\varvec{\psi }}) \\ {}&\overset{(22)}{=} \int _{\mathbb {S}^{d-1}} \int _{\mathbb {I}} \frac{1}{\left| \mathbb {S}^{d-1}\right| \sqrt{1-t^2}} \\&\quad \int _{C_{{\varvec{\psi }}}^{t}} f(\varvec{\xi }) \, g({\varvec{\psi }},t) \,\textrm{ds}(\varvec{\xi }) \, \textrm{d}t \, \textrm{d}\sigma _{\mathbb {S}^{d-1}}({\varvec{\psi }}) \\ {}&\overset{(21)}{=} \int _{\mathbb {S}^{d-1}} \int _{\mathbb {I}} \frac{1}{\left| \mathbb {S}^{d-1}\right| \sqrt{1-t^2}}\\&\quad \int _{C_{{\varvec{\psi }}}^{t}} f(\varvec{\xi }) \, g({\varvec{\psi }},\left\langle {\varvec{\psi }},\varvec{\xi }\right\rangle ) \,\textrm{ds}(\varvec{\xi }) \, \textrm{d}t \, \textrm{d}\sigma _{\mathbb {S}^{d-1}}({\varvec{\psi }}) \\ {}&\overset{(23)}{=} \frac{1}{\left| \mathbb {S}^{d-1}\right| } \int _{\mathbb {S}^{d-1}} \\&\quad \int _{\mathbb {S}^{d-1}} f(\varvec{\xi }) \, g({\varvec{\psi }},\left\langle {\varvec{\psi }},\varvec{\xi }\right\rangle ) \, \textrm{d}\sigma _{\mathbb {S}^{d-1}}(\varvec{\xi }) \, \textrm{d}\sigma _{\mathbb {S}^{d-1}}({\varvec{\psi }}). \end{aligned}$$

The adjoint of $\mathcal {U}_{\varvec{\psi }}$ can be established analogously without the outer integral. The continuity follows from Lebesgue’s dominated convergence theorem. $\square $

3.2 Parallel Slice Transform of Measures

We extend the definition (22) to measures as push-forward of the slicing operator (20). For ${\varvec{\psi }}\in \mathbb {S}^{d-1}$, we define

$$\begin{aligned} \mathcal {U}_{\varvec{\psi }}:\mathcal {M}(\mathbb {S}^{d-1}) \rightarrow \mathcal {M}(\mathbb {I}),\quad \mu \mapsto (\mathcal {S}_{\varvec{\psi }})_\# \mu = \mu \circ \mathcal {S}_{\varvec{\psi }}^{-1}. \end{aligned}$$

(28)

and $\mathcal {U}:\mathcal {M}(\mathbb {S}^2) \rightarrow \mathcal {M}(\mathbb {T}\times \mathbb {I})$ by

$$\begin{aligned} \mathcal {U}\mu := T_\# (u_{\mathbb {S}^{d-1}} \times \mu ) \quad \text {with}\quad T({\varvec{\psi }}, \varvec{\xi }) := ({\varvec{\psi }}, \mathcal {S}_{\varvec{\psi }}(\varvec{\xi })). \end{aligned}$$

(29)

Proposition 3.5

(Connection with adjoint) Let $\mu \in \mathcal {M}(\mathbb {S}^{d-1})$. The transforms (29) and (28) satisfy

$$\begin{aligned} \begin{aligned} \langle \mathcal {U}\mu , g \rangle&= \langle \mu , \mathcal {U}^*g \rangle \quad \text {for all } g \in C(\mathbb {S}^{d-1} \times \mathbb {I}) \quad \text {and} \\ \langle \mathcal {U}_{\varvec{\psi }}\mu , g \rangle&= \langle \mu , \mathcal {U}_{\varvec{\psi }}^*g \rangle \quad \text {for all } g \in C(\mathbb {I}),\, {\varvec{\psi }}\in \mathbb {S}^{d-1} \end{aligned} \end{aligned}$$

with the adjoint operators from (26) and (27).

Proof

For $g \in C(\mathbb {S}^{d-1} \times \mathbb {I})$, we have by the definition in (29)

$$\begin{aligned} \langle \mathcal {U}\mu , g \rangle&= \int _{\mathbb {S}^{d-1} \times \mathbb {I}} g({\varvec{\psi }},t) \, \, \textrm{d}T_\# (u_{\mathbb {S}^{d-1}} \times \mu )({\varvec{\psi }},t) \\&= \int _{\mathbb {S}^{d-1}} \int _{\mathbb {S}^{d-1}} g({\varvec{\psi }}, \left\langle {\varvec{\psi }},\varvec{\xi }\right\rangle ) \, \textrm{d}u_{\mathbb {S}^{d-1}}({\varvec{\psi }}) \, \textrm{d}\mu (\varvec{\xi })\\&= \langle \mu , \mathcal {U}^*g \rangle . \end{aligned}$$

For $g \in C(\mathbb {I})$, and fixed ${\varvec{\psi }}\in \mathbb {S}^{d-1}$,

$$\begin{aligned} \langle \mathcal {U}_{\varvec{\psi }}\mu , g \rangle= & {} \int _{\mathbb {I}} g(t) \, \, \textrm{d}(\mathcal {S}_{\varvec{\psi }})_\# \mu (t) \\= & {} \int _{\mathbb {S}^{d-1}} g(\left\langle {\varvec{\psi }},\varvec{\xi }\right\rangle ) \, \textrm{d}\mu (\varvec{\xi }) = \langle \mu , \mathcal {U}_{\varvec{\psi }}^*g \rangle . \end{aligned}$$

$\square $

The last proposition provides an alternative way of defining $\mathcal {U}$ for measures, similarly to what was done for the Radon transform, e.g., in [12].

For absolutely continuous measures with respect to the surface measure $\sigma _{\mathbb {S}^{d-1}}$, the measure- and function-valued transforms coincide.

Proposition 3.6

(Absolutely continuous measures) For $f \in L^1(\mathbb {S}^{d-1})$, we have

$$\begin{aligned} \mathcal {U}[f \sigma _{\mathbb {S}^{d-1}}]= & {} (\mathcal {U}f) \, \sigma _{\mathbb {S}^{d-1} \times \mathbb {I}} \quad \text {and}\quad \mathcal {U}_{\varvec{\psi }}[f \sigma _{\mathbb {S}^{d-1}}] \\= & {} (\mathcal {U}_{\varvec{\psi }}f) \, \sigma _{\mathbb {I}} \qquad \forall {\varvec{\psi }}\in \mathbb {S}^{d-1}. \end{aligned}$$

In particular, the transformed measures are again absolutely continuous.

Proof

Let $g \in C(\mathbb {T}\times \mathbb {I})$. The first identity follows from Proposition 3.5 by

$$\begin{aligned} \langle \mathcal {U}[f \sigma _{\mathbb {S}^{d-1}}], g \rangle&= \langle f \sigma _{\mathbb {S}^{d-1}}, \mathcal {U}^* g \rangle \\&= \int _{\mathbb {S}^{d-1}} f(\varvec{\xi })\, \mathcal {U}^* g(\varvec{\xi }) \, \textrm{d}\sigma _{\mathbb {S}^{d-1}}(\varvec{\xi }) \\&= \int _{-1}^{1} \int _{\mathbb {S}^{d-1}} \mathcal {U}f({\varvec{\psi }},t)\, g({\varvec{\psi }},t) \, \textrm{d}\sigma _{\mathbb {S}^{d-1}}({\varvec{\psi }}) \, \textrm{d}t\\&= \langle (\mathcal {U}f)\, \sigma _{\mathbb {T}\times \mathbb {I}}, g \rangle . \end{aligned}$$

The identity for $\mathcal {U}_{\varvec{\psi }}$ follows analogously. $\square $

3.3 Spherical Sliced Wasserstein Distance

For $p \in [1, \infty )$ and $\mu , \nu \in \mathcal {P}(\mathbb {S}^{d-1})$, the parallel-sliced spherical Wasserstein distance

$$\begin{aligned} {{\,\textrm{PSW}\,}}_p^p(\mu ,\nu ) :=\int _{\mathbb {S}^{d-1}} {{\,\textrm{W}\,}}_p^p( \mathcal {U}_{{\varvec{\psi }}}\mu , \mathcal {U}_{{\varvec{\psi }}}\nu ) \, \textrm{d}u_{\mathbb {S}^{d-1}} ({\varvec{\psi }}) \end{aligned}$$

(30)

is the mean value of Wasserstein distances on the unit interval $\mathbb {I}$. Since the geodesic distance (7) on the sphere is rotationally invariant, i.e., $d(\varvec{Q} \varvec{\xi }, \varvec{Q} \varvec{\eta }) = d(\varvec{\xi },\varvec{\eta })$ for all rotations $\varvec{Q} \in \textrm{SO}(d)$, the spherical Wasserstein distance inherits this property, ${{\,\textrm{W}\,}}_p(\mu ,\nu ) = {{\,\textrm{W}\,}}_p(\mu \circ \varvec{Q},\nu \circ \varvec{Q})$.

Theorem 3.7

(Metric) For every $p \in [1, \infty )$, the sliced spherical Wasserstein distance ${{\,\textrm{PSW}\,}}_p$ is a metric on $\mathcal {P}(\mathbb {S}^{d-1})$, which induces the same topology as the spherical Wasserstein distance ${{\,\textrm{W}\,}}_p$. There exist constants $c_{d,p},C_{d,p}>0$ such that

$$\begin{aligned}{} & {} c_{d,p}{{\,\textrm{PSW}\,}}_p(\mu ,\nu ) \le {{\,\textrm{W}\,}}_p(\mu ,\nu ) \le C_{d,p} {{\,\textrm{PSW}\,}}_p(\mu ,\nu )^{\frac{1}{p(d+1)}}, \nonumber \\{} & {} \quad \forall \mu ,\nu \in \mathcal {P}(\mathbb {S}^{d-1}). \end{aligned}$$

(31)

It is rotationally invariant in the sense that for every $\mu , \nu \in \mathcal {P}(\mathbb {S}^2)$ and $\varvec{Q} \in \textrm{SO}(d)$, we have

$$\begin{aligned} {{\,\textrm{PSW}\,}}_p(\mu , \nu ) = {{\,\textrm{PSW}\,}}_p(\mu \circ \varvec{Q}, \nu \circ \varvec{Q}). \end{aligned}$$

The proof is given in Appendix A. Even though the Wasserstein ${{\,\textrm{W}\,}}_p$ and sliced Wasserstein metric ${{\,\textrm{PSW}\,}}_p$ on $\mathcal {P}(\mathbb {S}^{d-1})$ are topologically equivalent, they are not bilipschitz equivalent. As $\mathbb {S}^{d-1}$ is a compact set, the p-Wasserstein metrics for all $p\in [1,\infty )$ induce the same topology on $\mathcal {M}(\mathbb {S}^{d-1})$, see [67, § 5.2]. We conclude this section by mentioning another slicing approach.

Remark 3.8

(Semicircular slices) A different approach due to [13, 14] uses slicing along semicircles and maps to the one-dimensional torus $\mathbb {T}:=\mathbb {R}/(2\pi )$. We mention here the case $d=3$ following [58]. We denote the components of the inverse of the bijective spherical coordinate transform (10) by $ {{\,\mathrm{\Phi }\,}}^{-1}(\varvec{\xi }) = ({{\,\textrm{azi}\,}}(\varvec{\xi }), {{\,\textrm{zen}\,}}(\varvec{\xi })), $ where ${{\,\mathrm{\Psi }\,}}$ denotes the transformation in Euler angles (14). For ${\varvec{\psi }}= \Phi (\varphi ,\theta )\in \mathbb {S}^2$, we define the slicing operator

$$\begin{aligned} \mathcal {A}_{\varvec{\psi }}:\mathbb {S}^2\rightarrow \mathbb {T},\ \varvec{\xi }\mapsto {{\,\textrm{azi}\,}}({{\,\mathrm{\Psi }\,}}(\varphi ,\theta ,0)^\top \varvec{\xi }). \end{aligned}$$

The semicircular sliced Wasserstein distance

$$\begin{aligned} {{\,\textrm{SSW}\,}}_p^p(\mu ,\nu ) :=\int _{\mathbb {S}^2} {{\,\textrm{W}\,}}_p^p((\mathcal {A}_{{\varvec{\psi }}})_\# \mu , (\mathcal {A}_{{\varvec{\psi }}})_\# \nu ) \, \textrm{d}u_{\mathbb {S}^2}({\varvec{\psi }}) \end{aligned}$$

is also a rotationally invariant metric, see [58], but an equivalence as in (31) is not known. Furthermore, it requires solving the one-dimensional OT on the torus, which is more difficult than on an interval, see [23, 61, 62]. $\square $

4 Sliced OT on SO(3)

We present an approach to generate sliced Wasserstein distances on $\mathbb {X}=\textrm{SO}(3)$. We denote the angle of the rotation ${{\varvec{Q}}}\in \textrm{SO}(3)$ by

$$\begin{aligned} \angle ({{\varvec{Q}}}) :=\arccos \frac{-1+{\text {trace}}({{\varvec{Q}}})}{2} \in [0,\pi ]. \end{aligned}$$

We take $\mathbb {D}= \textrm{SO}(3)$ and the slicing operator

$$\begin{aligned} d_{{\varvec{Q}}}:\textrm{SO}(3)\rightarrow [0,\pi ],\quad {{\varvec{P}}}\mapsto \angle ({{\varvec{Q}}}^\top {{\varvec{P}}}),\quad \forall {{\varvec{Q}}}\in \textrm{SO}(3). \nonumber \\ \end{aligned}$$

(32)

The respective slice $d_{{\varvec{Q}}}^{-1}(\omega )$ can be parameterized as follows.

Proposition 4.1

(Parameterization) Let ${{\varvec{Q}}}\in \textrm{SO}(3)$ and $\omega \in [0,\pi ]$. Then

$$\begin{aligned} d_{{\varvec{Q}}}^{-1}(\omega ) = \left\{ {{\varvec{A}}}\in \textrm{SO}(3)\mid {d_{{{\varvec{Q}}}}({{\varvec{A}}})} = \omega \right\} = \{ {{\varvec{Q}}}\textrm{R}_{\varvec{\xi }}(\omega ) \mid \varvec{\xi }\in \mathbb {S}^2\}, \end{aligned}$$

where $\textrm{R}_{\varvec{\xi }}(\omega )$ is the rotation with axis $\varvec{\xi }$ and angle $\omega $, see (13).

Proof

We have $ \angle (\textrm{R}_{\varvec{\eta }}(\omega )) = \omega $ for all $\varvec{\eta }\in \mathbb {S}^2$ and $\omega \in [0,\pi ]$. Let ${{\varvec{A}}}\in \textrm{SO}(3)$. We write ${{\varvec{Q}}}^\top {{\varvec{A}}}$ in axis–angle form (13) as ${{\varvec{Q}}}^\top {{\varvec{A}}}= \textrm{R}_{\varvec{\xi }}(\sigma )$ with $\sigma =\angle ({{\varvec{Q}}}^\top {{\varvec{A}}})$ and some $\varvec{\xi }\in \mathbb {S}^2$. Then we have ${{\varvec{A}}}\in d_{{\varvec{Q}}}^{-1}(\omega )$ if and only if $ \omega = \angle ({{\varvec{Q}}}^\top {{\varvec{A}}}) = \sigma , $ which shows the claim. $\square $

4.1 A Two-Dimensional Radon Transform on SO(3)

Let $f\in L^1(\textrm{SO}(3))$. We define the Radon transform on the rotation group for any ${{\varvec{Q}}}\in \textrm{SO}(3)$ and $\omega \in [0,\pi ]$ by

$$\begin{aligned} \mathcal {T}f({{\varvec{Q}}},\omega ):= & {} \frac{1}{4\pi ^2}(1-\cos (\omega ))\,\nonumber \\{} & {} \int _{\mathbb {S}^2} f({{\varvec{Q}}}\textrm{R}_{\varvec{\xi }}(\omega )) \, \textrm{d}\sigma _{\mathbb {S}^2}(\varvec{\xi }), \end{aligned}$$

(33)

and its restriction $\mathcal {T}_{{\varvec{Q}}}f(\omega ) :=8\pi \mathcal {T}({{\varvec{Q}}},\omega )$. By Proposition 4.1, the domain of integration of f is the slice $d_{{\varvec{Q}}}^{-1}(\omega )$. With (15), we obtain

$$\begin{aligned} \int _{\textrm{SO}(3)} f({{\varvec{A}}}) \, \textrm{d}\sigma _{\textrm{SO}(3)}({{\varvec{A}}})= & {} \int _{0}^{\pi } \mathcal {T}_{{\varvec{Q}}}f(\omega ) \, \textrm{d}\omega ,\\{} & {} \forall f\in L^1(\textrm{SO}(3)), \end{aligned}$$

which serves as analogue to (23). By Fubini’s theorem, the last equation implies that $\mathcal {T}_{{\varvec{Q}}}f \in L^1([0,\pi ])$ and $\mathcal {T}f \in L^1(\textrm{SO}(3)\times [0,\pi ])$ if $f\in L^1(\textrm{SO}(3))$.

Theorem 4.2

(Singular value decomposition) The Radon transform (33) is one-to-one

$$\begin{aligned} \mathcal {T}:L^2(\textrm{SO}(3)) \rightarrow L^2(\textrm{SO}(3)\times [0,\pi ]) \end{aligned}$$

and has the singular value decomposition

$$\begin{aligned} \mathcal {T}{\widetilde{D}}_n^{j,k} = \lambda _n^{\mathcal {T}}\, F_n^{j,k},\qquad \forall n\in \mathbb {N}_0,\, j,k\in \{-n,\dots ,n\} \end{aligned}$$

(34)

with the orthonormal basis ${\widetilde{D}}_n^{j,k}$ of $L^2(\textrm{SO}(3))$, see (19), the singular values

$$\begin{aligned} \lambda _0^{\mathcal {T}} = \sqrt{\frac{3}{2}}\, \pi ^{-1/2} \quad \text {and}\quad \lambda _n^{\mathcal {T}} :=\frac{1}{(2n+1)\sqrt{\pi }} \qquad \forall n\in \mathbb {N}, \end{aligned}$$

and the set of orthonormal functions on $L^2(\textrm{SO}(3)\times [0,\pi ])$ defined by

$$\begin{aligned}{} & {} F_n^{j,k}({{\varvec{Q}}},\omega ) \\{} & {} \ \ :={\left\{ \begin{array}{ll} \tfrac{2}{\sqrt{\pi }}\, {\widetilde{D}}_n^{j,k}({{\varvec{Q}}})\, \sin \!\left( (n+\tfrac{1}{2})\omega \right) \, \sin (\tfrac{\omega }{2}), &{} n\ne 0, \\ \sqrt{\tfrac{8}{3\pi }}\, {\widetilde{D}}_0^{0,0}({{\varvec{Q}}}) \left( \sin (\tfrac{\omega }{2}) \right) ^2, &{} n=0, \end{array}\right. } \quad ({{\varvec{Q}}},\omega )\in \textrm{SO}(3)\times [0,\pi ]. \end{aligned}$$

The proof is postponed to Appendix B. Restricting $\mathcal {T}(\cdot ,\omega )$ to a fixed radius $\omega $, we obtain the following injectivity result, which is of a similar structure as for the spherical cap [71] or spherical slice transform [68] on $\mathbb {S}^2$.

Corollary 4.3

(Injectivity for fixed $\omega $) For fixed $\omega \in (0,\pi )$, the Radon transform $\mathcal {T}(\cdot ,\omega )$ is injective as operator $L^2(\textrm{SO}(3))\rightarrow \ L^2(\textrm{SO}(3))$ if and only if $ \omega / \pi = p/q$ where p/q is a reduced fraction and $p\in \mathbb {N}$ is even and $q\in \mathbb {N}$ is odd.

Proof

As a direct consequence of Theorem 4.2 and the orthonormality of the rotational harmonics, the restricted transformation has the eigenvalue decomposition

$$\begin{aligned} \mathcal {T}{\widetilde{D}}_n^{j,k}({{\varvec{Q}}},\omega )= & {} \frac{2}{(2n+1)\pi }\, {\sin ((n+\tfrac{1}{2}) \omega )}\, {\sin (\tfrac{\omega }{2})}\, {\widetilde{D}}_n^{j,k}({{\varvec{Q}}}),\\{} & {} \forall {{\varvec{Q}}}\in \textrm{SO}(3). \end{aligned}$$

It is injective if and only if all eigenvalues $ \frac{2}{(2n+1)\pi } \sin ((n+\tfrac{1}{2})\omega ) \sin (\tfrac{\omega }{2}) $ are nonzero. The nth eigenvalue is zero if and only if $(n+1/2)\omega \in \pi \mathbb {N}$, i.e., there exists $k\in \mathbb {N}$ such that $\omega /\pi = 2k / (2n+1)$, the quotient of an even and an odd integer. Since this fraction can only be reduced with an odd factor, also the reduced fraction consists of an even divided by an odd integer. $\square $

Theorem 4.4

(Adjoint) Let ${{\varvec{Q}}}\in \textrm{SO}(3)$. The adjoint of $\mathcal {T}_{{{\varvec{Q}}}} :L^2(\textrm{SO}(3)) \rightarrow L^2([0,\pi ])$ is given by

$$\begin{aligned} \mathcal {T}_{{{\varvec{Q}}}}^* g({{\varvec{A}}}) = g(d_{{\varvec{Q}}}({{\varvec{A}}})),\qquad \forall {{\varvec{A}}}\in \textrm{SO}(3), \end{aligned}$$

and the adjoint of $\mathcal {T}:L^2(\textrm{SO}(3)) \rightarrow L^2(\textrm{SO}(3)\times [0,\pi ])$ is

$$\begin{aligned} \mathcal {T}^*({{\varvec{A}}})= & {} \frac{1}{8\pi ^2} \int _{\textrm{SO}(3)} g({{\varvec{Q}}},d_{{\varvec{Q}}}({{\varvec{A}}})) \, \textrm{d}\sigma _{\textrm{SO}(3)}({{\varvec{Q}}}), \\{} & {} \qquad \forall {{\varvec{A}}}\in \textrm{SO}(3). \end{aligned}$$

Proof

Let $f\in C(\textrm{SO}(3))$, $g\in C([0,\pi ])$ and ${{\varvec{Q}}}\in \textrm{SO}(3)$. We have with the substitution ${{\varvec{A}}}= {{\varvec{Q}}}\textrm{R}_{\varvec{\xi }}(\omega )$ and (15)

$$\begin{aligned}&\int _{\textrm{SO}(3)} f({{\varvec{A}}}) g(d_{{\varvec{Q}}}({{\varvec{A}}})) \, \textrm{d}\sigma _{\textrm{SO}(3)} ({{\varvec{A}}}) \\&\quad = 2 \int _{0}^{\pi } \int _{\mathbb {S}^2} f({{\varvec{Q}}}\textrm{R}_{\varvec{\xi }}(\omega )) g(\omega ) (1-\cos (\omega )) \, \textrm{d}\sigma _{\mathbb {S}^2} (\varvec{\xi }) \, \textrm{d}\omega \\&\quad = \int _{0}^{\pi } \mathcal {T}_{{\varvec{Q}}}f(\omega ) g(\omega ) \, \textrm{d}\omega . \end{aligned}$$

The second claim follows analogously by considering $g\in C(\textrm{SO}(3)\times [0,\pi ])$ and integration over ${{\varvec{Q}}}\in \textrm{SO}(3)$. $\square $

4.2 Slicing of Measures

We generalize the definition (33) to a measure $\mu \in \mathcal {M}(\textrm{SO}(3))$ via the push-forward of the slicing operator (32), i.e.,

$$\begin{aligned} \mathcal {T}_{{\varvec{Q}}}\mu&:=(d_{{{\varvec{Q}}}})_\#\mu \in \mathcal {P}([0,\pi ]) \quad \text {and} \\ \mathcal {T}\mu&:=T_\#(u_{\textrm{SO}(3)} \times \mu ) \in \mathcal {P}(\textrm{SO}(3)\times [0,\pi ]), \qquad \text {with} \\ T({{\varvec{Q}}},{{\varvec{A}}})&= ({{\varvec{Q}}}, d_{{\varvec{Q}}}({{\varvec{A}}})). \end{aligned}$$

Proposition 4.5

(Connection with adjoint) Let ${{\varvec{Q}}}\in \textrm{SO}(3)$ and $\mu \in \mathcal {M}(\textrm{SO}(3))$. Then

$$\begin{aligned} \left\langle \mathcal {T}_{{\varvec{Q}}}\mu , g\right\rangle&= \left\langle \mu , \mathcal {T}_{{\varvec{Q}}}^*g\right\rangle ,\qquad \forall g\in C([0,\pi ]), \\ \left\langle \mathcal {T}\mu , g\right\rangle&= \left\langle \mu , \mathcal {T}^*g\right\rangle ,\qquad \forall g\in C(\textrm{SO}(3)\times [0,\pi ]). \end{aligned}$$

Proposition 4.6

(Absolutely continuous measures) Let $\mu \in \mathcal {M}(\textrm{SO}(3))$ be absolutely continuous, i.e., there exists a density function $f\in L^1(\textrm{SO}(3))$ such that $\mu = f \sigma _{\textrm{SO}(3)}$. Then

$$\begin{aligned} \mathcal {T}_{{\varvec{Q}}}\mu&= (\mathcal {T}_{{\varvec{Q}}}f) \sigma _{[0,\pi ]}, \quad \text {and} \\ \mathcal {T}\mu&= (\mathcal {T}f) \sigma _{\textrm{SO}(3)\times [0,\pi ]}. \end{aligned}$$

The last two propositions can be proved analogously to Propositions 3.5 and 3.6.

Theorem 4.7

(Injectivity) The Radon transform $\mathcal {T}:\mathcal {M}(\textrm{SO}(3))\rightarrow \mathcal {M}(\textrm{SO}(3)\times [0,\pi ])$ is injective.

The proof, which uses the singular value decomposition, is given in Appendix B. Although the slicing operator and therefore the transform $\mathcal {T}$ could be easily generalized to $\textrm{SO}(d)$, this does not apply for the singular value decomposition (34) as the harmonic analysis on $\textrm{SO}(d)$ becomes much more evolved for $d>3$, cf. [74, Chapter IX].

4.3 Sliced Wasserstein Distance on SO(3)

Let $p\in [1,\infty )$. We define the sliced Wasserstein distance between measures $\mu ,\nu \in \mathcal {P}(\textrm{SO}(3))$ by

$$\begin{aligned} {{\,\textrm{SOSW}\,}}_p^p(\mu ,\nu ) :=\int _{\textrm{SO}(3)} {{\,\textrm{W}\,}}_p^p\big (\mathcal {T}_{{{\varvec{Q}}}} \mu , \mathcal {T}_{{{\varvec{Q}}}} \nu \big ) \, \textrm{d}u_{\textrm{SO}(3)}({{\varvec{Q}}}), \nonumber \\ \end{aligned}$$

(35)

which is the mean of Wasserstein distances on the interval $[0,\pi ]$.

Theorem 4.8

Let $p\in [1,\infty )$. The sliced Wasserstein distance (35) is a metric on $\mathcal {P}(\textrm{SO}(3))$ that is invariant to rotations, i.e., for any ${{\varvec{A}}}\in \textrm{SO}(3)$ and $\mu ,\nu \in \mathcal {P}(\textrm{SO}(3))$, we have

$$\begin{aligned} {{\,\textrm{SOSW}\,}}_p^p(\mu ({{\varvec{A}}}\cdot ), \nu ({{\varvec{A}}}\cdot )) = {{\,\textrm{SOSW}\,}}_p^p(\mu , \nu ). \end{aligned}$$

Proof

The positive definiteness is due to Theorem 4.7, while the symmetry and triangular inequality follow from the respective properties of the Wasserstein distance on $[0,\pi ]$. By definition in (35), we have

$$\begin{aligned}{} & {} {{\,\textrm{SOSW}\,}}_p^p(\mu ({{\varvec{A}}}\cdot ), \nu ({{\varvec{A}}}\cdot )) \\{} & {} \quad = \int _{\textrm{SO}(3)} {{\,\textrm{W}\,}}_p^p\big ( \mu \circ {{\varvec{A}}}\circ d_{{\varvec{Q}}}^{-1}, \nu \circ {{\varvec{A}}}\circ d_{{\varvec{Q}}}^{-1}\big ) \, \textrm{d}u_{\textrm{SO}(3)}({{\varvec{Q}}}). \end{aligned}$$

Let $\omega \in [0,\pi ]$. We have ${{\varvec{B}}}\in {{\varvec{A}}}\circ d_{{\varvec{Q}}}^{-1}(\omega )$ if and only if $\omega = d_{{\varvec{Q}}}({{\varvec{A}}}^\top {{\varvec{B}}}) = d_{{{\varvec{A}}}{{\varvec{Q}}}}({{\varvec{B}}})$, cf. (32). Hence,

$$\begin{aligned} {{\,\textrm{SOSW}\,}}_p^p(\mu ({{\varvec{A}}}\cdot ), \nu ({{\varvec{A}}}\cdot ))= & {} \int _{\textrm{SO}(3)} {{\,\textrm{W}\,}}_p^p\big ((d_{{{\varvec{A}}}{{\varvec{Q}}}})_\# \mu , (d_{{{\varvec{A}}}{{\varvec{Q}}}})_\# \nu \big )\\{} & {} \, \textrm{d}u_{\textrm{SO}(3)}({{\varvec{A}}}{{\varvec{Q}}}) \\= & {} {{\,\textrm{SOSW}\,}}_p^p(\mu , \nu ). \end{aligned}$$

$\square $

In Appendix C, we provide a relation between the sliced Wasserstein distances on $\textrm{SO}(3)$ and on $\mathbb {S}^3$.

5 Barycenter Algorithms

There exist two approaches to compute barycenters of measures using 1D Wasserstein distances along projected measures, namely sliced Wasserstein barycenters and Radon Wasserstein barycenters, cf. [16]. We adapt them to our slicing in Sect. 5.1 and 5.2, respectively.

5.1 Sliced Wasserstein Barycenters

For sliced Wasserstein barycenters, we replace in (2) the Wasserstein distance ${{\,\textrm{W}\,}}_2$ by its sliced counterpart. In particular, with the general notion of slicing in (6), the sliced Wasserstein barycenter of given measures $\mu _i\in \mathcal {P}(\mathbb {X})$, $i\in \llbracket M\rrbracket $ and ${\varvec{\lambda }}\in \Delta _M$, is defined by

$$\begin{aligned} {{\,\textrm{Bary}\,}}^{{{\,\textrm{SW}\,}}}_{\mathbb {X}}(\mu _i, \lambda _m)_{i=1}^{M} :=\mathop {\mathrm {arg\,min}}\limits _{\nu \in \mathcal {P}(\mathbb {X})} \sum _{i=1}^{M} \lambda _i\, {{\,\textrm{SW}\,}}_2^2(\nu ,\mu _i). \end{aligned}$$

Remark 5.1

Although the different slicing approaches often yield similar barycenters, as we will see in the numerics, they differ considerably in the extreme case of two antipodal point measures on the sphere $\mathbb {S}^2$. Denote by $\delta _{\varvec{\xi }}$ the Dirac measure at $\varvec{\xi }\in \mathbb {S}^2$. The Wasserstein barycenter ${{\,\textrm{Bary}\,}}_{\mathbb {S}^{2}}^{{{\,\textrm{W}\,}}}$ of $\delta _{{{\varvec{e}}}^3}$ and $\delta _{-{{\varvec{e}}}^3}$ with equal weights $\lambda _1=\lambda _2=1/2$ consists of the measures $\nu \in \mathcal {P}(\mathbb {S}^2)$ with support on the equator. However, all measures in $\mathcal {P}(\mathbb {S}^2)$ are parallelly sliced Wasserstein barycenters ${{\,\textrm{Bary}\,}}_{\mathbb {S}^2}^{{{\,\textrm{PSW}\,}}}$. For the semicircular slices of Remark 3.8, we can show that the uniform distribution on the equator is a candidate for the barycenter ${{\,\textrm{Bary}\,}}_{\mathbb {S}^{2}}^{{{\,\textrm{SSW}\,}}}$, while $u_{\mathbb {S}^2}$ is not. The details are provided in Appendix D. $\square $

We consider two types of discretization to compute sliced Wasserstein barycenters. Free-support barycenters are based on a Lagrangian discretization: Measures are represented by samples, and the minimization is carried out over the coordinates of those samples, see [16, 55, 62]. Fixed-support barycenters are based on a Eulerian discretization: A fixed grid is considered for all the measures which are represented by the weights given to each grid point, and the minimization is carried out over those weights, cf. [9, 18].

5.1.1 Free-Support Discretization

For ${X = (X_k)_{k=1}^{N} \in \mathbb {X}^N}$, we note $\mu _X :=\frac{1}{N}\sum _{k = 1}^N\delta _{X_k}$. We consider M discrete measures $\mu _{Y^{(i)}} \in \mathcal {P}(\mathbb {X})$ with $Y^{(i)} \in \mathbb {X}^N$ for all $i\in \llbracket M\rrbracket $. The aim is to compute a discrete barycenter of these measures, i.e., we want to minimize the functional

$$\begin{aligned} \mathcal {E}:\mathbb {X}^ N \rightarrow \mathbb {R},\qquad X \mapsto \displaystyle \sum _{i \in \llbracket M\rrbracket } \lambda _i {{\,\textrm{SW}\,}}_2^2(\mu _X, \mu _{Y^{(i)}}), \end{aligned}$$

(36)

which is not convex in general, via a stochastic gradient descent, as it has been applied in the Euclidean space $\mathbb {X}=\mathbb {R}^d$ in [62, sect. 3.2] and [55, sect. 10.4]. Since X is on a manifold $\mathbb {X}$, the gradient is in its tangent space $T_x\mathbb {X}$ at $x\in \mathbb {X}$. By Whitney’s embedding theorem [47, thm. 6.15], we can assume $\mathbb {X}$ to be embedded in Euclidean space $\mathbb {R}^d$ for sufficiently large d. If $f:D\rightarrow \mathbb {R}$ is differentiable on an open set $D\subset \mathbb {R}^d$ with $D\supset \mathbb {X}$, then the Riemannian gradient of the restriction of f to $\mathbb {X}$ is given by $\nabla f(x) = {{\,\textrm{proj}\,}}_{T_x\mathbb {X}} (\nabla _{\mathbb {R}^d} f(x))$, where $\nabla _{\mathbb {R}^d}$ denotes the gradient in Euclidean space and ${{\,\textrm{proj}\,}}_{T_x\mathbb {X}}$ the orthogonal projection to the tangent space, see [2, sect. 3.6.1]. The gradient of the functional (36) can be expressed as follows.

Theorem 5.2

Let $\mathbb {X}$ be a smooth submanifold of $\mathbb {R}^d$ and $X, Y\in \mathbb {X}^N$ consist of pairwise distinct points $X_k$, $k\in \llbracket N\rrbracket $. Assume that the slicing operator $\mathcal {S}_{\varvec{\psi }}:\mathbb {X}\rightarrow \mathbb {R}$ is differentiable for all ${\varvec{\psi }}\in \mathbb {D}$, that $({{\varvec{x}}}, {\varvec{\psi }}) \mapsto \mathcal {S}_{\varvec{\psi }}({{\varvec{x}}})$ is bounded on $\mathbb {X}\times \mathbb {D}$, and that ${\varvec{\psi }}\mapsto \nabla \mathcal {S}_{\varvec{\psi }}({{\varvec{x}}})$ is integrable on $\mathbb {D}$ uniformly for every ${{\varvec{x}}}\in \mathbb {X}$. Furthermore, assume that $\mathcal {S}_{\varvec{\psi }}({{\varvec{x}}}_1)\ne \mathcal {S}_{\varvec{\psi }}({{\varvec{x}}}_2)$ for $u_\mathbb {D}$-almost every ${\varvec{\psi }}\in \mathbb {D}$ if ${{\varvec{x}}}_1\ne {{\varvec{x}}}_2$. Then the gradient of the sliced Wasserstein distance between the two measures $\mu _X$ and $\mu _Y$ with respect to X reads

$$\begin{aligned} \nabla _{X_k}{{\,\textrm{SW}\,}}_2^2(\mu _X, \mu _Y)= & {} \frac{2}{N}\int _{\mathbb {D}} \left[ \mathcal {S}_{\varvec{\psi }}\left( X_k\right) - \mathcal {S}_{\varvec{\psi }}\left( Y_{\sigma _{Y, {\varvec{\psi }}}\circ \sigma _{X, {\varvec{\psi }}}^{-1}(k)} \right) \right] \nonumber \\{} & {} \nabla \mathcal {S}_{\varvec{\psi }}(X_k) \, \textrm{d}u_{\mathbb {D}}({\varvec{\psi }}), \end{aligned}$$

(37)

where $\sigma _{X,{\varvec{\psi }}}:\llbracket N\rrbracket \rightarrow \llbracket N\rrbracket $ is a permutation which sorts $\mathcal {S}_{\varvec{\psi }}(X_k)$, i.e.,

$$\begin{aligned} \mathcal {S}_{\varvec{\psi }}(X_{\sigma _{X, {\varvec{\psi }}}(1)}) \le \mathcal {S}_{\varvec{\psi }}(X_{\sigma _{X, {\varvec{\psi }}}(2)}) \le ... \le \mathcal {S}_{\varvec{\psi }}(X_{\sigma _{X, {\varvec{\psi }}}(N)}). \end{aligned}$$

Proof

Let ${\varvec{\psi }}\in \mathbb {D}$. For the sake of simplicity, we write $\mathcal {S}_{\varvec{\psi }}(X) :=(\mathcal {S}_{\varvec{\psi }}(X_k))_{1\le k \le N}\in \mathbb {R}^N$. The pseudoinverse of the cumulative density function of $\mu _{\mathcal {S}_{\varvec{\psi }}(X)}$ is written, for $r\in (0, 1)$,

$$\begin{aligned} \begin{aligned}&F_{\mu _{\mathcal {S}_{\varvec{\psi }}(X)}}^{-1} (r)\\ {}&\quad = \min \bigg \{x\in \mathbb {R}\cup \{-\infty \}\Bigg | \frac{1}{N}\sum _{k = 1}^N\mathbb {1}_{\left[ \mathcal {S}_{\varvec{\psi }}(X_k), 1\right] }(x) \ge r\bigg \}\\&\quad = \mathcal {S}_{\varvec{\psi }}\left( X_{\sigma _{X, {\varvec{\psi }}}(\lceil rN\rceil )}\right) . \end{aligned} \end{aligned}$$

Then

$$\begin{aligned}&{{\,\textrm{W}\,}}_2^2\big ((\mathcal {S}_{\varvec{\psi }})_\#\mu _X, (\mathcal {S}_{\varvec{\psi }})_\#\mu _Y\big )\\&\quad = {{\,\textrm{W}\,}}_2^2\left( \mu _{\mathcal {S}_{\varvec{\psi }}(X)}, \mu _{\mathcal {S}_{\varvec{\psi }}(Y)}\right) \\&\quad = \int _{[0, 1]}\left| F_{\mu _{\mathcal {S}_{\varvec{\psi }}(X)}}^{-1}(r) - F_{\mu _{\mathcal {S}_{\varvec{\psi }}(Y)}}^{-1}(r)\right| ^2 \, \textrm{d}r\\&\quad = \sum _{k = 1}^N \frac{1}{N}\Big |\mathcal {S}_{\varvec{\psi }}\left( X_{\sigma _{X, {\varvec{\psi }}}(k)}\right) - \mathcal {S}_{\varvec{\psi }}\left( Y_{\sigma _{Y,{\varvec{\psi }}}(k)}\right) \Big |^2 \\&\quad = \frac{1}{N}\sum _{k = 1}^N \left| \mathcal {S}_{\varvec{\psi }}\left( X_k\right) - \mathcal {S}_{\varvec{\psi }}\left( Y_{\sigma _{Y, {\varvec{\psi }}}\circ \sigma _{X, {\varvec{\psi }}}^{-1}(k)}\right) \right| ^2, \end{aligned}$$

is bounded and, thus, integrable with respect to ${\varvec{\psi }}$ on $(\mathbb {D}, u_\mathbb {D})$. Its gradient is given by

$$\begin{aligned}{} & {} \nabla _{X_k}{{\,\textrm{W}\,}}_2^2\big ((\mathcal {S}_{\varvec{\psi }})_\#\mu _X, (\mathcal {S}_{\varvec{\psi }})_\#\mu _Y\big )\nonumber \\{} & {} \quad = \left[ \mathcal {S}_{\varvec{\psi }}\left( X_k\right) - \mathcal {S}_{\varvec{\psi }}\big (Y_{\sigma _{Y, {\varvec{\psi }}}\circ \sigma _{X, {\varvec{\psi }}}^{-1}(k)}\big )\right] \nabla \mathcal {S}_{\varvec{\psi }}(X_k) \end{aligned}$$

(38)

for $u_\mathbb {D}$-almost every ${\varvec{\psi }}\in \mathbb {D}$. Indeed, we assume that $X_k$, $k\in \llbracket N\rrbracket $, are pairwise distinct, so $\mathcal {S}_{\varvec{\psi }}(X_k)$ are $u_\mathbb {D}$-almost surely pairwise distinct, so $\sigma _{X, {\varvec{\psi }}}$ is $u_\mathbb {D}$-almost surely uniquely defined, and constant in the neighborhood of X. Hence, (38) is integrable on $\mathbb {D}$ since $\nabla \mathcal {S}_\psi $ is and the rest is bounded by assumption. Therefore, similarly to [16], we have

$$\begin{aligned}&{{\,\textrm{SW}\,}}_p^p(\mu _Y, \mu _X)\\&\quad = \int _\mathbb {D}{{\,\textrm{W}\,}}_p^p\big ((\mathcal {S}_{\varvec{\psi }})_\#\mu _X, (\mathcal {S}_{\varvec{\psi }})_\#\mu _Y\big )\, \textrm{d}u({\varvec{\psi }})\\&\quad = \frac{1}{N}\int _\mathbb {D}\sum _{k = 1}^N \left| \mathcal {S}_{\varvec{\psi }}\left( X_k\right) - \mathcal {S}_{\varvec{\psi }}\left( Y_{\sigma _{Y, {\varvec{\psi }}}\circ \sigma _{X, {\varvec{\psi }}}^{-1}(k)}\right) \right| ^p\, \textrm{d}u({\varvec{\psi }}) \end{aligned}$$

and

$$\begin{aligned}&\nabla _{X_k}{{\,\textrm{SW}\,}}_2^2(\mu _Y, \mu _X)\\&\quad = \int _\mathbb {D}\nabla _{X_k}{{\,\textrm{W}\,}}_2^2\left( \mu _{\mathcal {S}_{\varvec{\psi }}(Y)}, \mu _{\mathcal {S}_{\varvec{\psi }}(X)}\right) \, \textrm{d}u({\varvec{\psi }}) \\&\quad = \frac{2}{N}\int _\mathbb {D}\left[ \mathcal {S}_{\varvec{\psi }}\left( X_k\right) - \mathcal {S}_{\varvec{\psi }}\left( Y_{\sigma _{Y, {\varvec{\psi }}}\circ \sigma _{X, {\varvec{\psi }}}^{-1}(k)}\right) \right] \nabla \mathcal {S}_{\varvec{\psi }}(X_k) \, \textrm{d}u({\varvec{\psi }}). \end{aligned}$$

$\square $

We discretize (37) over ${\varvec{\psi }}$ via considering $({\varvec{\psi }}_q)_{q=1}^{P}$ distributed according to the uniform measure $u_\mathbb {D}$ to get a numerical approximation. This enables us to devise a stochastic gradient descent algorithm with initialization $X^0\in \mathbb {X}^N$ and whose step $l\in \mathbb {N}$ is given by

$$\begin{aligned} \begin{aligned} X_k^{l+1}&= \exp ^{\mathbb {X}}_{X_k^l}(- \tau _l\nabla \mathcal {E}(X^l)_k)\\&= \exp ^{\mathbb {X}}_{X_k^l} \left( - \tau _l \sum _{i = 1}^M\frac{2\lambda _i}{N P} \sum _{q = 1} ^P\right. \\&\quad \left. \left[ \mathcal {S}_{{\varvec{\psi }}_q}\big (X^l_k\big ) - \mathcal {S}_{{\varvec{\psi }}_q}\left( Y^{(i)}_{\sigma _{Y^{(i)}, {{\varvec{\psi }}_q}} \circ \sigma _{X^l, {{\varvec{\psi }}_q}}^{-1}(k)}\right) \right] \nabla \mathcal {S}_{\varvec{\psi }}(X_k^l) \right) \end{aligned} \end{aligned}$$

(39)

for every $k\in \llbracket N\rrbracket $, where $\exp ^{\mathbb {X}}_x$ denotes exponential map, which maps a subset of the tangent space $T_x\mathbb {X}$ to the manifold $\mathbb {X}$, and $\tau _l>0$ is the step size, also known as the learning rate.

Remark

The last theorem can be extended to the case where both measures have different numbers of points. The squared Wasserstein distance between $\mu _x$ and $\mu _y$, with $x = (x_i)_{1\le i \le N} \in \mathbb {R}^N$ and $y = (y_j)_{1\le j\le M} \in \mathbb {R}^M$ two sorted lists of real numbers, is not anymore $\frac{1}{N}\Vert x - y\Vert _{\mathbb {R}^N}^2$ but is of the shape $\sum _{i,j}\pi _{i,j}(x_i - y_j)^2$ with the transport plan $\pi \in \mathbb {R}^{N\times M}$ that depends only on N and M (not x nor y). Because the matrix $\pi $ is sparse with support close to the diagonal $\frac{i}{j}\approx \frac{N}{M}$, we can generalize our algorithm while keeping its complexity ${\mathcal {O}}((N+M)\log (N+M))$. $\square $

5.1.1.1. Application to ${{\,\textrm{PSW}\,}}$ on the Sphere We look at the stochastic gradient descent step for the sphere with the parallel slicing operator (20). Let ${{\varvec{x}}},{\varvec{\psi }}\in \mathbb {S}^{d-1}$. The projection to the tangential plane $T_{{\varvec{x}}}{\mathbb {S}^{d-1}} = \{{{\varvec{v}}}\in \mathbb {R}^d\mid \left\langle {{\varvec{v}}}, {{\varvec{x}}}\right\rangle = 0\}$ is

$$\begin{aligned} {{\,\textrm{proj}\,}}_{T_{{\varvec{x}}}{\mathbb {S}^{d-1}}}:\mathbb {R}^d\rightarrow T_{{\varvec{x}}}{\mathbb {S}^{d-1}},\ {{\varvec{v}}}\mapsto {{\varvec{v}}}- \left\langle {{\varvec{x}}}, {{\varvec{v}}}\right\rangle {{\varvec{x}}}, \end{aligned}$$

and we have $\nabla \mathcal {S}_{\varvec{\psi }}({{\varvec{x}}}) = {{\,\textrm{proj}\,}}_{T_{{\varvec{x}}}{\mathbb {S}^{d-1}}}({\varvec{\psi }})$. As a consequence, the stochastic gradient descent step (39) is

$$\begin{aligned}{} & {} X_k^{l+1} = \exp ^{\mathbb {S}^{d-1}}_{X_k^l}\left[ - \tau _l {{\,\textrm{proj}\,}}_{T_{X_k^l}{\mathbb {S}^{d-1}}}\right. \nonumber \\{} & {} \quad \left. \left( \sum _{i = 1}^M\frac{2\lambda _i}{N P} \sum _{q = 1} ^P \left\langle {\varvec{\psi }}_q,\ X^l_k - Y^{(i)}_{\sigma _{Y^{(i)}, {{\varvec{\psi }}_q}}\circ \sigma _{X^l, {{\varvec{\psi }}_q}}^{-1}(k)}\right\rangle {\varvec{\psi }}_q\right) \right] \nonumber \\ \end{aligned}$$

(40)

with the exponential map $\exp ^{\mathbb {S}^{d-1}}_{{\varvec{x}}}({{\varvec{v}}}) :=\cos (\Vert {{\varvec{v}}}\Vert ){{\varvec{x}}}+ \sin (\Vert {{\varvec{v}}}\Vert )\frac{{{\varvec{v}}}}{\Vert {{\varvec{v}}}\Vert }$.

The numerical complexity of this step is ${\mathcal {O}}(MN\log (N)+d)P)$ with M the number of measures, N the number of points in each measure and P the number of directions (i.e., of projections or slices), since the sorting has complexity ${\mathcal {O}}(N\log (N))$ and the dimension d comes in only due to the computation of the inner product in $\mathbb {R}^d$ and the generation of uniform samples on $\mathbb {S}^{d-1}$.

5.1.1.2. Application to ${{\,\textrm{SOSW}\,}}$ on the Rotation Group Let us now look at the case $\mathbb {X}= \textrm{SO}(3)$ with the slicing operator $\mathcal {S}^{\textrm{SO}(3)}_{\varvec{\psi }}({{\varvec{R}}}) ={{\,\textrm{trace}\,}}({{\varvec{R}}}^\top {\varvec{\psi }})$ for ${\varvec{\psi }}\in \mathbb {D}= \textrm{SO}(3)$ and ${{\varvec{R}}}\in \textrm{SO}(3)$. In order to avoid numerical instabilities due to the unboundedness of the derivative of the arccosine, we take here a monotone transformation of the slicing operator (32) in order to simplify computation while keeping the properties of ${{\,\textrm{SOSW}\,}}$. Drawing random directions ${\varvec{\psi }}={{\,\mathrm{\Psi }\,}}(\alpha ,\beta ,\gamma )\in \textrm{SO}(3)$ can be done by randomly generating Euler angles $\alpha ,\gamma \sim u_{[0,2\pi ]}$ and $\beta =\arccos t$ with $t\sim u_{[-1,1]}$. The gradient descent step (39) is similar to the case of the sphere. The tangent space is ${T_{{\varvec{R}}}\textrm{SO}(3)= \{{{\varvec{A}}}\in \mathbb {R}^{3\times 3}\mid {{\varvec{R}}}^\top {{\varvec{A}}}= - {{\varvec{A}}}^\top {{\varvec{R}}}\}}$. Utilizing that the orthogonal projection on the set of skew-symmetric matrices is given by ${{\varvec{A}}}\mapsto \frac{1}{2}({{\varvec{A}}}- {{\varvec{A}}}^\top )$, one can show similarly that the projection to ${T_{{\varvec{R}}}\textrm{SO}(3)}$ is given by

$$\begin{aligned} {{\,\textrm{proj}\,}}_{T_{{\varvec{R}}}\textrm{SO}(3)}:\mathbb {R}^{3\times 3}\rightarrow T_{{\varvec{R}}}\textrm{SO}(3),\quad {{\varvec{A}}}\mapsto \tfrac{1}{2}({{\varvec{A}}}- {{\varvec{R}}}{{\varvec{A}}}^\top {{\varvec{R}}}). \end{aligned}$$

The exponential map is given by [77, 3.37]

$$\begin{aligned} \exp ^{\textrm{SO}(3)}_{{\varvec{R}}}:T_{{\varvec{R}}}\textrm{SO}(3)\rightarrow \textrm{SO}(3),\quad {{\varvec{A}}}\mapsto {{\varvec{R}}}\,\exp ({{\varvec{R}}}^\top {{\varvec{A}}}). \end{aligned}$$

In order to avoid the computation of the matrix exponential, one could replace the exponential map by a retraction, see [2, Sect. 4.1].

5.1.2 Fixed-Support Discretization

Fixed-support sliced Wasserstein barycenters correspond to a Eulerian discretization. As opposed to Sect. 5.1.1, the support is the same, fixed set for all measures (including the barycenter), and we minimize over the weights of the Dirac measures. We first study the one-dimensional case of fixed-support OT.

Theorem 5.3

Let $\{t_j\mid j\in \llbracket N\rrbracket \}\subset \mathbb {R}$ with $t_1< t_2<... < t_N$. Further, let ${{\varvec{w}}}, {{\varvec{v}}}\in \Delta _N $ and the discrete probability measures $\mu _{{\varvec{w}}}= \sum _{j = 1}^N w_j\delta _{t_j}$ and $\mu _{{\varvec{v}}}= \sum _{j = 1}^N v_j\delta _{t_j}$. We introduce the partial sums ${\tilde{{{\varvec{w}}}}} = (\sum _{j = 1}^k w_j)_{k=1}^{N}$ and ${\tilde{{{\varvec{v}}}}} = (\sum _{j = 1}^k v_j)_{k=1}^{N}$ in $\mathbb {R}^N$ as well as the vectors ${{\varvec{y}}}=({\tilde{w}}_1,..., {\tilde{w}}_{N-1}, {\tilde{v}}_1,..., {\tilde{v}}_{N-1}) \in \mathbb {R}^{2N-2}$ and ${{\varvec{z}}}= (u_{\sigma (j)})_{j=1}^{2N-2}$ with $\sigma $ a permutation such that $u_{\sigma (1)} \le u_{\sigma (2)} \le ... \le u_{\sigma (2N-2)}$. We set

$$\begin{aligned} a_j = \big | { t_{\min \{k\mid {\tilde{w}}_k \ge z_{j+1}\}} - t_{\min \{k\mid {\tilde{v}}_k \ge z_{j+1}\}}}\big |^p\quad \text {for } j \in \llbracket 2N-3\rrbracket , \end{aligned}$$

and $a_0 = a_{2N-2} = 0$. Then the gradient of ${{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}})$ in $\Delta _N$ with respect to ${{\varvec{w}}}$ is given almost everywhere by

$$\begin{aligned} \nabla _{{{\varvec{w}}}} {{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}}) = {{\,\textrm{proj}\,}}_H \left( \left( \sum _{k = j}^{N-1} \big (a_{\sigma ^{-1}(k) - 1} - a_{\sigma ^{-1}(k)}\big ) \right) _{j=1}^N \right) , \nonumber \\ \end{aligned}$$

(41)

where ${{\,\textrm{proj}\,}}_H:\mathbb {R}^N\rightarrow H$, ${{\varvec{x}}}\mapsto {{\varvec{x}}}- \left\langle {{\varvec{x}}},\mathbb {1}\right\rangle \mathbb {1}$ is the orthogonal projection on the hyperplane $H = \{ {{\varvec{x}}}\in \mathbb {R}^N\mid \left\langle {{\varvec{x}}}, \mathbb {1}\right\rangle = 0\}$.

Proof

The pseudoinverse of the cumulative distribution function of $\mu _{{\varvec{w}}}$ is

$$\begin{aligned} F_{\mu _{{\varvec{w}}}}^{-1}(r)= & {} \min \left\{ s\in \mathbb {R}\left| \ \sum _{i = 1}^N w_i\mathbb {1}_{[t_i, +\infty )}(s) \ge r\right. \right\} = t_{\min \{k\mid {\tilde{w}}_k \ge r\}}, \\{} & {} \forall r \in (0, 1), \end{aligned}$$

and analogously for ${{\varvec{v}}}$. We thus have

$$\begin{aligned} {{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}})&= \int _0^1 \left| F_{\mu _{{\varvec{w}}}}^{-1}(r) - F_{\mu _{{\varvec{v}}}}^{-1}(r)\right| ^p \, \textrm{d}r\\&= \int _0^1 \left| t_{\min \{k\mid {\tilde{w}}_k \ge r\}} - t_{\min \{k\mid {\tilde{v}}_k \ge r\}}\right| ^p \, \textrm{d}r\\&= \sum _{j = 1}^{2N - 3} \left| t_{\min \{k\mid {\tilde{w}}_k \ge z_{j+1}\}} - t_{\min \{k\mid {\tilde{v}}_k \ge z_{j+1}\}}\right| ^p (z_{j+1} - z_{j})\\&= \sum _{j = 1}^{2N - 3} a_j (z_{j+1} - z_{j}), \end{aligned}$$

where we have ignored the segment $(0, z_1]$ on which the integrand is 0, and the segment $[z_{2N-2}, 1)$ on which the integrand is 0 or which is empty if $z_{2N-2} = 1$.

We extend the map ${{\varvec{w}}}\mapsto {{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}})$ from $\Delta _N$ to some neighborhood in the Euclidean space $\mathbb {R}^N$ by making it constant in the last component $w_N$, then $\frac{\mathrm d}{\mathrm d {w_N}} {{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}}) = 0.$ For $k \in \llbracket N-1\rrbracket $, since $z_j = {\tilde{w}}_{\sigma (j)}$, we have

$$\begin{aligned} \frac{\mathrm d}{\mathrm d {\tilde{w}}_k} {{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}}) = a_{\sigma ^{-1}(k) - 1} - a_{\sigma ^{-1}(k)}, \quad \text {a.e.} \end{aligned}$$

Indeed, $a_j$ is almost everywhere constant with respect to ${\tilde{w}}_k$. It is justified using the fact that the $z_j$ are almost surely pairwise distinct. So, if ${\tilde{w}}_k = z_{j+1}$, then $z_{j+1}$ “moves” (a.s.) with ${\tilde{w}}_k$, such that $\{k'\mid {\tilde{w}}_{k'} \ge z_{j+1}\}$ does not change. If ${\tilde{w}}_k \ne z_{j+1}$, it is obvious that $\{k'\mid {\tilde{w}}_{k'}\ge z_{j+1}\}$ is locally constant with respect to ${\tilde{w}}_k$. The definition of ${\tilde{{{\varvec{w}}}}}$ gives a bijection between $(w_j)_{j=1}^{N-1}$ and $({\tilde{w}}_k)_{k=1}^{N-1}$, so we obtain for any $j\in \llbracket N-1\rrbracket $

$$\begin{aligned} \frac{\mathrm d}{\mathrm d {w_j}} {{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}})= & {} \sum _{k = 1}^{N} \frac{\mathrm d {\tilde{w}}_k}{\mathrm d w_j} \frac{\mathrm d}{\mathrm d {{\tilde{w}}_k}} {{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}}) \\= & {} \sum _{k = j}^{N-1} \frac{\mathrm d}{\mathrm d {\tilde{w}}_k} {{\,\textrm{W}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{\varvec{v}}}). \end{aligned}$$

The projection onto the hyperplane H yields the Riemannian gradient in $\Delta _N$. $\square $

We now come to the sliced OT on $\mathbb {X}\in \{\mathbb {S}^{d-1}, \textrm{SO}(3)\}$. We consider a family $X = ({{\varvec{x}}}_j)_{j=1}^{N} \in \mathbb {X}^N$ of points on $\mathbb {X}$ representing the fixed support and measures of the form

$$\begin{aligned} \mu _{{\varvec{w}}}= \sum _{j = 1}^ N w_j \delta _{x_j} \end{aligned}$$

with some weight vector ${{\varvec{w}}}\in \Delta _N$. Let ${{\varvec{v}}}^{(i)} \in \Delta _N$ for $i \in \llbracket M\rrbracket $ be given probability vectors of the measures whose barycenter we want to compute. Therefore, we minimize

$$\begin{aligned} \mathcal {E}:\Delta _N \rightarrow \mathbb {R}, \qquad {{\varvec{w}}}\mapsto \displaystyle \sum _{i =1}^ M \lambda _i {{\,\textrm{SW}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{{\varvec{v}}}^{(i)}}). \end{aligned}$$

We have, for $i \in \llbracket M\rrbracket $,

$$\begin{aligned} {{\,\textrm{SW}\,}}_p^p(\mu _w, \mu _{v^{(i)}})&= \int _\mathbb {D}{{\,\textrm{W}\,}}_p^p\big ((\mathcal {S}_{\varvec{\psi }})_\#\mu _{{\varvec{w}}}, (\mathcal {S}_{\varvec{\psi }})_\#\mu _{{{\varvec{v}}}^{(i)}}\big )\, \textrm{d}u_\mathbb {D}({\varvec{\psi }}) \\&= \int _\mathbb {D}{{\,\textrm{W}\,}}_p^p\left( \sum _{j = 1}^N w_j\delta _{\mathcal {S}_{\varvec{\psi }}({{\varvec{x}}}_j)}, \sum _{j = 1}^N v^{(i)}_j\delta _{\mathcal {S}_{\varvec{\psi }}({{\varvec{x}}}_j)}\right) \\&\quad \, \textrm{d}u_\mathbb {D}({\varvec{\psi }}), \end{aligned}$$

and therefore,

$$\begin{aligned}{} & {} \nabla _{{\varvec{w}}}{{\,\textrm{SW}\,}}_p^p(\mu _{{\varvec{w}}}, \mu _{{{\varvec{v}}}^{(i)}}) \\{} & {} \quad = \int _\mathbb {D}\nabla _{{\varvec{w}}}{{\,\textrm{W}\,}}_p^p\left( \sum _{j = 1}^ N w_j\delta _{\mathcal {S}_{\varvec{\psi }}({{\varvec{x}}}_j)}, \sum _{j = 1}^ N v_j^{(i)}\delta _{\mathcal {S}_{\varvec{\psi }}({{\varvec{x}}}_j)}\right) \, \textrm{d}u_\mathbb {D}({\varvec{\psi }}). \end{aligned}$$

As the integrand of the last equation is handled in Theorem 5.3, we can thus compute $\nabla \mathcal {E}$ and devise a stochastic gradient descent, as synthesized in Algorithm 1, where we use P projections per descent step as above.

The complexity of the computation of the gradient (41) is ${\mathcal {O}}(N\log N)$, as we need to sort the points $(t_j)_{j=1}^{N}$ and the vector ${{\varvec{u}}}$, and all other operations are done in linear time. The projection ${{\,\textrm{proj}\,}}_{\Delta _N}:\mathbb {R}^N\rightarrow \Delta _N$ on the probability simplex can be computed in complexity ${\mathcal {O}}(N\log (N))$ using the algorithm from [76], see also [19] for further numerical approaches. Therefore, one iteration of Algorithm 1 has the arithmetic complexity ${\mathcal {O}}(MN (\log (N)+d)P)$. However, we want to point out that the number N of points of a fixed grid generally grows with the dimension d.

5.2 Radon Wasserstein Barycenters

Radon Wasserstein barycenters are obtained by first computing the 1D barycenter (5) for every slice, stacking them together and then applying the pseudoinverse of the respective Radon transform, cf. [16, 43]. Denoting by $\mathcal {Z}:\mathcal {P}(\mathbb {X}) \rightarrow \mathcal {P}(\mathbb {D}\times \mathbb {I})$ the generalized slicing transformation, we set for $\mu _m\in \mathcal {P}_{\textrm{ac}}(\mathbb {X})$ the Radon barycenter

$$\begin{aligned}{} & {} {{\,\textrm{Bary}\,}}_{\mathbb {X}}^{\mathcal {Z}}(\mu _m, \lambda _m)_{m=1}^{M} \nonumber \\{} & {} \quad :=\mathcal {Z}^\dagger \left( \left( {{\,\textrm{Bary}\,}}_\mathbb {I}(\lambda _m,(\mathcal {S}_{\varvec{\psi }})_\#\mu _m)_{m=1}^{M} \right) _{{\varvec{\psi }}\in \mathbb {D}} \right) , \end{aligned}$$

(42)

where $\mathcal {Z}^\dagger $ is the pseudoinverse whose argument is viewed as a density function on $\mathbb {D}\times \mathbb {I}$. In general, it is not clear if the pseudoinverse yields again a nonnegative function, which then gives a probability density. On $\mathbb {S}^{d-1}$, it is fulfilled for the parallel slicing $\mathcal {V}$ by Theorem 3.2, but not for the semicircular slicing, see [58, sect. 6.2].

The discretization is based on a fixed support. We describe the parallel slicing case $\mathcal {Z}=\mathcal {U}$ analogously to the semicircular case $\mathcal {Z}=\mathcal {W}$ from [58]. Let ${\varvec{\psi }}_p \in \mathbb {S}^{d-1}$, $p\in \llbracket P\rrbracket $, be the nodes of a quadrature rule with weights ${{\varvec{w}}}\in \Delta _P$, and some grid $t_\ell $, $\ell \in \llbracket L\rrbracket $, on the interval $\mathbb {I}$. We denote the density function of $\mu _m$ by $f^{\mu _m}$ and assume that we are given $f^{\mu _m}({\varvec{\psi }}_p)$ for $p\in \llbracket P\rrbracket $. Firstly, we approximate $\mathcal {U}_{{\varvec{\psi }}_p}(t_\ell )$ via the singular value decomposition (24) for fixed truncation degree $D\in \mathbb {N}$ by

$$\begin{aligned} \mathcal {U}_{{\varvec{\psi }}_p} f^{\mu _m}(t_\ell )\approx & {} \sum _{n=0}^D \sum _{j=1}^{N_{n, d}} \lambda _{n, d}^{\mathcal {U}} Y_{n, d}^j({\varvec{\psi }}_p) {\widetilde{P}}_{n, d}(t_\ell ) \\{} & {} \sum _{i=1}^{P} f^{\mu _m}({\varvec{\psi }}_i) \overline{Y_{n, d}^j({\varvec{\psi }}_i)} w_i. \end{aligned}$$

Secondly, we compute the density of the one-dimensional barycenters (5) of the measures $\mathcal {U}_{{\varvec{\psi }}_p}\mu _m$. In particular, we set $g({\varvec{\psi }}_p,\cdot )$ as the density function of

$$\begin{aligned} {{\,\textrm{CDT}\,}}^{-1}_{\omega }\left( \sum _{m=1}^{M} \lambda _m {{\,\textrm{CDT}\,}}_\omega [\mathcal {U}_{{\varvec{\psi }}_p}\mu _m] \right) . \end{aligned}$$

Using again the singular value decomposition Theorem 4.2, we note that the Moore–Penrose pseudoinverse [25] of $\mathcal {U}$ is given by

$$\begin{aligned}{} & {} \mathcal {U}^\dagger :{\text {Range}}(\mathcal {U})\oplus {\text {Range}}(\mathcal {U})^\perp \rightarrow L^2(\mathbb {S}^{d-1}),\\{} & {} \mathcal {U}^\dagger g = \sum _{n=0}^{\infty } \sum _{k=1}^{N_{n,d}} \frac{1}{\lambda _{n,d}^{\mathcal {U}}} \, \left\langle g, Y_{n,d}^k\, {\widetilde{P}}_{n,d} \right\rangle \, Y^k_{n,d}. \end{aligned}$$

Finally, we discretize the Moore–Penrose pseudoinverse to approximate the density of the desired barycenter ${{\,\textrm{Bary}\,}}_{\mathbb {S}^{d-1}}^{\mathcal {U}}$ by

$$\begin{aligned} \mathcal {U}^\dag g({\varvec{\psi }}_p)\approx & {} \sum _{n=0}^D \sum _{j=1}^{N_{n, d}} \frac{1}{\lambda _{n,d}^{\mathcal {U}}} Y_{n, d}^j({\varvec{\psi }}_p) \frac{1}{L}\\{} & {} \sum _{i=1}^{P} \sum _{\ell =1}^{L} g({\varvec{\psi }}_i, t_\ell ) \overline{Y_{n, d}^j({\varvec{\psi }}_i)} {\widetilde{P}}_{n, d}(t_\ell ) w_i. \end{aligned}$$

We analyze the complexity for $\mathbb {S}^2$. The sums over n and j constitute a non-uniform spherical Fourier transform and the sum over i its adjoint, which can both be computed efficiently in $\mathcal {O}(D^2 \log (D)+P)$ steps, see [46] and [56, sect. 9.6]. The CDT (3) and its inverse (4) have linear complexity and can be computed with the algorithm [44]. Therefore, we obtain an overall complexity of $\mathcal {O}((D^2 \log (D)+P)LM)$. Since the number of points is usually $P\sim D^2$, the complexity grows slower than for the algorithms of Sect. 5.1.

6 Numerical Results

In this section, we present numerical computations of sliced barycenters between two measures on the sphere. and visualize them $\mathbb {S}^2$. On general $\mathbb {S}^{d-1}$, we only look at free-support barycenters, which do not require grids that become unhandy in higher dimension, and test their convergence behavior and execution times. We compare two notions of slicing: our parallel slicing (20) and the semicircular slicing of Remark 3.8, each for the two different notions of free- and fixed-support sliced Wasserstein barycenters from Sect. 5.1 and the Radon barycenters from Sect. 5.2. Further, we compare the fixed-support and Radon barycenters with the entropy-regularized version [55] of the Wasserstein barycenter (2) computed with PythonOT [27]. Finally, we consider the barycenters on $\textrm{SO}(3)$ in Sect. 6.4. Our code is available online.^{Footnote 1}

6.1 Free-Support Sliced Wasserstein Barycenters on the sphere

For the free-support discretization of Sect. 5.1.1, we compute the parallelly sliced Wasserstein barycenter (PSB) ${{\,\textrm{Bary}\,}}^{{{\,\textrm{PSW}\,}}}_{\mathbb {S}^2}$ with the stochastic gradient descent (40) and the semicircular sliced Wasserstein barycenter (SSB) ${{\,\textrm{Bary}\,}}^{{{\,\textrm{SSW}\,}}}_{\mathbb {S}^2}$, see Remark 3.8, with the algorithm [14], which uses a similar gradient descent scheme.^{Footnote 2}

The von Mises–Fisher (vMF) distribution with center $\varvec{\eta }\in \mathbb {S}^2$ and concentration $\kappa > 0$ has the density function

$$\begin{aligned} f(\varvec{\xi }) = \frac{\kappa }{4\pi \sinh \kappa }\, \textrm{e}^{\kappa \left\langle \varvec{\xi },\varvec{\eta }\right\rangle },\qquad \varvec{\xi }\in \mathbb {S}^2. \end{aligned}$$

(43)

In the first setting, we consider the sliced Wasserstein barycenters of two vMF distributions with centers on the equator shifted by $90^\circ $ and concentration $\kappa = 100$, represented by $N=200$ samples each. We use 1000 iterations, $P=500$ slices, a step size of $\tau =40$ for the PSB algorithm and 80 for the SSB algorithm, and we take samples of the uniform distribution $u_{\mathbb {S}^2}$ as initialization $X^0$. The computed sliced Wasserstein barycenters are displayed in Fig. 2. We observe that the SSB is slightly more extended toward the poles.

In the second setting, we consider vMF distributions that are highly concentrated near the poles with $\kappa = 400$, to illustrate the observations of Remark 5.1. The resulting sliced Wasserstein barycenters are shown in Fig. 3. The PSB is seemingly uniform on the sphere, which corresponds to the initial distribution $X^0$. This is coherent with the observation that all measures on the sphere are PSB of two antipodal Dirac measures. Conversely, the SSB is supported on a ring around the equator.

The third setting is with two “croissant” measures spanning from the South Pole to the North Pole and rotated from each other by an angle of $120^\circ $. The resulting PSB and SSB in Fig. 4 are quite similar, which illustrates the fact that in many cases the two notions of barycenters seem more or less to coincide.

Convergence We study the convergence of the algorithms using the same measures as in Fig. 2, but with $N=50$ samples. Figure 5 shows the evolution of the loss function (36) and the step norm, which is the $L^2$ norm of the step $X^{l+1} - X^{l}$, depending on the iteration. We show the results on $\mathbb {S}^{d-1}$ for $d=3$ and $d=10$, where we use $\tau =20$ for PSB and $\tau =50$ for SSB. The two losses are rescaled so that the initial loss coincides, as they cannot be compared in absolute value. Despite this, these loss evolutions remain difficult to compare, as they highly depend on the chosen step size $\tau $. However, we observed that for appropriate step sizes $\tau $ (i.e., that avoid oscillatory behavior around the minimizer), the PSB algorithm converges faster on $\mathbb {S}^2$. On $\mathbb {S}^{9}$, both algorithms show a similar speed of convergence. This change in dimension might be explained by the fact that a uniform measure on $\mathbb {S}^{d-1}$ (which is our initial distribution) is projected by the semicircular slicing to a uniform measure on the circle, while the parallel slicing becomes more concentrated around 0 on the unit interval $\mathbb {I}$ for larger d.

Complexity All tests were performed on an Intel Core i7-10700 with 32 GB memory. Figure 6 shows the execution time depending on the number N of points of each given measure and the number P of slices, both for a fixed number of 20 iterations without stopping criterion. We observe that the PSB algorithm is between 40 and 100 times faster than the SSB algorithm. This comes from the fact that the SSB requires to solve an OT problem on the torus $\mathbb {T}$, which is more difficult than on the real line. In terms of the evolution along N, the two algorithms (PSB and SSB) have a complexity of $O(P N(\log (N){+d)})$, which is coherent with our observations. The dependence on P is linear, but experiments with bigger values of P would be required to confirm this.

The execution times barely grow with increasing dimension d of the sphere. The only parts of the algorithm that depend on d are the inner products in $\mathbb {R}^d$, which are well parallelized, and the generation of uniform samples on $\mathbb {S}^{d-1}$, which explains the stronger effect for larger number of slices P. While in Fig. 6 the relative differences are higher for the generally faster PSB algorithm, the absolute differences are usually larger for the SSB.

6.2 Fixed-Support Sliced Wasserstein Barycenters

We test the fixed-support sliced barycenter algorithm from Sect. 5.1.2 and compare the resulting barycenters with the free-support ones. As input measures, we take a vMF distribution with $\kappa = 30$ and a “smiley” distribution, see Fig. 7. We use the gradient descent Algorithm 1 with a grid of $150\times 50$ points on the sphere, $P=100$ slices, 500 iterations, the uniform distribution as initialization and an empirically chosen step size $\tau = 0.005\, (1 + k/20)^{-1/2}$ in the kth iteration. We noticed that the results highly depend on the step size.

The resulting fixed-support PSB is displayed in Fig. 7, along with the free-support PSB and SSB from Sect. 5.1.1 and the regularized Wasserstein barycenter from PythonOT with regularization parameter 0.05. For the free-support sliced Wasserstein barycenters (PSB and SSB), the two input measures are sampled $N=200$ points and the resulting barycenters are represented using kernel density estimation [31] with the density of the vMF distributions (43) as kernel function. We notice that all barycenters look similarly.

6.3 Radon Wasserstein Barycenters on the Sphere

We compare the Radon PSB ${{\,\textrm{Bary}\,}}^{\mathcal {U}}_{\mathbb {S}^2}$ of Sect. 5.2 with the Radon SSB ${{\,\textrm{Bary}\,}}^{\mathcal {W}}_{\mathbb {S}^2}$. For the latter, we apply the algorithm [58]. Both use the truncation degree $N=120$ in (42) and $P=29282$ slices, which equals the number of points on the $121\times 242$ grid on $\mathbb {S}^2$. Figure 8 shows the barycenters of vMF distributions concentrated at the poles. As above, the Wasserstein barycenter is computed with the POT library with the regularization parameter 0.05 (or 0.01 for the “smiley” test). The regularized Wasserstein barycenter is somehow “between” the Radon PSB and SSB. Different from Fig. 3, the Radon is concentrated on a ring around the equator, which might be explained by the fact that, other than in Remark 5.1, the measures are not Dirac measures. Furthermore, the computation of the Radon PSB is much faster than the Radon SSB. Moreover, we compare the Radon barycenters of the “croissant” shape in Fig. 9 as well as the “smiley” in Fig. 10. Again, the Wasserstein barycenter seems to be between the Radon PSB and SSB.

6.4 Sliced Barycenters on the Rotation Group

We compute free-support sliced Wasserstein barycenters on $\textrm{SO}(3)$ with the stochastic gradient descent method outlined in paragraph 5.1.1.2. Similar to the spherical setting, we start with two given measures with $N=100$ points each. Since there is no other slicing approach on $\textrm{SO}(3)$, we compare it with a Wasserstein barycenter that is obtained by computing an optimal transport map for the 1-Wasserstein distance (1) using PythonOT and applying the logarithm map to project it to the manifold $\textrm{SO}(3)$.

We visualize a rotation $Q=\textrm{R}_{{{\varvec{n}}}}(\omega )\in \textrm{SO}(3)$, see (13), as the point $\tan (\frac{\omega }{4})\, {{\varvec{n}}}\in \mathbb {R}^3$, where $\omega \in [0,\pi ]$ is the angle and ${{\varvec{n}}}\in \mathbb {S}^2$ is the rotation axis, cf. [24, p. 1633]. The resulting free-support barycenters between two input measures are depicted in Fig. 11. In (a) and (b), the measures are closer together, the angle between their centers is approximately 92 degree, and notions of barycenters are similar. In (c) and (d), the input measures are almost opposite to each other with a distance of 164 degrees, then the sliced barycenter is supported on a curve with high clustering between the inputs, and the projected Wasserstein barycenter is stronger localized around two locations.

7 Conclusions

We investigated a new approach for sliced Wasserstein distance of spherical measures and proved that this parallel slicing provides a rotational invariant metric on $\mathcal {P}(\mathbb {S}^{d-1})$ that induces the same topology as the Wasserstein distance. We provided numerical algorithms for the computation of the respective sliced barycenters, both with free or fixed support, that are considerably faster than for the semicircular slicing, while producing comparable results in most cases, except when the input measures are highly concentrated around antipodal points.

Extending our method to the rotation group $\textrm{SO}(3)$, we proved the metric properties of the proposed sliced Wasserstein distance based on a new Radon transform on $\textrm{SO}(3)$ and its singular value decomposition. An extensive numerical evaluation of the latter transform is planned, but out of the scope of this paper. Further, it will be interesting to incorporate our slicing approach into gradient flows on ${\mathbb {S}}^{d-1}$ for $d \gg 2$.

Data Availability

Code is available at https://github.com/leo-buecher/Sliced-OT-Sphere

Notes

References

Abramowitz, M., Stegun, I.A. (eds.): Handbook of Mathematical Functions. National Bureau of Standards, Washington, DC (1972)
Google Scholar
Absil, P.-A., Mahony, R., Sepulchre, R.: Optimization Algorithms on Matrix Manifolds. Princeton University Press, Princeton (2008). https://doi.org/10.1515/9781400830244
Book Google Scholar
Agueh, M., Carlier, G.: Barycenters in the Wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011). https://doi.org/10.1137/100805741
Article MathSciNet Google Scholar
Altekrüger, F., Hertrich, J., Steidl, G.: Neural Wasserstein gradient flows for maximum mean discrepancies with Riesz kernels. In: Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J. (eds.) Proceedings of the 40th International Conference on Machine Learning, pp. 664–690. PMLR (2023). https://proceedings.mlr.press/v202/altekruger23a.html
Ambrosio, L., Gigli, N., Savaré, G.: Gradient flows in metric spaces and in the space of probability measures. Lectures in Mathematics ETH Zürich. Birkhäuser, Basel (2005). https://doi.org/10.1007/b137080
Atkinson, K., Han, W.: Spherical Harmonics and Approximations on the Unit Sphere: An Introduction. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-25983-8
Book Google Scholar
Ba, F.A., Quellmalz, M.: Accelerating the Sinkhorn algorithm for sparse multi-marginal optimal transport via fast Fourier transforms. Algorithms 15(9), 311 (2022). https://doi.org/10.3390/a15090311
Article Google Scholar
Beier, F., Beinert, R., Steidl, G.: On a linear Gromov-Wasserstein distance. IEEE Trans. Image Process. 31, 7292–7305 (2022). https://doi.org/10.1109/TIP.2022.3221286
Article Google Scholar
Beier, F., von Lindheim, J., Neumayer, S., Steidl, G.: Unbalanced multi-marginal optimal transport. J. Math. Imaging. Vis. (2022). https://doi.org/10.1007/s10851-022-01126-7
Berens, H., Butzer, P.L., Pawelke, S.: Limitierungsverfahren von Reihen mehrdimensionaler Kugelfunktionen und deren Saturationsverhalten. Publ. Res. Inst. Math. Sci. 4, 201–268 (1968). https://doi.org/10.2977/prims/1195194875
Article MathSciNet Google Scholar
Birdal, T., Arbel, M., Şimşekli, U., Guibas, L.J.: Synchronizing probability measures on rotations via optimal transport. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1566–1576 (2020). https://doi.org/10.1109/CVPR42600.2020.00164
Boman, J., Lindskog, F.: Support theorems for the Radon transform and Cramèr–Wold theorem. J. Theor. Probab. 22, 683–710 (2009). https://doi.org/10.1007/s10959-008-0151-0
Article Google Scholar
Bonet, C.: Leveraging Optimal Transport via Projections on Subspaces for Machine Learning Applications. PhD thesis, Université Bretagne Sud (2023)
Bonet, C., Berg, P., Courty, N., Septier, F., Drumetz, L., Pham, M.-T.: Spherical sliced-Wasserstein. In: International Conference on Learning Representations (2023). https://openreview.net/forum?id=jXQ0ipgMdU
Bonet, C., Chapel, L., Drumetz, L., Courty, N.: Hyperbolic sliced-Wasserstein via geodesic and horospherical projections. In: Doster, T., Emerson, T., Kvinge, H., Miolane, N., Papillon, M., Rieck, B., Sanborn, S. (eds.) Proceedings of 2nd Annual Workshop on Topology, Algebra, and Geometry in Machine Learning (TAG-ML), pp. 334–370. PMLR (2023). https://proceedings.mlr.press/v221/bonet23a.html
Bonneel, N., Rabin, J., Peyré, G., Pfister, H.: Sliced and Radon Wasserstein barycenters of measures. J. Math. Imaging Vis. 51(1), 22–45 (2015). https://doi.org/10.1007/s10851-014-0506-3
Article MathSciNet Google Scholar
Bonnotte, N.: Unidimensional and Evolution Methods for Optimal Transportation. PhD thesis, Université Paris Sud (2013)
Borgwardt, S.: An LP-based, strongly-polynomial 2-approximation algorithm for sparse Wasserstein Barycenters. Oper. Res. Int. J. 22, 1511–1551 (2022). https://doi.org/10.1007/s12351-020-00589-z
Article MathSciNet Google Scholar
Condat, L.: Fast projection onto the simplex and the $l_1$ ball. Math. Program. 158(1–2), 575–585 (2016). https://doi.org/10.1007/s10107-015-0946-6
Article MathSciNet Google Scholar
Cui, L., Qi, X., Wen, C., Lei, N., Li, X., Zhang, M., Gu, X.: Spherical optimal transportation. Comput. Aided Des. 115, 181–193 (2019). https://doi.org/10.1016/j.cad.2019.05.024
Article MathSciNet Google Scholar
Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, Volume 26. Curran Associates, Inc. (2013). https://papers.nips.cc/paper_files/paper/2013/hash/af21d0c97db2e27e13572cbf59eb343d-Abstract.html
Dai, F., Xu, Y.: Approximation Theory and Harmonic Analysis on Spheres and Balls. Springer Monographs in Mathematics. Springer, New York (2013). https://doi.org/10.1007/978-1-4614-6660-4
Book Google Scholar
Delon, J., Salomon, J., Sobolevski, A.: Fast transport optimization for Monge costs on the circle. SIAM J. Appl. Math. 70(7), 2239–2258 (2010). https://doi.org/10.1137/090772708
Article MathSciNet Google Scholar
Ehler, M., Gräf, M., Neumayer, S., Steidl, G.: Curve based approximation of measures on manifolds by discrepancy minimization. Found. Comput. Math. 21(6), 1595–1642 (2021). https://doi.org/10.1007/s10208-021-09491-2
Article MathSciNet Google Scholar
Engl, H.W., Hanke, M., Neubauer, A.: Regularization of Inverse Problems, volume 375 of Mathematics and Its Applications. Kluwer, Dodrecht (1996)
Book Google Scholar
Fan, J., Zhang, Q., Taghvaei, A., Chen, Y.: Variational Wasserstein gradient flow. In: Chaudhuri, K., Jegelka, S., Song, L., Szepesvari, C., Niu, G., Sabato, S. (eds.) Proceedings of the 39th International Conference on Machine Learning, pp. 6185–6215. PMLR (2022)
Flamary, R., Courty, N., Gramfort, A., Alaya, M.Z., Boisbunon, A., Chambon, S., Chapel, L., Corenflos, A., Fatras, K., Fournier, N., Gautheron, L., Gayraud, N.T., Janati, H., Rakotomamonjy, A., Redko, I., Rolet, A., Schutz, A., Seguy, V., Sutherland, D.J., Tavenard, R., Tong, A., Vayer, T.: POT: Python optimal transport. J. Mach. Learn. Res. 22(78), 1–8 (2021)
Google Scholar
Funk, P.: Über Flächen mit lauter geschlossenen geodätischen Linien. Math. Ann. 74(2), 278–300 (1913). https://doi.org/10.1007/BF01456044
Article MathSciNet Google Scholar
Gehér, G.P., Hrušková, A., Titkos, T., Virosztek, D.: Isometric rigidity of Wasserstein spaces over Euclidean spheres (2023). arXiv:2308.05065
Groemer, H.: On a spherical integral transformation and sections of star bodies. Monatsh. Math. 126(2), 117–124 (1998). https://doi.org/10.1007/BF01473582
Article MathSciNet Google Scholar
Hall, P., Watson, G., Cabrera, J.: Kernel density estimation with spherical data. Biometrika 74(4), 751–62 (1987). https://doi.org/10.1093/biomet/74.4.751
Article MathSciNet Google Scholar
Hamfeldt, B., Turnquist, A.: A convergence framework for optimal transport on the sphere. Numer. Math. 151, 627–657 (2022). https://doi.org/10.1007/s00211-022-01292-1
Article MathSciNet Google Scholar
Han, R.: Sliced Wasserstein distance between probability measures on Hilbert spaces (2023). arXiv:2307.05802
Helgason, S.: Integral Geometry and Radon Transforms. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-6055-9
Book Google Scholar
Hertrich, J., Wald, C., Altekrüger, F., Hagemann, P.: Generative sliced MMD flows with Riesz kernels. ICLR (2024)
Hielscher, R.: The Radon Transform on the Rotation Group–Inversion and Application to Texture Analysis. Dissertation, Technische Universität Bergakademie Freiberg (2007). https://nbn-resolving.org/urn:nbn:de:bsz:105-3614018
Hielscher, R., Potts, D., Prestin, J., Schaeben, H., Schmalz, M.: The Radon transform on $SO(3)$: A Fourier slice theorem and numerical inversion. Inverse Prob. 24, 025011 (2008). https://doi.org/10.1088/0266-5611/24/2/025011
Article MathSciNet Google Scholar
Hielscher, R., Potts, D., Quellmalz, M.: An SVD in spherical surface wave tomography. In: Hofmann, B., Leitao, A., Zubelli, J.P. (eds.) New Trends in Parameter Identification for Mathematical Models, pp. 121–144. Birkhäuser (2018). https://doi.org/10.1007/978-3-319-70824-9_7
Chapter Google Scholar
Hielscher, R., Quellmalz, M.: Optimal mollifiers for spherical deconvolution. Inverse Prob. 31(8), 085001 (2015). https://doi.org/10.1088/0266-5611/31/8/085001
Article MathSciNet Google Scholar
Hielscher, R., Quellmalz, M.: Reconstructing a function on the sphere from its means along vertical slices. Inverse Probl. Imaging 10(3), 711–739 (2016). https://doi.org/10.3934/ipi.2016018
Article MathSciNet Google Scholar
Kim, Y.-H., Pass, B.: Wasserstein Barycenters over Riemannian manifolds. Adv. Math. 307, 640–683 (2017). https://doi.org/10.1016/j.aim.2016.11.026
Article MathSciNet Google Scholar
Knight, P.A.: The Sinkhorn–Knopp algorithm: convergence and applications. SIAM J. Matrix Anal. Appl. 30(1), 261–275 (2008). https://doi.org/10.1137/060659624
Article MathSciNet Google Scholar
Kolouri, S., Nadjahi, K., Simsekli, U., Badeau, R., Rohde, G.K.: Generalized sliced Wasserstein distances. In Wallach, H., Larochelle, H., Beygelzimer, A., d’Alché Buc, F., Fox, E., Garnett, R. (eds.) Advances in Neural Information Processing Systems 32 (NeurIPS 2019) (2019)
Kolouri, S., Park, S.R., Rohde, G.K.: The Radon cumulative distribution transform and its application to image classification. IEEE Trans. Image Process. 25(2), 920–34 (2016). https://doi.org/10.1109/TIP.2015.2509419
Article MathSciNet Google Scholar
Korotin, A., Selikhanovych, D., Burnaev, E.: Neural optimal transport. In The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=d8CBRlWNkqH
Kunis, S., Potts, D.: Fast spherical Fourier algorithms. J. Comput. Appl. Math. 161, 75–98 (2003). https://doi.org/10.1016/S0377-0427(03)00546-6
Article MathSciNet Google Scholar
Lee, J.M.: Introduction to Smooth Manifolds, volume 218 of Grad. Texts in Math. Springer, New York (2012). https://doi.org/10.1007/978-1-4419-9982-5
Book Google Scholar
Loeper, G.: Regularity of optimal maps on the sphere: the quadratic cost and the reflector antenna. Arch. Ration. Mech. Anal. 199(1), 269–289 (2010). https://doi.org/10.1007/s00205-010-0330-x
Article MathSciNet Google Scholar
Louis, A.K., Riplinger, M., Spiess, M., Spodarev, E.: Inversion algorithms for the spherical Radon and cosine transform. Inverse Prob. 27(3), 035015 (2011). https://doi.org/10.1088/0266-5611/27/3/035015
Article MathSciNet Google Scholar
McRae, A.T.T., Cotter, C.J., Budd, C.J.: Optimal-transport-based mesh adaptivity on the plane and sphere using finite elements. SIAM J. Sci. Comput. 40(2), A1121–A1148 (2018). https://doi.org/10.1137/16M1109515
Article MathSciNet Google Scholar
Morawiec, A.: Orientations and Rotations. Springer, Berlin (2004). https://doi.org/10.1007/978-3-662-09156-2
Book Google Scholar
Natterer, F., Wübbeling, F.: Mathematical Methods in Image Reconstruction. SIAM, Philadelphia, PA (2000). https://doi.org/10.1137/1.9780898718324.fm
Nguyen, K., Ren, T., Nguyen, H., Rout, L., Nguyen, T.M., Ho, N.: Hierarchical sliced Wasserstein distance. In The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=CUOaVn6mYEj
Park, S.R., Kolouri, S., Kundu, S., Rohde, G.K.: The cumulative distribution transform and linear pattern classification. Appl. Comput. Harmon. Anal. 45(3), 616–641 (2018). https://doi.org/10.1016/j.acha.2017.02.002
Article MathSciNet Google Scholar
Peyré, G., Cuturi, M.: Computational optimal transport. Found. Trends Mach. Learn. 11(5–6), 355–607 (2019). https://doi.org/10.1561/2200000073
Article Google Scholar
Plonka, G., Potts, D., Steidl, G., Tasche, M.: Numerical Fourier Analysis. Applied and Numerical Harmonic Analysis, 2nd edition Birkhäuser, Basel (2023). https://doi.org/10.1007/978-3-031-35005-4
Book Google Scholar
Quellmalz, M.: Reconstructing Functions on the Sphere from Circular Means. Dissertation. Universitätsverlag Chemnitz (2019). https://nbn-resolving.org/urn:nbn:de:bsz:ch1-qucosa2-384068
Quellmalz, M., Beinert, R., Steidl, G.: Sliced optimal transport on the sphere. Inverse Prob. 39(10), 105005 (2023). https://doi.org/10.1088/1361-6420/acf156
Article MathSciNet Google Scholar
Quellmalz, M., Hielscher, R., Louis, A.K.: The cone-beam transform and spherical convolution operators. Inverse Prob. 34(10), 105006 (2018). https://doi.org/10.1088/1361-6420/aad679
Article MathSciNet Google Scholar
Quellmalz, M., Weissinger, L., Hubmer, S., Erchinger, P.D.: A frame decomposition of the Funk-Radon transform. In: Calatroni, L., Donatelli, M., Morigi, S., Prato, M., Santacesaria, M. (eds.) Scale Space and Variational Methods in Computer Vision, SSVM 2023, pp. 42–54. Springer (2023). https://doi.org/10.1007/978-3-031-31975-4_4
Chapter Google Scholar
Rabin, J., Delon, J., Gousseau, Y.: Transportation distances on the circle. J. Math. Imaging Vis. 41, 147–167 (2011). https://doi.org/10.1007/s10851-011-0284-0
Article MathSciNet Google Scholar
Rabin, J., Peyré, G., Delon, J., Bernot, M.: Wasserstein barycenter and its application to texture mixing. In Bruckstein, A., ter Haar Romeny, B., Bronstein, A., Bronstein, M. (eds), Scale Space and Variational Methods in Computer Vision, SSVM 2011, pp. 435–446. Springer (2012). https://doi.org/10.1007/978-3-642-24785-9_37
Rubin, B.: Generalized Minkowski–Funk transforms and small denominators on the sphere. Fract. Calc. Appl. Anal. 3(2), 177–203 (2000)
MathSciNet Google Scholar
Rubin, B.: The vertical slice transform on the unit sphere. Fract. Calculus Appl. Anal. 22(4), 899–917 (2019). https://doi.org/10.1515/fca-2019-0049
Article MathSciNet Google Scholar
Rustamov, K.P.: On approximation of functions on the sphere. Izv. RAN. Ser. Mat. 57(5), 127–148 (1993). https://doi.org/10.1070/IM1994v043n02ABEH001566
Article Google Scholar
Rustamov, R.M., Majumdar, S.: Intrinsic sliced Wasserstein distances for comparing collections of probability distributions on manifolds and graphs (2023). https://proceedings.mlr.press/v202/rustamov23a.html
Santambrogio, F.: Optimal Transport for Applied Mathematicians, volume 87 of Progress in Nonlinear Differential Equations and Their Applications. Birkhäuser, Cham (2015). https://doi.org/10.1007/978-3-319-20828-2
Book Google Scholar
Schneider, R.: Functions on a sphere with vanishing integrals over certain subspheres. J. Math. Anal. Appl. 26, 381–384 (1969). https://doi.org/10.1016/0022-247X(69)90160-7
Article MathSciNet Google Scholar
Staib, M., Claici, S., Solomon, J.M., Jegelka, S.: Parallel streaming Wasserstein barycenters. In: Guyon, I., Von Luxburg, U., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30 (NIPS 2017) (2017). https://proceedings.neurips.cc/paper/2017/hash/253f7b5d921338af34da817c00f42753-Abstract.html
Theveneau, M., Keriven, N.: Stability of entropic Wasserstein Barycenters and application to random geometric graphs. In $29^{\circ }$ Colloque sur le traitement du signal et des images, pages 93–96. GRETSI - Groupe de Recherche en Traitement du Signal et des Images (2023). https://gretsi.fr/data/colloque/pdf/2023_keriven1083.pdf
Ungar, P.: Freak theorem about functions on a sphere. J. Lond. Math. Soc. 1(1), 100–103 (1954). https://doi.org/10.1112/jlms/s1-29.1.100
Article MathSciNet Google Scholar
van den Boogaart, K.G., Hielscher, R., Prestin, J., Schaeben, H.: Kernel-based methods for inversion of the Radon transform on ${{\rm SO}}(3)$ and their applications to texture analysis. J. Comput. Appl. Math. 199, 122–140 (2007). https://doi.org/10.1016/j.cam.2005.12.003
Varshalovich, D., Moskalev, A., Khersonskii, V.: Quantum Theory of Angular Momentum. World Scientific Publishing, Singapore (1988). https://doi.org/10.1142/0270
Book Google Scholar
Vilenkin, N.J.: Special Functions and the Theory of Group Representations. AMS, Providence, RI (1968)
Book Google Scholar
Villani, C.: Topics in Optimal Transportation. Number 58 in Graduate Studies in Mathematics. American Mathematical Society, Providence (2003). https://doi.org/10.1090/gsm/058
Wang, W., Carreira-Perpinán, M.A.: Projection onto the probability simplex: An efficient algorithm with a simple proof, and an application (2013). arXiv:1309.1541
Warner, F.W.: Foundations of Differentiable Manifolds and Lie Groups. Graduate Texts in Mathematics. Springer, New York (1983). https://doi.org/10.1007/978-1-4757-1799-0
Book Google Scholar
Weller, H., Browne, P., Budd, C., Cullen, M.: Mesh adaptation on the sphere using optimal transport and the numerical solution of a Monge-Ampère type equation. J. Comput. Phys. 308, 102–123 (2016). https://doi.org/10.1016/j.jcp.2015.12.018
Article MathSciNet Google Scholar
Zangerl, G., Scherzer, O.: Exact reconstruction in photoacoustic tomography with circular integrating detectors II: Spherical geometry. Math. Methods Appl. Sci. 33(15), 1771–1782 (2010). https://doi.org/10.1002/mma.1266
Article MathSciNet Google Scholar

Download references

Acknowledgements

We gratefully acknowledge the funding by the German Research Foundation (DFG): STE 571/19-1, project number 495365311, within the Austrian Science Fund (FWF) SFB 10.55776/F68: “Tomography Across the Scales.” L.B. acknowledges funding from the TU Berlin, Institute of Mathematics, during his internship in 2023. For open access purposes, the author has applied a CC BY public copyright license to any author-accepted manuscript version arising from this submission.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute of Mathematics, Technische Universität Berlin, Straße des 17. Juni 136, 10623, Berlin, Germany
Michael Quellmalz, Léo Buecher & Gabriele Steidl
CentraleSupélec, Université Paris-Saclay, Paris, France
Léo Buecher

Authors

Michael Quellmalz
View author publications
You can also search for this author in PubMed Google Scholar
Léo Buecher
View author publications
You can also search for this author in PubMed Google Scholar
Gabriele Steidl
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MQ, LB and GS contributed to the conception and writing of the manuscript. LB did the coding for the sliced Wasserstein barycenters and MQ for the Radon Wasserstein barycenters.

Corresponding author

Correspondence to Michael Quellmalz.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Metric Properties of the Parallelly Sliced Wasserstein Distance

In the following, we show bounds between the sliced Wasserstein distance ${{\,\textrm{PSW}\,}}_p$ and the spherical Wasserstein distance ${{\,\textrm{W}\,}}_p$ by a relation to the Euclidean case [17, Sect. 5.1].

Lemma A.1

The geodesic distance (7) on the sphere and the Euclidean distance are related via

$$\begin{aligned} \left\Vert \varvec{\xi }-\varvec{\eta } \right\Vert \le d(\varvec{\xi },\varvec{\eta }) \le \frac{\pi }{2} \left\Vert \varvec{\xi }-\varvec{\eta } \right\Vert ,\qquad \forall \varvec{\xi },\varvec{\eta }\in \mathbb {S}^{d-1}. \end{aligned}$$

Proof

We first show that

$$\begin{aligned} \sqrt{2-2x} \le \arccos (x) \le \frac{\pi }{2} \sqrt{2-2x},\qquad \forall x\in \mathbb {I}. \end{aligned}$$

(44)

For $x\in \mathbb {I}$, we have by [1, 4.4.2]

$$\begin{aligned} \arccos (x)= & {} \int _{x}^{1} \frac{1}{\sqrt{1-t^2}} \, \textrm{d}t = \int _{x}^{1} \frac{1}{\sqrt{1+t}\, \sqrt{1-t}} \, \textrm{d}t \\\ge & {} \int _{x}^{1} \frac{1}{\sqrt{2-2t}} \, \textrm{d}t = \sqrt{2-2x}, \end{aligned}$$

which is the first inequality of (44). Analogously, we have for $x\in [0,1]$ that

$$\begin{aligned} \arccos (x) = \int _{x}^{1} \frac{1}{\sqrt{1-t^2}} \, \textrm{d}t \le \int _{x}^{1} \frac{1}{\sqrt{1-t}} \, \textrm{d}t = 2\sqrt{1-x}. \end{aligned}$$

Because $\arccos $ is convex on $[-1,0]$ and $\arccos (-1)=\pi $, we obtain the second inequality of (44) for all $x\in [-1,1]$. The assertion follows from the fact that

$$\begin{aligned} \left\Vert \varvec{\xi }-\varvec{\eta } \right\Vert = \sqrt{\left\Vert \varvec{\xi } \right\Vert ^2+\left\Vert \varvec{\eta } \right\Vert ^2-2\left\langle \varvec{\xi },\varvec{\eta }\right\rangle } = \sqrt{2-2\left\langle \varvec{\xi },\varvec{\eta }\right\rangle } \end{aligned}$$

and $ d(\varvec{\xi },\varvec{\eta }) = \arccos (\left\langle \varvec{\xi },\varvec{\eta }\right\rangle ) $ for all $\varvec{\xi },\varvec{\eta }\in \mathbb {S}^{d-1}$ by (7). $\square $

We extend a spherical measure $\mu \in \mathcal {P}(\mathbb {S}^{d-1})$ to a measure ${\tilde{\mu }}\in \mathcal {P}(\mathbb {R}^d)$ that is supported on $\mathbb {S}^{d-1}$ by setting

$$\begin{aligned} {\tilde{\mu }}(B) :=\mu (B\cap \mathbb {S}^{d-1}), \qquad \forall B\in \mathcal {B}(\mathbb {R}^d). \end{aligned}$$

Lemma A.2

Let $\mu ,\nu \in \mathcal {P}(\mathbb {S}^{d-1})$ be spherical measures with extensions ${\tilde{\mu }},{\tilde{\nu }}\in \mathcal {P}(\mathbb {R}^d)$. Then the following inequalities between the spherical Wasserstein distance ${{\,\textrm{W}\,}}_p(\mu ,\nu )$ and the Euclidean Wasserstein distance ${{\,\textrm{W}\,}}_p({\tilde{\mu }},{\tilde{\nu }})$ on $\mathbb {R}^d$ hold:

$$\begin{aligned} {{\,\textrm{W}\,}}_p({\tilde{\mu }},{\tilde{\nu }}) \le {{\,\textrm{W}\,}}_p(\mu ,\nu ) \le \frac{\pi }{2} {{\,\textrm{W}\,}}_p({\tilde{\mu }},{\tilde{\nu }}). \end{aligned}$$

Proof

The Euclidean Wasserstein distance on $\mathbb {X}=\mathbb {R}^d$ is given in (1) with $d(\varvec{x},\varvec{y}) = \left\Vert \varvec{x}- \varvec{y} \right\Vert $. Any transport plan ${\tilde{\gamma }} \in \Pi ({\tilde{\mu }},{\tilde{\nu }})$ is supported only on $\mathbb {S}^{d-1}\times \mathbb {S}^{d-1}$. Hence, its restriction to $\mathbb {S}^{d-1}\times \mathbb {S}^{d-1}$ yields a transport plan in $\Pi (\mu ,\nu )$. Conversely, a transport plan $\gamma \in \Pi (\mu ,\nu )$ can be extended to $\mathbb {R}^d$ by setting it to zero outside the sphere. Hence, we have

$$\begin{aligned} {{\,\textrm{W}\,}}_p^p(\mu , \nu )= & {} \inf _{\gamma \in \Pi (\mu , \nu )} {\int _{\mathbb {S}^{d-1}\times \mathbb {S}^{d-1}}} d({{\varvec{x}}},{{\varvec{y}}}) \, \textrm{d}\gamma ({{\varvec{x}}},{{\varvec{y}}})\\= & {} \inf _{{\tilde{\gamma }}\in \Pi ({\tilde{\mu }},{\tilde{\nu }})} {\int _{\mathbb {R}^{d}\times \mathbb {R}^{d}}} d({{\varvec{x}}},{{\varvec{y}}}) \, \textrm{d}{\tilde{\gamma }}({{\varvec{x}}},{{\varvec{y}}}). \end{aligned}$$

The claim follows by Lemma A.1. $\square $

Proof of Theorem 3.7

We first show (31) via applying the respective result from the Euclidean space. We briefly recall sliced OT on $\mathbb {R}^d$, see [16], with the slicing operator $ \mathcal {S}^{\mathbb {R}^d}_{{\varvec{\psi }}} :\mathbb {R}^d\rightarrow \mathbb {R}$, ${{\varvec{x}}}\mapsto \left\langle {\varvec{\psi }},{{\varvec{x}}}\right\rangle $ for ${\varvec{\psi }}\in \mathbb {S}^{d-1}$ and the sliced Wasserstein distance

$$\begin{aligned}{} & {} {{\,\textrm{RSW}\,}}_p^p(\mu ,\nu ) := \int _{\mathbb {S}^{d-1}} {{\,\textrm{W}\,}}_p^p\left( (\mathcal {S}^{\mathbb {R}^d}_{{\varvec{\psi }}})_\# \mu ,(\mathcal {S}^{\mathbb {R}^{d}}_{{\varvec{\psi }}})_\# \nu \right) \, \textrm{d}u_{\mathbb {S}^{d-1}} ({\varvec{\psi }}), \\{} & {} \qquad \mu ,\nu \in \mathcal {P}(\mathbb {R}^d). \end{aligned}$$

Comparing the Euclidean sliced Wasserstein distance ${{\,\textrm{RSW}\,}}$ with the spherical sliced Wasserstein distance (30), we see that

$$\begin{aligned} {{\,\textrm{PSW}\,}}_p(\mu ,\nu ) = \textrm{RSW}_p({\tilde{\mu }},{\tilde{\nu }}). \end{aligned}$$

(45)

By [17, thm. 5.1.5], there exist constants ${\tilde{c}}_{d,p}, {\tilde{C}}_{d,p}$ such that for all measures ${\tilde{\mu }},{\tilde{\nu }} \in \mathcal {P}(\mathbb {R}^d)$ which are supported in a ball of fixed radius $R>0$ we have

$$\begin{aligned} \textrm{RSW}_p({\tilde{\mu }},{\tilde{\nu }})\le & {} {\tilde{c}}_{d,p} {{\,\textrm{W}\,}}_p({\tilde{\mu }},{\tilde{\nu }})\\\le & {} {\tilde{C}}_{d,p} R^{1-\frac{1}{p(d+1)}}\, \textrm{RSW}_p({\tilde{\mu }},{\tilde{\nu }})^{\frac{1}{p(d+1)}}. \end{aligned}$$

As ${\tilde{\nu }}$ and ${\tilde{\mu }}$ are by construction supported in a ball of radius $1+\varepsilon $ for any $\varepsilon >0$, the validity of (31) follows by invoking (45) and Lemma A.2.

Next we prove the metric properties. The symmetry and the triangle inequality follow from the corresponding properties of the Wasserstein distance and the p-norm on $\mathbb {S}^{d-1}$. The positive definiteness and the equivalence to the spherical Wasserstein distance follow from (31). The rotational invariance of ${{\,\textrm{PSW}\,}}$ follows since $\mathcal {U}$ is rotationally invariant. $\square $

Proofs From Sect. 4

Proof of Theorem 4.2

Let $n\in \mathbb {N}_0$ and $j,k\in \{-n,\dots ,n\}$. Using the product identity [36, cor. 2.11]

$$\begin{aligned} D_n^{j,k}({{\varvec{P}}}{{\varvec{Q}}}) = \sum _{\ell =-n}^{n} D_n^{j,\ell }({{\varvec{P}}}) D_n^{\ell ,k}({{\varvec{Q}}}) \qquad \forall {{\varvec{P}}},{{\varvec{Q}}}\in \textrm{SO}(3), \nonumber \\ \end{aligned}$$

(46)

we have

$$\begin{aligned} \mathcal {T}D_n^{j, k}({{\varvec{Q}}}, \omega )= & {} \frac{1}{4\pi ^2}(1-\cos (\omega ))\,\\{} & {} \sum _{\ell =-n}^{n} D_n^{j, \ell }({{\varvec{Q}}}) \int _{\mathbb {S}^2} D_n^{\ell , k}(\textrm{R}_{\varvec{\xi }}(\omega )) \, \textrm{d}\sigma _{\mathbb {S}^2}(\varvec{\xi }). \end{aligned}$$

We write $\varvec{\xi }={{\,\mathrm{\Phi }\,}}(\varphi ,\vartheta )$ with the spherical coordinates (10). Since

$$\begin{aligned} \textrm{R}_{{{\,\mathrm{\Phi }\,}}(\varphi ,\vartheta )}(\omega )= & {} \textrm{R}_{{{\varvec{e}}}^3}(\varphi ) \textrm{R}_{{{\varvec{e}}}^2}(\vartheta ) \textrm{R}_{{{\varvec{e}}}^3}(\varphi ) \textrm{R}_{{{\varvec{e}}}^3}(\omega ) \textrm{R}_{{{\varvec{e}}}^2}(-\vartheta ) \textrm{R}_{{{\varvec{e}}}^3}(-\varphi ) \\= & {} {{\,\mathrm{\Psi }\,}}(\varphi ,\vartheta ,0) {{\,\mathrm{\Psi }\,}}(\omega ,-\vartheta ,-\varphi ), \end{aligned}$$

we have by (16) and (46)

$$\begin{aligned} D_n^{j,k}(\textrm{R}_{{{\,\mathrm{\Phi }\,}}(\varphi ,\vartheta )}(\omega ))= & {} \sum _{\ell =-n}^{n} D_n^{j,\ell }({{\,\mathrm{\Psi }\,}}(\varphi ,\vartheta ,0))\, \\{} & {} \textrm{e}^{-\textrm{i}\ell \omega }\, D_n^{\ell ,k}({{\,\mathrm{\Psi }\,}}(0,-\vartheta ,-\varphi )), \end{aligned}$$

cf. [73, § 4.5]. Hence, we obtain

$$\begin{aligned} \mathcal {T}D_n^{j,k}({{\varvec{Q}}},\omega )= & {} \frac{1}{4\pi ^2}(1-\cos (\omega )) \\{} & {} \sum _{\ell =-n}^{n} D_n^{j,\ell }({{\varvec{Q}}}) \sum _{m=-n}^{n} \int _{0}^{\pi } \int _{\mathbb {T}} D_n^{\ell ,m}({{\,\mathrm{\Psi }\,}}(\varphi ,\vartheta ,0)) \\{} & {} \textrm{e}^{-\textrm{i}m \omega } D_n^{m,k}({{\,\mathrm{\Psi }\,}}(0,-\vartheta ,-\varphi )) \, \textrm{d}\varphi \sin (\vartheta ) \, \textrm{d}\vartheta . \end{aligned}$$

With the symmetry $ D_n^{m,k}({{\varvec{Q}}}) = \overline{D_n^{k,m}({{\varvec{Q}}}^\top )} $ and (16), we see that

$$\begin{aligned} \mathcal {T}D_n^{j,k}({{\varvec{Q}}},\omega )= & {} \frac{1}{4\pi ^2}(1-\cos (\omega )) \\{} & {} \sum _{\ell =-n}^{n} D_n^{j,\ell }({{\varvec{Q}}}) \sum _{m=-n}^{n} \textrm{e}^{-\textrm{i}m \omega } \int _{0}^{\pi } \int _{\mathbb {T}}\\{} & {} \textrm{e}^{-\textrm{i}\ell \varphi } d_n^{\ell ,m}(\cos (\vartheta )) d_n^{k,m}(\cos (\vartheta )) \textrm{e}^{\textrm{i}m \varphi } \, \textrm{d}\varphi \sin (\vartheta ) \, \textrm{d}\vartheta . \end{aligned}$$

With the orthogonality of the exponentials and the d-functions in (18), we obtain

$$\begin{aligned} \mathcal {T}D_n^{j, k}({{\varvec{Q}}}, \omega ) = (1-\cos (\omega ))\, \frac{1}{(2n+1)\pi } D_n^{j, k}({{\varvec{Q}}}) \sum _{\ell =-n}^{n} \textrm{e}^{-\textrm{i}\ell \omega }. \end{aligned}$$

The expansion relation of the Dirichlet kernel

$$\begin{aligned} \sum _{\ell =-n}^{n} \textrm{e}^{-\textrm{i}\ell \omega } = \frac{\sin ((n+\frac{1}{2}) \omega )}{\sin (\frac{\omega }{2})} \end{aligned}$$

and the half angle formula $ (1-\cos (\omega )) = 2\sin (\omega /2)^2 $ yield

$$\begin{aligned} \mathcal {T}D_n^{j, k}({{\varvec{Q}}}, \omega ) = \frac{2}{(2n+1)\pi } D_n^{j, k}({{\varvec{Q}}})\, \sin \left( (n+\tfrac{1}{2}) \omega \right) \, \sin (\tfrac{\omega }{2}), \end{aligned}$$

which implies (34). The orthogonality of $F_n^{j,k}$ follows from the orthonormality (17) of the rotational harmonics ${\widetilde{D}}_n^{j,k}$. Using the identity $(\sin (\frac{\omega }{2}))^2 = (1+\cos (\omega ))/2$ and the orthogonality of the cosine, we obtain

$$\begin{aligned} \int _{0}^{\pi } \left( \sin \left( (n+\tfrac{1}{2}) \omega \right) \right) ^2\, \left( \sin (\tfrac{\omega }{2})\right) ^2 \, \textrm{d}\omega = {\left\{ \begin{array}{ll} \pi /4, &{} n \in \mathbb {N}, \\ 3\pi /8, &{} n=0. \end{array}\right. } \quad \square \end{aligned}$$

Proof of Theorem 4.7

This proof uses a similar structure as [58, Thm. 3.7]. Let $\mu ,\nu \in \mathcal {M}(\textrm{SO}(3))$ such that $\mathcal {T}\mu =\mathcal {T}\nu $. For $g\in C(\textrm{SO}(3)\times \mathbb {I})$, we have by Proposition 3.5

$$\begin{aligned} \left\langle \mu , \mathcal {T}^*g\right\rangle = \left\langle \nu , \mathcal {T}^*g\right\rangle . \end{aligned}$$

We show that $\{\mathcal {T}^*g: g\in C(\textrm{SO}(3)\times \mathbb {I})\}$ is a dense subset of $C(\textrm{SO}(3))$, which implies $\mu =\nu $.

The Sobolev space $H^s(\textrm{SO}(3))$ with $s\ge 0$ is defined as the completion of $C^\infty (\textrm{SO}(3))$ with respect to the Sobolev norm

$$\begin{aligned} \left\Vert f \right\Vert _{H^s(\textrm{SO}(3))}^2 := \sum _{n=0}^\infty \left( n+\tfrac{1}{2}\right) ^{2s} \sum _{j,k=-n}^{n} \frac{8\pi ^2}{2n+1} | \langle f, D_n^{j,k}\rangle |^2. \nonumber \\ \end{aligned}$$

(47)

Let $s>2$, then $H^s(\textrm{SO}(3))$ is dense in $C(\textrm{SO}(3))$, cf. [36, Lem. 2.22]. Let $f\in H^s(\textrm{SO}(3))$. Since $\mathcal {T}$ is injective on $L^2(\textrm{SO}(3))$ by Theorem 4.2, we have $f = \mathcal {T}^* g$ if and only if $\mathcal {T}f = \mathcal {T}\mathcal {T}^* g$. In the following, we show that

$$\begin{aligned} g :=(\mathcal {T}\mathcal {T}^*)^{-1} \mathcal {T}f \end{aligned}$$

is in $C(\textrm{SO}(3)\times [0,\pi ])$, then we obtain $f = \mathcal {T}^* g$, which shows that $H^s(\textrm{SO}(3))\subset \mathcal {T}^*(C(\textrm{SO}(3)\times [0,\pi ])$ and therefore the assertion.

Since $\mathcal {T}^*$ has the same singular functions as $\mathcal {T}$ and the conjugate singular values, we obtain by the singular value decomposition (34) that

$$\begin{aligned} (\mathcal {T}\mathcal {T}^*)^{-1} \mathcal {T}f = \sum _{n=0}^{\infty } \sum _{j,k=-n}^{n} \frac{1}{\lambda _n^{\mathcal {T}}}\, \left\langle f, D_n^{j,k}\right\rangle _{L^2(\textrm{SO}(3))}\, F_n^{j,k}. \nonumber \\ \end{aligned}$$

(48)

We want to show that the right-hand side of (48) converges uniformly on $C(\textrm{SO}(3)\times [0,\pi ])$. Let $({{\varvec{Q}}},\omega ) \in \textrm{SO}(3)\times [0,\pi ]$ and $N\in \mathbb {N}$. Inserting $\lambda _n^{\mathcal {T}}$ from Theorem 4.2, we have

$$\begin{aligned}&\left| \sum _{n=0}^{\infty } \sum _{j,k=-n}^{n} \frac{1}{\lambda _n^{\mathcal {T}}}\, \left\langle f, D_n^{j,k}\right\rangle _{L^2(\textrm{SO}(3))}\, F_n^{j,k}({{\varvec{Q}}},\omega ) \right. \\&\quad \left. -\sum _{n=0}^{N-1} \sum _{j,k=-n}^{n} \frac{1}{\lambda _n^{\mathcal {T}}}\, \left\langle f, D_n^{j,k}\right\rangle _{L^2(\textrm{SO}(3))}\, F_n^{j,k}({{\varvec{Q}}},\omega ) \right| \\&\quad \le \frac{1}{2} \sum _{n=N}^{\infty } \sum _{j,k=-n}^{n} \left( n+\tfrac{1}{2}\right) \, \left| \left\langle f, D_n^{j,k}\right\rangle _{L^2(\mathbb {S}^2)}\right| \left| {\widetilde{D}}_n^{j,k}({{\varvec{Q}}})\right| \\&\quad \le \frac{1}{2} \sqrt{\sum _{n=N}^{\infty } \sum _{j,k=-n}^{n} \left( n+\tfrac{1}{2}\right) ^{2s}\, \left| \left\langle f, D_n^{j,k}\right\rangle _{L^2(\mathbb {S}^2)}\right| ^2} \\&\quad \sqrt{\sum _{n=N}^{\infty } \left( n+\tfrac{1}{2}\right) ^{2-2s} \sum _{j,k=-n}^{n}\left| {\widetilde{D}}_n^{j,k}({{\varvec{Q}}})\right| }, \end{aligned}$$

where we made use of the Cauchy–Schwarz inequality. In the last equation, the first part is bounded by the Sobolev norm (47). For the second part, the addition theorem [36, Thm. 2.14] yields

$$\begin{aligned}{} & {} \sum _{n=N}^{\infty } \left( n+\tfrac{1}{2}\right) ^{2-2s} \sum _{j,k=-n}^{n} \left| {\widetilde{D}}_n^{j,k}({{\varvec{Q}}})\right| \\{} & {} \quad = \sum _{n=N}^{\infty } \left( n+\tfrac{1}{2}\right) ^{2-2s} \frac{(2n+1)(n+1)}{8\pi ^2} <\infty \end{aligned}$$

since $s>2$. Hence, the right-hand side of (48) converges uniformly to a continuous function on $\textrm{SO}(3)\times [0,\pi ]$, which finally implies that g is continuous. $\square $

Relation of the Rotation Group with the 3-sphere

We show a relation of the sliced Wasserstein distance (35) with an analogue of ${{\,\textrm{PSW}\,}}$ on the sphere $\mathbb {S}^3$, making use of the representation of $\textrm{SO}(3)$ via unit quaternions, see [51, §2.6]. The algebra of quaternions $\mathbb {H}$ consists of vectors ${{\varvec{q}}}= (q_0,q_1,q_2,q_3) = (q_0,{{\varvec{q}}}')\in \mathbb {R}^4$, where ${{\varvec{q}}}'=(q_1,q_2,q_3)$ is called the vector part, with the standard addition and the multiplication

$$\begin{aligned} {{\varvec{q}}}\diamond {{\varvec{r}}}:=(r_0q_0 - {{\varvec{q}}}' \cdot {{\varvec{r}}}', q_0{{\varvec{r}}}' + r_0{{\varvec{q}}}' + {{\varvec{q}}}'\times {{\varvec{r}}}'), \end{aligned}$$

(49)

where $\cdot $ is the scalar product and $\times $ is the cross product in $\mathbb {R}^3$. The unit quaternions $\{q\in \mathbb {H}\mid q_0^2+q_1^2+q_2^2+q_3^2=1\}$ can be identified with $\mathbb {S}^3$. The inverse of ${{\varvec{q}}}\in \mathbb {S}^3$ with respect to $\diamond $ is ${\bar{{{\varvec{q}}}}} = (q_0,-q_1,-q_2,-q_3)$. The map

$$\begin{aligned} \phi :\mathbb {S}^3\rightarrow \textrm{SO}(3),\quad {{\varvec{q}}}\mapsto \textrm{R}_{{{{\varvec{q}}}'}/{\sqrt{1-q_0^2}}}(2\arccos (q_0)) \end{aligned}$$

is surjective and satisfies $\phi ^{-1}(\phi ({{\varvec{q}}}))=\{{{\varvec{q}}},-{{\varvec{q}}}\}$ for all ${{\varvec{q}}}\in \mathbb {S}^3$. It is a homomorphism in the sense that $\phi ({{\varvec{q}}}\diamond {{\varvec{r}}}) = \phi ({{\varvec{q}}})\phi ({{\varvec{r}}})$. By [36, (2.6)], the integral on $\textrm{SO}(3)$ is transformed via

$$\begin{aligned} \begin{aligned} \int _{\textrm{SO}(3)} f({{\varvec{Q}}}) \, \textrm{d}\sigma _{\textrm{SO}(3)}({{\varvec{Q}}})&= \int _{-1}^{1} f\circ \phi (q_0,{{\varvec{q}}}') 4 \\&\quad \sqrt{1-q_0^2} \, \textrm{d}q_0 \, \textrm{d}\sigma _{\mathbb {S}^2}\left( \sqrt{1-q_0^2}{{\varvec{q}}}' \right) \\&\overset{(9)}{=} 4 \int _{\mathbb {S}^3} f\circ \phi ({{\varvec{q}}}) \, \textrm{d}\sigma _{\mathbb {S}^3}({{\varvec{q}}}). \end{aligned} \end{aligned}$$

(50)

We denote the set of even probability measures on $\textrm{SO}(3)$ by

$$\begin{aligned} \mathcal {P}_{\textrm{even}}(\mathbb {S}^3) :=\{\mu \in \mathcal {P}(\mathbb {S}^3) \mid \mu (B) = \mu (-B) \,\forall B\in \mathcal {B}(\mathbb {S}^3)\}. \end{aligned}$$

Theorem C.1

Let $\mu ,\nu \in \mathcal {P}(\textrm{SO}(3))$ and $p\in [1,\infty )$. We define $ c:\mathbb {I}\rightarrow [0,\pi ], $ $ c(t) :=2\arccos \left| t\right| . $ Then

$$\begin{aligned} {{\,\textrm{SOSW}\,}}_p^p(\mu ,\nu ) = \int _{\mathbb {S}^3} {{\,\textrm{W}\,}}_p^p \left( c_\#\mathcal {U}_{{{\varvec{q}}}}(\mu \circ \phi ), c_\#\mathcal {U}_{{{\varvec{q}}}}(\nu \circ \phi )\right) \, \textrm{d}u_{\mathbb {S}^3}({{\varvec{q}}}). \nonumber \\ \end{aligned}$$

(51)

Proof

Let $\varvec{\xi }\in \mathbb {S}^3$, ${{\varvec{Q}}}\in \textrm{SO}(3)$ and $q\in \phi ^{-1}(\textrm{SO}(3))$. By the multiplication invariance of $\phi $, we have

$$\begin{aligned} d_{{\varvec{Q}}}\circ \phi (\varvec{\xi }) = \angle ({{\varvec{Q}}}^\top \phi (\varvec{\xi })) = \angle (\phi ({\bar{{{\varvec{q}}}}} \diamond \varvec{\xi })). \end{aligned}$$

We note that since the push-forward $\phi _\#$ is bijective from $\mathcal {P}_{\textrm{even}}(\mathbb {S}^3)$ to $\mathcal {P}(\textrm{SO}(3))$, and its inverse is given by $\mu \circ \phi \in \mathcal {P}_{\textrm{even}}(\mathbb {S}^3)$. Since $\angle (\phi ({{\varvec{r}}})) = 2\arccos \left| r_0\right| $ for any ${{\varvec{r}}}\in \mathbb {S}^3$, we obtain by (49)

$$\begin{aligned} d_{{\varvec{Q}}}\circ \phi (\varvec{\xi })= & {} 2\arccos \left| q_0 \xi _0+{{\varvec{q}}}'\cdot \varvec{\xi }'\right| = 2\arccos \left| {{\varvec{q}}}\cdot \varvec{\xi }\right| \\= & {} c\circ \mathcal {S}_{{\varvec{q}}}(\varvec{\xi }). \end{aligned}$$

Since $\phi _\# (\mu \circ \phi ) = \mu $, we obtain

$$\begin{aligned} {{\,\textrm{SOSW}\,}}_p^p(\mu ,\nu )&\overset{(35)}{=} \frac{1}{8\pi ^2} \int _{\textrm{SO}(3)} {{\,\textrm{W}\,}}_p^p\left( (d_{{\varvec{Q}}}\circ \phi )_\# (\mu \circ \phi ),\right. \\&\quad \left. (d_{{\varvec{Q}}}\circ \phi )_\# (\nu \circ \phi ) \right) \, \textrm{d}\sigma _{\textrm{SO}(3)}({{\varvec{Q}}}) \\&\overset{(50)}{=} \frac{4}{8\pi ^2} \int _{\mathbb {S}^3} {{\,\textrm{W}\,}}_p^p\left( (c\circ \mathcal {S}_{{\varvec{q}}})_\# (\mu \circ \phi ),\right. \\&\quad \left. (c\circ \mathcal {S}_{{\varvec{q}}})_\# (\nu \circ \phi ) \right) \, \textrm{d}\sigma _{\mathbb {S}^3}({{\varvec{q}}}). \end{aligned}$$

$\square $

The right-hand side of (51) mimics the parallelly sliced Wasserstein distance (30) on $\mathbb {S}^3$ between $\mu \circ \phi $ and $\nu \circ \phi $, except for the additional transformation c. We want to point out that this equivalence in Theorem C.1 holds only for even measures on $\mathbb {S}^3$, since we always have $\mu \circ \phi \in \mathcal {P}_{\textrm{even}}(\mathbb {S}^3)$.

Sliced Wasserstein Distances for Antipodal Point Measures

We study sliced Wasserstein barycenters of two antipodal Dirac measures $\mu _1 = \delta _{{{\varvec{e}}}^3}$ and $\mu _2 = \delta _{-{{\varvec{e}}}^3}$ on the sphere $\mathbb {S}^2$, as presented in Remark 5.1. This case exhibits some differences between the Wasserstein distance, the parallelly sliced Wasserstein distance and the semicircular sliced Wasserstein distance. We define the equator $C :=\{\varvec{\xi }\in \mathbb {S}^2\mid \xi _3=0\}$.

Proposition D.1

The 2-Wasserstein barycenters of the two antipodal Dirac measures $\mu _1$ and $\mu _2$ are the probability measures $\nu \in \mathcal {P}(\mathbb {S}^2)$ whose support is included in the equator C.

Proof

The Wasserstein barycenters of $\mu _1$ and $\mu _2$ on the sphere are the measures in $\mathcal {P}(\mathbb {S}^2)$ minimizing $\mathcal {E}_{{{\,\textrm{W}\,}}}(\nu ) :=\frac{1}{2}{{\,\textrm{W}\,}}_2^2(\nu , \mu _1) + \frac{1}{2}{{\,\textrm{W}\,}}_2^2(\nu , \mu _2)$. Let $\nu \in \mathcal {P}(\mathbb {S}^2)$. With the map ${{\,\textrm{zen}\,}}:\mathbb {S}^2\rightarrow [0,\pi ]$, $\varvec{\xi }\mapsto \arccos (\xi _3)$, we have

$$\begin{aligned} {{\,\textrm{W}\,}}_2^2(\nu , \delta _{{{\varvec{e}}}^3}) = \int _{\mathbb {S}^2} d_{\mathbb {S}^2}({\varvec{\psi }}, {{\varvec{e}}}^3)^2 \, \textrm{d}\nu ({\varvec{\psi }}) = \int _0^\pi t^2 \, \textrm{d}({{\,\textrm{zen}\,}})_\#\nu (t) \end{aligned}$$

and similarly

$$\begin{aligned} {{\,\textrm{W}\,}}_2^2(\nu , \delta _{-{{\varvec{e}}}^3}) = \int _0^\pi (\pi - t)^2 \ \, \textrm{d}({{\,\textrm{zen}\,}})_\#\nu (t). \end{aligned}$$

Hence,

$$\begin{aligned} \mathcal {E}_{{{\,\textrm{W}\,}}}(\nu ) = \frac{1}{2}\int _0^\pi \big [t^2 + (\pi - t)^2\big ]\, \textrm{d}({{\,\textrm{zen}\,}})_\#\nu (t). \end{aligned}$$

The integrand has a unique minimizer $t=\frac{\pi }{2}$ and its minimum is $\frac{\pi ^2}{4}$. Therefore, $\mathcal {E}_{{{\,\textrm{W}\,}}}(\nu )\ge \frac{\pi ^2}{4}$ for all $\nu \in \mathcal {P}(\mathbb {S}^2)$ with equality if and only if $({{\,\textrm{zen}\,}})_\#\nu (dt) = \delta _{\frac{\pi }{2}}$, i.e., if $\nu (C) = 1$. $\square $

However, this observation does not apply to the parallelly sliced Wasserstein barycenters.

Proposition D.2

All probability measures on the sphere are parallelly sliced Wasserstein barycenters of $\mu _1$ and $\mu _2$.

Proof

Let $\nu \in \mathcal {P}(\mathbb {S}^2)$. We have

$$\begin{aligned} \mathcal {E}_V(\nu ) :={}&\frac{1}{2}{{\,\textrm{PSW}\,}}_2^2(\nu , \delta _{{{\varvec{e}}}^3}) + \frac{1}{2}{{\,\textrm{PSW}\,}}_2^2(\nu , \delta _{-{{\varvec{e}}}^3})\\ ={}&\frac{1}{2}\int _{\mathbb {S}^2}\Big [{{\,\textrm{W}\,}}_2^2\big (\mathcal {U}_{\varvec{\psi }}\nu , \mathcal {U}_{\varvec{\psi }}\delta _{{{\varvec{e}}}^3}\big ) + {{\,\textrm{W}\,}}_2^2\big (\mathcal {U}_{\varvec{\psi }}\nu , \mathcal {U}_{\varvec{\psi }}\delta _{-{{\varvec{e}}}^3}\big )\Big ]\\&\quad \, \textrm{d}u_{\mathbb {S}^2}({\varvec{\psi }}). \end{aligned}$$

Using that $\mathcal {U}_{\varvec{\psi }}\delta _{\pm {{\varvec{e}}}^3} = \delta _{\left\langle {\varvec{\psi }}, \pm {{\varvec{e}}}^3\right\rangle } = \delta _{\pm \psi _3}$ and $\mathcal {U}_{\varvec{\psi }}\nu = (\mathcal {S}_{\varvec{\psi }})_\#\nu $ for all ${\varvec{\psi }}\in \mathbb {S}^2$, we have

$$\begin{aligned} \mathcal {E}_V(\nu ) ={}&\frac{1}{2}\int _{\mathbb {S}^2} \left[ \int _{-1}^1\left| t - \left\langle {\varvec{\psi }}, {{\varvec{e}}}^3\right\rangle \right| ^2 \, \textrm{d}(\mathcal {S}_{{\varvec{\psi }}})_\#\nu (t)\right. \\&\quad \left. + \int _{-1}^1\left| t + \left\langle {\varvec{\psi }}, {{\varvec{e}}}^3\right\rangle \right| ^2 \, \textrm{d}(\mathcal {S}_{{\varvec{\psi }}})_\#\nu (t)\right] \, \textrm{d}u_{\mathbb {S}^2}({\varvec{\psi }})\\ ={}&\frac{1}{2}\int _{\mathbb {S}^2} \int _{-1}^1\Big [2t^2 + 2 \left\langle {\varvec{\psi }}, {{\varvec{e}}}^3\right\rangle ^2\Big ] \, \textrm{d}(\mathcal {S}_{{\varvec{\psi }}})_\#\nu (t)\ u_{\mathbb {S}^2}(d{\varvec{\psi }}). \end{aligned}$$

Using (9) and the rotation invariance of the spherical integral, we have

$$\begin{aligned} \int _{\mathbb {S}^2}\left\langle {\varvec{\psi }}, \varvec{\xi }\right\rangle ^2\, \textrm{d}u_{\mathbb {S}^2}({\varvec{\psi }}) = \frac{1}{2}\int _{-1}^1 t^2\, \textrm{d}t = \frac{1}{3} \qquad \forall {\varvec{\psi }}\in \mathbb {S}^2. \end{aligned}$$

Hence, we obtain

$$\begin{aligned} \mathcal {E}_V(\nu ) = \int _{\mathbb {S}^2} \int _{\mathbb {S}^2} \left[ \left\langle {\varvec{\psi }}, \varvec{\xi }\right\rangle ^2 + \left\langle {\varvec{\psi }}, {{\varvec{e}}}^3\right\rangle ^2 \right] \, \textrm{d}u_{\mathbb {S}^2}({\varvec{\psi }}) \, \textrm{d}\nu (\varvec{\xi }) = \frac{2}{3}. \end{aligned}$$

$\square $

Let us now consider the case of the semicircular sliced Wasserstein distance. Such slicing operator is much harder to manipulate. Therefore, we did not manage to determine the semicircular sliced Wasserstein barycenters of $\mu _1$ and $\mu _2$, but we can show that the observation made in Proposition D.2 does not hold in the case of the ${{\,\textrm{SSW}\,}}$ distance.

Proposition D.3

The uniform probability measure on the sphere is not a semicircular sliced 2-Wasserstein barycenter. In particular, considering $\chi $, the uniform probability measure on the equator C and $u_{\mathbb {S}^2}$ the uniform probability measure on $\mathbb {S}^2$, we have

$$\begin{aligned}{} & {} \tfrac{1}{2}{{\,\textrm{SSW}\,}}_2^2(\mu _1, \chi ) + \tfrac{1}{2} {{\,\textrm{SSW}\,}}_2^2(\mu _2, \chi ) \\{} & {} \quad < \tfrac{1}{2}{{\,\textrm{SSW}\,}}_2^2(\mu _1, u_{\mathbb {S}^2}) + \tfrac{1}{2} {{\,\textrm{SSW}\,}}_2^2(\mu _2, u_{\mathbb {S}^2}). \end{aligned}$$

Proof

Recall the circle $\mathbb {T}=\mathbb {R}/ 2\pi \mathbb {Z}$. For any $x \in \mathbb {R}$, we note $[x] = x + 2\pi \mathbb {Z}$ the equivalence class of x and for any $\gamma \in \mathbb {T}$ we note ${\tilde{\gamma }}$ its representative in $[0, 2\pi [$. Let $\mu \in \mathcal {P}(\mathbb {S}^2)$. The semicircular sliced Wasserstein barycenters are the minimizers of the functional

$$\begin{aligned} \mathcal {E}_{S}(\mu ) :=\frac{1}{2} {{\,\textrm{SSW}\,}}_2^2(\mu , \delta _{{{\varvec{e}}}^3}) + \frac{1}{2} {{\,\textrm{SSW}\,}}_2^2(\mu , \delta _{-{{\varvec{e}}}^3}), \end{aligned}$$

where

$$\begin{aligned} {{\,\textrm{SSW}\,}}_2^2(\mu , \delta _{{{\varvec{e}}}^3}) = \int _{\mathbb {S}^2} {{\,\textrm{W}\,}}_2^2(\mathcal {A}_{{\varvec{\psi }}\#}\mu , \mathcal {A}_{{\varvec{\psi }}\#}\delta _{{{\varvec{e}}}^3})\, \textrm{d}u_{\mathbb {S}^2}({\varvec{\psi }}) \end{aligned}$$

and the slicing operator $\mathcal {A}_{\varvec{\psi }}$ is given in Remark 3.8. Let ${\varvec{\psi }}= \Phi (\varphi , \theta ) \in \mathbb {S}^2$. Since we integrate over $\mathbb {S}^2$, we can assume ${\varvec{\psi }}\notin \{\pm {{\varvec{e}}}^3\}$. Then we have $(\mathcal {A}_{\varvec{\psi }})_\#\delta _{{{\varvec{e}}}^3} = \delta _{[\pi ]}$ and $(\mathcal {A}_{\varvec{\psi }})_\# \delta _{-{{\varvec{e}}}^3} = \delta _{[0]}$. Therefore,

$$\begin{aligned} {{\,\textrm{W}\,}}_2^2(\mathcal {A}_{{\varvec{\psi }}\#}\mu , \mathcal {A}_{{\varvec{\psi }}\#}\delta _{{{\varvec{e}}}^3})= & {} \int _\mathbb {T}|{\tilde{\gamma }} - \pi |^2 \, \textrm{d}\mathcal {A}_{{\varvec{\psi }}\#}\mu (t) \nonumber \\= & {} \int _{\mathbb {S}^2} \left| {\tilde{\mathcal {A}}}_{\varvec{\psi }}(\varvec{\xi }) - \pi \right| ^2 \, \textrm{d}\mu (\varvec{\xi }). \end{aligned}$$

(52)

Using spherical coordinates $\varvec{\xi }= {{\,\mathrm{\Phi }\,}}(\alpha ,\beta )$, we see that

$$\begin{aligned} \mathcal {A}_{{\varvec{\psi }}}(\varvec{\xi })= & {} {{\,\textrm{azi}\,}}\big ({{\,\mathrm{\Psi }\,}}(0,-\theta ,-\varphi ) {{\,\mathrm{\Phi }\,}}(\alpha ,\beta )\big ) \\= & {} {{\,\textrm{azi}\,}}\big ({{\,\mathrm{\Psi }\,}}(0,-\theta ,-\varphi +\alpha ) {{\,\mathrm{\Phi }\,}}(0,\beta )\big ) \\= & {} \mathcal {A}_{{{\,\mathrm{\Phi }\,}}(\varphi -\alpha ,\theta )} ({{\,\mathrm{\Phi }\,}}(0,\beta )). \end{aligned}$$

Hence, with the substitution $\varphi \mapsto \varphi -\alpha $, we have

$$\begin{aligned} {{\,\textrm{SSW}\,}}_2^2(\mu ,\delta _{{{\varvec{e}}}^3})&= \frac{1}{4\pi } \int _{\mathbb {S}^2} \int _{0}^{\pi } \int _{0}^{2\pi }\left| {\tilde{\mathcal {A}}}_{{{\,\mathrm{\Phi }\,}}(\varphi -\alpha ,\theta )} ({{\,\mathrm{\Phi }\,}}(0,\beta )) - \pi \right| ^2 \\&\quad \sin (\theta ) \, \textrm{d}\varphi \, \textrm{d}\theta \, \textrm{d}\mu ({{\,\mathrm{\Phi }\,}}(\alpha ,\beta ))\\&= \frac{1}{4\pi } \int _{\mathbb {S}^2} \int _{0}^{\pi } \int _{0}^{2\pi } \left| {\tilde{\mathcal {A}}}_{{{\,\mathrm{\Phi }\,}}(\varphi ,\theta )} ({{\,\mathrm{\Phi }\,}}(0,\beta )) - \pi \right| ^2 \\&\quad \sin (\theta ) \, \textrm{d}\varphi \, \textrm{d}\theta \, \textrm{d}\mu ({{\,\mathrm{\Phi }\,}}(\alpha ,\beta ))\\&= \frac{1}{4\pi } \int _{\mathbb {S}^2} \int _{0}^{\pi } \int _{0}^{2\pi } \left| {\tilde{\mathcal {A}}}_{{{\,\mathrm{\Phi }\,}}(\varphi ,\theta )} ({{\,\mathrm{\Phi }\,}}(0,\beta )) - \pi \right| ^2\\&\quad \sin (\theta ) \, \textrm{d}\varphi \, \textrm{d}\theta \, \textrm{d}{{\,\textrm{zen}\,}}_\#\mu (\beta ). \end{aligned}$$

Introducing the functions

$$\begin{aligned}{} & {} F_1:[0, \pi ]\rightarrow \mathbb {R}, \\{} & {} \beta \mapsto \frac{1}{4\pi } \int _0^{2\pi }\int _0^\pi \left| {\tilde{\mathcal {A}}}_{\Phi (-\varphi , \theta )}(\Phi (0, \beta )) - \pi \right| ^2 \sin (\theta ) \, \textrm{d}\theta \, \textrm{d}\varphi , \end{aligned}$$

and

$$\begin{aligned} F :[0, \pi ]\rightarrow \mathbb {R}, \quad \beta \mapsto \tfrac{1}{2} F_1(\beta ) + \tfrac{1}{2} F_1(\pi -\beta ), \end{aligned}$$

we obtain by using the symmetry that

$$\begin{aligned} \mathcal {E}_S(\mu )= & {} \tfrac{1}{2} {{\,\textrm{SSW}\,}}_p^p(\mu , \delta _{{{\varvec{e}}}^3}) + \tfrac{1}{2} {{\,\textrm{SSW}\,}}_p^p(\mu , \delta _{-{{\varvec{e}}}^3})\\= & {} \int _0^\pi F(\beta ) \, \textrm{d}{{\,\textrm{zen}\,}}_\#\mu (\beta ). \end{aligned}$$

Let $\beta \in [0,\pi ]$. We study $f_\beta (\varphi , \theta ) :={\tilde{\mathcal {A}}}_{\Phi (-\varphi , \theta )}(\Phi (0, \beta ))$. For $\theta \in (0, \pi )$ and $\varphi \in (0, 2\pi ) {\setminus } \{\pi \}$, we have

We identify some symmetries. For $\varphi \in (0, 2\pi ) {\setminus } \{\pi \}$, and $\theta \in (0, \pi )$, we have

$$\begin{aligned}{} & {} f_\beta (2\pi - \varphi , \theta ) = 2\pi - f_\beta (\varphi , \theta ) \quad \text { or }\\{} & {} f_\beta (2\pi - \varphi , \theta ) = f_\beta (\varphi , \theta ). \end{aligned}$$

In both cases, $(f_\beta (2\pi - \varphi , \theta ) - \pi )^2 = (f_\beta (\varphi , \theta ) - \pi )^2$. Moreover, for $\varphi , \theta \in (0, \pi )$,

$$\begin{aligned}{} & {} f_\beta (\varphi , \pi - \theta ) = f_\beta (\pi - \varphi , \theta )\quad \text { and }\nonumber \\{} & {} f_{\pi - \beta }(\varphi , \theta ) = \pi - f_\beta (\pi - \varphi , \theta ). \end{aligned}$$

(53)

Hence, we have

$$\begin{aligned} F_1(\beta )&= \frac{1}{\pi } \int _0^\pi \int _0^{\frac{\pi }{2}} (f_\beta (\varphi , \theta ) - \pi )^2 \sin (\theta ) \, \textrm{d}\theta \, \textrm{d}\varphi \quad \text {and}\\ F_1(\pi -\beta )&= \frac{1}{\pi }\int _0^\pi \int _0^{\frac{\pi }{2}} f_\beta (\varphi , \theta )^2 \sin (\theta ) \, \textrm{d}\theta \, \textrm{d}\varphi . \end{aligned}$$

Eventually, F is given by

$$\begin{aligned} F(\beta ) = \frac{1}{\pi }\int _0^\pi \int _0^{\frac{\pi }{2}} \left( f_\beta (\varphi , \theta ) - \frac{\pi }{2}\right) ^2 \sin (\theta ) \, \textrm{d}\theta \, \textrm{d}\varphi \ + \frac{\pi ^2}{4}, \end{aligned}$$

where, for $\beta , \varphi , \theta \in (0, \pi )$, we have

$$\begin{aligned} f_\beta (\varphi , \theta ) = \frac{\pi }{2} - \arctan \left( \cos (\theta )\cot (\varphi ) - \frac{\sin (\theta )}{\sin (\varphi )}\cot (\beta )\right) . \end{aligned}$$

Let us now focus on the two particular cases of the theorem. Let $\chi $ be a measure supported by the equator. By the symmetry (53), we have

$$\begin{aligned} \mathcal {E}_S(\chi ) = F\left( \frac{\pi }{2}\right)&= \frac{1}{\pi }\int _0^\pi \int _0^{\frac{\pi }{2}} \left( f_{\frac{\pi }{2}}(\varphi , \theta ) - \frac{\pi }{2}\right) ^2 \\&\quad \sin (\theta ) \, \textrm{d}\theta \, \textrm{d}\varphi \ + \frac{\pi ^2}{4}\\&= \frac{2}{\pi }\int _0^{\frac{\pi }{2}} \int _0^{\frac{\pi }{2}} \left( \arctan \left( \frac{\tan (\varphi )}{\cos (\theta )}\right) - \frac{\pi }{2}\right) ^2 \\&\quad \sin (\theta ) \, \textrm{d}\theta \, \textrm{d}\varphi \ + \frac{\pi ^2}{4}. \end{aligned}$$

Since $\arctan \left( \frac{\tan (\varphi )}{\cos (\theta )}\right) > \varphi $ for any $\varphi , \theta \in (0, \frac{\pi }{2})$, and $t \mapsto \left( t-\frac{\pi }{2}\right) ^2$ is strictly decreasing on $[0, \frac{\pi }{2}]$, we obtain

$$\begin{aligned} \mathcal {E}_S(\chi )< & {} \frac{2}{\pi }\int _0^{\frac{\pi }{2}} \int _0^{\frac{\pi }{2}} \left( \varphi - \frac{\pi }{2}\right) ^2 \sin (\theta ) \, \textrm{d}\theta \, \textrm{d}\varphi \\{} & {} + \frac{\pi ^2}{4} = \frac{\pi ^2}{12} + \frac{\pi ^2}{4} = \frac{\pi ^2}{3}. \end{aligned}$$

For the uniform measure $u_{\mathbb {S}^2}$ on the sphere, we have $\mathcal {A}_{{\varvec{\psi }}\#}u_{\mathbb {S}^2} = u_\mathbb {T}$ for any ${\varvec{\psi }}\in \mathbb {S}^2$. From the first equality of (52), we obtain

$$\begin{aligned} {{\,\textrm{SSW}\,}}_2^2(u_{\mathbb {S}^2}, \delta _{{{\varvec{e}}}^3})&= \frac{1}{2\pi } \int _{\mathbb {S}^2} \int _0^{2\pi } |t - \pi |^2 \, \textrm{d}t \, \textrm{d}u_{\mathbb {S}^2}({\varvec{\psi }}) = \frac{\pi ^2}{3}. \end{aligned}$$

By symmetry, we have $\mathcal {E}_S(u) = \frac{\pi ^2}{3}$, and therefore, $\mathcal {E}_S(u) > \mathcal {E}_S(\chi )$. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Quellmalz, M., Buecher, L. & Steidl, G. Parallelly Sliced Optimal Transport on Spheres and on the Rotation Group. J Math Imaging Vis (2024). https://doi.org/10.1007/s10851-024-01206-w

Download citation

Received: 06 February 2024
Accepted: 03 July 2024
Published: 26 July 2024
DOI: https://doi.org/10.1007/s10851-024-01206-w

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Parallelly Sliced Optimal Transport on Spheres and on the Rotation Group

Abstract

Similar content being viewed by others

The Slice Approximating Property and Figiel-Type Problem on Unit Spheres

The Vertical Slice Transform on the Unit Sphere

Lattice points in d-dimensional spherical segments

1 Introduction

2 Preliminaries

2.1 Measures and OT

2.2 Sphere

2.3 Rotation Group

3 Sliced OT on the Sphere

3.1 Parallel Slice Transform of Functions

Proposition 3.1

Theorem 3.2

Proof

Theorem 3.3

Proof

Theorem 3.4

Proof

3.2 Parallel Slice Transform of Measures

Proposition 3.5

Proof

Proposition 3.6

Proof

3.3 Spherical Sliced Wasserstein Distance

Theorem 3.7

Remark 3.8

4 Sliced OT on SO(3)

Proposition 4.1

Proof

4.1 A Two-Dimensional Radon Transform on SO(3)

Theorem 4.2

Corollary 4.3

Proof

Theorem 4.4

Proof

4.2 Slicing of Measures

Proposition 4.5

Proposition 4.6

Theorem 4.7

4.3 Sliced Wasserstein Distance on SO(3)

Theorem 4.8

Proof

5 Barycenter Algorithms

5.1 Sliced Wasserstein Barycenters

Remark 5.1

5.1.1 Free-Support Discretization

Theorem 5.2

Proof

Remark

5.1.2 Fixed-Support Discretization

Theorem 5.3

Proof

5.2 Radon Wasserstein Barycenters

6 Numerical Results

6.1 Free-Support Sliced Wasserstein Barycenters on the sphere

6.2 Fixed-Support Sliced Wasserstein Barycenters

6.3 Radon Wasserstein Barycenters on the Sphere

6.4 Sliced Barycenters on the Rotation Group

7 Conclusions

Data Availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Metric Properties of the Parallelly Sliced Wasserstein Distance

Lemma A.1

Proof

Lemma A.2

Proof