1 Introduction and basic notation

In modern mathematical language, the p-th power of the Wasserstein distance between two probability measures over \({\mathbb {R}}^d\), namely \(\mu \) and \(\nu \), is defined as

$$\begin{aligned} W_{p}^p(\mu ,\nu )=\min _{\pi \in \Pi (\mu ,\nu )}\int _{{\mathbb {R}}^d\times {\mathbb {R}}^d}l_p^p(x,y)d\pi , \end{aligned}$$
(1)

where \(l_p^p(x,y)=\sum _{i=1}^d \mid x_i-y_i\mid ^p\) and \(\Pi (\mu ,\nu )\) is the set of measures over \({\mathbb {R}}^{d}\times {\mathbb {R}}^d\) whose marginals are \(\mu \) and \(\nu \), [1].

Due to its ability of capturing the weak topology of the space of probability measures, the family of \(W_p^p\) distances has found a natural home in many applied fields, such as Computer Vision [2, 3], generative models [4,5,6], and clustering [7, 8]. For this reason, much effort has been spent to find cheap ways to compute the value of \(W_p^p\) given two measures. When \(\mu \) and \(\nu \) are discrete measures, the minimization problem (1) can be cast as an Linear Programming (LP) problem. Due to the separability of the \(l_p^p\) cost functions, it is possible to lower the complexity of these LP problems, [9, 10]; however, for many applied tasks, this is yet not enough to make \(W_p^p\) an efficient alternative to other metrics. Therefore several cheap-to-compute alternatives to the Wasserstein distance have been proposed: some approaches rely on adding an entropy regularization term [11,12,13] to the objective of (1), while other approaches considers topological equivalent alternatives, like, for example, the Fourier Based metrics [14, 15]. Another successful alternative is the Sliced Wasserstein Distance (SWD) [16, 17]. The SWD computes the distance between two measures by comparing their projection on all possible affine 1-dimensional sub-spaces of \({\mathbb {R}}^d\). Since the Wasserstein distance between measures supported over a line can be computed through an explicit formula, the SWD can be computed without solving a minimization problem.

In this note, we propose a general methodology to relate the original Wasserstein distances to the SWDs. First, we show that, when both the probability measures are supported over \({\mathbb {R}}^2\), the \(W_2^2\) distance can be represented as follows

$$\begin{aligned} W_2^2(\mu ,\nu )=\dfrac{1}{\pi }\int _0^{2\pi }\bigg (\int _{\mathbb {R}}W_2^2(\mu _{\mid x^{(\theta )}_2},\zeta ^{(\theta )}_{\mid x^{(\theta )}_2})d\mu _2^{(\theta )}\bigg ) d\theta , \end{aligned}$$

where \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) is a suitable family of measures on \({\mathbb {R}}^2\) (see Theorem 2). We call this identity Radiant formula and use it to find equivalence bounds between the classic Wasserstein distances and their sliced counterparts. We then extend these results to the case \(p\ne 2\) and use the Knothe-Rosenblatt heuristic transportation plan [18, 19] to provide an upper bound on the absolute error between the SWD and \(W^p_p\). Finally, we extend our results to any \({\mathbb {R}}^d\), with \(d\ge 2\).

2 Our contribution

For the sake of clarity, we first introduce our results for measures supported over \({\mathbb {R}}^2\) and then extend our findings to the higher dimensional setting in a dedicated subsection. In what follows, we denote with \({\mathcal {P}}({\mathbb {R}}^d)\) the set of Borel probability measures over \({\mathbb {R}}^d\).

The Radiant formula

As a starting point of our discussion, we show that any \(W_p\) distance can be computed by summing the averages two one-dimensional Wasserstein distances between \(\mu \) and two suitable probability measures.

Proposition 1

Let \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\) and \(p\ge 1\). Then, there exists a couple of measures \((\zeta ,\eta )\),Footnote 1 such that \(\zeta \in \Pi (\nu _1,\mu _2)\), \(\eta \in \Pi (\mu _1,\nu _2)\), and

$$\begin{aligned} W^p_{p}(\mu ,\nu )=\int _{\mathbb {R}}W^p_p(\mu _{\mid x_1},\eta _{\mid x_1})d\mu _1+\int _{\mathbb {R}}W^p_p(\mu _{\mid x_2},\zeta _{\mid x_2})d\mu _2, \end{aligned}$$
(2)

where \(\lambda _i\) is the marginal of \(\lambda \) on the i-th coordinate and \(\lambda _{\mid x_i}\) is the conditional law of \(\lambda \) given \(x_i\).

Proof

Let \(\gamma \in {\mathcal {P}}({\mathbb {R}}^2\times {\mathbb {R}}^2)\) be an optimal transportation plan between \(\mu \) and \(\nu \). We then have that

$$\begin{aligned} W_p^p(\mu ,\nu )&=\int _{{\mathbb {R}}^2\times {\mathbb {R}}^2}\Big (\mid x_1-y_1 \mid ^p+\mid x_2-y_2 \mid ^p \Big )d\gamma \\&=\int _{{\mathbb {R}}^2\times {\mathbb {R}}}\mid x_1-y_1 \mid ^p df + \int _{{\mathbb {R}}\times {\mathbb {R}}^2}\mid x_2-y_2 \mid ^p dg, \end{aligned}$$

where f is the marginal of \(\gamma \) on \((x_1,x_2,y_1)\) e and g is the marginal of \(\gamma \) over \((x_1,x_2,y_2)\). Finally, let \(\eta \) and \(\zeta \) be the marginals of \(\gamma \) over \((x_1,y_2)\) and \((y_1,x_2)\), respectively. Since \(\gamma \) is optimal, from [20], we have that

$$\begin{aligned} \int _{{\mathbb {R}}^2\times {\mathbb {R}}}\mid x_1-y_1 \mid ^p df=\int _{{\mathbb {R}}}W_{p}^p(\mu _{\mid x_2},\zeta _{\mid x_2})d\mu _2. \end{aligned}$$
(3)

Similarly, we have that \(\int _{{\mathbb {R}}^2\times {\mathbb {R}}}\mid x_2-y_2 \mid ^p dg=\int _{{\mathbb {R}}}W_{p}^p(\mu _{\mid x_1},\eta _{\mid x_1})d\mu _1\), which concludes the proof. \(\square \)

Let us set \(V_\theta =\{v_\theta ,v_{\theta ^\perp }\}\), where \(v_{\theta }=(\cos (\theta ),\sin (\theta ))\) and \(v_{\theta ^\perp }=(-\sin (\theta ),\cos (\theta ))\). We notice that, for every \(\theta \in [0,2\pi ]\), \(V_\theta \) is the basis of \({\mathbb {R}}^2\) obtained by applying a \(\theta \)-counterclockwise rotation of the canonical base \(V=\{e_1,e_2\}\). In what follows, we denote with \((x^{(\theta )}_1,x^{(\theta )}_2)\) the coordinates of \({\mathbb {R}}^2\) with respect to the base \(V_\theta \). Moreover, we denote with \(\mu _1^{(\theta )}\) and \(\mu _2^{(\theta )}\) the marginals of \(\mu \) on \(x_1^{(\theta )}\) and \(x_2^{(\theta )}\), respectively. In this framework, given \(p\in [1,\infty )\), the Sliced Wasserstein Distance is defined as follows

$$\begin{aligned} SW_p^p(\mu ,\nu )=\dfrac{1}{2\pi }\int _{0}^{2\pi }W_p^p(\mu ^{(\theta )}_1,\nu ^{(\theta )}_1)d\theta . \end{aligned}$$
(4)

Finally, we denote with \(R_\theta :{\mathbb {R}}^2\rightarrow {\mathbb {R}}^2\) the rotation that satisfies \(R_\theta (e_1)=v_\theta \) and \(R_\theta (e_2)=v_{\theta ^\perp }\) and with \(\mu ^{(\theta )}:=(R_\theta )_\#\mu \) the push-forward of \(\mu \) through \(R_\theta \).Footnote 2 Notice that, according to our notation, the marginal of \(\mu ^{(\theta )}\) over the first coordinat coincides with \(\mu ^{(\theta )}_1\).

Theorem 2

(The Radiant Formula) Let \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\). Then there exists a family of measures \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) such that, for every \(\theta \in [0,2\pi ]\), it holds \(\zeta ^{(\theta )}\in \Pi (\mu _2^{(\theta )},\nu _1^{(\theta )})\) and

$$\begin{aligned} W_2^2(\mu ,\nu )=\dfrac{1}{\pi }\int _0^{2\pi }\bigg (\int _{\mathbb {R}}W_2^2(\mu _{\mid x^{(\theta )}_2},\zeta ^{(\theta )}_{\mid x^{(\theta )}_2})d\mu _2^{(\theta )}\bigg ) d\theta , \end{aligned}$$
(5)

where \(\zeta ^{(\theta )}_{\mid x^{(\theta )}_2}\) is the conditional law of \(\zeta ^{(\theta )}\) given \(x^{(\theta )}_2\). Moreover, if both \(\mu \) and \(\nu \) are absolutely continuous, the family \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) is unique.

Proof

First, we notice that the \(W_2^2\) distance between two measures \(\mu \) and \(\nu \) is preserved if we apply a rotation to \({\mathbb {R}}^2\). Indeed, if \(\gamma \) is an optimal transportation plan between \(\mu \) and \(\nu \), the plan \((R_\theta ,R_\theta )_{\#}\gamma \) is optimal between \((R_\theta )_{\#}\mu \) and \((R_\theta )_{\#}\nu \). This is due to the fact that \(l_2^2(x,y)=l_2^2(R_\theta (x),R_\theta (y))\) for every \(x,y\in {\mathbb {R}}^2\) and for every \(\theta \in [0,2\pi ]\).

Given \(\theta \in [0,2\pi ]\), Proposition 1 gives us a couple of measures, namely \(\eta ^{(\theta )}\) and \(\zeta ^{(\theta )}\) such that

$$\begin{aligned} W^p_{p}(\mu ,\nu )=\int _{\mathbb {R}}W^p_p(\mu _{\mid x^{(\theta )}_1},\eta ^{(\theta )}_{\mid x^{(\theta )}_1})d\mu ^{(\theta )}_1+\int _{\mathbb {R}}W^p_p(\mu _{\mid x^{(\theta )}_2},\zeta ^{(\theta )}_{\mid x^{(\theta )}_2})d\mu ^{(\theta )}_2, \end{aligned}$$
(6)

Since a \(\frac{\pi }{2}\)-counterclockwise rotation swaps the basis vectors in any \(V_\theta \) base (i.e. \(v_\theta \) and \(v_{\theta ^\perp }\)), we have that \(\zeta ^{(\theta +\frac{\pi }{2})}=\eta ^{(\theta )}\). Thus, we have

$$\begin{aligned} \int _{\mathbb {R}}W^p_p(\mu _{\mid x^{(\theta )}_1},\eta ^{(\theta )}_{\mid x^{(\theta )}_1})d\mu ^{(\theta )}_1 =\int _{\mathbb {R}}W^2_2(\mu _{\mid x_2^{(\theta +\frac{\pi }{2})}},\zeta _{\mid x_2^{(\theta +\frac{\pi }{2})}}^{(\theta +\frac{\pi }{2})})d\mu _2^{(\theta +\frac{\pi }{2})}, \end{aligned}$$
(7)

for each \(\theta \in (0,2\pi ]\). By substituting (7) in (6), we find

$$\begin{aligned} W^2_2(\mu ,\nu )&=\int _{\mathbb {R}}W^2_2(\mu _{\mid x^{(\theta )}_2},\zeta _{\mid x^{(\theta )}_2}^{(\theta )})d\mu _2^{(\theta )} + \int _{\mathbb {R}}W^2_2(\mu _{\mid x_2^{(\theta +\frac{\pi }{2})}},\zeta _{\mid x_2^{(\theta +\frac{\pi }{2})}}^{(\theta +\frac{\pi }{2})})d\mu _2^{(\theta +\frac{\pi }{2})}. \end{aligned}$$
(8)

Since (8) holds true for every \(\theta \in [0,2\pi ]\), we can take the integral media and retrieve the radiant formula

$$\begin{aligned} W^2_2&(\mu ,\nu )=\dfrac{1}{2\pi }\int _{[0,2\pi ]}W^2_2(\mu ,\nu )d\theta \\&=\dfrac{1}{2\pi }\int _{[0,2\pi ]}\Big (\int _{\mathbb {R}}W^2_2(\mu _{\mid x^{(\theta )}_2},\zeta _{\mid x^{(\theta )}_2}^{(\theta )})d\mu _2^{(\theta )} + \int _{\mathbb {R}}W^2_2(\mu _{\mid x^{(\theta +\frac{\pi }{2})}_2},\zeta _{\mid x^{(\theta +\frac{\pi }{2})}_2}^{(\theta +\frac{\pi }{2})})d\mu _2^{(\theta +\frac{\pi }{2})}\Big ) d\theta \\&= \dfrac{1}{\pi }\int _{[0,2\pi ]} \int _{\mathbb {R}}W^2_2(\mu _{\mid x_2^{(\theta )}},\zeta _{\mid x^{(\theta )}_2}^{(\theta )})d\mu _2^{(\theta )} d\theta . \end{aligned}$$

Finally, the uniqueness result follows from the uniqueness of the transportation plan between absolutely continuous probability measures, [21]. \(\square \)

Fig. 1
figure 1

An example of the family \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) for two Dirac’s deltas (we represent \(\mu \) with the red dot and \(\nu \) with the blue dot). The white dots represent a different \(\zeta ^{(\theta )}\) for different values of \(\theta \). The arrows represent the different flows \(f^{(\theta )}\) and \(g^{(\theta )}\). Arrows with the same colour are used to connect \(\mu \) to \(\zeta ^{(\theta )}\) and \(\zeta ^{(\theta +\frac{\pi }{2})}\) (color figure online)

In Fig. 1, we give a visual example of the family \(\{\zeta ^{(\theta )}\}_{\theta \in [0,2\pi ]}\) for two Dirac’s deltas.

To prove Theorem 2, we made use of the rotation invariance property of \(W_2^2\). This property, however, does not hold for \(W_p^p\), which prevents us from expressing \(W_p^p\) using a radiant formula. However, we bypass this issue by defining a rotation-averaged version of the \(W_p\) distance as follows

$$\begin{aligned} RW_p^p(\mu ,\nu ):=\frac{1}{2\pi }\int _{0}^{2\pi }W_p^p(\mu ^{(\theta )},\nu ^{(\theta )})d\theta . \end{aligned}$$
(9)

We notice that \(RW_2(\mu ,\nu )=W_2(\mu ,\nu )\) for every \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\).

Proposition 3

The function \(RW_p\) defined in (9) is a distance over \({\mathcal {P}}({\mathbb {R}}^2)\), and it is invariant under rotation of the coordinates. Furthermore, it holds

$$\begin{aligned} RW_p^p(\mu ,\nu )\le \frac{K_p}{2\pi } W^p_{1,p}(\mu ,\nu ) \le n^{(\frac{p}{2}-1)_+}\frac{K_p}{2\pi } W^p_{p}(\mu ,\nu ), \end{aligned}$$
(10)

where \((\circ )_+\) is the positive part function, \(K_p:=2\int _0^{2\pi }\mid \cos (x)\mid ^p dx\) and

$$\begin{aligned} W^p_{1,p}(\mu ,\nu ):=\min _{\pi \in \Pi (\mu ,\nu )}\int _{{\mathbb {R}}^d\times {\mathbb {R}}^d}l_1^p(x,y)d\pi . \end{aligned}$$

Thus, up to a constant, \(RW_p^p(\mu ,\nu )\) is dominated by \(W_p^p\) and \(W_{1,p}\). Moreover, the first upper bound is tight.

Proof

We divide the proof the proposition into three pieces.

\(RW_p\) is invariant under rotations It follows from the fact that \(RW_p\) is defined as the average of the costs with respect to all the possible choices of coordinates. Indeed, given \(\phi \in [0,2\pi ]\), let \(\mu ^{(\phi )}=(R_\phi )_\#\mu \) and \(\nu ^{(\phi )}=(R_\phi )_\#\nu \). Then, it holds \((\mu ^{(\phi )})^{(\theta )}=\mu ^{(\theta +\phi )}\); thus

$$\begin{aligned} RW_p^p(\mu ^{(\phi )},\nu ^{(\phi )})&=\dfrac{1}{2\pi }\int _0^{2\pi }W_p^p(\mu ^{(\phi +\theta )},\nu ^{(\phi +\theta )})d\theta \\&=\dfrac{1}{2\pi }\int _0^{2\pi }W_p^p(\mu ^{(\theta ')},\nu ^{(\theta ')})d\theta ' =RW_p^p(\mu ,\nu ), \end{aligned}$$

where we used the change of variable \(\theta '=\theta +\phi \).

\(RW_p\) is a distance First, notice that \(RW_p\) is symmetric since \(W_p\) is symmetric. Similarly, if \(\mu =\nu \), we have that \(W_p^p(\mu ^{(\theta )},\nu ^{(\theta )})=0\) for every \(\theta \), thus \(RW_p^p(\mu ,\nu )=0\). Conversely, since \(W_p^p(\mu ,\nu )\ge 0\), we have that \(RW_p^p(\mu ,\nu )=0\) if and only if \(W_p^p(\mu ^{(\theta )},\nu ^{(\theta )})=0\) for almost every \(\theta \in [0,2\pi ]\), hence \(\mu =\nu \). To conclude, we prove the triangular inequality. Let \(\mu ,\nu \), and \(\zeta \) be elements of \({\mathcal {P}}({\mathbb {R}}^2)\). From the Minkowsky’s inequality [21], we have that

$$\begin{aligned} RW_p(\mu ,\nu )&=\Bigg (\frac{1}{2\pi }\int _{0}^{2\pi }W_p^p(\mu ^{(\theta )},\nu ^{(\theta )})d\theta \Bigg )^{\frac{1}{p}}\\&\le \Bigg (\frac{1}{2\pi }\int _{0}^{2\pi }(W_p(\mu ^{(\theta )},\zeta ^{(\theta )})+W_p(\zeta ^{(\theta )},\nu ^{(\theta )}))^pd\theta \Bigg )^{\frac{1}{p}}\\&\le \Bigg (\frac{1}{2\pi }\int _{0}^{2\pi }W^p_p(\mu ^{(\theta )},\zeta ^{(\theta )})d\theta \Bigg )^{\frac{1}{p}} + \Bigg (\frac{1}{2\pi }\int _{0}^{2\pi }W^p_p(\zeta ^{(\theta )},\nu ^{(\theta )})\Bigg )^{\frac{1}{p}}\\&=RW_p(\mu ,\zeta )+RW_p(\zeta ,\nu ), \end{aligned}$$

which concludes the second part of the proof.

\(RW_p\) is dominated by \(W_p\) and \(W_{1,p}\) Let us consider \(x,y\in {\mathbb {R}}^2\). Let \(\rho \) and \(\phi \) be the polar coordinates of \(x-y\), so that \(x-y=\rho (\cos (\phi ),\sin (\phi ))\). We then have

$$\begin{aligned} x_1-y_1=\rho \cos (\phi ) \quad \quad \text {and} \quad \quad x_2-y_2=\rho \sin (\phi ). \end{aligned}$$

We thus infer

$$\begin{aligned} \mid x_1-y_1\mid ^p + \mid x_2-y_2 \mid ^p =\rho ^p(\mid \cos (\phi ) \mid ^p + \mid \sin (\phi ) \mid ^p). \end{aligned}$$

Let \(\gamma \in \Pi (\mu ,\nu )\) be an optimal transportation plan between \(\mu \) and \(\nu \) with respect to the p-th power of the Euclidean metric, that is \(d(x,y)=\mid \mid x-y \mid \mid ^p_2\), so that

$$\begin{aligned} W_{1,p}^p(\mu ,\nu ):=\int _{{\mathbb {R}}^2\times {\mathbb {R}}^2}\mid \mid x-y \mid \mid ^p_2d\gamma . \end{aligned}$$

Finally, we have

$$\begin{aligned} RW_p^p(\mu ,\nu )&=\frac{1}{2\pi }\int _{0}^{2\pi }\big (\min _{\pi \in \Pi (\mu ^{(\theta )},\nu ^{(\theta )})}l_p^p(x,y)d\pi \big )d\theta \\&=\frac{1}{2\pi }\int _{0}^{2\pi }\big (\min _{\pi \in \Pi (\mu ,\nu )}l_p^p(R_\theta (x),R_\theta (y))d\pi \big )d\theta \\&\le \frac{1}{2\pi }\int _{0}^{2\pi }\int _{{\mathbb {R}}^2\times {\mathbb {R}}^2}\rho ^p(\mid \cos (\phi +\theta )\mid ^p+\mid \sin (\phi +\theta )\mid ^p)d\gamma d\theta \\&= \frac{1}{2\pi }\int _{{\mathbb {R}}^2\times {\mathbb {R}}^2}\rho ^p\int _{0}^{2\pi }(\mid \cos (\phi +\theta )\mid ^p+\mid \sin (\phi +\theta )\mid ^p)d\theta d\gamma \\&=\frac{K_p}{2\pi }\int _{{\mathbb {R}}^2\times {\mathbb {R}}^2}\rho ^p d\gamma = \frac{K_p}{2\pi } W^P_{1,p}(\mu ,\nu ), \end{aligned}$$

where

$$\begin{aligned} K_p=2\int _0^{2\pi }\mid \cos (\theta )\mid ^pd\theta . \end{aligned}$$

To conclude the proof, we recall the classic inequality

$$\begin{aligned} \mid \mid x-y\mid \mid ^p_2\le n^{(\frac{p}{2}-1)_+}\mid \mid x-y\mid \mid _p^p. \end{aligned}$$

The tightness of the first inequality in (10) follows by considering two Dirac’s delta. \(\square \)

Since \(RW_p^p\) is a rotation invariant distance, we are able to express it through a radiant formula.

Theorem 4

Let \(p\ge 1\). Let \(\mu \) and \(\nu \) be two measures supported over \({\mathbb {R}}^2\).

Then, there exists a family of measures \(\{\zeta ^{(\theta )}\}_{\theta \in [0,\pi ]}\) such that

$$\begin{aligned} RW_p^p(\mu ,\nu )=\frac{1}{\pi }\int _{0}^{2\pi }\int _{{\mathbb {R}}}W_p^p(\mu _{\mid x^{(\theta )}_2},\zeta _{ \mid x^{(\theta )}_2}^{(\theta )})d\mu _{2}^{(\theta )}d\theta . \end{aligned}$$
(11)

Proof

Given any \(\theta \in [0,2\pi ]\), from Proposition 1, we have that there exists a couple of measures \(\zeta ^{(\theta )}\) and \(\eta ^{(\theta )}\) such that

$$\begin{aligned} W_p^p(\mu ^{(\theta )},\nu ^{(\theta )})=\int _{{\mathbb {R}}}W_{p}^p(\mu ^{(\theta )}_{\mid x^{(\theta )}_2},\zeta ^{(\theta )}_{\mid x^{(\theta )}_2})d\mu ^{(\theta )}_2 + \int _{{\mathbb {R}}}W_{p}^p(\mu ^{(\theta )}_{\mid x^{(\theta )}_1},\eta ^{(\theta )}_{\mid x^{(\theta )}_1})d\mu ^{(\theta )}_1. \end{aligned}$$

By taking the average over \(\theta \), we conclude the thesis. \(\square \)

Relation with the sliced Wasserstein distance

We now highlight how the Radiant Formula allows us to retrieve bounds on the Sliced Wasserstein distance in terms of the classic Wasserstein distance.

Theorem 5

Given two probability measures \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\), we have

$$\begin{aligned} SW_2^2(\mu ,\nu )\le \frac{1}{2} W^2_2(\mu ,\nu ). \end{aligned}$$

Moreover, the bound is tight. Similarly, it holds

$$\begin{aligned} SW_p^p(\mu ,\nu )\le n^{(\frac{p}{2}-1)_+}\frac{K_p}{\pi } W^p_{p}(\mu ,\nu ). \end{aligned}$$

Proof

It follows from the convexity of the \(W_2^2\) distance [21, Theorem 4.8]. Indeed, from the Radiant Formula (5) we know that

$$\begin{aligned} W_2^2(\mu ,\nu )=\dfrac{1}{\pi }\int _{0}^{2\pi }\int _{{\mathbb {R}}} W^2_2(\mu _{\mid x_2^{(\theta )}},\zeta _{\mid x_2^{(\theta )}}^{(\theta )})d\mu _2^{(\theta )} \, d\theta \end{aligned}$$

then, following the notation in [21], if we set \(\lambda =\mu ^{(\theta )}_2\), \(\mu =\mu _{\mid x_2^{(\theta )}}\) and \(\nu =\zeta _{\mid x_2^{(\theta )}}^{(\theta )}\), we conclude

$$\begin{aligned} W_2^2(\mu ,\nu )&=\dfrac{1}{\pi }\int _{0}^{2\pi }\int _{{\mathbb {R}}} W^2_2(\mu _{\mid x_2^{(\theta )}},\zeta _{\mid x_2^{(\theta )}}^{(\theta )})d\mu _2^{(\theta )} d\theta \nonumber \\&\ge \dfrac{1}{\pi }\int _{0}^{2\pi } W_2^2(\mu _1^{(\theta )},\nu _1^{(\theta )})d\theta = 2\,SW_2^2(\mu ,\nu ) \end{aligned}$$
(12)

where the equality (12) comes from the fact that each \(\zeta ^{(\theta )}\in \Pi (\nu ^{(\theta )}_1,\mu ^{(\theta )}_2)\). To prove the tightness, it suffice to consider the measures \(\mu =\delta _{(0,0)}\) and \(\nu =\delta _{(1,1)}\).

By the same argument, we infer the bound on \(SW_p^p\). \(\square \)

Finally, we use the Knothe-Rosenblatt transportation plan [19, 22] to bound the absolute error we commit by using the Sliced Wasserstein Distance over the Wasserstein Distance.

Theorem 6

Given \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^2)\), it holds

$$\begin{aligned} \mid W_2^2(\mu ,\nu )-SW_2^2(\mu ,\nu ) \mid \le \dfrac{1}{2\pi }\int _0^{2\pi }\int _{{\mathbb {R}}}W_2^2((\zeta ^{(\theta )}_{KR})_{\mid y_1^{(\theta )}},\nu _{\mid y_1^{(\theta )}})d\nu _1^{(\theta )}d\theta , \end{aligned}$$
(13)

where \(\zeta _{KR}^{(\theta )}\) is the marginal of Knothe-Rosenblatt transportation plan on the coordinates \((x^{(\theta )}_2,y^{(\theta )}_1)\). Furthermore, the upper bound in (13) is tight.

Proof

Let \(V^{(\theta )}\) be a base of the space and let \(\gamma _{KR}^{(\theta )}\) and \(\zeta _{KR}^{(\theta )}\) be the Knothe-Rosenblatt transportation plan and its projection over \((x_2^{(\theta )},y_1^{(\theta )})\), respectively.Footnote 3 Then, by definition of the Knothe-Rosenblatt transportation plan, we have

$$\begin{aligned} W^2_2(\mu ,\nu )\le W^2_{KR}(\mu ^{(\theta )},\nu ^{(\theta )}) = W_2^2(\mu _1^{(\theta )},\nu _1^{(\theta )})+\int _{{\mathbb {R}}}W_2^2((\zeta ^{(\theta )}_{KR})_{\mid y_1^{(\theta )}},\nu _{\mid y_1^{(\theta )}})d\nu _1^{(\theta )},\nonumber \\ \end{aligned}$$
(14)

where \(W^2_{KR}(\mu ^{(\theta )},\nu ^{(\theta )})\) is the cost of the Knothe-Rosenblatt transportation plan according to the squared Euclidean distance. Since for every \(\theta \in [0,2\pi ]\) it holds \(W_2^2(\mu _1^{(\theta )},\nu _1^{(\theta )})\le W_2^2(\mu ,\nu )\), we infer

$$\begin{aligned} 0\le W_2^2(\mu ,\nu )-W_2^2(\mu _1^{(\theta )},\nu _1^{(\theta )})\le \int _{{\mathbb {R}}}W_2^2((\zeta ^{(\theta )}_{KR})_{\mid y_1^{(\theta )}},\nu _{\mid y_1^{(\theta )}})d\nu _1^{(\theta )} \end{aligned}$$
(15)

for any \(\theta \in [0,2\pi )\). Finally, by taking the average over \([0,2\pi )\), we find

$$\begin{aligned} \dfrac{1}{2\pi }\int _0^{2\pi }&\Big (W_2^2(\mu ,\nu )-W_2^2(\mu _1^{(\theta )},\nu _1^{(\theta )})\Big )d\theta = \dfrac{1}{2\pi }\int _0^{2\pi }\int _{{\mathbb {R}}}W_2^2((\zeta ^{(\theta )}_{KR})_{\mid y_1^{(\theta )}},\nu _{\mid y_1^{(\theta )}})d\nu _1^{(\theta )}, \end{aligned}$$

and, hence

$$\begin{aligned} \mid W_2^2(\mu ,\nu )-SW_2^2(\mu ,\nu )\mid \;\le \dfrac{1}{2\pi }\int _0^{2\pi }\int _{{\mathbb {R}}}W_2^2((\zeta ^{(\theta )}_{KR})_{\mid y_1^{(\theta )}},\nu _{\mid y_1^{(\theta )}})d\nu _1^{(\theta )}, \end{aligned}$$

which concludes the first part of the proof.

To prove the tightness, it suffice to consider the measures \(\mu =\delta _{(0,0)}\) and any measure \(\nu \). In this case, the Knothe-Rosenblatt transportation plan between \(\mu \) and \(\nu \) is optimal, thus the inequality in (15) is an equality for every \(\theta \in [0,2\pi ]\). \(\square \)

The extension to higher dimensional spaces

To conclude, we extend our results to the case in which the measures are supported over a higher dimensional space. We denote with d the dimension of the space, so that \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\). Moreover, let \({\mathcal {R}}_d\) the set of all the rotations from \({\mathbb {R}}^d\) to \({\mathbb {R}}^d\). Given \(R\in {\mathcal {R}}_d\), let use define \(e^{(R)}_i=R(e_i)\), where \(e_i\) is the i-th vector in the canonical base of \({\mathbb {R}}^d\). It is easy to see that \(\{e_i^{(R)}\}_{i=1,\dots ,d}\) is an orthonormal base of \({\mathbb {R}}^d\), we use \(x^{(\theta )}_1,x_2^{(\theta )},\dots ,x^{(\theta )}_d\) to denote the coordinates of \({\mathbb {R}}^d\) with respect to the base \(\{e^{(R)}_i\}\). Finally, we set \(\rho \) to be the uniform probability distribution over \({\mathcal {R}}_d\), since the set of rotation is identifiable as a compact set of the orthogonal matrices, this measure is well-defined.

Given a couple of measures \(\mu \) and \(\nu \), the SWD is defined as follows

$$\begin{aligned} SW_p^p(\mu ,\nu )=\dfrac{1}{\mid {\mathbb {S}}^{(d-1)}\mid }\int _{{\mathbb {S}}^{(d-1)}} W^p_p(\mu _v,\nu _v) dv, \end{aligned}$$
(16)

where \({\mathbb {S}}^{(d-1)}\) is the set of all directions in \({\mathbb {R}}^d\) and \(\mu _v\) is the marginal of \(\mu \) over the span of \(v\in {\mathbb {S}}^{(d-1)}\). In what follows, we consider the equivalent formulation

$$\begin{aligned} SW_p^p(\mu ,\nu )=\int _{{\mathcal {R}}_d} W^p_p(\mu _1^{(R(e_1))},\nu _1^{(R(e_1))}) d\rho . \end{aligned}$$
(17)

Indeed, given \(v,v'\in {\mathbb {S}}^{(d-1)}\), let us define \(T_v=\{R\in {\mathcal {R}}_d\;\;\text {s.t.}\;\; R(e_1)=v\}\) and, similarly, \(T_{v'}=\{R\in {\mathcal {R}}_d\;\;\text {s.t.}\;\; R(e_1)=v'\}\). Then, it holds

$$\begin{aligned} \rho (T_{v})=\rho (T_{v'}). \end{aligned}$$

Infact, let \(S\in {\mathcal {R}}_d\) be such that \(S(v)=v'\), we then define \({\mathcal {S}}:T_v\rightarrow T_{v'}\) as follows

$$\begin{aligned} {\mathcal {S}}(R)=S\circ R. \end{aligned}$$

It is easy to see that \({\mathcal {S}}\) is a bijection. Moreover, since it is induced by a rotation, its determinant is equal to 1, this combined with the fact that \(\rho \) is a uniform distribution, allows us to retrieve (17). Thus, in the following, when we refer to the SWD, we will refer to the one defined in (17).

First, it is easy to see that Proposition 1 and its proof can be easily adapted to the \({\mathbb {R}}^d\) case. In particular, we have the following.

Corollary 1

For every \(p\ge 1\) and for every \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\) there exists a family of d measures \(\{\zeta _i\}_{i=1,\dots ,d}\in {\mathcal {P}}({\mathbb {R}}^d)\) such that

$$\begin{aligned} W_p^p(\mu ,\nu )=\sum _{i=1}^d\int _{{\mathbb {R}}^{d-1}}W_p^p(\mu _{\mid x_{-i}},(\zeta _i)_{\mid x_{-i}})d\mu _{-i} \end{aligned}$$
(18)

and \(\mu _{-i}=(\zeta _{i})_{-i}\) for every \(i=1,\dots ,d\), where \(x_{-i}\in {\mathbb {R}}^{d-1}\) is the vector x without the i-th coordinate and \(\mu _{-i}\) is the marginal of the measure \(\mu \) on all the coordinates but the i-th one, i.e. on \((x_1,\dots ,x_{i-1},x_{i+1},\dots ,x_d)\).

Proof

It follows from the same argument used in the proof of Proposition 1.

Indeed, let \(\gamma \) be the optimal transportation plan between \(\mu \) and \(\nu \).

We then define \(\zeta _i\) as the projection of \(\gamma \) on the coordinates \((x_1,\dots ,x_{i-1},y_i,x_{i+1},\dots ,x_d)\) for every \(i\in \{1,2,\dots ,d\}\).

Then, using again the characterization showed in [20], we infer (18) and thus the thesis. \(\square \)

Using the characterization of Corollary 1, we are able to extend the Radiant Formula to the higher dimensional case. The same goes for the bounds presented in Theorems 5 and 6.

Theorem 7

Given any couple of measures \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\), it holds

$$\begin{aligned} W_2^2(\mu ,\nu )=d\int _{{\mathcal {R}}_d}\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R)}_{-1}},(\zeta ^{(R)}_{-1})_{\mid x^{(R)}_1})d\mu ^{(R)}_{-1}d\rho . \end{aligned}$$
(19)

Furthermore, for every couple of measures \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\), it holds \(\frac{1}{d}W_2^2(\mu ,\nu )\ge SW_2(\mu ,\nu )\) and

$$\begin{aligned} \mid W_2^2(\mu ,\nu )-SW_2^2(\mu ,\nu )\mid \le \int _{{\mathcal {R}}_d}\int _{{\mathbb {R}}}W_2^2((\zeta ^{(R)}_{KR})_{\mid y_1^{(R)}},\nu _{\mid y_1^{(R)})})d\nu _{(x_2^{(R)},\dots ,x_d^{(R)})}d\rho ,\nonumber \\ \end{aligned}$$
(20)

where \(\zeta ^{(R)}_{KR}\) is the marginal of the Knothe-Rosenblatt transportation plan over the coordinates \((y^{(R)}_1,x^{(R)}_2\dots ,x^{(R)}_d)\) and \(\nu _{(x_2^{(R)},\dots ,x_d^{(R)})}\) is the marginal of \(\nu \) over \((x_2^{(R)},\dots ,x_d^{(R)})\).

Both bounds (19) and (20) are tight.

Moreover the same bounds can be inferred for \(RW_p^p\).

Proof

For the sake of simplicity, we just show how to extend the proof of Theorem 2 to prove identity (19). Indeed, the proof of (20) follows by applying the same argument to the proof of Theorem 6. The same goes for the bounds on \(RW_p^p\).

For every \(R\in {\mathcal {R}}_d\), Corollary 1 allows us find a set of d measures \(\{\zeta _i^{(R)}\}_{i=1,\dots ,d}\) such that

$$\begin{aligned} W_2^2(\mu ,\nu )=\sum _{i=1}^d\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta ^{(R)}_{i})_{\mid x^{(R)}_{-i}})d\mu ^{(R)}_{-i}. \end{aligned}$$

By taking the average with respect to \(\rho \) of both sides of the equation we get

$$\begin{aligned} W_2^2(\mu ,\nu )=\int _{{\mathcal {R}}_d}W_2^2(\mu ,\nu )d\rho =\int _{{\mathcal {R}}_d}\sum _{i=1}^d\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta ^{(R)}_{i})_{\mid x^{(R)}_{-i}})d\mu ^{(R)}_{-i}d\rho . \end{aligned}$$

Let \(S_i\in {\mathcal {R}}_d\) be a rotation such that \(S_i(e_1)=e_i\) holds. Then we have \(e^{(R)}_i=R(S_i(e_1))\) and, therefore,

$$\begin{aligned} W_2^2(\mu ,\nu )&=\int _{{\mathcal {R}}_d}\sum _{i=1}^d\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R\circ S_i)}_{-1}},(\zeta ^{(R\circ S_i)}_{-1})_{\mid x^{(R\circ S_i)}_{-1}})d\mu ^{(R\circ S_i)}_{-1}d\rho \\&=d\int _{{\mathcal {R}}_d}\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R)}_{-1}},(\zeta ^{(R)}_{-1})_{\mid x^{(R)}_{-1}})d\mu ^{(R)}_{-1}d\rho , \end{aligned}$$

where the last equality comes from the fact that every \(S_i\) induces a bijective map from \({\mathcal {R}}_d\) to \({\mathcal {R}}_d\) whose determinant is equal to 1 and \(\rho \) is a uniform distribution.

Using convexity, we retrieve the bound \(\frac{1}{d}W_2^2(\mu ,\nu )\ge SW_2^2(\mu ,\nu )\). \(\square \)

Finally, we study a more general class of Sliced Wasserstein distances. Let \({\mathcal {H}}_k\) be the set of all the k-dimensional hyper-plans in \({\mathbb {R}}^d\). Given \(\mu ,\nu \in {\mathcal {P}}({\mathbb {R}}^d)\), we define the k-Sliced Wasserstein Distance as

$$\begin{aligned} \Big (SW^{(k)}_2(\mu ,\nu )\Big )^2=\int _{{\mathcal {H}}_k}W_2^2(\mu _{H},\nu _{H})d{\textbf{H}}, \end{aligned}$$
(21)

where \(\mu _H\) (\(\nu _H\)) is the marginal of \(\mu \) (\(\nu \), respectively) over the hyper-plan \(H\in {\mathcal {H}}_k\) and \({\textbf{H}}\) is the uniform probability distribution over the space of k-dimensional hyper-plans. Roughly speaking, the k-Sliced Wasserstein Distance is a variant of SW that projects the two measures over all the k dimensional sub-spaces of \({\mathbb {R}}^d\) rather than on all the 1 dimensional sub-spaces. This class of metrics is a natural extension of the \({\mathcal {S}}_k\) metrics introduced in [23].

Theorem 8

Let \(\mu \) and \(\nu \) be two probability measures over \({\mathbb {R}}^d\) and let k be an integer such that \(k<d\).

Then, it holds

$$\begin{aligned} SW^{(k)}_2(\mu ,\nu )\le \sqrt{\frac{k}{d}}W_2(\mu ,\nu ). \end{aligned}$$
(22)

Moreover, the bound is tight.

Proof

Let us consider H, a k-dimensional hyper-plan.

Moreover, let \({\mathcal {T}}:{\mathcal {R}}_d\rightarrow {\mathcal {H}}_k\) be defined as

$$\begin{aligned} {\mathcal {T}}(R)=R(H_0) \end{aligned}$$

where \(H_0\in {\mathcal {H}}_k\) is the k-dimensional hyper-space generated by \(\{e_1,\dots ,e_k\}\), i.e.

$$\begin{aligned} H_0=\{x\in {\mathbb {R}}^d,\;{\textbf {s.t.}}\; x_d=x_{d-1}=\dots =x_{d+1-k}=0\}. \end{aligned}$$

Given \(H,H'\in {\mathcal {H}}_d\), let \(S\in {\mathcal {R}}_d\) be a rotation such that \(S(H)=H'\). hen, we have that for every \(R'\in {\mathcal {T}}^{-1}(H)\), the rotation \(R'':=S\circ R'\in {\mathcal {T}}^{-1}(H')\), indeed \(R''(H_0)=H'=S(H)=S(R'(H_0))\).In particular, S induces a bijective map between \({\mathcal {T}}^{-1}(H)\) and \({\mathcal {T}}^{-1}(H')\).

Again, since the maps is induced by a rotation and \({\textbf{H}}\) is a uniform distribution, we have that \(\rho ({\mathcal {T}}^{-1}(H))=\rho ({\mathcal {T}}^{-1}(H'))\). n particular, given a function \(f:{\mathcal {R}}_d\rightarrow {\mathbb {R}}\), it holds

$$\begin{aligned} \int _{{\mathcal {R}}_d}f(R)d\rho =\int _{{\mathcal {H}}_k}f({\mathcal {T}}(H))d{\textbf{H}}. \end{aligned}$$
(23)

Given \(R\in {\mathcal {R}}_d\), there exists a set of d measures in \({\mathbb {R}}^d\), namely \(\zeta ^{(R)}_i\) such that

$$\begin{aligned} W_2^2(\mu ,\nu )=\sum _{i=1}^d\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta _i^{(R)})_{\mid x^{(R)}_{-i}})d\mu ^{(R)}_{-i}. \end{aligned}$$

For every \(i\in \{1,2,\dots ,d\}\), let \(S_i\in {\mathcal {R}}_d\) be such that \(S_i(e_1)=e_i\).By definition, given \(e^{(R)}_i\), it holds true that \(e_i^{(R)}=S_i(e_1)^{(R)}=R(S_i(e_1))=e^{(R\circ S_i)}_1\).

Thus, we get

$$\begin{aligned} W_2^2(\mu ,\nu )&=\int _{{\mathcal {R}}_d}\sum _{i=1}^d\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta _i^{(R)})_{\mid x^{(R)}_{-i}})d\mu ^{(R)}_{-i}d\rho \\&=d\int _{{\mathcal {R}}_d}\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R)}_{-1}},(\zeta _1^{(R)})_{\mid x^{(R)}_{-1}})d\mu ^{(R)}_{-1}d\rho , \end{aligned}$$

where the last equality follows from the fact that, for every \(i\in \{1,2,\dots ,d\}\), the function \({\mathcal {S}}_i:R\rightarrow R\circ S_i\) is bijective and its Jacobian’s determinant is equal to 1.

Given \(R\in {\mathcal {R}}_d\) and \(\{e^{(R)}_1,\dots ,e^{(R)}_d\}\) its related base of \({\mathbb {R}}^d\), we have that

$$\begin{aligned} l_2^2(x,y)=c^{(R)}_{1:k}(x,y)+c^{(R)}_{k:d}(x,y), \end{aligned}$$

where \(c^{(R)}_{1:k}(x,y):=\sum _{i=1}^k(x^{(R)}_i-y^{(R)}_i)^2\) and \(c^{(R)}_{k:d}(x,y):=\sum _{i=k+1}^d(x^{(R)}_i-y^{(R)}_i)^2\). Again, by Corollary 1, we can then find a couple of measures \(\Psi \in \Pi (\mu ^{(R)}_{>k},\nu ^{(R)}_{\le k})\) and \(\Theta \in \Pi (\mu ^{(R)}_{\le k},\nu ^{(R)}_{> k})\) such that

$$\begin{aligned} W_2^2(\mu ,\nu )&=\int _{{\mathbb {R}}^{d-k}}W_{c^{(R)}_{1:k}}^2(\mu _{\mid x^{(R)}_{k+1},\dots ,x^{(R)}_d},\Psi _{\mid x^{(R)}_{k+1},\dots ,x^{(R)}_d})d\mu ^{(R)}_{>k}\\&+ \int _{{\mathbb {R}}^{k}}W_{c^{(R)}_{k:d}}^2(\mu _{\mid x^{(R)}_{1},\dots ,x^{(R)}_k},\Theta _{\mid x^{(R)}_{1},\dots ,x^{(R)}_k})d\mu ^{(R)}_{\le k}, \end{aligned}$$

where \(\mu ^{(R)}_{\le k}\) and \(\mu ^{(R)}_{> k}\) are the marginals of \(\mu \) over the first k coordinates and the last \((d-k)\) coordinates, respectively.

Since, for every \(R\in {\mathcal {R}}_d\) and for every \((x^{(R)}_{k+1},\dots ,x^{(R)}_{d})\), we have that \(c_{1:k}\) is also separable, we can further split the Wasserstein distance \(W_{c^{(R)}_{1:k}}^2(\mu _{\mid x^{(R)}_{k+1},\dots ,x^{(R)}_d},\Psi _{\mid x^{(R)}_{k+1},\dots ,x^{(R)}_d})\) and obtain

$$\begin{aligned} \int _{{\mathbb {R}}^{d-k}}&W_{c^{(R)}_{1:k}}^2(\mu _{\mid x^{(R)}_{k+1},\dots ,x^{(R)}_d},\Psi _{\mid x^{(R)}_{k+1},\dots ,x^{(R)}_d})d\mu ^{(R)}_{>k}\\&=\sum _{i=1}^k \int _{{\mathbb {R}}^{d-1}} W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta _i^{(R)})_{\mid x^{(R)}_{-i}})d\mu _{-i}^{(R)}. \end{aligned}$$

Thus, by convexity, we infer that

$$\begin{aligned} \sum _{i=1}^k \int _{{\mathbb {R}}^{d-1}} W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta _i^{(R)})_{\mid x^{(R)}_{-i}})d\mu _{-i}^{(R)}&=\int _{{\mathbb {R}}^{d-k}}W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta _i^{(R)})_{\mid x^{(R)}_{\ge i}})d\mu _{\ge i}^{(R)}\\&\ge W_2^2(\mu _H,\nu _H), \end{aligned}$$

where \(H=R(H_0)\). If we take the average over all the possible rotations, we get

$$\begin{aligned} \int _{{\mathcal {R}}_d}\sum _{i=1}^k \int _{{\mathbb {R}}^{d-1}} W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta _i^{(R)})_{\mid x^{(R)}_{-i}})d\mu _{-i}^{(R)}d\rho \ge \int _{{\mathcal {R}}_d}W_2^2(\mu _{{\mathcal {T}}(R)},\nu _{{\mathcal {T}}(R)})d\rho . \end{aligned}$$

Using identity (23), we get

$$\begin{aligned} \int _{{\mathcal {R}}_d}\sum _{i=1}^k \int _{{\mathbb {R}}^{d-1}} W_2^2(\mu _{\mid x^{(R)}_{-i}},(\zeta _i^{(R)})_{\mid x^{(R)}_{-i}})d\mu _{-i}^{(R)}d\rho \ge \int _{{\mathcal {H}}_k}W_2^2(\mu _H,\nu _H)d{\textbf{H}}. \end{aligned}$$

By the same argument used before, we can simplify the sum

$$\begin{aligned} k\int _{{\mathcal {R}}_d}\int _{{\mathbb {R}}^{d-1}} W_2^2(\mu _{\mid x^{(R)}_{-1}},(\zeta _1^{(R)})_{\mid x^{(R)}_{-1}})d\mu _{-1}^{(R)}d\rho&\ge \int _{{\mathcal {H}}_k}W_2^2(\mu _{H},\nu _{H})d{\textbf{H}}\\&=(SW_2^{(k)}(\mu ,\nu ))^2. \end{aligned}$$

Finally, we have that

$$\begin{aligned} W_2^2(\mu ,\nu )=d\int _{{\mathcal {R}}_d}\int _{{\mathbb {R}}^{d-1}}W_2^2(\mu _{\mid x^{(R)}_{-1}},(\zeta _1^{(R)})_{\mid x^{(R)}_{-1}})d\mu ^{(R)}_{-1}d\rho \ge \frac{d}{k}(SW_2^{(k)}(\mu ,\nu ))^2, \end{aligned}$$

which allows to conclude(22).

Lastly, the tightness property follows by considering two Dirac’s deltas. \(\square \)

3 Conclusion

In this note, we proposed an alternative representation of the Wasserstein distance and showcased how this formula has significant connections with a classic Sliced Wasserstein Distance and used them to prove bounds on Sliced Wasserstein Distances in term of the classic Wasserstein Distances. Furthermore, we used the Knothe-Rosenblatt heuristic to prove a bound over the absolute error.

The field of Sliced Wasserstein Distance keeps flourishing and producing different alternative versions [23,24,25]. We believe the Radiant Formula is a useful item to study the relationships between old Sliced-like distances and new ones.