An Optimal Transport approach for the Schr\"odinger bridge problem and convergence of Sinkhorn algorithm

This paper exploit the equivalence between the Schr\"odinger Bridge problem and the entropy penalized optimal transport in order to find a different approach to the duality, in the spirit of optimal transport. This approach results in a priori estimates which are consistent in the limit when the regularization parameter goes to zero. In particular, we find a new proof of the existence of maximizing entropic-potentials and therefore, the existence of a solution of the Schr\"odinger system. Our method extends also when we have more than two marginals: we can provide an alternative proof of the convergence of the Sinkhorn algorithm with two marginals and we show that the Sinkhorn algorithm converges in the multi-marginal case.


Introduction
Let (X, d X ) and (Y, d Y ) be Polish spaces, c : X × Y → R be a cost function, ρ 1 ∈ P(X) and ρ 2 ∈ P(Y ) be probability measures. We consider the Schrödinger Bridge problem OT ε (ρ 1 , ρ 2 ) = inf γ∈Π(ρ1,ρ2) ε KL(γ|K), (1.1) where K is the so-called Gibbs Kernel associated with the cost c: The function KL(γ|K) in (1.1) is the Kullback-Leibler divergence between the probability measures γ and K ∈ P(X × Y ) which is defined as Here, by abuse of notation we are denoting by γ the Radon-Nikodym derivative of γ with respect to the product measure ρ 1 ⊗ ρ 2 . Geometrically speaking, when we interpret the Kullback-Leibler divergence as a distance, the problem (1.1) defines the so called Kullback-Leibler projection of K on the set Π(ρ 1 , ρ 2 ).
In this paper, we are interested in the following questions: (1) What is the regularity of the Entropic potentials a ε and b ε ?
(2) Can we understand the structure and regularity of the minimizer in (1.1) if we consider the Schrödinger Bridge problem with N given marginals ρ 1 , ρ 2 , . . . , ρ N instead of 2?
The answers to the questions (1) and (2) relies on the Kantorovich duality formulation of (1.1) and its extension to the multi-marginal setting: we will exploit the parallel with optimal transport to give also a new (variational) proof for the existence of a solution to the Schrödinger system. We believe that also this contribution is important since the only available proofs of that pass through abstract results of closure of "sum type" functions.
The multi-marginal Schrödinger Bridge problem, to be introduced in section 4, has been recently consider in the literature from different viewpoints (e.g. [5,6,16,18,20,26,41,42]) as, for instance, the Wasserstein Barycenters, Matching problem in Economics, time-discretisation of Euler Equations and Density Functional Theory in computational chemistry.
Finally, we want to mention that G. Carlier and M. Laborde in [18] show the well-posedness (existence, uniqueness and smooth dependence with respect to the data) for the multi-marginal Schrödinger system in L ∞ -see (4.8) in section 4 -via a local and global inverse function theorems. This is a different approach and orthogonal result compared to the study presented in this paper; moreover their result is restricted to measures ρ i which are absolutely continuous with respect to some reference measure, with density bounded from above and below.

Computational aspects and connection with Optimal Transport Theory
In many applications, the method of choice for numerically computing (1.1) is the so-called Iterative Proportional Fitting Procedure (IPFP) or Sinkhorn algorithm [62]. The aim of the Sinkorn algorithm is to construct the measure γ ε realizing minimum in (1.1) by fixing the shape of the guess as γ ε n = a n (x)b n (y)K (since this is the actual shape of the minimizer) and then alternatively updating either a n or b n , by matching one of the marginal distribution respectively to the target marginals ρ 1 or ρ 2 , The IPFP sequences (a n ) n∈N and (b n ) n∈N are defined thus iteratively by 1 , a n (y) = 1 k(x, y)b n (y)dρ 2 (y) .
While a n and b n will be approximations of the solution of the Schrödinger system, the sequence of probability measures γ ε n = a n (x)b n (y)K will approximate the minimizer γ ε . We stress that the IPFP procedure can be easily generalized in the multi-marginal setting, whose discussion will be detailed in section 4.
(3) Can one prove convergence of the Sinkhorn algorithm in two and several marginals case?
In the two marginals case, the IPFP schemes was introduced by R. Sinkhorn [62]. The convergence of the iterates (1.4) was proven by J. Franklin and J. Lorenz [35] in the discreate case and by L. Ruschendorf [60] in the continuous one. An alternative proof based on the Franklin-Lorenz approach via the Hilbert metric was also provided by Y. Chen, T. Georgiou and M. Pavon [21], which is particular leads to a linear convergence rate of the procedure (in the Hilbert metric).
Despite the different approaches and theoretical guarantees obtained in the 2-marginal problem, in the multi-marginal case the situation changes completely. Although numerical evidence suggests convergence and stability of the Sinkorhn algorithm for general class of cost functions [5,26,16,16], theoretical results guaranteeing convergence, stability were unknown (even if in [60] it is claimed that with his methods the result can be extended to the multimarginal case, but to our knowledge this has not been done yet).
One of the contributions of this paper is to give convergence results of the Sinkhorn algorithm in the multimarginal setting. In our approach we exploit the regularity of Entropic potentials to prove by compatness the convergence of IPFP scheme (1.4).
Connection with Optimal Transport Theory: the problem (1.1) allow us to create very efficient numerical scheme approximating solutions to the Monge-Kantorovich formulation of optimal transport and its many generalizations. Indeed, notice that we can rewrite (1.1) as a functional given by the Monge-Kantorovich formulation of Optimal Transport with a cost function c plus an Entropic regularization parameter In particular, one can show that [16,49,53] if (γ ε ) ε≥0 is a sequence of minimizers of the above problem, then γ ε converges when ε → 0 to a solution of the Optimal Transport (ε = 0). More precisely, let us define the functionals C k , C 0 : Then in [16,50,53] it is shows that the sequence of functionals (C k ) k∈N Γ−converges to C 0 with respect to the weak convergence of measures. In particular the minima and minimal values are converging and so, in particular if c(x, y) = d(x, y) p , then where W p (ρ 1 , ρ 2 ) is the p-Wasserstein distance between ρ 1 and ρ 2 , Figure 1: Support of the optimal coupling γ ε in (1.1) for the one-dimensional distance square cost with different values of = 10 −1 , 1, 10, 10 2 , 10 3 : the densities ρ 1 ∼ N (0, 5) (blue) and ρ 2 = 1 2 η 1 + 1 2 η 2 is a mixed Gaussian (red), where η 1 ∼ N (−2, 0.8) and η 2 ∼ N (2, 0.7). The numerical computations are done using the POT library, [34].
In the context of Optimal Transport Theory, the entropic regularization was introduced by A. Galichon and B. Salanié [36] to solve matching problems in economics; and by M. Cuturi [25] in the context of machine learning and data sciences. Both seminal papers received renewed attention in understanding the theoretical aspects of (1.5) as well as had a strong impact in imagining, data sciences and machine learning communities due to the efficiency of the Sinkhorn algorithm.
Sinkhorn algorithm provides an efficient and scalable approximation to optimal transport. In particular, by an appropriate choice of parameters, the Sinkhorn algorithm is in fact a near-linear time approximation for computing OT distances between discrete measures [2]. However, as studied in [30,64], the Wasserstein distance suffer from the so-called curse of dimensionality. We refer to the recent book [27] written by M. Cuturi and G. Peyré for a complete presentation and references on computational optimal transport.

Main contributions
In order to study the regularity of Entropic-potentials, we introduce the dual (Kantorovich) functional The Kantorovich duality of (1.1) is given by the following variational problem (see Proposition 2.12) OT ε (ρ 1 , ρ 2 ) = sup The Entropy-Kantorovich duality (1.6) appeared, for instance, in [18,33,42,43,44,50]. The firsts contributions of this paper are (i) prove the existence of maximizers u * and v * (up to translation) in (1.6) in natural spaces; (ii) show that the Entropy-Kantorovich potentials inherit the same regularity of the cost function (see the precise statement in Proposition 2.4 and Theorem 2.8).
We then link u * and v * to the solution of the Schrödinger problem; as a byproduct of our results we are able to provide an alternative proof of the convergence of the Sinkhorn algorithm in the 2-marginal case via a purely optimal transportation approach (Theorem 3.1), seeing it as an alternate maximization procedure. The strength of this proof is that it can be easily generalized to the multi-marginal setting (Theorem 4.7).

Summary of results and main ideas of the proofs
Our approach follows ideas from Optimal Transport and relies on the study of the duality (Kantorovich) problem (1.6) of (1.1). Analogously to the optimal transport case, if one assume some regularity (boundedness, uniform continuity, concavity) of the cost function c, then we can obtain the same type of regularity of the Entropy potentials u and v.
The relation between solution of the dual problem 1.6 and the Entropic-Potentials solving the Schrödinger system was already pointed out by C. Léonard [50]. From our knowledge, the direct proof of existence of maximizers in (1.6) a new result.
Our approach to obtain the existence of Entropic-Kantorovich potentials, follow the direct method of Calculus of Variations. The key idea in the argument is to define a generalized notion of c-transform in the Schrödinger Bridge case, namely the (c, ε)-transform. The main duality result, in the most general case where we assume only that c is bounded, is given by the Theorem 2.8 and stated below.
One can show that this operation is well defined and, moreover, D ε (u, u (c,ε) ) ≥ D ε (u, v), ∀u, v and D ε (u, u (c,ε) ) = D ε (u, v) if and only if v = u (c,ε) (lemma 2.6). If we assume additionally regularity for the cost function c, for instance that c is ω-continuous or that it is merely bounded, the (c, ε)-transform is actually a compact operator, respectively form L 1 (ρ 1 ) to C(Y ) and from L ∞ (ρ 1 ) to L p (ρ 2 ) (Proposition 2.4). IPFP/Sinkhorn algorithm: As a byproduct of the above approach to the duality, we present an alternative proof of the convergence of the IPFP/Sinkhorn algorithm. The main idea in our proof is that we can rewrite the IPFP iteration substituting a n = exp(u n /ε) and b n = exp(v n /ε); in these new variables the iteration becomes Or, v n (y) = (u (n−1) ) (c,ε) and u n (y) = (v n ) (c,ε) . In particular we can interpret the IPFP in optimal transportation terms: at each step the Sinkhorn iterations (1.4) are equivalent to take the (c, ε)-transforms alternatively and therefore the IPFP can be seen as an alternating maximizing procedure for the dual problem. Therefore, using the aforementioned compactness, it is easy to show that u n and v n converge to to the optimal solution of the Kantorovich dual problem when n → +∞.
be Polish metric spaces, ρ 1 ∈ P(X) and ρ 2 ∈ P(Y ) be probability measures and c : X ×Y → R be a Borel bounded cost. If (a n ) n∈N and (b n ) n∈N are the IPFP sequences defined in (1.4), then there exists λ n > 0 such that a n /λ n → a in L p (ρ 1 ) and λ n b n → b in L p (ρ 2 ), 1 ≤ p < +∞, for a, b that solve the Schrödinger system. In particular, the sequence γ n = a n b n K converges in L p (ρ 1 ⊗ ρ 2 ) to γ ε opt in (1.1), 1 ≤ p < +∞.
We recall that the argument in original proof of convergence of the Sinkhorn algorithm [35] (also in [21]) relies on defining the Hilbert metric on the projection cone of the Sinkhorn iterations. The authors show that the Sinkhorn iterates are a contraction under this metric and therefore the procedure converges. This proof has the advantage of providing automatically the rate of convergence of the iterates; however it is not easily extendable in the several marginals case.
Our approach instead can be extended to obtain the existence and convergence results also in the multimarginal setting: be Polish metric spaces and ρ i ∈ P(X i ) be probability measures, for i ∈ {1, . . . , N } and c : X 1 × · · · × X N → R be a Borel bounded cost If (a n i ) n∈N are the multi-marginal IPFP sequences that will be defined (4.9), then there exist λ n i > 0 with i λ n i = 1 such that where (a 1 , . . . , a N ) solve the Schrödinger system. In particular, the sequence γ n to the optimal coupling γ ε N,opt solving the multi-marginal Schrödinger Bridge problem to be defined in (4.2).

Organization of the paper
The remaining part of the paper is organized as follows: Section 2 contains the main structural results of the paper, namely Proposition 2.4 and Theorem 2.8. In particular, we define the main tools for showing the existence of maximizer of the Entropic-Kantorovich problem and prove regularity results of the Entropic-Kantorovich potentials via the (c, ε)-transform.
In the section 3, we apply the above results to prove convergence of the Sinkhorn algorithm purely via the compactness argument and alternating maximizing procedue (Theorem 3.1) and, in section 4, we extend the main results of the paper to the multi-marginal Schrödinger Bridge problem, including convergence of Sinkhorn algorithm in the multi-marginal case (Theorem 4.7).

The role of the reference measures
In this subsection, we simply give a technical remark, discussing the role of the reference measures m 1 and m 2 . We stress that all the results of the paper can be extended while considering a kind of entropic optimal transport problem, where the penalization occurs with respect to some reference measures m 1 , m 2 .
For ε > 0, we in particular may look at the problem where K is the Gibbs Kernel K = e − c ε m 1 ⊗ m 2 . While having a reference measure in some situations can be quite useful (for example the Schrödinger problem is set with m 1 = m 2 = L d ), in other it is the opposite, for example when we are considering ρ 1 , ρ 2 to be sums of diracs. In those cases it is a much better solution to consider m 1 = ρ 1 and m 2 = ρ 2 . Notice that in this case, we have that Now we will see that in fact OT ε is a universal reduction for S ε , meaning that we can always assume m 1 = ρ 1 and m 2 = ρ 2 : moreover, whenever one of the two sides is finite the minimizers of S ε and OT ε are the same.
Proof. The key equality in proving this lemma is that, whenever γ ∈ Γ(ρ 1 , ρ 2 ) one has While this equality is clear whenever all the terms are finite, we refer to Lemma 1.6 below for a complete proof entailing every case. From this equality we can easily get the conclusions.
Proof. First we assume the right hand side of (1.9) is finite, and in particular this implies γ ρ 1 ⊗ ρ 2 , ρ 1 m 1 and ρ 2 m 2 . In particular we get γ m 1 ⊗ m 2 and we can infer We can now compute We assume now that the left hand side of (1.9) is finite. Thanks to the fact that KL(F µ|F ν) ≤ KL(µ|ν) for every measurable function F , we immediately have that KL(ρ 1 |m 1 ) and KL(ρ 2 |m 2 ) are finite, and in particular ρ 1 m 1 and ρ 2 m 2 . Now let us introduce f = dγ dm1⊗m2 , g 1 = dρ1 dm1 and g 2 = dρ2 dm2 ; let us then consider any measurable set A ⊆ X × Y and assume that ( However, by the marginal conditions, we have γ(B x ) = ρ 1 {g 1 (x) = 0} = 0 and similarly γ(B y ) = 0, which imply γ(B) = 0. In particular we have This proves that γ ρ 1 ⊗ ρ 2 and so we can perform the same calculation we did before to conclude.

Regularity of Entropic-Potentials and dual problem
In this section we will treat the case where c : X × Y → R is a Borel bounded cost; of course everything extends also to the case when c ∈ L ∞ (ρ 1 ⊗ ρ 2 ). Some of the results extend naturally for unbounded costs (for example (i), (ii), (v) in Proposition 2.4), but we prefer to keep the setting uniform.

Entropy-Transform and a priori estimates
We start by defining the Entropy-Transform. First, let us define the space L exp ε , which will be the natural space for the dual problem.
For simplicity, we will use the notation L exp ε (ρ 1 ) instead of L exp ε (X, ρ 1 ). Notice that it is possible that u ∈ L exp ε (X, ρ 1 ) attains the value −∞ in a set of positive measure, but not everywhere, because of the positivity constraint X e u/ε dρ 1 > 0. On the other hand, we have that u ∈ L exp ε (X, ρ 1 ) implies u + ∈ L p (ρ 1 ) for every p ≥ 1, where u + (x) := max{u(x), 0} denotes the positive part of u.
Whenever it will be clear we denote v (c,ε) = F (c,ε) (v) and u (c,ε) = F (c,ε) (u), in an analogous way to the classical c-transform.
Moreover, we get a lower bound for the above quantity using c ≥ − c ∞ : Therefore u (c,ε) ∈ L exp ε (ρ 2 ) and in particular λ u (c,ε) ≤ −λ u + c ∞ ; the other inequality follows with a similar calculation and the same holds for v (c,ε) , which proves (ii).
Some of the following properties were already known for the softmax operator: for example in [38] and they are used in order to get a posteriori regularity of the potentials but, up to our knowledge, were never used to get a priori results.
Another very cleverly used properties of the (c, ε)-transform are used in [31] in order to obtain a new proof of the Caffarelli contraction theorem [12].
Proof. Of course we have that (ii) implies (i); let us prove directly (ii).
(ii) Let u ∈ L exp ε (ρ 1 ), y 1 , y 2 ∈ Y . We can assume without loss of generality that u (c,ε) (y 1 ) ≥ u (c,ε) (y 2 ); in that case (iii) This is a direct consequence of Lemma 2.3 (i); (iv) We first prove that F (c,ε) is 1-Lipschitz. In fact, letting u,ũ ∈ L ∞ (ρ 1 ), we can perform a calculation very similar to what has been done in (ii): for every y ∈ Y we have , so it remains to prove part (b) of the criterion of Proposition 5.1.
Let us consider γ = ρ 1 ⊗ ρ 2 . Since c ∈ L ∞ (γ), by Lusin theorem we have that for every σ > 0 there exists N σ ⊂ X × Y , with γ(N σ ) < σ, such that c| (Nσ) c is uniformly continuous, with modulus of continuity ω σ . We now try to mimic what we did in (ii), this time keeping also track of the remainder terms we will have.
For each y ∈ Y , we consider the slice of N σ above y, N σ y = {x ∈ X : (x, y) ∈ N σ }; then we consider the set of bad y ∈ Y , where the slice N σ y is too big: In particular, by definition if y ∈ N σ b we have ρ 1 (N σ y ) ≤ √ σ, but thanks to Fubini and the condition γ(N σ ) < σ we have also that ρ Let us now consider y, y ∈ N σ b , and let us denote X * = X \ (N σ y ∪ N σ y ). Then we have that for every x ∈ X * , |c(x, y) − c(x, y )| ≤ ω σ (d(y, y )). We can assume without loss of generality that u (c,ε) (y) ≥ u (c,ε) (y ) and we have Now we denote by A = 2e 2( c + u )/ε and thanks to the fact that if a, b ≥ 0 then e a + b ≤ e a+b , we have Then (having in mind also (iii) and that A depends only on u ∞ ), we have that also (b) of Proposition 5.1 is satisfied for F (c,ε) (B), for every bounded set B ⊂ L ∞ (ρ 1 ), granting then the compactness of F (c,ε) .
(v) In this case we are assuming that Y is a geodesic space and that there exists K ∈ R such that for each constant speed geodetic (y t ) t∈[0,1] we have Then, setting f t (x) = e (u(x)−c(x,yt))/ε , the K-concavity inequality for c implies Using this along with Hölder inequality we get Remark 2.5 (Entropic c-transform for S ε (ρ 1 , ρ 2 ; m 1 , m 2 )). It is possible to define the entropic c-transform also for the Schrödinger problem S ε (ρ 1 , ρ 2 ; m 1 , m 2 ) with reference measures m 1 ∈ P(X) and m 2 ∈ P(Y ). In this case, so that in fact the (c, ε)−transforms with reference measures are in fact the (c, ε)-trasforms conjugated by the addition of a function. In particular we can get exactely the same estimates we did in Lemma 2.3, up to translate in the appropriate manner. For example we would have if u ∈ L exp ε (m 1 ), we would have then u (c,ε) * (y) − ε log(ρ 2 (y)) ∈ L ∞ (m 2 ).

Dual problem
Let u ∈ L exp ε (ρ 1 ), v ∈ L exp ε (ρ 2 ) and consider the Entropy-Kantorovich functional, What are the minimal assumption on u, v in order to make sense for D ε (u, v)? First of all if u + ∈ L 1 (ρ 1 ) and v + ∈ L 1 (ρ 2 ) then D ε (u, v) < ∞ and in particular in order to have D ε (u, v) > −∞ we need u ∈ L exp ε (ρ 1 ), v ∈ L exp ε (ρ 2 ) which is then a natural assumption (since we want to compute the supremum of D ε ).
be Polish spaces, c : X × Y → R be a Borel bounded cost, ρ 1 ∈ P(X), ρ 2 ∈ P(Y ) be probability measures and ε > 0 be a positive number. Consider the problem Then the supremum in (2.8) is attained for a unique couple (u 0 , v 0 ) (up to the trivial tranformation (u, v) → (u + a, v − a)). In particular we have u 0 ∈ L ∞ (ρ 1 ) and v 0 ∈ L ∞ (ρ 2 ); moreover we can choose the maximizers such that u 0 ∞ , v 0 ∞ ≤ 3 2 c ∞ . Proof. Now, we are going to show that the supremum is attainded in the right-hand side of (2.8). Let (u n ) n∈N ⊂ L exp ε (ρ 1 ) and (v n ) n∈N ⊂ L exp ε (ρ 2 ) be maximizing sequences. Due to Lemma 2.7, we can suppose that u n ∈ L ∞ (ρ 1 ), v n ∈ L ∞ (ρ 2 ) and u n ∞ , v n ∞ ≤ 3 2 c ∞ . Then by Banach-Alaoglu theorem there exists subsequences (u n k ) n k ∈N and (v n k ) n k ∈N such that u n k u and v n k v. In particular,ũ n k +ṽ n k −c u+v−c. First, notice that since t → e t is a convex function, we have Moreover, So, (u, v) is a maximizer for D ε . By construction, we have also that u ∈ L ∞ (ρ 1 ) and v ∈ L ∞ (ρ 2 ). Finally, the strictly concavity of D ε and Lemma 2.6 implies that the maximizer is unique and, in particular v = u (c,ε) . Corollary 2.9. Let (X, d X , m 1 ), (Y, d Y , m 2 ) be Polish metric measure spaces, c : X × Y → R be a Borel bounded cost function, ρ 1 ∈ P(X) and ρ 2 ∈ P(Y ) be probability measures such that KL(ρ 1 |m 1 )+KL(ρ 1 |m 2 ) < ∞. Consider the dual functionalD ε : Then the supremum sup {D ε (u, v) : u ∈ L exp ε (m 1 ), v ∈ L exp ε (m 2 )} . is attained for a unique couple (u 0 , v 0 ) and in particular we have Proof. The proof follows by the change of variable T : (u, v) → (u − ε log ρ 1 , v − ε log ρ 2 ) which is such thatD ε (u, v) = D ε (T (u, v)) + ε KL(ρ 1 |m 1 ) + ε KL(ρ 1 |m 2 ), and Theorem 2.8. Another way is to apply same arguments of theorem (2.8) by using the Entropic c-transform u (c,ε) m1 described in Remark 2.5.
In the following proposition an important concept will be that of bivariate transformation. Given K a Gibbs measure, a(x) and b(y) two measurable function with respect to κ, such that a, b ≥ 0, we define the bivariate transformation of K through a and b as κ(a, b) := a(x)b(y) · K (2.9) this is still a (possibily infinite) measure.
Proof. First of all we can assume γ K, otherwise the right hand side would be +∞ and so the inequality would be verified; then if we denote (with a slight abuse of notation) γ(x, y) the density of γ with respect to K, we get where we used ts + εt ln t − ε ≥ −εe −s/ε , with equality if t = e −s/ε . Notice in particular that, as we wanted, there is equality iff γ = e (u(x)+v(y)−c(x,y))/ε · ρ 1 ⊗ ρ 2 = κ(e u/ε , e v/ε ).
Notice that in proving 3 ⇒ 4 we incidentally proved that γ * is the (unique) minimizer.
Finally, we conclude this section by giving a short proof of the duality between (1.1) and (2.8).
Proposition 2.12 (General duality). Let ε > 0 be a positive number, (X, d X ) and (Y, d Y ) be Polish metric spaces, c : X × Y → R be a bounded cost function, ρ 1 ∈ P(X), ρ 2 ∈ P(Y ) be probability measures. Then duality holds Proof. From Theorem 2.8 we have the existence of a maximizing pair of potentials u * , v * . In particular we have this, together with point 4 in Proposition 2.11 (which is true since 1 holds true), proves the duality.
By a similar argument, one can show that the duality holds also for the functional S ε (ρ 1 , ρ 2 ; m 1 , m 2 ). Corollary 2.13. Let ε > 0 be a positive number, (X, d X , m 1 ) and (Y, d Y , m 2 ) be Polish metric measure spaces, c : X × Y → R be a bounded cost function, ρ 1 ∈ P(X), ρ 2 ∈ P(Y ) be probability measures. Then duality holds

Convergence of the Sinkhorn / IPFP Algorithm
In this section, we give an alternative proof for the convergence of the Sinkhorn algorithm. The aim of the Iterative Proportional Fitting Procedure (IPFP, also known as Sinkorn algorithm) is to construct the measure γ ε realizing minimum in (1.1) by alternatively matching one marginal distribution to the target marginals ρ 1 and ρ 2 : this leads to the construction of the IPFP sequences (a n ) n∈N and (b n ) n∈N , defined in (1.4).
We now look at the new variables u n := ε ln(a n ) and v n ; = ε ln(b n ): we can then rewrite the system (1.4) as v n (y)/ε = − log X k(x, y)e In other words, using the (c, ε)−transform and the expression of k given in (1.2), v n (y) = (u (n−1) ) (c,ε) and u n (y) = (v n ) (c,ε) .
be Polish metric spaces, ρ 1 ∈ P(X) and ρ 2 ∈ P(Y ) be probability measures and c : X ×Y → R be a Borel bounded cost. If (a n ) n∈N and (b n ) n∈N are the IPFP sequences defined in (1.4), then there exists a sequence of positive real numbers (λ n ) n∈N such that a n /λ n → a in L p (ρ 1 ) and where (a, b) solve the Schrödinger problem. In particular, the sequence γ n = a n b n k, where k is defined in (1.2) converges in L p (ρ 1 ⊗ ρ 2 ) to γ ε opt , the density of the minimizer of (1.1) with respect to ρ 1 ⊗ ρ 2 , for any 1 ≤ p < +∞.
Proof. Let (a n ) n∈N and (b n ) n∈N be the IPFP sequence defined in (1.4). Let us write a n = e un/ε , b n = e vn/ε ; then, in this new variables, we noticed that the iteration can be written with the help of the (c, ε)-transform: Notice that, as soon as n ≥ 2, we have u n ∈ L ∞ (ρ 1 ) and v n ∈ L ∞ (ρ 2 ) thanks to the regularizing properties of the (c, ε)-transform proven in Lemma 2.3 and, moreover, thanks to (2.6) and Proposition 2.12 we have Then, by the same argument used in the proof of Lemma 2.7 it is easy to prove that there for each n ≥ 2 there exists n ∈ R such that u n − n ∞ , v n + n ≤ 3 2 c ∞ . Now, thanks to Proposition 2.4 we have that the sequeces u n − n and v n + n are precompact in every L p , for 1 ≤ p < ∞; in particular let us consider any limit point u, v. Then we have a subsequence u n k , v n k such that u n k → u,v n k → v in L ∞ and u n k +1 = (v n k ) (c,ε) (or the opposite). Using the continuity in L p of the (c, ε)-transform, and the fact that an increasing and bounded sequence has vanishing increments, we obtain In particular, by (2.7), we have u = v (c,ε) . Analogously, we obtain that v = u (c,ε) by doing the same calculation using the potentials (u n k+2 , v n k+2 ) and then Now we can use Proposition 2.11: the implication 2 ⇒ 1 proves that (u, v) is a maximizer 2 . In particular a = e u/ε , b = e v/ε are solutions of the Schrödinger equation and taking λ n = e n /ε we get the convergence result for a n and b n , using that the exponential is Lipschitz in bounded domains.
In order to prove also the convergence of the plans, it is sufficient to note that for free we have u n + v n → u+v in L p (ρ 1 ⊗ρ 2 ), since now the translations are cancelled. Again, the fact that the exponential is Lipschitz on bounded domains and the boundedness of k, will let us conclude that in fact γ n → γ in L p (ρ 1 ⊗ ρ 2 ) for every 1 ≤ p < ∞.
Remark 3.2. Notice that as long as we have more hypothesis on the smoothness of the cost function c we can use precompactness of the sequences u n − n and v n + n on larger space, obtaining faster convergence. For example if c is uniformly continuous we will get the uniform convergence instead of strong L p convergence.

Multi-marginal Schrödinger Problem
In this section we generalize the results obtain previously for the Schrödinger problem with more than two marginals, including a proof of convergence of the Sinkhorn algorithm in the several marginals case.
We point out that the entropic-regularization of the multi-marginal transport problem leads to a problem of multi-dimensional matrix scaling [35,59]. An important example in this setting is the Entropy-Regularized Wasserstein Barycenter introduced by M. Agueh and G. Carlier in [1]. The Wasserstein Barycenter defines a non-linear interpolation between several probabilities measures generalizing the Euclidian barycenter and turns out to be equivalent to Gangbo-Świeçh cost [37], that is c(x 1 , . . . , In the next section we extend to the multi-marginal setting the notions and properties of the Entropy c-transform done in section 2. As a consequence, we generalise the proof of convergence of IPFP.

Entropy-Transform
Analogously to definitions (2.1) and (2.2) in section 2.1, we define the following Entropy c-transformŝ u In particular, we haveû There is also the possibility to reconduce us to the case N = 2: notice that if one considers the spaces X i and Y i = Π j =i X j , then c is also a natural cost function on X i × Y i . We can then consider ρ i as a measure on X i and ⊗ j =i ρ j as a measure on Y i . In this way we able to construct an entropic c-trasform F (c,ε) associated to this 2-marginal problem and it is clear that The following lemma extend lemma 2.3 in the multi-marginal setting. We omit the proof since it follow by similar arguments.

Entropy-Kantorovich Duality
We introduce the dual functional dual function D N ε : L exp In the sequel we will use the invariance by translation of the dual problem, and thus we introduce the following projection operator: Lemma 4.3. Let us consider the operator P : Then the following properties hold Proof. (i) In order to prove D N ε (P (u)) = D N ε (u) we first observe that In particular we have (here we denote The inequality is not trivial only if u i ∈ L ∞ (ρ i ). In this case obviously we have inf u i ≤ λ ui ≤ sup u i and in particular that is u i − λ ui ∞ ≤ osc(u i ). This proves already the bound for i < N ; for i = N we have, letting This projection operator allows us to generalize Lemma 2.7: Lemma 4.4. Let us consider u i ∈ L exp ε (ρ i ), for every i = 1, . . . , N . Then there exist u * i ∈ L exp ε (ρ i ) for i = 1, . . . , N such that • u * i ∞ ≤ 3 c ∞ for every i = 1, . . . , N .
Proof. Let us construct the following sequence of potentials: Then let us consider u * = P (u N ). First of all we notice that, using the multimarginal analogous of Lemma 2.6 we have Then is clear by construction that for every i = 1, . . . , N we have u N i = u i i and in particular, by Lemma 4.2 (iv) we have osc(u N i ) ≤ 2 c ∞ . Moreover, thanks to (4.4) it is easy to see that Similarly to Theorem 2.8, Proposition 2.11 and 2.12, the next theorem and the following Proposition state the existence of a maximizer and the Entropic-Kantorovich duality to the multi-marginal case, along with the complementarity conditions. Since the proofs follows the same lines of the case N = 2, without big changes, we will omit them.
Theorem 4.5. For every i ∈ {1, . . . , N }, let (X i , d i ) be Polish metric spaces, ρ i ∈ P(X i ) be a probability measures and c : X 1 × · · · × X N → R be a bounded cost function. Then for every ε > 0, (i) The dual function D N ε is well defined on its definition domain and moreover (ii) The supremum is attained, up to trivial transformations, for a unique N -tuple (u 0 1 , . . . , u 0 N ) and in particular we have u 0 i ∈ L ∞ (ρ i ), ∀i ∈ {1, . . . , N } . Moreover if we consider γ 0,N = e ( i u 0 i (xi)−c)/ε ρ N then γ 0,N is the minimizer of (4.2) (iii) Duality holds: Finally, the result extends the main results of the previous section to the multi-marginal case.
Proposition 4.6 (Equivalence and complementarity condition). Let ε > 0 and for every i ∈ {1, . . . , N }, let (X i , d i ) be Polish metric spaces, ρ i ∈ P(X i ) be a probability measures and c : X 1 × · · · × X N → R be a bounded cost function. Then given u * i ∈ L exp ε (ρ i ) for every i = 1, . . . , N , the following are equivalent: 1. Moreover in those cases γ * , as defined in 3, is also the (unique) minimizer for the problem (4.2) The part 3. in proposition 4. 6 [18]. Our approach, being purely variational, allows us to study the convergence of the Sinkhorn algorithm in the several marginal case in a similar way of done in the previous section.

Convergence of the IPFP / Sinkhorn algorithm for several marginals
The goal of this subsection is to prove the convergence of the IPFP/Sinkhorn algorithm in the multimarginal setting. Analogously to (1.4), define recursively the sequences (a n j ) n∈N , j ∈ {1, . . . , N } by a 0 1 ( , ∀n ∈ N. (4.9) Also here, by writing a n j = exp(u n j /ε), for all j ∈ {1, . . . , N }, one can rewrite the IPFP sequences (4.9) in terms of Entropic (c, ε)-transforms, . Then, the proof of convergence of the IPFP in the multi-marginal case, follows a method similar to the one used in Theorem 3.1.
By the boundedness of v n i ∞ , by the compactness in Lemma 4.2 (iv), there exists a subsequence k n such that v kn i converges in L p to some v i for every i = 1, . . . , N ; by pigeon-hole principle we have that at least a class of residue modulo N is taken infinitely by the sequence k n and we will suppose that without loss of generality this residue class is 0. Up to restricting to the infinite subsequence such that k n ≡ 0 (mod N ), Remark on the multi-marginal problem S ε (ρ 1 , . . . , ρ N ; m 1 , . . . , m N ): More generally, we could also consider the multi-marginal Schrödinger problem with references measures m i ∈ P(X i ), i = 1, . . . , N . For simplicity, we denote ρ = (ρ 1 , . . . , ρ N ) and m = (m 1 , . . . , m N ). Then the functional S ε (ρ; m) is defined by S ε (ρ; m) = min γ∈Π N (ρ1,...,ρ N ) ε KL N (γ|m 1 ⊗ · · · ⊗ m N ).
Analogously to the 2 marginal case, the duality results, existence and regularity of entropic-potentials as well as the convergence of the Sinkhorn algorithm can be extended to that case. We omit the details here since the proof follows by similar arguments.
Proof. Let us fix ε > 0 and let us consider a sequence σ n → 0 such that ∞ n=1 µ(N σn ) ≤ ε; then define ω n = ω σn and N ε := n N σn ; in particular we have µ(N ε ) ≤ ε and |f (x) − f (x )| ≤ ω n (d(x, x )) + β σn ∀x, x ∈ N ε , ∀n ∈ N. (5.1) Let us define ω ε (t) = inf n {ω n (t) + β σn }: by (5.1) we have that f is ω ε -continuous outside N ε . We can verify that ω ε is a non degenerate modulus of continuity: it is obvious that is it nondecreasing since it is an infimum of noncreasing functions. Then for everyε > 0 we can choose n big enough such that β σn <ε/2 and then choose t small enough such that ω n (t) <ε/2; in this way we have ω ε (t) ≤ ω n (t)+β σn <ε. In particular ω ε (t) → 0 as t → 0. Now we conclude by a diagonal argument: let us consider a sequence (f 0 n ) n∈N ⊆ F and a sequence ε k → 0. We want to find a subsequence that is converging strongly in L p . We iteratively extract a subsequence (f k n ) of (f k−1 n ) that is converging uniformly outside N ε k (thanks to Ascoli-Arzelà) to some function f k , which is defined only outside N ε k . Then let us consider First of all f is well defined since if x ∈ N ε k and x ∈ N εj with j > k then we have that f k n (x) → f k (x) but since f j n is a subsequence of f k n we have also f j n (x) → f k (x); however by definition f j n (x) → f j (x) and so f j (x) = f k (x). Moreover it is clear that f ∞ ≤ M since this is true for every f 0 n thanks to property (a). Now we consider the sequence g n = f n n which is a subsequence of f 0 n . Let us fix ε > 0 and choose k such that ε k < ε p ; then let n 0 > k such that |f k n − f | ≤ ε on X \ N ε k for every n ≥ n 0 . Now we have g n = f k n for some n ≥ n 0 and in particular ≤ ε k (2M ) p + ε p µ(X) ≤ ε p (2 p M p + 1) In particular we get g n → f in L p and so we're done.