1 Introduction

We consider the following multi-marginal entropy-transport problem

$$\begin{aligned} I_{\varepsilon }[\rho ] = \inf _{\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )}( C_0[\gamma ] + \varepsilon E[\gamma ]), \end{aligned}$$
(1.1)

where \(C_0[\gamma ] = \int _{X^N} c\,{\mathrm d}\gamma \) is the transportation cost related to a cost function c, \(E[\gamma ]\) is the entropy, and \(\varepsilon \ge 0\) is a parameter, see Sect. 2 for details. We consider the setting where \((X,d,{\mathfrak {m}})\) is a Polish measure space, and \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) is an absolutely continuous probability measure with respect to the reference measure \({\mathfrak {m}}\). An element \(\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )\) is called a symmetric coupling (or transport plan), that is, a symmetric probability measure in \(X^N\) having all marginals equal to \(\rho {\mathfrak {m}}\).

We are interested in a class of repulsive cost functions \(c:X^N\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) of the form

$$\begin{aligned} c(x_1,\ldots , x_N)=\sum _{1\le i<j\le N}f(d(x_i,x_j)),\quad \text { for all } \, (x_1,\ldots ,x_N)\in X^N. \end{aligned}$$

We assume \(f:]0,\infty [\rightarrow {\mathbb {R}}\) to be a continuous and decreasing function that approaches \(+\infty \) if \(d(x_i,x_j)\rightarrow 0\). Among the examples of such cost functions we have the Coulomb cost \(f(z) = 1/\vert z\vert \), the Riesz cost \(f(z) = 1/\vert z\vert ^s, n \ge s\ge \max \lbrace n-2, 0\rbrace \) (in \({\mathbb {R}}^n\)) and the logarithmic cost \(f(z) = -\log (\vert z\vert )\). We observe that when \(\varepsilon =0\), this entropy-transport problem reduces to the classical multi-marginal optimal transport problem with repulsive costs [4, 6, 7, 12].

The motivation of this paper comes from both theory and numerics. For repulsive cost functions, the entropy term in (1.1) plays a role of a regularizer to compute numerically a solution \(\gamma \) of the multi-marginal optimal transport problem \(I_{0}[\rho ]\), see [2]. Numerical experiments suggest that when the regularization parameter \(\varepsilon \) goes to 0, the minimizer \(\gamma _{\varepsilon }\) converges to a minimizer of \(I_{0}[\rho ]\) having minimal entropy among the minimizer of \(I_{0}[\rho ]\).

From a theoretical viewpoint, this type of a functional has direct relevance in Density Functional Theory. By choosing carefully the parameter \(\varepsilon \), the functional (1.1) provides a lower-bound for the Hohenberg–Kohn functional in Density Functional Theory [15, 24, 27]. This is an immediate consequence of the Log-Sobolev Inequality.

The entropy-transport problem has appeared previously in the literature in the attractive case, in particular when \(c(x_1,x_2) = d(x_1,x_2)^2\). We mention briefly below some of the connections of the entropy-transport with other fields and point out the relevance in the Coulomb case.

Brief comments on some applications of the entropy-transport

Optimal transport and Sinkhorn algorithm: The entropy-transport (1.1) was introduced by Cuturi [9] in order to compute numerically the optimal transport plan for the distance squared cost in the 2-marginals case via the Sinkorn algorithm. Due to its reasonable computational cost, it has been applied to a wide range of problems in various research areas, including Information Theory, Computer Graphics, Statistical Inference, Machine Learning, and Mean-Field Games. The entropic regularization method was also considered in the (attractive) multi-marginal case in the so-called barycenter problem introduced by Agueh and Carlier [1] (see also [5, 11]) and in numerical methods in the time discretization of Brenier’s relaxed formulation of the incompressible Euler equation [3]. For a thorough presentation of the computational aspects we refer to Cuturi and Peyré’s book [25].

Second-order calculus on RCD spaces: Gigli and Tamanini [8, 17] studied the entropic-transport problem on a class of metric spaces with (Riemannian) Ricci curvature bounded from below (2-marginals case, \(c(x_1,x_2) = d(x_1,x_2)^2\)). The entropic regularization procedure was crucial for establishing a second-order differential structure in that setting.

Schrödinger problem: In 1926, E. Schrödinger introduced the (linear) Schrödinger equations describing the non-relativistic evolution of a single particle in an electric field with potential energy and also established an equivalence between such equations and a system of diffusion equations [26]. Roughly speaking, the variational problem (see (1.1) with \(X = C([0,1],{\mathbb {R}}^d)\) and \(N=2\)) arises in the Schrödinger manuscript while studying the limit \(k\rightarrow \infty \)\((N=2)\) of the empirical measures associated to the evolution of k i.i.d. Brownian motions. We refer the reader to Léonard survey [21] for technical details and historical notes.

Lower bound on the Hohenberg–Kohn functional in density functional theory: This is the particular case where the entropy-transport problem with Coulomb cost comes into play. It has been shown in [24, 27] that the functional (1.1) provides a lower bound for computing the ground state energy of the Hohenberg–Kohn functional [4, 6, 7, 12, 22]. Below we give a brief description of the result. Notice that in this context \(X = {\mathbb {R}}^d\) and \({\mathfrak {m}}\) is the Lebesgue measure on \({\mathbb {R}}^d\).

Assume that \(\gamma \in \Pi _N(\rho )\) such that \(\sqrt{\gamma } \in H^1({\mathbb {R}}^{dN})\). This is the case, for example, when \(\gamma (x_1,\ldots ,x_N) = \vert \psi (x_1,\ldots ,x_N)\vert ^2\), where \(\psi \in H^1({\mathbb {R}}^{dN})\) is a ground-state wave function solving the N-electron Schrödinger Equation (see [6, 7, 12, 15, 27] for details). Then, we can define the Hohenberg–Kohn functional by

$$\begin{aligned}&{\tilde{F}}_{\hbar }^{HK}[\rho ] \\&\quad = \inf _{\gamma \in \Pi _N(\rho ), \sqrt{\gamma } \in H^1({\mathbb {R}}^{dN})}\bigg \lbrace \dfrac{\hbar ^2}{2}\int _{{\mathbb {R}}^{dN}}\vert \nabla \sqrt{\gamma }\vert ^2 dx_1\ldots dx_N + \int _{{\mathbb {R}}^{dN}}\sum _{1\le i<j\le N}\dfrac{1}{\vert x_i-x_j\vert }d\gamma \bigg \rbrace . \end{aligned}$$

Now, as a consequence of the logarithmic Sobolev inequality for the Lebesgue measure [18], the following result holds: if \(\rho {\mathcal {L}}^d\in {\mathcal {P}}({\mathbb {R}}^d)\) and \(\sqrt{\gamma }\in H^1({\mathbb {R}}^{dN})\) then

$$\begin{aligned} C_{\varepsilon }[\rho ] \le {\tilde{F}}_{\hbar }^{HK}[\rho ], \quad \text { with } \varepsilon = \pi \hbar ^2/2. \end{aligned}$$
Fig. 1
figure 1

The dependence of the minimizer of the entropic-transport problem (1.1) on the entropic parameter \(\varepsilon \) for the one-dimensional Coulomb cost, \(N=2\) and \(\rho \sim N(0,5)\). The pictures show part of the support of the optimal coupling \(\gamma _{\varepsilon }\) around the origin. From the left to right: \(\varepsilon = 10^4,10^{-2},10^{-3},10^{-4},10^{-5}\)

1.1 Examples of optimal entropy couplings

Let us present some computational examples of minimizers of \(I_\varepsilon [\rho ]\) illustrating the role of the parameter \(\varepsilon \). Before this, we recall a result on the characterization of minimizers in the one-dimensional case [10]. In particular, according to it the minimizer of \(I_0[\rho ]\) is concentrated on finitely many graphs and thus singular with respect to the product reference measure.

Theorem 1.1

[10] Let \(\mu \in {\mathcal {P}}({\mathbb {R}})\) be an absolutely continuous probability measure and \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) strictly convex, bounded from below and non-increasing function. Then there exists a unique optimal symmetric plan \(\gamma \in {\Pi _N^{\mathrm {sym}} (\mu )}\) that solves

$$\begin{aligned} \min _{ \gamma \in \Pi _N^{\mathrm {sym}}(\mu ) } \int _{{\mathbb {R}}^{N} } \sum _{1 \le i < j \le N} f(|x_j-x_i|) \, {\mathrm d}\gamma . \end{aligned}$$

Moreover, this plan is induced by an optimal cyclical map T, that is, \(\gamma _{\mathrm {sym}}=\left( \gamma _T\right) ^S\), where \(\gamma _T=(\mathrm {id},T,T^{(2)} , \ldots , T^{(N-1)})_{\sharp } \mu \). An explicit optimal cyclical map is

$$\begin{aligned} T(x) ={\left\{ \begin{array}{ll} F_{\mu }^{-1} (F_{\mu }(x) + 1/N) \qquad &{} \text { if }F_{\mu }(x) \le (N-1)/N \\ F_{\mu }^{-1} ( F_{\mu }(x) +1 - 1/N ) &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$

Here \(F_{\mu }(x)=\mu (( -\infty , x])\) is the distribution function of \(\mu \), and \(F_{\mu }^{-1}\) is its lower semicontinuous left inverse.

1.1.1 One-dimensional entropic-transport with Coulomb cost and a Gaussian measure

Let \(\rho \) be the normal distribution on the real line with zero mean and standard deviation \(\sigma = 5\). We compute numerically the solution of the entropic-transport problem with Coulomb cost in the real line using the Sinkorn algorithm [9]. Notice that by Theorem 1.1, we know that the minimizer of \(I_0[\rho ]\) is concentrated on a graph. See Fig. 1 for an illustration of the computational results. Our code is based on the Python implementation available at POT library [14].

1.2 Organization of the paper

In Sect. 2, we introduce the setting and study sufficient conditions for the existence of minimizers for the entropy-transport problem (1.1). Section 3 is devoted to the \(\Gamma -\)convergence proof of the entropic-transport functional \(C_{\varepsilon }[\gamma ]\) to the multi-marginal optimal transport with repulsive costs \(C_0[\gamma ]\). In Sect. 4, we study the Kantorovich duality for the entropic-transport problem.

1.3 The strategy of the main proof and some technical remarks

The main result of this paper is Theorem 3.1, in which we prove the \(\Gamma \)-convergence of the entropic-regularized functional \(C_{\varepsilon }[\gamma ]\) to \(C_0[\gamma ]\). The technical difficulty on dealing with the \(\Gamma \)-convergence comes from the fact that while for the entropic part \(E[\gamma ]\) the minimizer \(\gamma \) tends to be as spread as possible with respect to \({\mathfrak {m}}\), for the cost \(C_{0}[\gamma ]\) a minimizer can be very singular and have infinite entropy.

We divide the proof in two parts. The part (I), the \(\liminf -\)inequality, follows basically from the lower-semicontinuity of the costs \(C_0[\gamma ]\) and \(C_{\varepsilon }[\gamma ]\) - which are obtained from the assumption \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) on the marginal measure \(\rho {\mathfrak {m}}\), giving a lower bound on the entropy. The part (II), the \(\limsup -\)inequality, is more involved. In Sect. 3.2, we construct a block approximation \(\gamma '_n\) for a coupling \(\gamma \) with \(C_{0}[\gamma ] <+\infty \). Such construction is done in several steps, since we need to construct a competitor \(\gamma '_n\) such that \(E[\gamma '_n] <\infty \) and \(\gamma '_n \in \Pi ^{\mathrm {sym}}_N(\rho )\). The main idea and the rigorous construction are given in Sect. 3.2.

Futhermore, we point out that our construction can deal with the case when the space X is a domain in \({\mathbb {R}}^d\), answering a question raised in [3]. There the \(\Gamma \)-convergence was proven using convolutions; an approach that does not seem to be easy to implement for domains, or in general metric spaces.

Related works: A proof of the \(\Gamma \)-convergence of (1.1) to the Monge-Kantorovich problem for \(c(x,y) = d(x,y)^p\) first appeared in [20, 23] via probabilistic methods. In [5], G. Carlier, V. Duval, G. Peyré and B. Schmitzer provided an alternative and more analytical proof carrying out a similar block approximation procedure for the two-marginal squared distance cost in the Euclidean space and the Wasserstein Barycenter.

2 The entropy-regularized repulsive costs

Let (Xd) be a Polish space and \({\mathfrak {m}}\) be a reference measure on X. We denote by \({\mathcal {P}}(X)\) the set of Borel probability measures on X, and \({\mathcal {P}}^{ac}(X)\) the set of Borel probability measures on X that are absolutely continuous with respect to \({\mathfrak {m}}\). We denote by \({\mathfrak {m}}_{N}\) the product measure \({\mathfrak {m}}\otimes {\mathfrak {m}}\otimes \cdots \otimes {\mathfrak {m}}\). This is the reference measure we use on the product space \(X^N\). On \(X^N\) we use the sup-metric, which we denote by \(d_N\).

The class of cost functions \(c:X^N\rightarrow {\mathbb {R}}\cup \{+\infty \}\) of our interest is given by functions of the form

$$\begin{aligned} c(x_1,\ldots , x_N)=\sum _{1\le i<j\le N}f(d(x_i,x_j)),\quad \text { for all }(x_1,\ldots ,x_N)\in X^N, \end{aligned}$$

where \(f:[0,\infty [\rightarrow {\mathbb {R}}\cup \{+\infty \}\) satisfies the following conditions

$$\begin{aligned}&f|_{]0,\infty [}\text { is continuous, decreasing and } \end{aligned}$$
(F1)
$$\begin{aligned}&\lim _{t\rightarrow 0+}f(t)=+\infty . \end{aligned}$$
(F2)

Above and from now on, we denote by \((x_1,\ldots , x_N)\) points in \(X^N\), so \(x_i\in X\) for each i.

We denote by

$$\begin{aligned} \Pi _N(\rho )=\left\{ \gamma \in {\mathcal {P}}(X^N)~|~\mathtt {pr}^i_\sharp \gamma =\rho \text { for all }i\in \{1,\ldots ,N\}\right\} \end{aligned}$$

the set of couplings or transport plans, where \(\mathtt {pr}^i\) is the projection

$$\begin{aligned} \mathtt {pr}^i(x_1,\ldots ,x_i,\ldots ,x_N)=x_i~~~\text {for all }(x_1,\ldots , x_i,\ldots ,x_N)\in X^N. \end{aligned}$$

A measure \(\gamma \in {\mathcal {P}}(X^N)\) is symmetric if

$$\begin{aligned} \int _{X^N}\phi (x_1,\ldots ,x_N)\,{\mathrm d}\gamma = \int _{X^N} \phi (\overline{\sigma }(x_1,\ldots ,x_N))\,{\mathrm d}\gamma , \text { for all } \phi \in {\mathcal {C}}(X^N) \end{aligned}$$

for all permutations \(\overline{\sigma }\) of the N symbols \((x_1,\ldots , x_N)\). We denote by \({\mathcal {P}}^{\mathrm {sym}}(X^N)\) the set of symmetric probability measures in \(X^N\), and by

$$\begin{aligned} \Pi ^{\mathrm {sym}}_N(\rho ) := \Pi _N(\rho )\cap {\mathcal {P}}^{\mathrm {sym}}(X^N) \end{aligned}$$

the set of symmetric couplings of \(\rho \).

Let us also introduce the notation for symmetrising measures. If \(\gamma \) is a Borel measure on \(X^N\), we denote by \(\gamma ^S\) the symmetrized measure

$$\begin{aligned} \gamma ^S:=\frac{1}{N!}\sum _{\sigma \in {\mathcal {S}}_N}\sigma _\sharp \gamma , \end{aligned}$$

where \({\mathcal {S}}_N\) is the set of permutations of the N coordinates \((x_1,\ldots , x_N)\).

We define the functional \(C_0[\gamma ]\) to be the cost related to the coupling \(\gamma \)

$$\begin{aligned} C_0[\gamma ]=\int _{X^N}c(x_1,\ldots ,x_N)\,{\mathrm d}\gamma (x_1,\ldots , x_N). \end{aligned}$$

Because of the symmetry of the cost c, we immediately have

Proposition 2.1

For every \(\rho \in \mathcal (X)\), we have that

$$\begin{aligned} \inf _{ \gamma \in \Pi _N(\rho ) }C_0[\gamma ] = \inf _{ \gamma \in \Pi _N^{\mathrm {sym}}(\rho ) }C_0[\gamma ]. \end{aligned}$$
(2.1)

Moreover, if the infimum is attained on one side of the above equality, then it is attained on both sides.

Given \(\varepsilon \ge 0\), we denote by \(C_\varepsilon [\gamma ]\) the entropy-regularized cost

$$\begin{aligned} C_\varepsilon [\gamma ]=C_0[\gamma ]+\varepsilon E[\gamma ],\quad \text { for all }\gamma \in \Pi ^{\mathrm {sym}}_N(\rho ), \end{aligned}$$
(2.2)

where the entropy \(E[\gamma ]:{\mathcal {P}}(X^N)\rightarrow {\mathbb {R}}\cup \lbrace -\infty ,+\infty \rbrace \) is defined as

$$\begin{aligned} E[\gamma ]={\left\{ \begin{array}{ll}\int _{X^N}\rho _\gamma \log \rho _\gamma \,{\mathrm d}{\mathfrak {m}}_{N}&{}\text { if }\gamma \ll {\mathfrak {m}}_{N}\\ +\infty &{}\text { otherwise}\end{array}\right. }. \end{aligned}$$
(2.3)

The notation \(\rho _\gamma \) stands for the Radon-Nikodym derivative of \(\gamma \) with respect to the reference measure \({\mathfrak {m}}_{N}\) and \(\gamma \ll {\mathfrak {m}}_{N}\) means that \(\gamma \) is absolutely continuous with respect to the reference measure \({\mathfrak {m}}_{N}\). Let \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\). In this paper we are interested in the following infimum

$$\begin{aligned} I_\varepsilon [\rho {\mathfrak {m}}]:=\inf _{\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )} C_\varepsilon [\gamma ]. \end{aligned}$$
(2.4)

In order to guarantee the lower semicontinuity for \(C_\varepsilon [\cdot ]\), we will assume \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\). This will take care of the entropy part \(E[\cdot ]\) of the cost. In order to establish the lower semicontinuity for the functional \(C_0[\cdot ]\), we assume that the measure \(\rho \) satisfies the following two conditions:

$$\begin{aligned}&\lim _{r\rightarrow 0}\sup _{x\in X}\rho (B(x,r))<\frac{1}{N(N-1)^2}~~~\text {and} \end{aligned}$$
(A)
$$\begin{aligned}&\int _{X\setminus B(o,r_0)}f\left( 2d(x,o)\right) \,{\mathrm d}\rho (x)> - \infty ~~~\text {for some }o\in X. \end{aligned}$$
(B)

Above we have, by an abuse of notation, denoted the measure \(\rho {\mathfrak {m}}\) by only the density \(\rho \); we will use the same abbreviation in the rest of the paper if there is no risk of confusion. The Condition (B) is a similar assumption to requiring, in the case of the quadratic cost, that the marginal measures have finite second moments. The Condition (A) guarantees that the cost is finite.

If we endow the spaces \({\mathcal {P}}(X^N)\) and \({\mathcal {P}}(X)\) with \(w^*\)-topology then, by Prokhorov’s theorem, any subset of \({\mathcal {P}}(X)\) (or \({\mathcal {P}}(X^N)\)) is tight if and only if it is relatively compact.

Remark 2.2

(Entropy-transport seen as a Kullback–Leibler divergence) If \(\mu \) and \(\nu \) are measures on a set X, the Kullback–Leibler divergence of \(\mu \) with respect to \(\nu \) is defined as

$$\begin{aligned} \hbox {KL}[\mu \,|\,\nu ]={\left\{ \begin{array}{ll} \int _X\log \left( \frac{{\mathrm d}\mu }{{\mathrm d}\nu }\right) {\mathrm d}\mu &{}\text { if }\mu \ll \nu \\ +\infty &{}\text { otherwise}\end{array}\right. }. \end{aligned}$$

Now, if both measures \(\mu \) and \(\nu \) are absolutely continuous with respect to some reference measure R of the space X with densities \(\rho _\mu \) and \(\rho _\nu \), respectively, we can write:

$$\begin{aligned} \hbox {KL}[\mu \,|\,\nu ]={\left\{ \begin{array}{ll} \int _X\rho _\mu \log \left( \frac{\rho _\mu }{\rho _\nu }\right) {\mathrm d}R&{}\text { if }\mu \ll \nu \\ +\infty &{}\text { otherwise}\end{array}\right. }. \end{aligned}$$

Considering the entropy-regularized MOT problem, we see that the cost functional \(C_{\varepsilon }[\gamma ]\) can be alternatively written as the Kullback–Leibler divergence between \(\gamma \) and a kernel\(\kappa \) defined below

$$\begin{aligned} C_{\varepsilon }[\gamma ]= \varepsilon \hbox {KL}[\gamma \,|\,\kappa ] = \varepsilon \int _{X^N}\rho _{\gamma } \ln \bigg (\dfrac{\rho _{\gamma }}{\rho _{\kappa }}\bigg )\,{\mathrm d}{\mathfrak {m}}_{N}, \end{aligned}$$

where \(\kappa = e^{-c/\varepsilon }{\mathfrak {m}}_N\).

For the most part, in this paper we have chosen to consider as a reference measure the measure \({\mathfrak {m}}_N\). However, as the following lemma shows, we could also assume the reference measure to be \((\rho {\mathfrak {m}})^{\otimes N}\) since the minimizers of the entropy-regularized MOT problem (2.4) do not depend on the choice of the reference measure, at least if there exists a minimizer with finite cost. To state the lemma, let us introduce the notation of relative entropy: for each reference measure R of a Polish space Y, and for each \(\gamma \in {\mathcal {P}}(Y)\), we denote by \(E[\gamma \,|\,R]\) the relative entropy of \(\gamma \) with respect to R, defined as

$$\begin{aligned} E[\gamma \,|\,R]= {\left\{ \begin{array}{ll} \int _Y\log \left( \frac{{\mathrm d}\gamma }{{\mathrm d}R}\right) \,{\mathrm d}\gamma &{}\text { if }\gamma \ll R\\ +\infty &{}\text { otherwise}\end{array}\right. }. \end{aligned}$$

Now we may consider two, a priori different, entropy-regularized MOT problems: the one introduced in (2.4)

$$\begin{aligned} I_\epsilon [\rho {\mathfrak {m}}]=\inf _{\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )} C_\varepsilon [\gamma ]=:I_\epsilon [\rho {\mathfrak {m}}\,|\,{\mathfrak {m}}], \end{aligned}$$
(2.5)

and the problem with the reference measure chosen to be \((\rho {\mathfrak {m}})^{\otimes N}\)

$$\begin{aligned} I_\epsilon [\rho {\mathfrak {m}}\,|\,\rho {\mathfrak {m}}]:=\inf _{\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )}\left( C_0[\gamma ]+E[\gamma \,|\,(\rho {\mathfrak {m}})^{\otimes N}]\right) . \end{aligned}$$
(2.6)

The following Lemma 2.3 is used only to go from the compact to the general case in the duality Theorem 4.2. The proof in [11] can be directly applied here to prove Lemma 2.3.

Lemma 2.3

Let \((X,d,{\mathfrak {m}})\) be a Polish measure space, \(\rho {\mathfrak {m}}\in {\mathcal {P}}(X)\) a measure satisfying (A) and (B), and c a cost function satisfying (F1) and (F2). Now for all \(\epsilon > 0\) we have

$$\begin{aligned} I_\epsilon [\rho {\mathfrak {m}}\,|\,{\mathfrak {m}}]=I_\epsilon [\rho {\mathfrak {m}}\,|\,\rho {\mathfrak {m}}]+N\epsilon \hbox {KL}[\rho {\mathfrak {m}}\,|\,{\mathfrak {m}}]=I_\epsilon [\rho {\mathfrak {m}}\,|\,\rho {\mathfrak {m}}]+N\epsilon \int _X\rho \log \rho {\mathrm d}{\mathfrak {m}}. \end{aligned}$$
(2.7)

Moreover, whenever at least one side of the equality above is finite, the problems (2.5) and (2.6) have the same minimizers.

2.1 Some properties of the entropy functional

Let us start by noting that the minimum of the entropy is attained by the product measure and that its value is not \(-\infty \).

Proposition 2.4

Let \((X,d, {\mathfrak {m}})\) be a Polish metric measure space, and let \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) with \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) . Then

$$\begin{aligned} \min _{\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )}E[\gamma ] = \int _{X^N}\bigg (\otimes ^N_{i=1}\rho \bigg )\log \bigg (\otimes ^N_{i=1}\rho \bigg ) \,{\mathrm d}{\mathfrak {m}}_{N} = N \int _X \rho \log \rho \,{\mathrm d}{\mathfrak {m}}> -\infty . \end{aligned}$$

Proof

As we will see, the minimality is an immediate consequence of Jensen’s inequality. Let \(\gamma \in \Pi _N(\rho )\). Then

$$\begin{aligned} E[\gamma ]&= \int _{X^N}\rho _{\gamma }\log (\rho _\gamma )\,{\mathrm d}{\mathfrak {m}}_{N} = \int _{X^N}\frac{\rho _{\gamma }}{\otimes ^N_{i=1}\rho }\left( \log \left( \frac{\rho _{\gamma }}{\otimes ^N_{i=1}\rho }\right) + \log \left( \otimes ^N_{i=1}\rho \right) \right) \otimes ^N_{i=1}\rho \,{\mathrm d}{\mathfrak {m}}_{N}\\&\ge \bigg (\int _{X^N}\rho _{\gamma }\,{\mathrm d}{\mathfrak {m}}_{N}\bigg )\log \bigg (\int _{X^N}\rho _{\gamma }\,{\mathrm d}{\mathfrak {m}}_{N}\bigg ) + \int _{X^N}\rho _{\gamma }\log \left( \otimes ^N_{i=1}\rho \right) \,{\mathrm d}{\mathfrak {m}}_{N} \\&= 0 + E[\otimes ^N_{i=1}\rho ]. \end{aligned}$$

\(\square \)

Using Proposition 2.4 we immediately get the lower semicontinuity of the entropy functional by representing the entropy as relative entropy against the probability measure \(\otimes _{i=1}^N(\rho {\mathfrak {m}})\). See for instance [28, Lemma 4.1] for the lower semicontinuity of the entropy when the reference measure is finite.

Corollary 2.5

Let \((X,d, {\mathfrak {m}})\) be a Polish metric measure space, and let \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) with \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\). Then \(E[\cdot ]\) is lower semicontinuous in the set \(\Pi ^{\mathrm {sym}}_N(\rho )\).

Now we are ready to prove the existence of the minimizers for entropy-regularized MOT:

Proposition 2.6

Let \((X,d,{\mathfrak {m}})\) be a Polish metric measure space. Assume that the measure \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) satisfies \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) along with Conditions (A) and (B). Assume that \(c:X^N\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) satisfies the conditions (F1) and (F2). Then, for each \(\varepsilon \ge 0\), there exists a minimizer \(\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )\) for the entropic-regularized cost \(C_{\varepsilon }[\gamma ]\).

Proof

We notice that the set \(\Pi ^{\mathrm {sym}}_N(\rho )\) is compact in the \(w^*\)-topology [19]. The functional E is lower semicontinuous by Corollary 2.5, and in our setting the lower semicontinuity of \(C_0\) is proven as a part of the proof of [16, Proposition 3.1]. Since for each \(\varepsilon \ge 0\) the functional \(C_\varepsilon \) is lower semicontinuous, we conclude that it has a minimizer in the set \(\Pi ^{\mathrm {sym}}_N(\rho )\)\(\square \)

3 The \(\Gamma \)-convergence of entropic-regularized cost

Now let us turn to the \(\Gamma \)-convergence. From now on, \((\tau _n)_{n\in {\mathbb {N}}}\) is any sequence of positive real numbers decreasing to zero. Let us introduce the following functionals: for each \(n\in {\mathbb {N}}\)

$$\begin{aligned} {\mathcal {C}}_n:{\mathcal {P}}^{\mathrm {sym}}(X^N)\rightarrow {\mathbb {R}}\cup \{+\infty \},~~ {\mathcal {C}}_n[\gamma ]={\left\{ \begin{array}{ll}C_{\tau _n}(\gamma )&{}\text { if }\gamma \in \Pi _N(\rho )\\ +\infty &{}\text { otherwise}\end{array}\right. } \end{aligned}$$

and

$$\begin{aligned} {\mathcal {C}}:{\mathcal {P}}^{\mathrm {sym}}(X^N)\rightarrow {\mathbb {R}}\cup \{+\infty \},~~ {\mathcal {C}}[\gamma ]={\left\{ \begin{array}{ll}C_0[\gamma ]&{}\text { if }\gamma \in \Pi _N(\rho )\\ +\infty &{}\text { otherwise}\end{array}\right. }. \end{aligned}$$

The goal of this section is to prove that the sequence \(({\mathcal {C}}_{n\in {\mathbb {N}}})\)\(\Gamma \)-converges to \({\mathcal {C}}\) in the space \({\mathcal {P}}^{\mathrm {sym}}(X^N)\).

Theorem 3.1

Let \((X,d,{\mathfrak {m}})\) be a Polish metric measure space. Let \(\rho \in {\mathcal {P}}^{ac}(X)\) with \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) satisfying (A) and (B). Then the sequence \(({\mathcal {C}}_n)\)\(\Gamma \)-converges to \({\mathcal {C}}\) in the space \({\mathcal {P}}^{\mathrm {sym}}(X^N)\).

Let us fix \(\gamma \in {\mathcal {P}}^{\mathrm {sym}}(X^N)\). We need to show that

$$\begin{aligned}&\text {For each sequence }(\gamma _n)_{n\in {\mathbb {N}}}\text { that converges to }\gamma \\&\text {we have }\liminf _{n\rightarrow \infty } {\mathcal {C}}_n[\gamma _n]\ge {\mathcal {C}}[\gamma ]\text {, and } \end{aligned}$$
(I)
$$\begin{aligned}&\text {There exists a sequence }(\gamma _n)_{n\in {\mathbb {N}}}\text { that converges to }\gamma \text { and }\\&\limsup _{n\rightarrow \infty }{\mathcal {C}}_n[\gamma _n]\le {\mathcal {C}}[\gamma ]. \end{aligned}$$
(II)

The proof of Theorem 3.1 is divided into two parts. The proof of the first part, the liminf-inequality (I), is short and is established in the next subsection. The remainder of this section is then divided into subsections in which the second part, the limsup-inequality (II) is proven.

3.1 Proof of condition (I)

We fix a sequence \((\gamma _n)_{n\in {\mathbb {N}}}\) that converges to \(\gamma \). If \(\gamma \notin \Pi _N(\rho )\), then since the set \(\Pi _N(\rho )\) is compact, for large indices we also have \(\gamma _n\notin \Pi _N(\rho )\), so both sides of inequality (I) are \(+\infty \), and we are done. Hence we may assume that \(\gamma \) and \(\gamma _n\)’s are elements of the set \(\Pi _N(\rho )\). Since now \(\gamma _n\in \Pi _N(\rho )\), the claim (I) follows from the lower-semicontinuity of \(\gamma \mapsto \int c\,{\mathrm d}\gamma \) and from the entropy lower bound shown in Proposition 2.4. \(\square \)

3.2 Constructing an approximation of the coupling \(\gamma \)

First of all, we need to construct an approximation of \(\gamma \) only in the case where \(C_0[\gamma ] < \infty \): if this is not the case then any sequence \((\gamma _n)\) converging to \(\gamma \) can be used to prove Condition (II). The idea of the construction is to redefine a large part of \(\gamma \) to be a product measure on finitely many Borel sets with small diameter. In order not to increase the cost by too much, the Borel sets we are using have to be far away from the diagonal compared to the diameter of the sets. We call the part of the measure defined in this way the core part of the approximation. For the rest of the measure, we take another finite combination of product measures. However, this time the sets do not need to have small (or even bounded) diameter, but just small measure. This part will be called the remainder part of the approximation.

We start the construction by taking out a small part of \(\gamma \) that will later be used to deal with the remainder part of the approximation. For this we take a sequence of radii defined as \(r_n = 1/n\). Since \(C_0[\gamma ] < \infty \), there exists a point \(x =(x_1,\ldots , x_N)\in \mathop {\mathrm{spt}}\nolimits (\gamma )\) with

$$\begin{aligned} x_i \ne x_j \quad \text { if }1 \le i < j \le N. \end{aligned}$$

Moreover, since \(\gamma \in \Pi _N(\rho )\) and \(\rho \) satisfies (A), we have

$$\begin{aligned}&\gamma (\{(y_1,\ldots , y_N) \in X^N \,| \, y_i \ne x_j \text { for all }i,j\}) \\&\quad \ge 1 - \sum _{i\ne j} \gamma (\{(y_1,\ldots , y_N) \in X^N \,| \, y_i = x_j \}) \\&\quad = 1 - \sum _{i\ne j} \rho (\{x_j\}) \ge 1 - N(N-1) \frac{1}{N(N-1)^2} > 0. \end{aligned}$$

Thus, using again \(C_0[\gamma ] < \infty \), there exists another point \(x'=(x_{N+1},\ldots , x_{2N}) \in \mathop {\mathrm{spt}}\nolimits (\gamma )\), so that

$$\begin{aligned} x_i \ne x_j \quad \text { if }1 \le i < j \le 2N. \end{aligned}$$

From now on, we consider \(x,x'\) fixed. Therefore, for \(n \in {\mathbb {N}}\) sufficiently large we have

$$\begin{aligned} d(x_i,x_j)>r_n \quad \text { if }1 \le i < j \le 2N. \end{aligned}$$
(3.1)

Let us denote by

$$\begin{aligned} B_n:=B(x,\tfrac{r_n}{10})\quad \text {and}\quad B_n':=B(x',\tfrac{r_n}{10}) \end{aligned}$$

the balls around x and \(x'\) with radii \(r_n/10\) in the sup-metric of the product space. So,

$$\begin{aligned} y=(y_1,\ldots ,y_N)\in B_n \end{aligned}$$

if and only if

$$\begin{aligned} d(x_i,y_i)<\tfrac{r_n}{10}\quad \text { for all }i\in \{1,\ldots , N\}, \end{aligned}$$

and analogously for \(B_n'\) with the relevant index modifications.

Let us now define

$$\begin{aligned} \gamma _{B_n}=\left( \frac{\gamma \big |_{B_n} }{\gamma (B_n)} \right) ^S\quad \text {and}\quad \gamma _{B_n'}=\left( \frac{\gamma \big |_{B_n'} }{\gamma (B_n')} \right) ^S. \end{aligned}$$

Observe that \(\gamma _{B_n}\) and \(\gamma _{B_n'}\) are symmetric probability measures. Since the marginals of a symmetric measure are the same, we may denote by \(\rho _{B_n}\) the marginal of \(\gamma _{B_n}\) and similarly by \(\rho _{B_n'}\) the marginal of \(\gamma _{B_n'}\). Let us further denote \({{\tilde{B}}}_n:=\mathop {\mathrm{spt}}\nolimits \gamma _{B_n}\), \({{\tilde{B}}}_n':=\mathop {\mathrm{spt}}\nolimits \gamma _{B_n'}\) and

$$\begin{aligned} \varepsilon _n := \frac{1}{N}\min \left\{ \gamma ({{\tilde{B}}}_n),\gamma ({{\tilde{B}}}_n'),r_n,\frac{r_n}{f(2r_n/5)}\right\} . \end{aligned}$$
(3.2)

We then define a measure

$$\begin{aligned} \gamma _{0,n}:=&\gamma \big |_{X^n\setminus ({{\tilde{B}}}_n\cup {{\tilde{B}}}_n')} +\frac{\gamma ({{\tilde{B}}}_n)-\varepsilon _n}{\gamma ({{\tilde{B}}}_n)}\gamma \big |_{{{\tilde{B}}}_n}+\frac{\gamma ({{\tilde{B}}}_n')-\varepsilon _n}{\gamma ({{\tilde{B}}}_n')}\gamma \big |_{{{\tilde{B}}}_n'}. \end{aligned}$$

The idea behind the measure \(\gamma _{0,n}\) is that we have chopped off a small part of the measure around the points x and \(x'\) (symmetrically) for later use. Since we are working with a singular cost, we still need to take out a small neighbourhood of the diagonals before approximating by product measures. We do this now.

We fix a compact \(K_n\subset X\) such that

$$\begin{aligned} \gamma _{0,n}(X^N\setminus K_n^N)<\frac{\varepsilon _n}{2} \end{aligned}$$
(3.3)

and take a small enough \(\delta _n \in (0,r_n)\) so that

$$\begin{aligned}&\gamma _{0,n}(D_{\delta _n})< \frac{\varepsilon _n}{2},\nonumber \\&D_{\alpha }:=\{(x_1,\ldots ,x_N)\in X^N~|~d(x_i,x_j)<\alpha \text { for some }i\ne j\}\, \end{aligned}$$
(3.4)

denotes the \(\alpha \)-neighbourhood of the pairwise diagonals. Using \(K_n\) and \(\delta _n\) we then define

$$\begin{aligned} \gamma _{1,n}:=\gamma _{0,n}|_{K_n^N\setminus D_{\delta _n}}. \end{aligned}$$
(3.5)

The measure \(\gamma _{1,n}\) is now the core part of the measure that we approximate. We denote by \(\rho _{1,n}\) the marginals of the symmetric measure \(\gamma _{1,n}\).

Let us then approximate the measure \(\gamma _{1,n}\). We take \(\lambda _n \in (0,\delta _n/n)\) so that

$$\begin{aligned} |f(r)-f(s)| < \varepsilon _n \qquad \text {for all }r,s \in [\delta _n/2,2\mathop {\mathrm{diam}}\nolimits (K_n)]\text { with }|r-s| \le 2\lambda _n. \end{aligned}$$
(3.6)

Such \(\lambda _n\) exists by the uniform continuity of f on the compact set \([\delta _n/2,2\mathop {\mathrm{diam}}\nolimits (K_n)]\). Since the set \(K_n\) is compact, we may fix a finite Borel partition \(\{B_n^i\}_{i=1}^{M_n}\) of the set \(\mathop {\mathrm{spt}}\nolimits (\rho _{1,n})\) such that

$$\begin{aligned}&\mathop {\mathrm{diam}}\nolimits (B_n^i)<\lambda _n\quad \text {and}\quad \rho _{1,n}(B_n^i) > 0 \quad \text { for all }i\in \{1,\ldots , M_n\}. \end{aligned}$$

We are now ready to define the core part approximants \(\gamma _{1,n}^a\) as

$$\begin{aligned} \gamma _{1,n}^a=\sum _{(k_1,\ldots ,k_N)\in M_n^N}\frac{\gamma _{1,n}(B_n^{k_1}\times \cdots \times B_n^{k_N})}{\rho _{1,n}(B_n^{k_1})\cdots \rho _{1,n}(B_n^{k_N})}\rho _{1,n}|_{B_n^{k_1}}\otimes \cdots \otimes \rho _{1,n}|_{B_n^{k_N}}. \end{aligned}$$
(3.7)

Now let us handle the main part of the remainder of the measure, namely the measure

$$\begin{aligned} \gamma _{2,n}:=\gamma _{0,n}|_{D_{\delta _n}\cup (X^N\setminus K_n^N)}. \end{aligned}$$

Because \(\gamma _{0,n}\) and the set where we restrict it are symmetric, \(\gamma _{2,n}\) is as well. We may thus denote its marginals by \(\rho _{2,n}\).

In order to determine which part of the remaining marginal measure should be coupled where, we define a partition \(\{A_{i,n}\}_{i=1}^N\) of the space X by setting, for all \(i\in \{1,\ldots , N-1\}\)

$$\begin{aligned} A_{i,n}:=\{y\in X~|~d(x_i,y)\le \tfrac{r_n}{2}\}, \end{aligned}$$

and

$$\begin{aligned} A_{N,n}:=X\setminus \bigcup _{i=1}^{N-1}A_{i,n}. \end{aligned}$$

Condition (3.1) guarantees that the sets \(A_{i,n}\) are pairwise disjoint.

Now we approximate \(\gamma _{2,n}\) by the measure

$$\begin{aligned} \gamma _{2,n}^a:=N\left( \sum _{i=1}^N\eta _{n,i}\right) ^S, \end{aligned}$$

where for all i the measure \(\eta _{n,i}\) is the product

$$\begin{aligned} \eta _{n,i} := \left( \bigotimes _{k=1}^{i-1}\frac{\rho _{B_n}\big |_{B(x_k,r_n/10)}}{\rho _{B_n}(B(x_k,r_n/10))}\right) \otimes \rho _{2,n}\big |_{A_{i,n}} \otimes \left( \bigotimes _{k=i+1}^N\frac{\rho _{B_n}\big |_{B(x_k,r_n/10)}}{\rho _{B_n}(B(x_k,r_n/10))}\right) . \end{aligned}$$

By the definition of the sets \(A_{i,n}\), for every \((y_1,\ldots ,y_N) \in \mathop {\mathrm{spt}}\nolimits (\gamma _{2,n}^a)\) we have for each \(i \ne j\)

$$\begin{aligned} d(y_i,y_j) \ge |d(y_i,x_j) - d(x_j,y_j)| > \frac{r_n}{2}-\frac{r_n}{10} = \frac{2r_n}{5}, \end{aligned}$$
(3.8)

where we have assumed (which we can do without loss of generality) that \(y_j \in {\overline{B}}(x_j,r_n/10)\).

What we have done using the measure \(\gamma _{2,n}^a\) is that we have coupled the marginals of the measure \(\gamma _{2,n}\) with some suitable parts of the marginals of the reserved measure that was taken out around the point x. In this way we have used unevenly the marginals of this reserved part. To handle the rest of the reserved part of the measure around the point x, we now use the reserved measure around the point \(x'\). So, we need to redefine the coupling for the part of the marginal given by

$$\begin{aligned} \rho _{3,n} := (\texttt {pr}^1)_\sharp \frac{\varepsilon _n}{\gamma ({{\tilde{B}}}_n)}\gamma \big |_{{{\tilde{B}}}_n} +\rho _{2,n} - (\texttt {pr}^1)_\sharp \gamma _{2,n}^a. \end{aligned}$$

We define it as

$$\begin{aligned} \gamma _{3,n}^a := \left( \sum _{i=1}^N\phi _{n,i} \right) ^S, \end{aligned}$$

where each \(\phi _{n,i}\) is defined as

$$\begin{aligned} \phi _{n,i} := \left( \bigotimes _{k=N+1}^{N+i-1}\frac{\rho _{B_n}\big |_{B(x_k,r_n/10)}}{\rho _{B_n}(B(x_k,r_n/10))}\right) \otimes \rho _{3,n} \otimes \left( \bigotimes _{k=N+i+1}^{2N}\frac{\rho _{B_n}\big |_{B(x_k,r_n/10)}}{\rho _{B_n}(B(x_k,r_n/10))}\right) . \end{aligned}$$

Since \(\mathop {\mathrm{spt}}\nolimits (\rho _{3,n}) \subset \mathop {\mathrm{spt}}\nolimits (\rho _{B_n})\), we have that for every \((y_1,\ldots ,y_N) \in \mathop {\mathrm{spt}}\nolimits (\gamma _{3,n}^a)\) and each \(i \ne j\)

$$\begin{aligned} d(y_i,y_j) \ge |d(x_{k(i)},x_{k(j)})- d(y_i,x_{k(i}) - d(x_{k(j)},y_j)| > {r_n}-\frac{r_n}{10}-\frac{r_n}{10} = \frac{4r_n}{5},\nonumber \\ \end{aligned}$$
(3.9)

where \(k(i)\ne k(j)\) are the indices for which \(y_j \in {\overline{B}}(x_{k(j)},r_n/10)\) and \(y_i \in {\overline{B}}(x_{k(i)},r_n/10)\).

What remains is the part of the measure around \(x'\) that was not used for \(\gamma _{3,n}^a\). Since \(\gamma _{3,n}^a\) used the marginals from this part of the reserved measure evenly, we may simply couple the rest by a measure

$$\begin{aligned} \gamma _{4,n}^a := b\left( \bigotimes _{k=N+1}^{2N}\frac{\rho _{B_n}\big |_{B(x_k,r_n/10)}}{\rho _{B_n}(B(x_k,r_n/10))}\right) ^S, \end{aligned}$$

with b being the correct scaling constant. Similarly as for the previous remainder part, we have that for every \((y_1,\ldots ,y_N) \in \mathop {\mathrm{spt}}\nolimits (\gamma _{4,n}^a)\) and each \(i \ne j\) the inequality (3.9) holds.

Now we are ready to define the full approximation as

$$\begin{aligned} \gamma _n' = \gamma _{1,n}^a + \gamma _{2,n}^a + \gamma _{3,n}^a + \gamma _{4,n}^a. \end{aligned}$$

By construction \(\gamma _n' \in \Pi _N^\mathrm {sym}(\rho )\).

3.3 Narrow convergence of the approximations

Let us now prove that the sequence \((\gamma _n')_n\) narrowly converges to \(\gamma \). We could argue this by using the Wasserstein distance. However, let us do it here directly using the definition of narrow convergence.

Lemma 3.2

The sequence \((\gamma _n')_n\) narrowly converges to \(\gamma \).

Proof

Let \(\varphi \in C_b(X^N)\) and \(\varepsilon > 0\). We need an index \(N_0\in {\mathbb {N}}\) such that

$$\begin{aligned} \left|\int _{X^N}\varphi {\mathrm d}\gamma -\int _{X^N}\varphi d\gamma _n'\right|<\varepsilon \text { for all }n\ge N_0. \end{aligned}$$
(3.10)

Let us denote \(M:=\sup _{x\in X^N}|\varphi (x)|\); we may assume that \(M>0\). Since \(\rho \) is inner regular, we can fix a compact set \(K\subset X\) such that

$$\begin{aligned} \rho (X\setminus K)<\frac{\varepsilon }{12NM}. \end{aligned}$$

Since \(\gamma \in \Pi _N(\rho )\), we now have

$$\begin{aligned} \gamma (X^N\setminus K^N)<\frac{\varepsilon }{12M}. \end{aligned}$$

The function \(\varphi \), when restricted to \(K^N\), is uniformly continuous. Hence there exists \(\delta > 0\) so that

$$\begin{aligned} |\varphi (x)-\varphi (y)|< \frac{\varepsilon }{12}\text { for all }x,y\in K^N\text { for which }d_N(x,y)<\delta . \end{aligned}$$
(3.11)

Now, let \(N_0 \in {\mathbb {N}}\) be so large that

$$\begin{aligned} \sqrt{N}\lambda _n< \delta \qquad \text {and} \qquad 6M\varepsilon _n < \frac{\varepsilon }{6}~~~\text {for all }n\ge N_0. \end{aligned}$$

Let us show that this choice of \(N_0\) satisfies (3.10). First we note that for all \(n\ge N_0\) we have

$$\begin{aligned} \left|\int _{X^N}\varphi \,{\mathrm d}\gamma -\int _{X^N}\varphi \,{\mathrm d}\gamma _n'\right|&\le \left|\int _{X^N}\varphi \,{\mathrm d}\gamma -\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}\right|+\left|\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}-\int _{X^N}\varphi \,{\mathrm d}\gamma _n'\right|\nonumber \\&\le \left|\int _{X^N}\varphi \,{\mathrm d}\gamma -\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}\right|+\left|\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}-\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}^a\right|\nonumber \\&\quad +\left|\int _{X^N}\varphi \,{\mathrm d}(\gamma _{2,n}^a+\gamma _{3,n}^a+\gamma _{4,n}^a)\right|\nonumber \\&<\left|\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}-\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}^a\right|+\frac{\varepsilon }{6}, \end{aligned}$$
(3.12)

where in the last inequality we have used the following facts: \(\gamma (X^N)-\gamma _{1,n}(X^N)<3\varepsilon _n\) for all n, and for the remainder part of the measure \(\gamma _n'\) we have

$$\begin{aligned} (\gamma _{2,n}^a + \gamma _{3,n}^a + \gamma _{4,n}^a)(X^N) < 3\varepsilon _n~~~\text { for all }n\in {\mathbb {N}}. \end{aligned}$$

It remains to show that for all \(n\ge N_0\) we have

$$\begin{aligned} \left| \int _{X^N} \varphi \,{\mathrm d}\gamma _{1,n} - \int _{X^N} \varphi \,{\mathrm d}\gamma _{1,n}^a \right| < \frac{5\varepsilon }{6}. \end{aligned}$$
(3.13)

We first estimate the integrals in the set \(K^N\). Let us fix, for each \((k_1,\ldots , k_N)\in M_n^N\) for which the set

$$\begin{aligned} (B_n^{k_1}\times \cdots \times B_n^{k_N})\cap K^N \end{aligned}$$

is nonempty, an element

$$\begin{aligned} z_{k_1,\ldots ,k_N}\in (B_n^{k_1}\times \cdots \times B_n^{k_N})\cap K^N. \end{aligned}$$

Now we have, for a fixed \((k_1,\ldots , k_N)\), denoting for simplicity

$$\begin{aligned}&z_0:=z_{k_1,\ldots , k_N}, Q=B_n^{k_1}\times \cdots \times B_n^{k_N},~\gamma =\gamma _{1,n},~\gamma _a=\gamma _{1,n}^a,\\&\quad \bigg |\int _{Q\cap K^N}\varphi (z)\,{\mathrm d}\gamma -\int _{Q\cap K^N}\varphi (z)\,{\mathrm d}\gamma _a\bigg |\\&\quad \le \left|\int _{Q\cap K^N}\varphi (z)\,{\mathrm d}\gamma -\varphi (z_0)\gamma (Q\cap K^N)\right|+\left|\varphi (z_0)\gamma _a(Q\cap K^N)-\int _{Q\cap K^N}\varphi (z)\,{\mathrm d}\gamma _a\right|\\&\qquad +\left|\varphi (z_0)\gamma (Q\cap K^N)-\varphi (z_0)\gamma _a(Q\cap K^N)\right|\\&\quad \le \int _{Q\cap K^N}|\varphi (z)-\varphi (z_0)|\,{\mathrm d}\gamma +\int _{Q\cap K^N}|\varphi (z_0)-\varphi (z)|\,{\mathrm d}\gamma _a\\&\qquad +M|\gamma (Q\cap K^N)-\gamma _a(Q\cap K^N)|\\&\quad \overset{a)}{<}\gamma (Q\cap K^N)\cdot \frac{\varepsilon }{12}+\gamma _a(Q\cap K^N)\cdot \frac{\varepsilon }{12} +M|\gamma (Q\cap K^N)-\gamma _a(Q\cap K^N)|\\&\quad =\gamma (Q\cap K^N)\cdot \frac{\varepsilon }{12}+\gamma _a(Q\cap K^N)\cdot \frac{\varepsilon }{12}\\&\qquad +M|\gamma (Q\cap K^N)-\gamma (Q)+\gamma _a(Q)-\gamma _a(Q\cap K^N)|\\&\quad \le \gamma (Q\cap K^N)\cdot \frac{\varepsilon }{12}+\gamma _a(Q\cap K^N)\cdot \frac{\varepsilon }{12}+M\gamma (Q\setminus K^N)+M\gamma _a(Q\setminus K^N), \end{aligned}$$

where in a) we have used (3.11), and in b) the fact that the total measures of \(\gamma \) and \(\gamma _a\) coincide on ’cubes’ Q. Summing the estimate above over all cubes \(Q=B_{k_1}\times \cdots \times B_{k_N}\), \((k_1,\ldots , k_N)\in M_n^N\), gives

$$\begin{aligned}&\bigg |\int _{K^N} \varphi \,{\mathrm d}\gamma _{1,n} - \int _{K^N} \varphi \,{\mathrm d}\gamma _{1,n}^a \bigg |\nonumber \\&\qquad< \gamma _{1,n}(K^N)\cdot \frac{\varepsilon }{12}+\gamma _a(K^N)\cdot \frac{\varepsilon }{12}+M\gamma (X^N\setminus K^N)+M\gamma _a(X^N\setminus K^N)\nonumber \\&\qquad \overset{a)}{<}\frac{\varepsilon }{12}+\frac{\varepsilon }{12}+\frac{\varepsilon }{12}+\frac{\varepsilon }{12}=\frac{\varepsilon }{3}, \end{aligned}$$
(3.14)

where in inequality a) we have used the fact that \(\rho (X\setminus K)<\frac{\epsilon }{12MN}\) and, since the marginals of \(\gamma _{1,n}\) and \(\gamma _{1,n}^a\) are restrictions of \(\rho \), we can bound both \(\gamma _{1,n}(X^N\setminus K^N)\) and \(\gamma _{1,n}^a(X^N\setminus K^N)\) by \(\frac{\varepsilon }{12M}\). For the same reason, we have

$$\begin{aligned} \left|\int _{X^N\setminus K^N}\varphi \,{\mathrm d}\gamma _{1,n}-\int _{X^N\setminus K^N}\varphi \,{\mathrm d}\gamma _{1,n}^a\right|<2M\cdot \frac{\varepsilon }{12M}=\frac{\varepsilon }{6}. \end{aligned}$$
(3.15)

Combining estimates (3.14) and (3.15) gives

$$\begin{aligned} \left|\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}-\int _{X^N}\varphi \,{\mathrm d}\gamma _{1,n}^a\right|<\frac{\varepsilon }{3}+\frac{\varepsilon }{6}=\frac{3\varepsilon }{6}<\frac{5\varepsilon }{6}, \end{aligned}$$

proving (3.13). \(\square \)

3.4 Convergence of the cost functional

In order to prove the \(\Gamma \)-limsup inequality (II), we need the cost \(C_0[\cdot ]\) to converge along the approximating sequence \(\gamma _n\). We prove this in the following lemma.

Lemma 3.3

We have \(C_0[\gamma _n'] \rightarrow C_0[\gamma ]\) as \(n \rightarrow \infty \).

Proof

Let us first consider the remainder part. Recall that for all \(n \in {\mathbb {N}}\) we have

$$\begin{aligned} (\gamma _n'-\gamma _{1,n}^a)(X^N) = (\gamma _{2,n}^a + \gamma _{3,n}^a + \gamma _{4,n}^a)(X^N) < 3\varepsilon _n. \end{aligned}$$

Thus, using the lower bounds (3.8) and (3.9) for distances in the support of the remainder part, and the definition (3.2) of \(\varepsilon _n\), we get

$$\begin{aligned} \int _{X^N} c\,{\mathrm d}(\gamma _n'-\gamma _{1,n}^a) \le \frac{N(N-1)}{2}f\left( \frac{2r_n}{5}\right) 3\varepsilon _n \le \frac{3(N-1)}{2}r_n \rightarrow 0 \end{aligned}$$
(3.16)

as \(n \rightarrow \infty \). By continuity of the integral, we get

$$\begin{aligned} \int _{X^N} c\,{\mathrm d}(\gamma -\gamma _{1,n}) \rightarrow 0 \end{aligned}$$
(3.17)

as \(n \rightarrow \infty \).

Let us now estimate the core part of the approximation. By the construction (3.7) of \(\gamma _{1,n}^a\) and the choice (3.6) of \(\lambda _n\), we have

$$\begin{aligned} \left| \int _{X^N} c\,{\mathrm d}\gamma _{1,n}^a - \int _{X^N} c\,{\mathrm d}\gamma _{1,n}\right| \le \int _{X^N} \frac{N(N-1)}{2} \varepsilon _n\,{\mathrm d}\gamma _{1,n} < \frac{N(N-1)}{2} \varepsilon _n. \end{aligned}$$
(3.18)

Combining (3.16), (3.17) and (3.18) we get

$$\begin{aligned} \left| C_0[\gamma _n'] - C_0[\gamma ] \right| \le&\left| \int _{X^N} c\,{\mathrm d}(\gamma _n'-\gamma _{1,n}^a) \right| + \left| \int _{X^N} c\,{\mathrm d}\gamma _{1,n}^a - \int _{X^N} c\,{\mathrm d}\gamma _{1,n}\right| \\&+ \left| \int _{X^N} c\,{\mathrm d}(\gamma -\gamma _{1,n}) \right| \rightarrow 0 \end{aligned}$$

as \(n \rightarrow \infty \). \(\square \)

3.5 Finiteness of the entropy for the approximations

Next we show that the entropy is finite for the approximating sequence. Notice that, in order to prove (II), we do not need a better estimate on the entropy.

Lemma 3.4

For each \(n\in {\mathbb {N}}\) we have \(E[\gamma _n'] < \infty \).

Proof

In order to see the finiteness of the entropy, it suffices to notice that each \(\gamma _n'\) is a sum of finitely many measures \(({\tilde{\gamma }}_{n,k})_{k=1}^{N_n}\) each of which is of the form \({\tilde{\gamma }}_{n,k} = {\tilde{\rho }}_1^k{\mathfrak {m}}\otimes \cdots \otimes {\tilde{\rho }}_N^k{\mathfrak {m}}\) with \({\tilde{\rho }}_i^k \ll \rho \) and \(\frac{{\mathrm d}{\tilde{\rho }}_i^k}{{\mathrm d}\rho } \le 1\). Indeed, by Proposition 2.4, the entropy is always bounded from below, and so we can make a crude estimate:

$$\begin{aligned} E[\gamma _n']&= \int _{X^N} \log \left( \sum _{k=1}^{N_n}\frac{{\mathrm d}{\tilde{\gamma }}_{n,k}}{{\mathrm d}{\mathfrak {m}}}\right) \,{\mathrm d}\left( \sum _{k=1}^{N_n} {\tilde{\gamma }}_{n,k}\right) \\&\le \log (N_n) + \sum _{k=1}^{N_n} \int _{X^N}\log \left( \frac{{\mathrm d}{\tilde{\gamma }}_{n,k}}{{\mathrm d}{\mathfrak {m}}}\right) \,{\mathrm d}{\tilde{\gamma }}_{n,k} < \infty . \end{aligned}$$

\(\square \)

3.6 Proof of condition (II)

We are now ready to prove the \(\Gamma \)-\(\limsup \) inequality (II). By Lemma 3.2 we already know that \((\gamma _n')_n\) converges to \(\gamma \). However, \({\mathcal {C}}_n[\gamma _n']\) need not converge to \({\mathcal {C}}[\gamma ]\). This can be solved by making the convergence of \((\gamma _n')_n\) slower by repeating always the same measure for sufficiently (but finitely) many times before moving to the next one. We define k(n) for every \(n \in {\mathbb {N}}\) as

$$\begin{aligned} k(n) = \min \left( n,\max \left( 1, \sup \left\{ k\in {\mathbb {N}}\,|\,\sqrt{\tau _n} E[\gamma _j'] < 1 \text { for all }j \le k\right\} \right) \right) . \end{aligned}$$

By definition, \(1 \le k(n) \le n\). Moreover, since for every \(j\in {\mathbb {N}}\) we have \(E[\gamma _j'] < \infty \) by Lemma 3.4 and \(\tau _n \rightarrow 0\) by definition, we have that \(k(n) \rightarrow \infty \) as \(n\rightarrow \infty \). Thus, defining \(\gamma _n = \gamma _{k(n)}'\), for large enough \(n\in {\mathbb {N}}\) we have

$$\begin{aligned} {\mathcal {C}}_n[\gamma _n] = C_0[\gamma _{k(n)}'] + \tau _nE[\gamma _{k(n)}'] < C_0[\gamma _{k(n)}'] + \sqrt{\tau _n}. \end{aligned}$$

Recalling that by Lemma 3.3 we have \(C_0[\gamma _{k(n)}'] \rightarrow C_0[\gamma ]\), we conclude the proof. \(\square \)

In Proposition 2.6 the existence of a minimizer for the entropy-regularized cost was established. Now that we know that measures \(\gamma \) for which \(C_0(\gamma )<\infty \) can be approximated by measures with not only finite costs but also finite entropy, we can say more:

Corollary 3.5

Let \((X,d,{\mathfrak {m}})\) be a Polish metric measure space. Assume that \(\rho {\mathfrak {m}}\in {\mathcal {P}}^{ac}(X)\) satisfies \(\rho \log \rho \in L_{\mathfrak {m}}^1(X)\) and Conditions (A) and (B). Assume that \(c:X^N\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) satisfies Conditions (F1) and (F2). Then, for each \(\varepsilon > 0\), there exists a unique minimizer \(\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )\) for the entropic-regularized cost \(C_{\varepsilon }[\gamma ]\).

Proof

Our marginal measure satisfies Conditions (A) and (B), so there exists a measure \(\gamma \in \Pi ^{\mathrm {sym}}_N(\rho )\) that minimizes \(C_0\) with \(C_0[\gamma ]<\infty \). It must be noted that this measure can have infinite entropy. However, because of the approximation result presented in the proof of Condition (II) above, we get the existence of a measure \(\gamma '\in \Pi ^{\mathrm {sym}}_N(\rho )\) such that \(C_\epsilon [\gamma ']<\infty \). The uniqueness claim now follows, since the functional \(\gamma \mapsto C_\epsilon [\gamma ]\) is strictly convex for \(\epsilon >0\). \(\square \)

4 Entropic-Kantorovich duality for Coulomb-type costs

We start by recalling the classical Fenchel–Rockafellar Theorem. We refer to the I. Ekeland and R. Témam’s book [13, Theorem 4.2] for a more complete presentation and references.

Theorem 4.1

(Fenchel–Rockafellar) Let \({\mathcal {X}}\) and \({\mathcal {Y}}\) be Banach spaces and \(A:{\mathcal {X}}\rightarrow {\mathcal {Y}}\) be linear and continuous. Let \(F:{\mathcal {X}}\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) and \(G:{\mathcal {Y}}\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) be proper and convex functions. Then

$$\begin{aligned} \inf \big \lbrace F[x] + G[Ax] ~\big |~ x \in {\mathcal {X}}\big \rbrace = \sup \big \lbrace -F^{*}[-A^* \gamma ] - G^{*}[\gamma ] ~\big |~ \gamma \in {\mathcal {Y}}^* \big \rbrace \end{aligned}$$

where \(A^*:{\mathcal {Y}}^*\rightarrow {\mathcal {X}}^{*}\) denotes the adjoint operator of A.

Next we prove the Entropic-Kantorovich duality for the problem (2.4).

Theorem 4.2

(Entropic Duality for repulsive costs) Let \((X,d,{\mathfrak {m}})\) be a Polish measure space. Suppose \(\rho {\mathfrak {m}}\in {\mathcal {P}}(X)\) such that (A) and (B) hold and \(\rho \log \rho \in L_m^1(X)\), and \(c:X^N\rightarrow [0,\infty ]\) is a cost function

$$\begin{aligned} c(x_1,\ldots , x_N)=\sum _{1\le i<j\le N}f(d(x_i,x_j)),\quad \text { for all } \, (x_1,\ldots ,x_N)\in X^N, \end{aligned}$$

where \(f:[0,+\infty [\rightarrow [0,\infty ]\) is a function satisfying (F1) and (F2). Then, for \(\varepsilon > 0\), the duality holds

$$\begin{aligned} \min _{\gamma \in \Pi _N(\rho )}C_\varepsilon [\gamma ]&= \sup _{u_i\in C_b(X)}\left\{ \sum _{i=1}^N\int _Xu_i{\mathrm d}\rho {\mathfrak {m}}-\varepsilon \int _{X^N}\exp \left( \frac{u_1\oplus \cdots \oplus u_N-c}{\varepsilon }\right) \,{\mathrm d}{\mathfrak {m}}_{N}\right\} +\varepsilon \\&=\sup _{u\in C_b(X)}\left\{ N\int _Xu{\mathrm d}\rho {\mathfrak {m}}-\varepsilon \int _{X^N}\exp \left( \frac{u\oplus \cdots \oplus u-c}{\varepsilon }\right) \,{\mathrm d}{\mathfrak {m}}_{N}\right\} +\varepsilon , \end{aligned}$$

where \(v_1\oplus \cdots \oplus v_N\) denotes the operator \((v_1\oplus \cdots \oplus v_N)(x_1,\ldots , x_N) = v_1(x_1)+ \cdots + v_N(x_N)\).

Proof

First let us assume that X is a compact space. We denote by \({\mathcal {X}}= (C_b(X))^N\) and \({\mathcal {Y}}=C_b(X^N)\), where \(C_b(X)\) is the space of continuous and bounded functions on X, and similarly for \(X^N\). By Riesz representation theorem, the space \({\mathcal {Y}}\) is dual to the space \({\mathcal {M}}(X^N)\) of signed regular Borel measures on \(X^N\). Thus, we may define the Legendre–Fenchel transform \(G^{*}\) of a functional \(G:{\mathcal {Y}}\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace \) by

$$\begin{aligned} G^{*}:{\mathcal {M}}(X^N)\rightarrow {\mathbb {R}}\cup \lbrace +\infty \rbrace , \quad G^{*}[\pi ] = \sup _{\psi \in C_b(X^N)} \bigg \lbrace \int _{X^N}\psi \,{\mathrm d}\pi - G[\psi ] \bigg \rbrace . \end{aligned}$$

We define the functionals

$$\begin{aligned} F:(C_b(X))^N\rightarrow {\mathbb {R}}\cup \{+\infty \},~~(u_1,\ldots , u_N)\mapsto -\sum _{i=1}^N\int _Xu_i\,{\mathrm d}\rho {\mathfrak {m}}\end{aligned}$$

and

$$\begin{aligned} G:C_b(X^N)\rightarrow {\mathbb {R}}\cup \{+\infty \},~~\psi \mapsto \varepsilon \int _{X^N}e^{\tfrac{1}{\varepsilon }(\psi -c)}\,{\mathrm d}{\mathfrak {m}}_{N}, \end{aligned}$$

and the operator

$$\begin{aligned} A:(C_b(X))^N\rightarrow C_b(X^N),~~(u_1,\ldots , u_N)\mapsto u_1\oplus \cdots \oplus u_N. \end{aligned}$$

Now, F and G are proper and convex functionals and A is a linear and continuous operator. So, we may apply Fenchel–Rockafellar duality Theorem 4.1 to get

$$\begin{aligned} \inf \{F[x]+G[Ax]~|~x\in {\mathcal {X}}\}=\sup \{-G^*[\gamma ]-F^*[-A^\dag \gamma ]~|~\gamma \in {\mathcal {Y}}^*\}. \end{aligned}$$

This gives (since for every set S we have \(\inf (S)=-\sup (-S)\) and \(\sup S=-\inf (-S)\))

$$\begin{aligned} \inf \{G^*[\gamma ]+F^*[-A^\dag \gamma ]~|~\gamma \in {\mathcal {Y}}^*\}=\sup \{-F[x]-G[Ax]~|~x\in {\mathcal {X}}\}. \end{aligned}$$

It remains to show that the above expression has exactly the form of our duality claim. The claim that the right-hand sides correspond to each other follows immediately from our choices of \({\mathcal {X}}\), F, and G. So, it remains to show that

$$\begin{aligned} \inf \{G^*[\gamma ]+F^*[-A^\dag \gamma ]~|~\gamma \in {\mathcal {Y}}^*\}=C_\varepsilon [\gamma ]-\varepsilon . \end{aligned}$$
(4.1)

To prove it, let \(\gamma \in {\mathcal {M}}(X^N)\). Now we have

$$\begin{aligned} F^*[-A^*\gamma ]&= \sup \bigg \lbrace \int _{X^N}\sum ^N_{i=1}u_i(x_i)d\gamma - \sum ^N_{i=1}\int _{X}u_i(x_i)\,{\mathrm d}\rho {\mathfrak {m}}(x_i) ~\bigg |~ (u_1,\ldots ,u_N)\in C_b(X)^N \bigg \rbrace \\&={\left\{ \begin{array}{ll}0 &{}\text { if }\gamma \in \Pi _N(\rho )\\ +\infty &{}\text { otherwise}\end{array}\right. }. \end{aligned}$$

Let us then compute \(G^*[\gamma ]\):

$$\begin{aligned} G^*[\gamma ]=\sup _{\psi \in C_b(X^N)} \left\{ \int _{X^N}\psi \,{\mathrm d}\gamma - \varepsilon \int _{X^N}e^{\tfrac{1}{\varepsilon }(\psi -c)}\,{\mathrm d}{\mathfrak {m}}_{N} \right\} . \end{aligned}$$

If \(\gamma \) is not absolutely continuous with respect to \({\mathfrak {m}}_{N}\), we have \(G^*[\gamma ]=+\infty \). If \(\gamma \ll {\mathfrak {m}}_{N}\), then the supremum (that appears in the definition of \(G^{*}[\gamma ]\)) is realized at \(\psi =\varepsilon \log \rho _\gamma +c\); this holds also if the function \(\rho _\gamma \) is not continuous since it can be approximated by a sequence of continuous functions. Thus, we get for \(\gamma \ll {\mathfrak {m}}_{N}\)

$$\begin{aligned} G^*[\gamma ]&=\int _{X^N}\left( \rho _\gamma \cdot \psi -\varepsilon e^{\frac{1}{\varepsilon }(\psi -c)}\right) \,{\mathrm d}{\mathfrak {m}}_{N}\\&=\int _{X^N}(\varepsilon \rho _\gamma \log \rho _\gamma +c\rho _\gamma -\varepsilon \rho _\gamma )\,{\mathrm d}{\mathfrak {m}}_{N}. \end{aligned}$$

Hence, if \(\gamma \in \Pi _N(\rho )\), we have

$$\begin{aligned} G^*[\gamma ]=C_0[\gamma ]+\varepsilon E[\gamma ]-\varepsilon . \end{aligned}$$

This concludes the duality proof when X is a compact space.

The noncompact case Due to Lemma 2.3, it suffices to prove the claim in the case where the reference measure is \(\rho {\mathfrak {m}}\) instead of \({\mathfrak {m}}\); the finiteness of the measure \(\rho {\mathfrak {m}}\) now gives access to inner regularity and to the approximability by compact sets. We will for simplicity denote \(\rho :=\rho {\mathfrak {m}}\).

The claim is

$$\begin{aligned} \min _{\gamma \in \Pi _N(\rho )}C_\varepsilon [\gamma ]=\sup _{u\in C_b(X)}\left\{ N\int _Xu{\mathrm d}\rho -\varepsilon \int _{X^N}\exp \left( \frac{u\oplus \cdots \oplus u-c}{\varepsilon }\right) \,{\mathrm d}\rho ^{\otimes N}\right\} +\varepsilon . \end{aligned}$$

For simplicity, let us denote

$$\begin{aligned}D_\rho (u):=\left\{ N\int _Xu{\mathrm d}\rho -\varepsilon \int _{X^N}\exp \left( \frac{u\oplus \cdots \oplus u-c}{\varepsilon }\right) \,{\mathrm d}\rho ^{\otimes N}\right\} +\varepsilon ~~~\text {for all }u\in C_b(X).\end{aligned}$$

We may assume that \(\sup _{u\in C_b(X)}D_\rho (u)>-\infty \); indeed, since we can test with the function \(u\equiv 0\), this always holds for cost functions that are bounded from below.

Let us make, in the notation of the primal functional, the dependence on the reference measure explicit by the notation \(\gamma \mapsto C_\epsilon [\gamma \,|\,\mu ]\) when the reference measure on the space X is \(\mu \). Thus the original notation \(\gamma \mapsto C_\epsilon [\gamma ]\) corresponds to \(\gamma \mapsto C_\epsilon [\gamma \,|\,{\mathfrak {m}}]\).

Since the measures \(\rho \) and \(\gamma \) are inner regular, there exists a sequence \((K_n)_{n\in {\mathbb {N}}}\) of compact subsets of X such that

$$\begin{aligned} \rho (K_n)\rightarrow \rho (X)\qquad \text {and}\qquad \gamma (K_n^N)\rightarrow \gamma (X). \end{aligned}$$

Let us denote \(\gamma _n:=\frac{1}{\gamma (K_n^N)}\gamma \big |_{K_n^N}\) and \(\rho _n:=\frac{1}{\rho (K_n)}\rho \big |_{K_n}\). Let us also denote by \(\gamma _n^{\text {min}}\) the minimizer of the problem \(I_\epsilon [\rho _n]\). Since \(\gamma \) is the minimizer of the problem \(I_\epsilon (\rho )\) and since (due to the absolute continuity of the integral and continuity of the function \(t\mapsto t\log t\))

$$\begin{aligned} \lim _{n\rightarrow \infty }|C_\epsilon [\gamma _n\,|\,\rho _n]- C_\epsilon [\gamma \,|\,\rho ]|=0, \end{aligned}$$
(4.2)

we have

$$\begin{aligned} \lim _{n\rightarrow \infty }|C_\epsilon [\gamma _n\,|\,\rho _n]-C_\epsilon [\gamma _n^{\text {min}}\,|\,\rho _n]|=0. \end{aligned}$$
(4.3)

By the duality result proven above for compact spaces, we have for all \(n \in {\mathbb {N}}\)

$$\begin{aligned} \sup _{u\in C_b(K_n)}D_{\rho _n}(u)=C_\epsilon [\gamma _n^{\text {min}}\,|\,\rho _n]. \end{aligned}$$

Again, due to the absolute continuity of the integral, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\left|\sup _{u\in C_b(K_n)}D_{\rho _n}(u)-\sup _{u\in C_b(X)}D_\rho (u)\right|=0. \end{aligned}$$

Putting these conditions together, we get for all \(n\in {\mathbb {N}}\)

$$\begin{aligned}&\left| C_\epsilon [\gamma \,|\,\rho ]-\sup _{u\in C_b(X)}D_\rho (u)\right| \\&\quad \le \left| C_\epsilon [\gamma \,|\,\rho ]-C_\epsilon [\gamma _n\,|\,\rho _n]\right| +\left| C_\epsilon [\gamma _n\,|\,\rho _n]-C_\epsilon [\gamma _n^{\text {min}}\,|\,\rho _n]\right| \\&\qquad +\left| C_\epsilon [\gamma _n^{\text {min}}\,|\,\rho _n]-\sup _{u\in C_b(K_n)}D_{\rho _n}(u)\right| +\left| \sup _{u\in C_b(K_n)}D_{\rho _n}(u)-\sup _{u\in C_b(X)}D_{\rho }(u)\right| . \end{aligned}$$

The claim follows by letting \(n\rightarrow \infty \). \(\square \)