1 Introduction

In recent years, Optimal Transport and its link with the Ricci curvature in Riemannian geometry attracted a considerable amount of attention. The extensive modern book by Villani [57] is one of the main references on this topic. However, while a lot is now known in the Riemannian setting (and more generally in geodesic spaces), very little is known so far in discrete spaces (such as finite graphs or finite Markov chains), with the notable exception of some notions of (discrete) Ricci curvature proposed recently by several authors—unfortunately there is not yet a satisfactory (universally agreed upon) resolution even there—see Bonciocat-Sturm [6], Erbar-Maas [13], Hillion [19], Joulin [23], Lin-Yau [30], Maas [32], Mielke [38], Ollivier [39], and recent works on the displacement convexity of entropy by Hillion [20], Lehec [26] and Léonard [29].

In particular, the notions of Transport inequalities, HWI inequalities, interpolating paths on the measure space, displacement convexity of entropy, are yet to be properly introduced, analyzed and understood in discrete spaces. This is the chief aim of the present paper, and of a companion paper [17]. Due to its theoretical as well as applied appeal, this subject is at the intersection of many areas of mathematics, such as Calculus of Variations, Probability Theory, Convex Geometry and Analysis, as well as Combinatorial Optimization.

In order to present our results, let us first introduce some of the relevant notions in the continuous framework of geodesic spaces, see [57].

A complete, separable, metric space \((\mathcal X ,d)\) is said to be a geodesic space, if for all \(x_0,x_1 \in \mathcal X \), there exists at least one path \(\gamma :[0,1] \mapsto \mathcal X \) such that \(\gamma (0)=x_0, \gamma (1)=x_1\) and

$$\begin{aligned} d(\gamma (s),\gamma (t))=|t-s|d(x_0,x_1), \qquad \forall s,t \in [0,1]. \end{aligned}$$

Such a path is then called a constant speed geodesic between \(x_0\) and \(x_1\).

Then, for \(p \ge 1\), let \(\mathcal P _p(\mathcal X )\) be the set of Borel probability measures on \(\mathcal X \) having a finite \(p\)-th moment, namely

$$\begin{aligned} \mathcal P _p(\mathcal X ) := \left\{ \mu \hbox { Borel probability measure}: \int \limits _\mathcal X d(x_o,x)^p \mu (dx) < + \infty \right\} \,, \end{aligned}$$

where \(x_o \in \mathcal X \) is arbitrary (\(\mathcal P _p(\mathcal X )\) does not depend on the choice of the point \(x_o\)) and define the following \(L_p\)-Wasserstein distance: for \(\nu _0,\nu _1 \in \mathcal P _p(\mathcal X )\), set

$$\begin{aligned} W_p(\nu _0,\nu _1):= \left( \inf _{\pi \in \Pi (\nu _0,\nu _1)} \left\{ \int \int d(x,y)^p \,d\pi (x,y) \right\} \right) ^{1/p}\,, \end{aligned}$$
(1.1)

where \(\Pi (\nu _0,\nu _1)\) is the set of couplings of \(\nu _0\) and \(\nu _1\).

The metric space \((\mathcal{P }_{p}(\mathcal X ), W_p)\) is canonically associated to the original metric space \((\mathcal X ,d)\). Namely, if \(p>1\), \((\mathcal{P }_{p}(\mathcal X ), W_p)\) is geodesic if and only if \((\mathcal X ,d)\) is geodesic, see [54].

A remarkable and powerful fact is that, when \(\mathcal X \) is a Riemannian manifold, one can relate the Ricci curvature of the space to the convexity of entropy along geodesics [8, 31, 36, 45, 53, 56]. More precisely, under the Bakry-Emery \(\mathrm{CD}(K, \infty )\) condition (see e.g. [2]), namely if the space \((\mathcal X ,d,\mu )\) is such that \(\mathrm Ric +\mathrm Hess \,V\ge K\), where \(\mu (dx)=e^{-V(x)}\,dx\), then one can prove that for all \(\nu _0,\nu _1\in \mathcal P _{2}(\mathcal X )\) whose supports are included in the support of \(\mu \), there exists a constant speed \({W}_{2}\)-geodesic \(\{\nu _t\}_{t\in [0,1]}\) from \(\nu _0\) to \(\nu _1\) such that

$$\begin{aligned} H(\nu _t|\mu ) \le (1-t) H(\nu _0|\mu ) + t H(\nu _1|\mu ) - \frac{K}{2}t(1-t) {W}_2^2(\nu _0,\nu _1) \qquad \forall t\in [0,1], \nonumber \\ \end{aligned}$$
(1.2)

where \(H(\nu |\mu )\) denotes the relative entropy of \(\nu \) with respect to \(\mu \). Equation (1.2) is known as the \(K\) -displacement convexity of entropy. In fact, a converse statement also holds: if the entropy is \(K\)-displacement convex, then the Ricci curvature is bounded below by \(K\). This equivalence was used as a guideline for the definition of the notion of curvature in geodesic spaces by Sturm-Lott-Villani in their celebrated works [31, 54, 55].

Moreover, it is known that the \(K\)-displacement convexity of entropy is a very strong notion that implies many well-known inequalities in Convex Geometry and in Probability Theory, such as the Brunn-Minkowski inequality, the Prékopa-Leindler inequality, Talagrand’s transport-entropy inequality, HWI inequality, log-Sobolev inequality etc., see [57].

The question one would like to address is whether one can extend the above theory to discrete settings such as finite graphs, equipped with a set of probability measures on the vertices and with a natural graph distance.

Let us mention two main obstructions. Firstly, \({W}_{2}\)-geodesics do not exist in discrete settings (the reader can verify this fact by considering two nearest neighbors \(x,y\) in the graph \(G=(V,E)\) and constructing a constant speed geodesic between the two Dirac measures \(\delta _x, \delta _y\) at the vertices \(x\) and \(y\)). On the other hand, the following Talagrand’s transport-entropy inequality

$$\begin{aligned} W_2^2(\nu _0,\mu ) \le C\ H(\nu _0|\mu ), \qquad \forall \nu _0 \in \mathcal P _2(V)\, \end{aligned}$$
(1.3)

(for a suitable constant \(C>0\)) does not hold in discrete settings unless \(\mu \) is a Dirac measure! From these simple observations we deduce that \({W}_{2}\) is not well adapted either for defining the path \(\{\nu _t\}_{t\in [0,1]}\) or for measuring the defect/excess in the convexity of entropy in a discrete context.

In this paper, our contribution is to introduce the notion of an interpolating path \(\{\nu _t\}_{t\in [0,1]}\) and of a weak transport cost \(\widetilde{T}_{2}\) (that in a sense goes back to Marton [33, 34]). These will in turn help us to derive the desired displacement convexity results on finite graphs.

Before presenting our results, we give a brief state of the art of the field (to the best of our knowledge).

Ollivier and Villani [40] prove that, on the hypercube \(\Omega _n=\{0,1\}^n\), for any probability measures \(\nu _0, \nu _1\), there exists a probability measure \(\nu _{1/2}\) (concentrated on the set of mid-points, see [40] for a precise definition) such that

$$\begin{aligned} H(\nu _{1/2}|\mu ) \le \frac{1}{2} H(\nu _0|\mu ) + \frac{1}{2} H(\nu _1|\mu ) - \frac{1}{80n} W_1^2(\nu _0,\nu _1), \end{aligned}$$

where \(\mu \equiv 1/2^n\) is the uniform measure and \(W_1\) is defined with the Hamming distance. They observe that, this in turn implies some curved Brunn-Minkowski inequality on \(\Omega _n\). The constant \(1/n\) encodes, in some sense, the discrete Ricci curvature of the hypercube in accordance with the various definitions of the discrete Ricci curvature (see above for references).

Maas [32] introduces a pseudo Wasserstein distance \(\mathcal W _2\) that corresponds to the geodesic distance on the set, \(\mathcal P (\Omega _n)\), of probability measures on the hypercube \(\Omega _n\), equipped with a Riemannian metric. (In fact, his construction is more general and applies to a wide class of Markov kernels on finite graphs.) This metric is such that the continuous time random walk on the graph becomes a gradient flow of the function \(H(\cdot |\mu )\). This is further developed by Erbar and Maas [13] who prove, inter alia, that if \(\{\nu _t\}_{t\in [0,1]}\) is a geodesic from \(\nu _0\) to \(\nu _1\), then

$$\begin{aligned} H(\nu _t|\mu ) \le (1-t) H(\nu _0|\mu ) + t H(\nu _1|\mu ) - \frac{1}{n}t(1-t) \mathcal W _2^2(\nu _0,\nu _1), \qquad \forall t\in [0,1], \end{aligned}$$

where \(\mu \equiv 1/2^n\) is the uniform measure. Independently, Mielke [38] also obtains similar results. As a consequence of their displacement convexity property, these authors derive versions of log-Sobolev, HWI and Talagrand’s transport-entropy inequalities (involving \(\mathcal W _2\) and \(W_1\) distances) with sharp constants. Very recent works of Erbar [12] and Gigli-Maas [15] derive further results with the pseudo metric, demonstrating that the metric also works, in a certain sense, in continuous settings.

In a different direction (at the level of functional inequalities), besides the study of the log-Sobolev inequality which is now classical (see e.g. [1, 48]), Sammer and the last named author [49, 50] studied Talagrand’s inequality in discrete spaces, with \(W_1\) on the left hand side of (1.3). They also derived a discrete analogue of the Otto-Villani result [41]: that a modified log-Sobolev inequality implies the \(W_1\)-type Talagrand inequality. Connected to this, a few years ago, following seminal work of Bobkov and Ledoux [3], several researchers independently realized that modified versions of logarithmic Sobolev inequalities helped capture refined information that was lost while working with the classic log-Sobolev inequality of Gross. In the discrete setting of finite Markov chains, one such modified log-Sobolev inequality has been instrumental in capturing the rate of convergence to equilibrium in the (relative) entropy sense, see e.g. [5, 7, 10, 14, 16, 46, 48]. The current state of knowledge in identifying precise sufficient criteria to derive bounds on the entropy decay (or on the corresponding modified log-Sobolev constants) is unfortunately rather meagre. This is an independent motivation for our efforts at developing the discrete aspects of the displacement convexity property and related notions.

Now we describe some of the main results of the present paper. At first, we shall introduce the notion of an interpolating path \(\{\nu _t^\pi \}_{t\in [0,1]}\), on the set of probability measures on graphs, between two arbitrary probability measures \(\nu _0, \nu _1\). In fact, we define a family of interpolating paths, depending on a parameter \(\pi \in \Pi (\nu _0,\nu _1)\), which is a coupling of \(\nu _0, \nu _1\). The construction of this interpolating path is inspired by a certain binomial interpolation due to Johnson [22], see also [19, 20, 21]. In particular, we shall prove that such an interpolating path, for a properly chosen coupling \(\pi ^*\)—namely an optimal coupling for \(W_1\)—is actually a \(W_1\) constant speed geodesic: i.e., \(W_1(\nu _t^{\pi ^*},\nu _s^{\pi ^*})=|t-s|W_1(\nu _0,\nu _1)\) for all \(s,t \in [0,1]\), with \(W_1\) defined using the graph distance \(d\) (see Proposition 2.2 below). Such a family enjoys a tensorization (see Lemma 2.3) that is crucial in our derivation of the displacement convexity property on product of graphs.

Indeed, we shall prove the following tensoring property of a displacement convexity of entropy along the interpolating path \(\{\nu _t^\pi \}_{t\in [0,1]}\). This is one of our main results (see Theorem 1.1 below). In order to state the result, we define here the notion of a quadratic cost, which we will elaborate on, in the later sections.

Let \(G=(V,E)\) be a (finite) connected, undirected graph, and let \(\mathcal P (V)\) denote the set of probability measures on the vertex set \(V\). Given two probability measures \(\nu _0\) and \(\nu _1\) on \(V\), let \(\Pi (\nu _0, \nu _1)\) denote the set of couplings (joint distributions) of \(\nu _0\) and \(\nu _1\).

Given \(\pi \in \Pi (\nu _0,\nu _1)\), consider the probability kernels \(p\) and \(\bar{p}\) defined by: \(p(x,y)=\delta _x(y)\) (the Dirac mass at \(x\) evaluated at site \(y\)) if \(\nu _0(x)=0\), \(\bar{p}(x,y)=\delta _y(x)\) if \(\nu _1(y)=0\), and otherwise

$$\begin{aligned} \pi (x,y)=\nu _0(x)p(x,y) = \nu _{1}(y)\bar{p}(y,x), \ \ x,y \in V, \end{aligned}$$

and set

$$\begin{aligned} I_2(\pi ):= \sum _{x \in V} \left( \sum _{y \in V} d(x,y) p(x,y) \right) ^2 \nu _0(x), \qquad \bar{I}_2(\pi ):= \sum _{y \in V} \left( \sum _{x \in V} d(x,y) \bar{p}(y,x) \right) ^2 \nu _1(y)\nonumber \\ \end{aligned}$$
(1.4)
$$\begin{aligned} J_2(\pi ):= \left( \sum _{x \in V} \sum _{y \in V} d(x,y)\pi (x,y) \right) ^2. \end{aligned}$$
(1.5)

Let \(\mu \) be a (reference) probability measure in \(\mathcal P (V)\) charging all points (i.e. such that \(\mu (x)>0\) for all \(x\in V\)). We say a graph \(G\), equipped with the distance \(d\) and the probability measure \(\mu \), satisfies the displacement convexity property (of entropy), if there exists a \(C = C(G, d, \mu ) > 0\), so that for any \(\nu _0, \nu _1 \in \mathcal P (V)\), there exists a \(\pi \in \Pi (\nu _0, \nu _1)\) satisfying:

$$\begin{aligned} H(\nu _t^\pi | \mu ) \le (1-t)H(\nu _0 | \mu ) + tH(\nu _1 | \mu ) - C t(1-t)(I_2(\pi )+\bar{I}_2(\pi ))\,, \quad \forall t\in [0,1]. \end{aligned}$$

The quantity \(I_2(\pi )\) goes back to Marton [33, 34] in her definition of the following transport cost, we call weak transport cost:

$$\begin{aligned} \widetilde{W}_2^2(\nu _0,\nu _1) := \inf _{\pi \in \Pi (\nu _0,\nu _1)} I_2(\pi ) + \inf _{\pi \in \Pi (\nu _0,\nu _1)} \bar{I}_2(\pi ). \end{aligned}$$

For more on this Wasserstein-type distance, see [11, 35, 51]. The precise statement of our tensorization theorem is as follows. For a graph, by the graph distance between two vertices, we mean the length of a shortest path between the two vertices.

Theorem 1.1

For \(i\in \{1,\ldots ,n\}\), let \(\mu ^i\) be a probability measure on \(G_i=(V_i, E_i)\), with the graph distance \(d_i\), that charges all points. Assume also that for each \(i\in \{1,\ldots ,n\}\) there is a constant \(C_i\ge 0\) such that for all probability measures \(\nu _0,\nu _1\) on \(V_i\), there exists \(\pi = \pi ^i \in \Pi (\nu _0,\nu _1)\) such that it holds

$$\begin{aligned} H(\nu _t^\pi | \mu ^i) \le (1\!-\!t)H(\nu _0 | \mu ^{i}) \!+\! tH(\nu _1 | \mu ^{i}) \!-\! C_it(1-t)(I_2(\pi )+\bar{I}_2(\pi )) \quad \forall t\in [0,1]. \end{aligned}$$

Then the product probability measure \(\mu =\mu ^1\otimes \cdots \otimes \mu ^n\) defined on the Cartesian product \(G=(V,E)=G_1 \Box \cdots \Box G_n\) (see Sect. 1.1 for a precise definition) verifies the following property: for all probability measures \(\nu _0,\nu _1\) on \(V\), there exists \(\pi = \pi ^{(n)}\in \Pi (\nu _0,\nu _1)\) satisfying,

$$\begin{aligned} H(\nu _t^\pi | \mu ) \le (1\!-\!t)H(\nu _0 | \mu ) \!+\! tH(\nu _1 | \mu ) \!-\! Ct(1-t)(I_2^{(n)}(\pi )+\bar{I}^{(n)}_2(\pi )) \quad \forall t\in [0,1], \end{aligned}$$

where \(C=\min _i C_i\),

$$\begin{aligned} I_2^{(n)}(\pi ) := \sum _{x \in V_1 \times \cdots \times V_{n}} \sum _{i=1}^n \left( \sum _{y \in V_1 \times \cdots \times V_{n}} d_i(x_i,y_i) \frac{\pi (x,y)}{\nu _0(x)} \right) ^2 \nu _0(x), \end{aligned}$$
(1.6)

and

$$\begin{aligned} \bar{I}_2^{(n)}(\pi ) := \sum _{y \in V_1 \times \cdots \times V_{n}} \sum _{i=1}^n \left( \sum _{x \in V_1 \times \cdots \times V_{n}} d_i(x_i,y_i) \frac{\pi (x,y)}{\nu _1(y)} \right) ^2 \nu _1(y). \end{aligned}$$
(1.7)

(and with \(I_2(\pi ):=I_2^{(1)}(\pi )\) and similarly for \(\bar{I}_2(\pi )\)). The same proposition holds replacing \(I_2(\pi )+\bar{I}_2(\pi )\) by \(J_2(\pi )\) and \(I_2^{(n)}(\pi )+\bar{I}^{(n)}_2(\pi )\) by \(J_2^{(n)}(\pi )\), where

$$\begin{aligned} J_2^{(n)}(\pi ) := \sum _{i=1}^n \left( \sum _{x,y \in V_1 \times \cdots \times V_{n}} d_i(x_i,y_i) \pi (x,y) \right) ^2. \end{aligned}$$
(1.8)

In particular, as a consequence of the above tensorization theorem, we shall prove that, given two probability measures \(\nu _0, \nu _1\) on the hypercube \(\Omega _n=\{0,1\}^n\), there exists a coupling \(\pi \) such that

$$\begin{aligned} H(\nu _t^\pi |\mu ) \le (1-t) H(\nu _0|\mu ) + t H(\nu _1|\mu ) - \frac{1}{2}t(1-t) \widetilde{W}_2^{(n)}(\nu _0,\nu _1)^2 \,, \quad \forall t\in [0,1] \nonumber \\ \end{aligned}$$
(1.9)

where \(\mu \) is any product of non-trivial Bernoulli measures and \(\widetilde{W}_2^{(n)}(\nu _0,\nu _1)^2 := \inf _{\pi \in \Pi (\nu _0,\nu _1)} I_2^{(n)}(\pi ) + \inf _{\pi \in \Pi (\nu _0,\nu _1)} \bar{I}_2^{(n)}(\pi )\). As it is easy to see, the weak transport cost \(\tilde{W}_2\) is weaker than \(W_2\), but stronger than \(W_1\). Moreover, \(\widetilde{W}_2^2(\nu _0,\nu _1) \ge \frac{2}{n} W_1^2(\nu _0,\nu _1)\) (see below) so that (1.9) captures, in a sense, a discrete Ricci curvature of the hypercube (see [40] and references therein).

As a by-product of the displacement convexity property above, we shall derive a series of consequences. More precisely, we shall first derive a so-called HWI inequality.

Proposition 1.2

Let \(\mu \) be a probability measure charging all points on \(V^n\), the vertex set of a product graph \(G=G \Box \cdots \Box G\). Assume that \(\mu \) verifies the following displacement convexity inequality: there is some \(c>0\) such that for any probability measures \(\nu _0, \nu _1\) on \(V^n\), there exists a coupling \(\pi \in \Pi (\nu _0,\nu _1)\) such that

$$\begin{aligned} H(\nu _t^\pi |\mu ) \le (1\!-\!t) H(\nu _0|\mu ) \!+\! t H(\nu _1|\mu ) \!-\!ct(1-t) (I_2^{(n)} (\pi )+\bar{I}_2^{(n)}(\pi )) \quad \forall t \in [0,1]. \end{aligned}$$

Then \(\mu \) verifies

$$\begin{aligned} H(\nu _0|\mu )&\le H(\nu _1|\mu ) \nonumber \\&\quad +\, \sqrt{\sum _{x\in V^n} \sum _{i=1}^n \left[ \sum _{z\in N_i(x)} \left( \log \frac{\nu _0(x)}{\mu (x)} - \log \frac{\nu _0(z)}{\mu (z)} \right) \right] _{+}^2\nu _0(x)}\sqrt{I_2^{(n)}(\pi )}\nonumber \\&\quad -\, c(I_2^{(n)}(\pi ) + \bar{I}_2^{(n)}(\pi )), \end{aligned}$$
(1.10)

for the same \(\pi \in \Pi (\nu _0,\nu _1)\) as above, where \(N_i(x)\) is the set of neighbors of \(x\) in the \(i\)-th direction: \(N_i(x)=\{z \in V^n ; d(x,z)=1\text { and } x_i\ne z_i\}\).

On the hypercube \(\Omega _n=\{0,1\}^n\), the latter implies the following log-Sobolev-type inequality [that can be seen as a reinforcement of a discrete modified log-Sobolev inequality (see Corollary 5.4)]: if \(\mu \) is a product of non-trivial Bernoulli measures, for any \(f :\Omega _n \rightarrow (0,\infty )\), it holds

$$\begin{aligned} \mathrm{Ent }_{\mu }(f)\le \frac{1}{2} \sum _{x\in \Omega _n} \sum _{i=1}^n \left[ \log f(x) - \log f(\sigma _i(x)) \right] _{+}^2f(x)\mu (x) - \frac{1}{2} \widetilde{W}_2^2 (f\mu |\mu ), \end{aligned}$$

where \(\sigma _i(x)=(x_1,\dots ,x_{i-1},1-x_i,x_{i+1},\dots ,x_n)\) is the vector \(x=(x_1,\dots ,x_n)\) with the \(i\)-th coordinate flipped, and the constant \(1/2\) (in front of the Dirichlet form) is optimal.

From this, by means of the Central Limit Theorem, the above reinforced modified log-Sobolev inequality actually leads to the usual logarithmic Sobolev inequality of Gross [18] for the standard Gaussian, with the optimal constant (see Corollary 5.5).

In a different direction, we also prove that the displacement convexity along the interpolating path \(\{\nu _t^\pi \}_{t\in [0,1]}\) implies a discrete Prékopa-Leindler Inequality (Theorem 6.3), which in turn, as in the continuous setting, implies a logarithmic Sobolev inequality and a (weak) transport-entropy inequality of the Talagrand-type:

$$\begin{aligned} \widetilde{W}_2^2 (\nu |\mu ) \le C\ H(\nu |\mu ),\quad \forall \nu \, \end{aligned}$$

for a suitable constant \(C>0\).

These implications and inequalities are studied in further detail—their various links with the concentration of measure phenomenon and with other functional inequalities—in the companion paper [17].

We may summarize the various implications that we prove in the following diagram:

figure a

In summary, our paper develops various theoretical objects of much current interest (the interpolating path \(\{\nu _t^\pi \}_{t\in [0,1]}\), the weak transport cost \(\widetilde{W}_2\), the displacement convexity property and its consequences) in a discrete context. Our concrete examples include the complete graph and the hypercube. However, our theory applies to other graphs (not necessarily product type) that we will collect in a forthcoming paper. Also, we believe that our results open a wide class of new problems and new directions of investigation in Probability Theory, Convex Geometry and Analysis.

Finally, we mention that, during the final preparation of this work, we learned that Erwan Hillion independently introduced the same kind of interpolating path, but between a Dirac at a fixed point \(o \in G\) of the graph and any arbitrary measure (hence without coupling \(\pi \)), and derive a certain displacement convexity property [20] along the interpolation. In [20], the author also deals with the \(f \cdot g\) decomposition introduced by Léonard [29].

Our presentation follows the following table of contents.

1.1 Notation

Throughout the paper we shall use the following notation.

Graphs \(G=(V,E)\) will denote a finite connected undirected graph with the vertex set \(V\) and the edge set \(E\). For any two vertices \(x\) and \(y\) of \(G\), \(x \sim y\) means that \(x\) and \(y\) are nearest neighbors (for the graph structure of \(G\)), i.e. \((x,y) \in E\). We use \(d\) for the graph distance defined below.

Given two graphs \(G_1=(V_1,E_1)\), \(G_2=(V_2,E_2)\), with graph distances \(d_1\), \(d_2\) respectively, we set \(G_1\, \Box \, G_2 = (V_1 \times V_2, E_1\, \Box \, E_2)\) for the Cartesian product of the two graphs, equipped with the \(\ell ^1\) distance \(d(x,y)=d_1(x_1,y_1)+d_2(x_2,y_2)\), for all \(x=(x_1,x_2), y=(y_1,y_2) \in G_1 \times G_2\). More precisely, \(((x_1,x_2),(y_1,y_2)) \in E_1\, \Box \, E_2\) if either \(x_1=y_1\) and \(x_2 \sim y_2\), or \(x_1 \sim y_1\) and \(x_2=y_2\). The Cartesian product of \(G\) with itself will simply be denoted by \(G^2\), and more generally by \(G^n\), for all \(n\ge 2.\)

Paths and geodesics A path \(\gamma =(x_0,x_1,\dots ,x_n)\) (of \(G\)) is an oriented sequence of vertices of \(G\) satisfying \(x_{i-1} \sim x_i\) for any \(i=1\dots ,n\). Such a path starts at \(x_0\) and ends at \(x_n\) and is said to be of length \(|\gamma |=n\). The graph distance \(d(x,y)\) between two vertices \(x,y \in G\) is the minimal length of a path connecting \(x\) to \(y\). Any path of length \(n=d(x,y)\) between \(x\) and \(y\) is called a geodesic between \(x\) and \(y\). By construction, any geodesic is self-avoiding. We will denote by \(\Gamma (x,y)\) the set of all geodesics from \(x\) to \(y\).

We will say that a path \(\gamma =(x_0,x_1,\ldots ,x_n)\) crosses the vertex \(z\in V\), if there is some \(k\) such that \(z=x_k\). In this case, we will write \(z\in \gamma .\) Given \(z \in V\), we set \(C(z)=\{(x,y) \hbox { such that } z \in \gamma \hbox { for some } \gamma \in \Gamma (x,y)\}\) for the set of couples such that some geodesic joining them goes through \(z\). Conversely, if \(z\) belongs to some geodesic between \(x\) and \(y\), we shall write \(z\in [\![x,y]\!]\) and say that \(z\) is between \(x\) and \(y\). Finally, for all \(x,y,z\in V\), we will denote by \(\Gamma (x,z,y)\), the set of geodesics \(\gamma \in \Gamma (x,y)\) such that \(z\in \gamma \). This set is nonempty if and only if \(z\in [\![x,y]\!]\).

Probability measures and couplings We write \(\mathcal{P }(V)\) for the set of probability measures on \(V\). Given a probability measure \(\nu \in \mathcal{P }(V)\) and a function \(f :V \rightarrow \mathbb{R }\), \(\nu (f)=\sum _{z \in V} \nu (z)f(z)\) denotes the mean value of \(f\) with respect to \(\nu \). We may also use the alternative notation \(\nu (f)=\int f(x) \,\nu (dx) = \int f(x)\,d\nu (x)=\int f\,d\nu \).

Let \(\nu ,\mu \in \mathcal P (V)\); the relative entropy of \(\nu \) with respect to \(\mu \) is defined by

$$\begin{aligned} H(\nu |\mu )= \left\{ \begin{array}{ll} \int \frac{d\nu }{d\mu } \log \frac{d\nu }{d\mu } \,d\mu &{}\quad \hbox {if } \nu \ll \mu \\ +\infty &{}\quad \hbox {otherwise} \end{array} \right. \end{aligned}$$

where \(\nu \ll \mu \) means that \(\nu \) is absolutely continuous with respect to \(\mu \), and \(\frac{d\nu }{d\mu }\) denotes the density of \(\nu \) with respect to \(\mu \). Also, the total-variation distance is defined by \(\Vert \nu - \mu \Vert _{TV}:=\sum _{x \in V} |\nu (x) - \mu (x)|\).

Given a density \(f :V \rightarrow (0,\infty )\) with respect to a given probability measure \(\mu \) (i.e. \(\mu (f)=1\)), we shall use the following notation for the relative entropy of \(f\mu \) with respect to \(\mu \):

$$\begin{aligned} \mathrm{Ent }_\mu (f) := H(f\mu |\mu ) = \int f \log f d\mu . \end{aligned}$$

If \(f:V \rightarrow (0,\infty )\) is no longer a density, then \(\mathrm{Ent }_\mu (f) := \int f \log (f/\mu (f))\,d\mu \).

Given two graphs \(G_1=(V_1,E_1)\) and \(G_2=(V_2,E_2)\) and a probability measure \(\mu \in \mathcal P (V_1 \times V_2)\) on the product, we disintegrate \(\mu \) as follows: let \(\mu ^1\) be the first marginal of \(\mu \), i.e. \(\mu ^1(x_1)=\sum _{x_2 \in V_2} \mu (x_1,x_2)=\mu (x_1,V_2)\), for all \(x_1 \in V_1\), and set \(\mu ^2(x_2|x_1)\) so that

$$\begin{aligned} \mu (x_1,x_2)=\mu ^1(x_1) \mu ^2(x_2 |x_1), \qquad \forall (x_1,x_2) \in V_1 \times V_2, \end{aligned}$$
(1.11)

with the convention that \(\mu ^2(\cdot |x_1)=\delta _{x_1}(\cdot )\) (the Dirac mass at site \(x_1\)) if \(\mu ^1(x_1)=0\). Equation (1.11) will be referred to as the disintegration formula of \(\mu \).

Recall that a coupling \(\pi \) of two probability measures \(\mu \) and \(\nu \) in \(\mathcal P (V)\) is a probability measure on \(V^2\) so that \(\mu \) and \(\nu \) are its first and second marginals, respectively: i.e. \(\pi (x,V)=\mu (x)\) and \(\pi (V,y)=\nu (y)\), for all \(x, y \in V\). Given \(\mu , \nu \in \mathcal P (V)\), the set of all couplings of \(\mu \) and \(\nu \) will be denoted by \(\Pi (\mu ,\nu )\).

Moreover, given two probability measures \(\mu \) and \(\nu \) in \(\mathcal P (V)\), we denote by \(P(\mu ,\nu )\) the set of probability kernelsFootnote 1 \(p\) such that

$$\begin{aligned} \sum _{x \in V} \mu (x)p(x,y) = \nu (y), \qquad \forall y \in V. \end{aligned}$$

By construction, given \(p \in P(\mu ,\nu )\), one defines a coupling \(\pi \in \Pi (\mu ,\nu )\) by setting \(\pi (x,y)=\mu (x) p(x,y)\), \(x,y \in V\). Conversely, given a coupling \(\pi \in \Pi (\mu ,\nu )\), we canonically construct a kernel \(p \in P(\mu ,\nu )\) by setting \(p(x,y)=\pi (x,y)/\mu (x)\) when \(\mu (x) \ne 0\) and \(p(x,y)=\delta _x(y)\) otherwise.

Warning 1: In the sequel, it will always be understood, although not explicitly stated, that \(p(x,y)=\delta _x(y)\) if \(\mu (x)=0\) and similarly in the disintegration formula (1.11).

Warning 2: Throughout, we will use the French notation \(C_n^k := \genfrac(){0.0pt}{}{n}{k} = \frac{n!}{k!(n-k)!}\) for the binomial coefficients.

2 A notion of a path on the set of probability measures on graphs

The aim of this section is to define a class of paths \(\{\{\nu _t^\pi \}_{t \in [0,1]}, \pi \in \Pi (\nu _0,\nu _1)\}\), between probability measures \(\nu _0, \nu _1\), on graphs. As proved below, for some optimal \(\pi ^*\), the path \(\{\nu _t^{\pi ^*}\}_{t \in [0,1]}\) is a geodesic, in the space of probability measures equipped with the Wasserstein distance \(W_1\) (see below). It has the nice feature of allowing tensorization.

2.1 Construction

Inspired by [22], we will first construct an interpolating path between two Dirac measures \(\delta _x\) and \(\delta _y\), for arbitrary \(x,y \in V\), on the set of probability measures \(\mathcal{P }(V)\). Fix \(x,y \in V\) and denote by \(\Gamma \) the random variable that chooses uniformly at random a geodesic \(\gamma \) in \(\Gamma (x,y)\). Also, for any \(t \in [0,1]\), let \(N_t \sim \mathcal B (d(x,y),t)\) be a binomial variable of parameter \(d(x,y)\) and \(t\), independent of \(\Gamma \) (observe that \(N_0=0\) and \(N_1=d(x,y)\)). Then denote by \(X_t=\Gamma _{N_t}\) the random position on \(\Gamma \) after \(N_t\) jumps starting from \(x\). Finally, set \(\nu _t^{x,y}\) for the law of \(X_t\).

By construction, \(\nu _t^{x,y}\) is clearly a path from \(\delta _x\) to \(\delta _y\). Moreover, for all \(z\in V\), we have

$$\begin{aligned} \nu _t^{x,y}(z)&= \sum _{\gamma \in \Gamma (x,y)} \mathbb{P }(X_t=z|\Gamma =\gamma ,z\in \Gamma ) \mathbb{P }(\Gamma =\gamma , z\in \gamma )\\&= \sum _{\gamma \in \Gamma (x,y)} C_{d(x,y)}^{d(x,z)} t^{d(x,z)} (1-t)^{d(y,z)} \frac{ {1\!\!1}_{z\in \gamma }}{|\Gamma (x,y)|}. \end{aligned}$$

Therefore

$$\begin{aligned} \nu _t^{x,y}(z)= C_{d(x,y)}^{d(x,z)} t^{d(x,z)} (1-t)^{d(y,z)}\; \frac{|\Gamma (x,z,y)|}{|\Gamma (x,y)|}. \end{aligned}$$

For all \(z\) between \(x\) and \(y\) we observe that

$$\begin{aligned} |\Gamma (x,z,y)|=|\Gamma (x,z)| \times |\Gamma (z,y)|, \end{aligned}$$
(2.1)

since there is a one-to-one correspondence between the sets of geodesics from \(x\) to \(z\) and from \(z\) to \(y\), and the set of geodesics from \(x\) to \(y\) that cross the vertex \(z\), just by gluing the path from \(x\) to \(z\) to the path from \(z\) to \(y\), and by using that \(d(x,y)=d(x,z)+d(z,y)\). Therefore \(\nu _t^{x,y}\) takes the form

$$\begin{aligned} \nu _t^{x,y}(z)= C_{d(x,y)}^{d(x,z)} t^{d(x,z)} (1-t)^{d(y,z)}\; \frac{|\Gamma (x,z)| \times |\Gamma (z,y)|}{|\Gamma (x,y)|} {1\!\!1}_{z\in [\![x,y]\!]}. \end{aligned}$$
(2.2)

Observe that, for any \(x,y \in V\) and any \(t \in (0,1)\), \(\nu _t^{x,y} = \nu _{1-t}^{y,x}\).

Remark 2.1

In the construction above of the interpolation \(\nu ^{x,y}_t\), the choice of the binomial random variable for the number \(N_t\) of jumps might seem somewhat ad hoc; however, in Proposition 2.5 below, we show that in fact the choice is necessary for \(\nu ^{x,y}_t\) to tensorise over a (Cartesian) product of graphs.

Given the family \(\{\nu _t^{x,y}\}_{x,y}\), we can now construct a path from any measure \(\nu _0 \in \mathcal P (V)\) to any measure \(\nu _1\in \mathcal P (V)\). Namely, given a coupling \(\pi \in \Pi (\nu _0,\nu _1)\) of \(\nu _0\) and \(\nu _1\), we define

$$\begin{aligned} \nu _t^\pi (\,\cdot \,)= \sum _{(x,y) \in V^2} \pi (x,y)\nu _t^{x,y}(\,\cdot \,), \quad \forall t \in [0,1]. \end{aligned}$$
(2.3)

By construction we have \(\nu _0^\pi =\nu _0\) and \(\nu _1^\pi =\nu _1\). Furthermore, observe that, if \(\nu _0=\delta _x\) and \(\nu _1=\delta _y\), then necessarily \(\pi = \delta _x \otimes \delta _y\) and thus \(\nu _t^\pi =\nu _t^{x,y}\).

We end Sect. 2.1 with two specific examples.

2.1.1 The complete graph \(K_n\)

Let \(K_n\) be the complete graph with \(n\) vertices. Then, given any two points \(x,y \in K_n\), there exists only one geodesic from \(x\) to \(y\), namely \(\Gamma (x,y)=\{(x,y)\}\). Hence, by construction of \(\nu _t^{x,y}\), we have

$$\begin{aligned} \nu _t^{x,y}(z)=0 \;\forall z \ne x,y; \quad \nu _t^{x,y}(x)= 1-t, \quad \hbox {and} \quad \nu _t^{x,y}(y)= t. \end{aligned}$$
(2.4)

Therefore, for any coupling \(\pi \) with marginals \(\nu _0\) and \(\nu _1\) (two given probability measures on \(K_n\)), we have for any \(z \in K_n\),

$$\begin{aligned} \nu _t^\pi (z)&= \sum _{(x,y) \in C(z)} \nu _t^{x,y}(z) \pi (x,y) = \sum _{y \in K_n} \nu _t^{z,y}(z) \pi (z,y) + \sum _{x \in K_n} \nu _t^{x,z}(z) \pi (x,z) \\&= (1-t) \sum _{y \in K_n} \pi (z,y) + t \sum _{x \in K_n} \pi (x,z) = (1-t) \nu _0(z) + t \nu _1(z). \end{aligned}$$

As a conclusion, on the complete graph, \(\nu _t^\pi \) is a simple linear combination of \(\nu _0\) and \(\nu _1\) that does not depend on \(\pi \), namely \(\{\{\nu _t^\pi \}_{t \in [0,1]}, \pi \in \Pi (\nu _0,\nu _1)\} = \{\{t\nu _0 + (1-t)\nu _1\}_{t \in [0,1]}\}\).

2.1.2 The \(n\)-dimensional hypercube \(\Omega _n\)

Consider the \(n\)-dimensional hypercube \(\Omega _n=\{0,1\}^n\) whose edges consist of pairs of vertices p that differ in precisely one coordinate. The graph distance here coincides with the Hamming distance:

$$\begin{aligned} d(x,y)= \sum _{i=1}^n {1\!\!1}_{x_i \ne y_i},\quad x,y \in \Omega _n. \end{aligned}$$

Then, one observes that \(|\Gamma (x,y)|=d(x,y)!\) [since, in order to move from \(x\) to \(y\) in the shortest way, one just needs to choose, among \(d(x,y)\) coordinates where \(x\) and \(y\) differ, the order of the flips ( i.e. moves from \(x_i\) to \(1-x_i\))]. It follows from (2.2) that, as soon as \(z\) belongs to a geodesic from \(x\) to \(y\),

$$\begin{aligned} \nu _t^{x,y}(z) = C_{d(x,y)}^{d(x,z)} t^{d(x,z)}(1-t)^{d(y,z)} \frac{d(x,z)! d(y,z)!}{d(x,y)!} = t^{d(x,z)}(1-t)^{d(y,z)}, \end{aligned}$$

and \(\nu _t^{x,y}(z)=0\) if \(z\) does not belong to a geodesic from \(x\) to \(y\).

Given two probability measures on \(\Omega _n\), and a coupling \(\pi \) on \(\Omega _n \times \Omega _n\), we can finally define

$$\begin{aligned} \nu _t^\pi (z) = \sum _{(x,y)\in \Omega _n^2} t^{d(x,z)}(1-t)^{d(y,z)} {1\!\!1}_{z\in [\![x,y]\!]} \pi (x,y). \end{aligned}$$

2.2 Geodesics for \(W_1\)

Next we prove that, when \(\pi \) is well chosen, \((\nu _t^\pi )_{t \in [0,1]}\) is a geodesic from \(\nu _0\) to \(\nu _1\) on the set of probability measures \(\mathcal P (V)\) equipped with the Wasserstein \(L_{1}\)-distance \(W_{1}\).

Given two probability measures \(\mu \) and \(\nu \) on \(\mathcal P (V)\), recall that

$$\begin{aligned} W_1(\mu ,\nu )= \inf _{\pi \in \Pi (\nu _0,\nu _{1})} \int \int d(x,y)\,\pi (dx \ dy)=\inf _{X\sim \mu , Y\sim \nu } \mathbb{E }[d(X,Y)]. \end{aligned}$$

The following result asserts that \((\nu _t^{\pi ^*})_{t \in [0,1]}\) is actually a geodesic for \(W_1\) when \(\pi ^*\) is an optimal coupling. For simplicity we assume here that \(V\) is finite so that \(\pi ^*\) always exists (but is not necessarily unique).

Proposition 2.2

Assume that \(V\) is finite. Then, for any probability measures \(\nu _0,\nu _1 \in \mathcal P (V)\), it holds

$$\begin{aligned} W_1(\nu _s^{\pi ^*},\nu _t^{\pi ^*}) = |t-s| W_1(\nu _0,\nu _1) \qquad \forall s,t \in [0,1] \end{aligned}$$

where \(\pi ^*\) is an optimal coupling in the definition of \(W_1(\nu _0,\nu _1)\) and where \(\nu _t^{\pi ^*}\) is defined in (2.3).

Proof

Fix two probability measures \(\nu _0\), \(\nu _1 \in \mathcal P (V)\) and \(\pi ^*\) an optimal coupling in the definition of \(W_1(\nu _0,\nu _1)\) (since \(\mathcal P (V)\) is compact \(\pi ^*\) is well defined). For brevity, set \(\nu _t:= \nu _t^{\pi ^*}\).

First, we claim that it is enough to prove that

$$\begin{aligned} W_1(\nu _s,\nu _t) \le (t-s) W_1(\nu _0,\nu _1), \qquad \forall s,t \in [0,1] \hbox { with } s \le t. \end{aligned}$$
(2.5)

Indeed, assume (2.5), then recalling that \(W_1\) is a distance (see e.g. [57]), by the triangle inequality we have

$$\begin{aligned} W_1(\nu _0,\nu _1)&\le W_1(\nu _0,\nu _s) + W_1(\nu _s,\nu _t) + W_1(\nu _t,\nu _1)\\&\le s W_1(\nu _0,\nu _1)+ (t-s) W_1(\nu _0,\nu _1) + t W_1(\nu _0,\nu _1) \\&= W_1(\nu _0,\nu _1). \end{aligned}$$

Hence, all the inequalities used above are actually equalities, which guarantees the conclusion of the proposition and hence the claim.

Now, we prove (2.5). Let \((X,Y)\) be a random couple with the law \(\pi ^*\). Fix \(s \le t\), it suffices to construct a random couple \((X_s, X_t)\) with marginal laws \(\nu _s\) and \(\nu _t\) so that

$$\begin{aligned} \mathbb{E }[d(X_s,X_t)]\le (t-s)\mathbb{E }[d(X,Y)]= (t-s) W_1(\nu _0,\nu _1). \end{aligned}$$

From the last observation, let us remark that such a couple \((X_s, X_t)\) will therefore realize

$$\begin{aligned} \mathbb{E }[d(X_s,X_t)]=W_1(\nu _s,\nu _t). \end{aligned}$$

Let \(\bigl ((U_s^i,V_t^i)\bigr )_{i\ge 1}\) be an independent identically distributed sequence of random couples in \(\{0,1\}^2\), independent of \(X\) and \(Y\). We choose the law of \((U_s^1,V_t^1)\) given by

$$\begin{aligned} \mathbb{P }((U_s^1,V_t^1)=(0,0))=1-t,\quad \mathbb{P }((U_s^1,V_t^1)=(0,1))=t-s, \end{aligned}$$
$$\begin{aligned} \quad \mathbb{P }((U_s^1,V_t^1)=(1,0))=0,\quad \mathbb{P }((U_s^1,V_t^1)=(1,1))=s, \end{aligned}$$

so that \( U_s^1\) and \(V_t^1\) are Bernoulli random variables with respective parameters \(s\) and \(t\), and we have

$$\begin{aligned} \mathbb{E }(|U_s^1-V_t^1|)=(t-s). \end{aligned}$$

Given \((X,Y)=(x,y)\), with \(x,y\in V\), let \((N_s,N_t)\) denote the random couple defined by

$$\begin{aligned} N_s= \sum _{i=1}^{d(x,y)} U_s^i, \quad N_t= \sum _{i=1}^{d(x,y)} V_t^i. \end{aligned}$$

Then the laws of \(N_s\) and \(N_t\) given \((X,Y)=(x,y)\) are respectively \(\mathcal B (d(x,y),s)\) and \(\mathcal B (d(x,y),t)\), the binomial distribution with parameters \(d(x,y)\), \(s\) and \(t\) respectively.

Finally, given \((X,Y)=(x,y)\), with \(x,y\in V\), let \(\Gamma \) denote a random geodesic chosen uniformly in \(\Gamma (x,y)\), independently of the sequence \(\left( (U_s^i,V_t^i)\right) _{i\ge 1}\), and let \(X_s= \Gamma _{N_s}\) be the random position on \(\Gamma \) after \(N_s\) jumps and \(X_t= \Gamma _{N_t}\) be the random position on \(\Gamma \) after \(N_t\) jumps. By definition, the law of \(X_s\) and \(X_t\) are respectively \(\nu _s\) and \(\nu _t\) and one has \(d(X_s,X_t)= |N_s-N_t|\). Moreover, according to this construction, one has

$$\begin{aligned} \mathbb{E }[ d(X_s,X_t)]&= \mathbb{E }\left[ |N_s-N_t| \right] = \mathbb{E }\left[ \left| \sum _{i=1}^{d(X,Y)} U_s^i-\sum _{i=1}^{d(X,Y)} V_t^i\right| \right] \\&\le \mathbb{E }\left[ \sum _{i=1}^{d(X,Y)} \left| U_s^i-V_t^i\right| \right] = \mathbb{E }\left[ \sum _{i=1}^{d(X,Y)} \mathbb{E }\left[ \left| U_s^i-V_t^i\right| \right] \right] \\&= (t-s) \mathbb{E }[d(X,Y)]. \end{aligned}$$

This completes the proof of (2.5) and Proposition 2.2. \(\square \)

2.3 Tensoring property

In this section we prove that the path \((\nu _t^{x,y})_{t\in [0,1]}\) constructed in Sect. 2.1 does tensorise. This will be crucial in deriving the displacement convexity of the entropy on product spaces. Moreover we shall prove that, in order to have this tensoring property, the law of the random variable \(N_t\) introduced in the construction of the path \((\nu _t^{x,y})_{t\in [0,1]}\), must be, modulo a change of time, a binomial (see Proposition 2.5 below). The tensoring property of the path \((\nu _t^{x,y})_{t\in [0,1]}\) is the following.

Lemma 2.3

Let \(G_1=(V_1,E_1)\), \(G_2=(V_2,E_2)\) be two graphs and let \(G=G_1\, \Box \, G_2\) be their Cartesian product. Then, for any \(x=(x_1,x_2)\), \(y=(y_1,y_2)\) and \(z=(z_1,z_2)\) in \(V_1 \times V_2\),

$$\begin{aligned} \nu ^{x,y}_t(z)=\nu ^{x_1,y_1}_t(z_1)\nu ^{x_2,y_2}_t(z_2). \end{aligned}$$

Proof

Fix \(x=(x_1,x_2)\), \(y=(y_1,y_2)\) and \(z=(z_1,z_2)\) in \(V_1 \times V_2\). Then, we observe that, given two geodesics, one from \(x_1\) to \(y_1\), and one from \(x_2\) to \(y_2\), one can construct exactly \(C_{d(x,y)}^{d(x_1,y_1)}\) different geodesics from \(x\) to \(y\) (by choosing the \(d(x_1,y_1)\) positions where to change the first coordinate, according to the geodesic joining \(x_1\) to \(y_1\), and thus changing the second coordinate in the remaining \(d(x_2,y_2)=d(x,y)-d(x_1,y_1)\) positions, according to the geodesic joining \(x_2\) to \(y_2\)). This construction exhausts all the geodesics from \(x\) to \(y\). Hence,

$$\begin{aligned} |\Gamma (x,y)| = C_{d(x,y)}^{d(x_1,y_1)} |\Gamma (x_1,y_1)| \times |\Gamma (x_2,y_2)|. \end{aligned}$$
(2.6)

Observe also that \(z\) belongs to some geodesic from \(x\) to \(y\) if and only if \(z_1\) and \(z_2\) belong respectively to some geodesic from \(x_1\) to \(y_1\), and from \(x_2\) to \(y_2\). Therefore, by (2.1), it follows that

$$\begin{aligned} |\Gamma (x,z,y)| = C_{d(x,z)}^{d(x_1,z_1)}C_{d(z,y)}^{d(z_1,y_1)}|\Gamma (x_1,z_1,y_1)| \times |\Gamma (x_2,z_2,y_2)|. \end{aligned}$$

So, it holds that

$$\begin{aligned} \nu _t^{x,y}(z)&= C_{d(x,y)}^{d(x,z)} t^{d(x,z)} (1-t)^{d(y,z)}\; \frac{|\Gamma (x,z,y)| }{|\Gamma (x,y)|} \\&= \frac{C_{d(x,y)}^{d(x,z)} C_{d(x,z)}^{d(x_1,z_1)} C_{d(y,z)}^{d(y_1,z_1)} }{C_{d(x,y)}^{d(x_1,y_1)}} t^{d(x_1,z_1)} (1-t)^{d(y_1,z_1)} \frac{|\Gamma (x_1,z_1,y_1)| }{|\Gamma (x_1,y_1)|} t^{d(x_2,z_2)}\\&\quad \times (1-t)^{d(y_2,z_2)} \frac{|\Gamma (x_2,z_2,y_2)|}{|\Gamma (x_2,y_2)|} \\&= \nu ^{x_1,y_1}_t(z_1)\nu ^{x_2,y_2}_t(z_2), \end{aligned}$$

where we used that \(d(x,z)=d(x_1,z_1)+d(x_2,z_2)\), and similarly for \(d(y,z)\), and the fact (that the reader can easily verify) that

$$\begin{aligned} \frac{C_{d(x,y)}^{d(x,z)} C_{d(x,z)}^{d(x_1,z_1)} C_{d(y,z)}^{d(y_1,z_1)} }{C_{d(x,y)}^{d(x_1,y_1)}} = C_{d(x_1,y_1)}^{d(x_1,z_1)} C_{d(x_2,y_2)}^{d(x_2,z_2)}. \end{aligned}$$

\(\square \)

Remark 2.4

(The hypercube \(\Omega _n\)) In the case of the hypercube \(\Omega _n\), using the tensoring property, one can recover that \(\nu _t^{x,y}(z)=t^{d(x,z)}(1-t)^{d(y,z)}\) as soon as \(z\) belongs to a geodesic from \(x\) to \(y\), and \(0\) otherwise. Indeed, observe that Eq. (2.4) can be rewritten for the two-point space as follows, for all coordinates:

$$\begin{aligned} \nu _t^{x_i,y_i}(z_i)= {1\!\!1}_{\{x_i,y_i\}}(z_i) t^{d(x_i,z_i)}(1-t)^{d(y_i,z_i)}. \end{aligned}$$

Hence, by Lemma 2.3,

$$\begin{aligned} \nu _t^{x,y}(z) = \prod _{i=1}^n \nu _t^{x_i,y_i}(z_i) = t^{d(x,z)}(1-t)^{d(y,z)}, \end{aligned}$$

as soon as \(z\) belongs to a geodesic from \(x\) to \(y\), and \(0\) otherwise, which proves the claim.

Proposition 2.5

In the construction of \(\nu _t^{x,y}\), \(t \in [0,1]\), use a general random variable \(N_t^{d(x,y)} \in \{0,1,\dots ,d(x,y)\}\), of parameter \(d(x,y)\) and \(t\), that satisfies a.s., \(N_0^{d(x,y)} = 0\) and \(N_1^{d(x,y)} = d(x,y)\) (instead of the Binomial, observe that this condition is here to ensure that \(\nu _0^{x,y}=\delta _x\) and \(\nu _1^{x,y}=\delta _y\), namely that \(\nu _t^{x,y}\) is still an interpolation between the two Dirac measures), so that

$$\begin{aligned} \nu _t^{x,y}(z)= \mathbb{P }\left( N_t^{d(x,y)} = d(x,z) \right) \frac{| \Gamma (x,z,y)|}{| \Gamma (x,y)|}. \end{aligned}$$

Let \(G_1=(V_1,E_1)\), \(G_2=(V_2,E_2)\) be two graphs and let \(G=G_1\, \Box \, G_2\) be their Cartesian product. Assume that for any \(x=(x_1,x_2)\), \(y=(y_1,y_2)\) and \(z=(z_1,z_2)\) in \(V_1 \times V_2\),

$$\begin{aligned} \nu ^{x,y}_t(z)=\nu ^{x_1,y_1}_t(z_1)\nu ^{x_2,y_2}_t(z_2) \qquad \forall t \in [0,1]. \end{aligned}$$

Then, there exists a function \(a :[0,1] \rightarrow [0,1]\) with \(a(0)=0\), \(a(1)=1\), such that \(N_t^{d(x,y)} \sim \mathcal B (a(t),d(x,y))\).

Proof

Following the proof of Lemma 2.3 we have,

$$\begin{aligned} \nu _t^{x,y}(z)&= \mathbb{P }\left( N_t^{d(x,y)} = d(x,z) \right) \frac{|\Gamma (x,z,y) |}{| \Gamma (x,y)|} \\&= \frac{C_{d(x,z)}^{d(x_1,z_1)} C_{d(y,z)}^{d(y_1,z_1)} }{C_{d(x,y)}^{d(x_1,y_1)}} \mathbb{P }\left( N_t^{d(x,y)} = d(x,z) \right) \; \frac{| \Gamma (x_1,z_1,y_1) |}{| \Gamma (x_1,y_1)|} \; \frac{| \Gamma (x_2,z_2,y_2) |}{| \Gamma (x_2,y_2)|} \,. \end{aligned}$$

On the other hand,

$$\begin{aligned} \nu ^{x_1,y_1}_t(z_1) = \mathbb{P }\left( N_t^{d(x_1,y_1)} = d(x_1,z_1) \right) \frac{| \Gamma (x_1,z_1,y_1) |}{| \Gamma (x_1,y_1)|} \end{aligned}$$

and

$$\begin{aligned} \nu ^{x_2,y_2}_t(z_2) = \mathbb{P }\left( N_t^{d(x_2,y_2)} = d(x_2,z_2) \right) \frac{| \Gamma (x_2,z_2,y_2)| }{| \Gamma (x_2,y_2)|}. \end{aligned}$$

Hence, the identity \(\nu ^{x,y}_t(z)=\nu ^{x_1,y_1}_t(z_1)\nu ^{x_2,y_2}_t(z_2)\) ensures that

$$\begin{aligned}&\frac{C_{d(x,z)}^{d(x_1,z_1)} C_{d(y,z)}^{d(y_1,z_1)} }{C_{d(x,y)}^{d(x_1,y_1)}} \mathbb{P }\left( N_t^{d(x,y)} = d(x,z) \right) = \mathbb{P }\left( N_t^{d(x_1,y_1)} = d(x_1,z_1) \right) \\&\qquad \qquad \quad \qquad \qquad \quad \times \, \mathbb{P }\left( N_t^{d(x_2,y_2)} = d(x_2,z_2) \right) \end{aligned}$$

for any \(z_1\in [\![x_1,y_1]\!]\), \(z_2\in [\![x_2,y_2]\!]\).

Now, observe that

$$\begin{aligned} \frac{C_{d(x,z)}^{d(x_1,z_1)} C_{d(y,z)}^{d(y_1,z_1)} }{C_{d(x,y)}^{d(x_1,y_1)}} = \frac{ C_{d(x_1,y_1)}^{d(x_1,z_1)} C_{d(x_2,y_2)}^{d(x_2,z_2)}}{C_{d(x,y)}^{d(x,z)}}. \end{aligned}$$

Hence, the latter can be rewritten as

$$\begin{aligned} \frac{\mathbb{P }\left( N_t^{d(x,y)} = d(x,z) \right) }{C_{d(x,y)}^{d(x,z)}} = \frac{\mathbb{P }\left( N_t^{d(x_1,y_1)} = d(x_1,z_1) \right) }{C_{d(x_1,y_1)}^{d(x_1,z_1)}} \times \frac{ \mathbb{P }\left( N_t^{d(x_2,y_2)} = d(x_2,z_2) \right) }{C_{d(x_2,y_2)}^{d(x_2,z_2)}}. \end{aligned}$$

Set, for simplicity, for any \(n,k\) with \(0 \le k \le n\) and any \(t \in [0,1]\)

$$\begin{aligned} p_{n,k}(t) := \frac{\mathbb{P }\left( N_t^{n} = k \right) }{C_n^k}. \end{aligned}$$

When there is no confusion, we will skip the dependence in \(t\), using the simpler notation \(p_{n,k}\). We end up with the following induction formula

$$\begin{aligned} p_{n,k} = p_{n_1,k_1} \cdot p_{n-n_1,k-k_1} \end{aligned}$$
(2.7)

for any integers \(k_1,n_1, k,n\) satisfying the following conditions

$$\begin{aligned} k,n_1 \le n, \qquad k_1 \le \min (k,n_1), \quad \hbox {and} \quad n_1-k_1 \le n-k. \end{aligned}$$

(We set, \(n=d(x,y)\), \(n_1=d(x_1,y_1)\), \(k=d(x,z)\) and \(k_1=d(x_1,z_1)\)).

Observe that, since by assumption \(N_t^{d(x,y)} \in \{0,1,\dots ,d(x,y)\}\), necessarily, when \(x=y\), \(N_t^0 = 0\) (deterministically) for any \(t\). Hence \(p_{0,0}=1\).

The special choice \(n_1=1\), \(k_1=0\) in (2.7) leads to

$$\begin{aligned} p_{n,k} = p_{1,0} \cdot p_{n-1,k}. \end{aligned}$$
(2.8)

Set \(b=b(t)=p_{1,0}(t)\) (that might be \(0\)). From (2.8) we deduce that

$$\begin{aligned} p_{n,k} = b^{n-k} p_{k,k}. \end{aligned}$$

Finally, the special choice \(n=k\), \(n_1=k_1=k-1\), in (2.7), ensures that

$$\begin{aligned} p_{k,k} = p_{k-1,k-1} \cdot p_{1,1}. \end{aligned}$$

Since \(p_{1,0}+p_{1,1} = 1\), the latter reads as

$$\begin{aligned} p_{k,k} = p_{1,1}^k = (1-b)^k. \end{aligned}$$

It follows that

$$\begin{aligned} p_{n,k} = b^{n-k} (1-b)^k \qquad \forall n, \; \forall k \le n. \end{aligned}$$

Now set \(a(t)=1-b(t)\) to end up with

$$\begin{aligned} \mathbb{P }\left( N_t^{n} = k \right) = C_n^k a^k (1-a)^{n-k} \,, \end{aligned}$$

which guarantees that \(N_t^{d(x,y)}\) is indeed a binomial variable of parameter \(a(t)\) and \(d(x,y)\).

To end the proof, it suffices to observe that \(N_0^{d(x,y)}=0\) implies \(a(0)=0\), and that \(N_1^{d(x,y)}=d(x,y)\) implies \(a(1)=1\). \(\square \)

3 Weak transport cost

In this section we recall a notion of a discrete Wasserstein-type distance, called weak transport cost—introduced and studied in [33, 52], developed further in [17]—and collect some useful facts from [17]. Also, we introduce the notion of a Knothe-Rosenblatt coupling which will play a crucial role in the displacement convexity of the entropy property on product spaces.

3.1 Definition and first properties

For the notion of a weak transport cost, first recall the definition of \(P(\nu _0,\nu _1)\) introduced in Sect. 1.1.

Definition 3.1

Let \(\nu _0, \nu _1 \in \mathcal P (V)\). Then, the weak transport cost \(\widetilde{\mathcal{T }}_{2}(\nu _1 |\nu _0 )\) between \(\nu _0\) and \(\nu _1\) is defined as

$$\begin{aligned} \widetilde{\mathcal{T }}_{2}(\nu _1 |\nu _0 ):=\inf _{p \in P(\nu _0,\nu _1)} \sum _{x \in V} \left( \sum _{y \in V} d(x,y) p(x,y) \right) ^2 \nu _0(x). \end{aligned}$$

It can be shown that

$$\begin{aligned} (\nu _{0},\nu _{1})\mapsto \sqrt{\widetilde{\mathcal{T }}_2(\nu _{1}|\nu _{0})} +\sqrt{\widetilde{\mathcal{T }}_2(\nu _{0}|\nu _{1})} \end{aligned}$$

is a distance on \(\mathcal P (V)\), see [17].

Recall the definition of \(I_2(\pi ), \bar{I}_2(\pi )\) and \(J_2(\pi )\) from (1.4) and (1.5) in the introduction, and observe that

$$\begin{aligned} \widetilde{\mathcal{T }}_{2}(\nu _{1}|\nu _{0})=\inf _{\pi \in \Pi (\nu _{0},\nu _{1})}I_{2}(\pi ). \end{aligned}$$

Also, define

$$\begin{aligned} \hat{\mathcal{T }}_{2}(\nu _0,\nu _1 ) :=\inf _{\pi \in \Pi (\nu _{0},\nu _{1})}J_{2}(\pi ), \end{aligned}$$

and observe that \(\hat{\mathcal{T }}_{2}(\nu _0,\nu _1 )=W_1^2(\nu _{0},\nu _{1})\) where \(W_1\) is the usual \(L_1\)-Wasserstein distance associated to the distance \(d\).

When \(d\) is the Hamming distance \(d(x,y)={1\!\!1}_{x \ne y}\), \(x,y \in V\), the weak transport cost and the \(L_1\)-Wasserstein distance take an explicit form. This is stated in the next lemma. We give the proof for completeness.

Lemma 3.2

([17]) Let \(\nu _0, \nu _1,\mu \in \mathcal P (V)\) and assume that \(\mu \) charges all the points. Denote by \(f_0\) and \(f_1\) the relative densities of \(\nu _0\) and \(\nu _1\) with respect to \(\mu \). Assume that \(d(x,y)={1\!\!1}_{x \ne y}\), \(x,y \in V\). Then it holds

$$\begin{aligned} \widetilde{\mathcal{T }}_2(\nu _1 |\nu _0 ) = \int \limits _{\{f_0>0\}} \left[ 1-\frac{f_1}{f_0} \right] _+^2 f_0\,d\mu \end{aligned}$$

where \([X]_+=\max (X,0)\), and

$$\begin{aligned} \sqrt{\hat{\mathcal{T }}_{2}(\nu _0,\nu _1 )}=\int \left[ f_0-f_1 \right] _+ \,d\mu =\frac{1}{2} \int \left| f_0-f_1 \right| \,d\mu = \frac{1}{2} \Vert \nu _0-\nu _1\Vert _{TV} \end{aligned}$$

with \(\Vert \cdot \Vert _{TV}\), the total variation norm.

Remark 3.3

Observe that \(\widetilde{\mathcal{T }}_2(\nu _1 |\nu _0 )\) does not depend on \(\mu \).

Proof

For any \(\pi \in \Pi (\nu _{0},\nu _{1})\) and any \(x\in V\) with \(\nu _0(x)>0\), one has

$$\begin{aligned} 1- \sum _{y \in V} d(x,y)p(x,y)=\frac{\pi (x,x)}{\nu _0(x)}\le \frac{\min (\nu _0(x),\nu _1(x))}{\nu _0(x)}=\min \left( \frac{f_1(x)}{f_0(x)},1\right) . \end{aligned}$$

and therefore

$$\begin{aligned} \left[ 1-\frac{f_1(x)}{f_0(x)} \right] _+\le \sum _{y \in V} d(x,y)p(x,y). \end{aligned}$$

By integrating with respect to the measure \(\nu _0\) and then optimizing over all \(\pi \in \Pi (\nu _{0},\nu _{1})\), it follows that

$$\begin{aligned} \int \left[ f_0-f_1 \right] _+ \,d\mu \le \sqrt{\hat{\mathcal{T }}_{2}(\nu _0,\nu _1 )}, \end{aligned}$$

and

$$\begin{aligned} \int \limits _{\{f_0>0\}} \left[ 1-\frac{f_1}{f_0} \right] _+^2 f_0\,d\mu \le \widetilde{\mathcal{T }}_2(\nu _1 |\nu _0 ). \end{aligned}$$

The equality is reached choosing \(\pi ^* \in \Pi (\nu _{0},\nu _{1})\) defined by

$$\begin{aligned} \pi ^*(x,y) = \nu _0(x)p^*(x,y)&= {1\!\!1}_{x= y}\min (\nu _0(x),\nu _1(x))\nonumber \\&+\,{1\!\!1}_{x\ne y}\frac{[\nu _0(x)-\nu _1(x)]_+[\nu _1(y)-\nu _0(y)]_+}{ \sum _{z \in V} [\nu _1(z)-\nu _0(z)]_+},\qquad \end{aligned}$$
(3.1)

since \(\sum _{y \in V} d(x,y)p^*(x,y)=\left[ 1-\frac{f_1(x)}{f_0(x)} \right] _+.\) \(\square \)

3.2 The Knothe-Rosenblatt coupling

In this subsection, we recall a general method, due to Knothe-Rosenblatt [24, 47], enabling to construct couplings between probability measures on product spaces.

Consider two graphs \(G_1=(V_1,E_1)\) and \(G_2=(V_2,E_2)\) and two probability measures \(\nu _0,\nu _1 \in \mathcal P (V_1 \times V_2)\). The disintegration formulas of \(\nu _0, \nu _1\) (recall (1.11)) read

$$\begin{aligned} \nu _0(x_1,x_2)=\nu _0^1(x_1)\nu _0^2(x_2|x_1) \qquad \hbox {and} \qquad \nu _1(y_1,y_2)=\nu _1^1(y_1)\nu _1^2(y_2|y_1). \end{aligned}$$
(3.2)

Let \(\pi ^1 \in \mathcal P ( V_1^2)\) be a coupling of \(\nu _0^1\), \(\nu _1^1\), and for all \((x_{1},y_{1}) \in V_{1}^2\) let \(\pi ^2(\,\cdot \,|x_1,y_1) \in \mathcal P (V_2^2)\) be a coupling of \(\nu _0^2(\,\cdot \, | x_1)\) and \(\nu _1^2(\,\cdot \, | y_1)\), \(x_1, y_1 \in V_1\). We are now in a position to define the Knothe-Rosenblatt coupling.

Definition 3.4

(Knothe-Rosenblatt coupling) Let \(\nu _0,\nu _1 \in \mathcal P (V_1 \times V_2)\), and consider a family of couplings \(\pi ^1, \{\pi ^2(\,\cdot \,|x_{1},y_{1})\}_{x_{1},y_{1}}\) as above; the coupling \(\hat{\pi }\in \mathcal P ([V_1 \times V_2]^2)\), defined by

$$\begin{aligned} \hat{\pi }((x_1,x_2),(y_1,y_2)) := \pi ^1(x_1,y_1) \pi ^2(x_2,y_2|x_1,y_1), \quad (x_1,x_2),(y_1,y_2) \in V_1 \times V_2 \end{aligned}$$

is called the Knothe-Rosenblatt coupling of \(\nu _0, \nu _1\) associated with the family of couplings

$$\begin{aligned} \left\{ \pi ^1, \{\pi ^2(\, \cdot \, |x_{1}, y_{1}) \}_{x_{1}, y_{1}} \right\} . \end{aligned}$$

It is easy to check that the Knothe-Rosenblatt coupling is indeed a coupling of \(\nu _0, \nu _1\). Note that it is usually required that the couplings \(\pi ^1,\{\pi ^2(\,\cdot \,|x_{1},y_{1})\}_{x_{1},y_{1}}\) are optimal for some weak transport cost, but we will not make this assumption in what follows.

The preceding construction can easily be generalized to products of \(n\) graphs. Consider \(n\) graphs \(G_1=(V_1,E_1), \dots , G_n=(V_n,E_n)\), and two probability measures \(\nu _0, \nu _1 \in \mathcal P (V_1 \times \cdots \times V_n)\) admitting the following disintegration formulas: for all \(x=(x_1,\dots ,x_n), y=(y_1,\dots ,y_n) \in V_1 \times \dots \times V_n\),

$$\begin{aligned} \nu _0(x)&= \nu _{0}^1(x_1) \nu _{0}^2(x_{2}|x_1)\nu _{0}^{3}(x_{3}|x_{1},x_{2}) \cdots \nu _{0}^{n}(x_n|x_1,\ldots ,x_{n-1}),\\ \nu _1(y)&= \nu _{1}^1(y_1) \nu _{1}^{2}(y_{2}|y_1)\nu _{1}^{3}(y_{3}|y_{1},y_{2}) \cdots \nu _{1}^{n}(y_n|y_1,\ldots ,y_{n-1}). \end{aligned}$$

For all \(i=1, \ldots , n\), let \(\pi ^i(\,\cdot \, | x_{1},\ldots ,x_{i-1},y_{1},\ldots ,y_{i-1}) \in \mathcal P (V_i^2)\) be a coupling of \(\nu _{0}^{i}(\,\cdot \,|x_{1},\dots ,x_{i-1})\) and \(\nu _{1}^{i}(\,\cdot \,|y_{1},\dots ,y_{i-1})\). The Knothe-Rosenblatt coupling \(\hat{\pi }\in \mathcal P ([V_1 \times \dots \times V_n]^2)\) between \(\nu _{0}\) and \(\nu _{1}\) is then defined by

$$\begin{aligned} \hat{\pi }(x,y) = \pi ^1(x_1,y_1) \pi ^{2}(x_{2},y_{2}|x_1,y_1) \cdots \pi ^{n}(x_{n},y_{n}|x_1,\dots ,x_{n-1},y_1,\dots ,y_{n-1}), \end{aligned}$$

for all \(x=(x_{1},x_{2},\ldots ,x_{n})\) and \(y=(y_{1},y_{2},\ldots ,y_{n}).\)

3.3 Tensorization

Another useful property of the weak transport cost defined above is that it tensorises in the following sense. For \(1\le i\le n\), let \(G_i=(V_i,E_i)\) be a graph with the associated distance \(d_i\). Recall the definition of \(I_2^{(n)}, \bar{I}_2^{(n)}\) and \(J_2^{(n)}\) given in (1.6), (1.7) and (1.8). Then, given two probability measures \(\nu _0, \nu _1\) in \(\mathcal P (V_1 \times \cdots \times V_{n})\), define

$$\begin{aligned} \widetilde{\mathcal{T }}_{2}^{(n)}(\nu _1 |\nu _0 ):= \inf _{\pi \in \Pi (\nu _0,\nu _1)}I_2^{(n)}(\pi ) \end{aligned}$$

and

$$\begin{aligned} \hat{\mathcal{T }}_{2}^{(n)}(\nu _0,\nu _1 ):= \inf _{\pi \in \Pi (\nu _0,\nu _1)} J_2^{(n)}(\pi ). \end{aligned}$$

Using the notation of Sect. 3.2 above, we can state the result.

Proposition 3.5

Let \(\nu _0, \nu _1\) in \(\mathcal P (V_1 \times \cdots \times V_n)\); and consider a family of couplings \(\pi ^1 \!\in \! \Pi (\nu _{0}^1,\nu _{1}^1)\) and \(\pi ^i(\,\cdot \,| x_{1},\ldots ,x_{i-1}) \!\in \! \Pi (\nu _{0}^i(\,\cdot \,| x_{1},\ldots ,x_{i-1}), \nu _{1}^i(\,\cdot \,| y_{1},\ldots ,y_{i-1}))\), for all \(i\in \{2,\ldots ,n\}\), with \((x_{2},\ldots ,x_{n}),(y_{2},\ldots ,y_{n}) \in V_{2}\times \cdots \times V_{n}\), as above. Then,

$$\begin{aligned} I_2^{(n)}(\hat{\pi }) \le I_2(\pi ^1 ) + \sum _{i=2}^{n}\sum _{x,y \in V_1\times \cdots \times V_{n}} {\hat{\pi }}(x,y) I_2(\pi ^i(\,\cdot \,|x_{1},\ldots ,x_{i-1},y_{1}\ldots y_{i-1})). \end{aligned}$$

where \(\hat{\pi }\) is the Knothe-Rosenblatt coupling of \(\nu _0\) and \(\nu _1\) associated with the family of couplings above. The same holds for \(\bar{I}_{2}^{(n)}\) and \( J_2^{(n)}(\pi )\).

In particular, if the couplings \(\pi ^1\) and \(\pi ^i(\,\cdot \,| x_{1},\ldots ,x_{i-1},y_{1}\ldots y_{i-1})\) are assumed to achieve the infimum in the definition of the weak transport costs between \(\nu _{0}^1\) and \(\nu _{1}^1\) and between \(\nu _{0}^i(\,\cdot \,| x_{1},\ldots ,x_{i-1})\) and \(\nu _{1}^i(\,\cdot \,| y_{1},\ldots ,y_{i-1})\), for all \(i\in \{2,\ldots ,n\}\), respectively, we immediately get the following tensorization for \(\widetilde{\mathcal{T }}_{2}\):

$$\begin{aligned} \widetilde{\mathcal{T }}_{2}^{(n)}(\nu _{1}|\nu _{0}) \le \widetilde{\mathcal{T }}_{2}(\nu _{1}^1|\nu _{0}^1) + \sum _{i=2}^{n}\sum _{\genfrac{}{}{0.0pt}{}{x,y \in }{V_1\times \cdots \times V_{n}}} {\hat{\pi }}(x,y) \widetilde{\mathcal{T }}_{2}(\nu _{1}^i(\cdot | x_{1},\ldots ,x_{i-1}) | \nu _{0}^i(\cdot | y_{1},\ldots ,y_{i-1})).\nonumber \\ \end{aligned}$$
(3.3)

In an obvious way, the same kind of conclusion holds replacing \(\widetilde{\mathcal{T }}_{2}\) by \(\hat{\mathcal{T }}_{2}\).

Proof

In this proof, we will use the following shorthand notation: if \(x\in V\) and if \(1\le k\le n\), we will denote by \(x_{1 : k}\) the subvector \((x_{1},x_{2},\ldots ,x_{k})\in V_1\times \cdots \times V_k.\)

Define the kernels \(\hat{p}(\,\cdot \,,\,\cdot \,)\), \(p^1(\,\cdot \,,\,\cdot \,)\) and \(p^{k}(\,\cdot \,,\,\cdot \, | x_{1 : k-1}, y_{1:k-1})\) by the formulas

$$\begin{aligned} \hat{\pi }(x,y)&= \hat{p}(x,y)\nu _{0}(x)\\ \pi ^1(x_{1},y_{1})&= p^1(x_{1},y_{1})\nu _{0}^1(x_{1}),\\ \pi ^k(x_{k},y_{k} | x_{1:k-1}, y_{1:k-1})&= p^k(x_{k},y_{k} | x_{1:k-1},y_{1:k-1})\nu _{0}^k(x_{k}|x_{1:k-1}),\quad \forall 1<k\le n. \end{aligned}$$

By the definition of the Knothe-Rosenblatt coupling \(\hat{\pi }\), it holds

$$\begin{aligned} \hat{p}(x,y) = \prod _{k=2}^{n} p^k(x_{k},y_{k} | x_{1:k-1},y_{1:k-1})\times p^1(x_{1},y_{1}). \end{aligned}$$

As a result, for all \(i\in \{2,\ldots ,n\}\),

$$\begin{aligned} \left( \sum _{y } d_i(x_i,y_i) \hat{p}(x,y) \right) ^2&= \left( \sum _{y_{1:i}} d_i(x_i,y_i) \prod _{k=2}^{i} p^k(x_{k},y_{k} | x_{1:k-1},y_{1:k-1})p^1(x_{1},y_{1})\right) ^2\\&\le \sum _{y_{1:i-1}} \prod _{k=2}^{i-1} p^k(x_{k},y_{k} | x_{1:k-1}, y_{1:k-1})p^1(x_{1},y_{1})\\&\quad \times \left( \sum _{y_{i}} d_i(x_i,y_i) p^i(x_{i},y_{i} | x_{1:i-1},y_{1:i-1})\right) ^2 \end{aligned}$$

where the inequality comes from Jensen’s inequality. Therefore,

$$\begin{aligned}&\sum _{x} \left( \sum _{y} d_i(x_i,y_i) \hat{p}(x,y) \right) ^2\nu _{0}(x) \\&\le \sum _{x_{1:i-1}}\sum _{y_{1:i-1}} \prod _{k=2}^{i-1} \pi ^k(x_{k},y_{k} | x_{1:k-1},y_{1:k-1})\pi ^1(x_{1},y_{1}) \sum _{x_{i}}\nu _{0}^{i}(x_{i}|x_{1:i-1})\\&\quad \times \left( \sum _{y_{i}} d_i(x_i,y_i) p^i(x_{i},y_{i} |x_{1:i-1},y_{1:i-1})\right) ^2\\&= \sum _{x_{1:i=1}}\sum _{y_{1:i-1}} \prod _{k=2}^{i-1} \pi ^k(x_{k},y_{k} | x_{1:k-1},y_{1:k-1})\pi ^1(x_{1},y_{1}) I_{2}(\pi ^{i}(\,\cdot \,|x_{1:i-1},y_{1:i-1})) \\&= \sum _{x,y} \hat{\pi }(x,y)I_{2}(\pi ^{i}(\,\cdot \,|x_{1:i-1},y_{1:i-1})). \end{aligned}$$

Moreover

$$\begin{aligned} \sum _{x} \left( \sum _{y} d_1(x_1,y_1) \hat{p}(x,y) \right) ^2\nu _{0}(x)= \sum _{x,y} \hat{\pi }(x,y)I_{2}(\pi ^{1}). \end{aligned}$$

Summing all these inequalities gives the announced tensorization formula.

The proof for \(\bar{I}_{2}^{(n)}\) and \(J_2^{(n)}\) is identical and left to the reader. \(\square \)

4 Displacement convexity property of the entropy

Using the weak transport cost defined in the previous section, we can now derive a displacement convexity property of the entropy on graphs. More precisely, we will derive such a property for the complete graph. Then we will prove that our definition of \(\nu _t^\pi \) allows the displacement convexity to tensorise. As a consequence, we will be able to derive such a property on the \(n\)-dimensional hypercube.

4.1 The complete graph

Consider the complete graph \(K_n\), or equivalently any graph \(G\) equipped with the Hamming distance \(d(x,y)={1\!\!1}_{x \ne y}\) (in the definition of the weak transport cost). Recall the definition of \(\nu _t^\pi \) given in (2.3), and that we proved, in Sect. 2.1.1, that \(\nu _t^\pi =(1-t)\nu _0+t\nu _1\) for any choice of coupling \(\pi \). Then, the following holds.

Proposition 4.1

(Displacement convexity on the complete graph) Let \(\nu _0\),\(\nu _1\), \(\mu \in \mathcal P (K_n)\) be three probability measures. Assume that \(\nu _0,\nu _1\) are absolutely continuous with respect to \(\mu \). Then for any \(t \in [0,1]\),

$$\begin{aligned} H(\nu _t|\mu )\le (1-t)H(\nu _0|\mu )+tH(\nu _1|\mu )-\frac{t(1-t)}{2}\left( \widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1)\right) \!,\nonumber \\ \end{aligned}$$
(4.1)

and

$$\begin{aligned} H(\nu _t|\mu )\le (1-t)H(\nu _0|\mu )+tH(\nu _1|\mu )-\frac{t(1-t)}{2} \Vert \nu _0 - \nu _1\Vert _{TV}^2, \end{aligned}$$
(4.2)

where \(\nu _t=(1-t)\nu _0+t\nu _1\).

Proof

Our aim is simply to bound from below the second order derivative of \(t \mapsto F(t):=H(\nu _t|\mu )\). Denote by \(f_0\) and \(f_1\) the respective densities of \(\nu _0\) and \(\nu _1\) with respect to \(\mu \) and for simplicity set \(f_t:=(1-t)f_0+tf_1\). We have \(F(t)= \int f_t \log f_t d\mu \). Thus \(F'(t)=\int _{f_t>0} \log f_t\,d(\nu _0-\nu _1)\). In turn

$$\begin{aligned} F''(t)&= \int \limits _{\{f_t >0\}} \frac{(f_0-f_1)^2}{f_t}\,d\mu = \int \limits _{\{f_t >0\}} \frac{[f_0-f_1]_+^2}{f_t}\,d\mu + \int \limits _{\{f_t >0\}} \frac{[f_1-f_0]_+^2}{f_t}\,d\mu \\&\ge \int \limits _{\{f_0 >0\}} \frac{[f_0-f_1]_+^2}{f_0}\,d\mu + \int \limits _{\{f_1 >0\}} \frac{[f_1-f_0]_+^2}{f_1}\,d\mu \\&= \int \limits _{\{f_0 >0\}} \left[ 1-\frac{f_1}{f_0}\right] _+^2 f_0\,d\mu \!+\! \int \limits _{\{f_1 >0\}} \left[ 1-\frac{f_0}{f_1}\right] _+^2f_1\,d\mu \!=\! \widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1), \end{aligned}$$

where, in the last line, we used Lemma 3.2. As a consequence, the function \( G :t\mapsto F(t)-\frac{t^2}{2}\left( \widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1)\right) \) is convex on \([0,1],\) so that \(G(t)\le (1-t)G(0)+tG(1)\) which gives precisely, after some algebra, the first desired inequality.

For the second inequality, applying Cauchy-Schwarz yields

$$\begin{aligned} F''(t)&= \int \limits _{\{f_t >0\}} \left( \frac{|f_0-f_1|}{\sqrt{f_t}}\right) ^2\,d\mu \int \left( \sqrt{f_t}\right) ^2\,d\mu \ge \left( \int |f_0-f_1|\,d\mu \right) ^2 \\&= \Vert \nu _0-\nu _1\Vert _{TV}^2. \end{aligned}$$

Hence the map \(G:t\mapsto F(t)-\frac{t^2}{2}\Vert \nu _0-\nu _1\Vert _{TV}^2\) is convex on \([0,1]\) which leads to the desired inequality. \(\square \)

Remark 4.2

(Pinsker inequality) Inequality (4.2) is a reinforcement of the well known Csiszar-Kullback-Pinsker’s inequality (see e.g. [1, Theorem 8.2.7], [9, 25, 42]) which asserts that

$$\begin{aligned} \Vert \nu _0-\nu _1\Vert _{TV}^2 \le 2 H(\nu _1|\nu _0). \end{aligned}$$

Indeed, take \(\mu =\nu _0\) together with the fact that \(H(\nu _t|\mu ) \ge 0\), and then take the limit \(t \rightarrow 0\) in (4.2) to obtain the above inequality. Csiszar-Kullback-Pinsker’s inequality, and its generalizations, are known to have many applications in Probability theory, Analysis and Information theory, see [57, Page 636] for a review.

Remark 4.3

(Comparison) Now we compare the displacement convexity property of Proposition 4.1 with the Wasserstein-type distance and total variation norm. For the two-point space it is easy to check that the ratio

$$\begin{aligned} \frac{\widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1)}{\Vert \nu _0-\nu _1\Vert _{TV}^2} \end{aligned}$$

is not uniformly bounded above over all probability measures \(\nu _0\) and \(\nu _1\). On the other hand, we claim that

$$\begin{aligned} \frac{\widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1)}{\Vert \nu _0-\nu _1\Vert _{TV}^2} \ge \frac{1}{2}\,, \qquad \forall \nu _0, \nu _1 \end{aligned}$$
(4.3)

which implies that (4.1) is stronger than (4.2), up to a constant 2. We also provide an example below which shows that we cannot exactly recover (4.2) using (4.1).

Let us prove the claim, and more precisely that the following holds

$$\begin{aligned} \widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1) \ge \frac{\Vert \nu _0-\nu _1\Vert _{TV}^2}{1+\frac{\Vert \nu _0-\nu _1\Vert _{TV}}{2}} \ge \frac{1}{2}\Vert \nu _0-\nu _1\Vert _{TV}^2. \end{aligned}$$
(4.4)

This is a consequence of Cauchy-Schwarz inequality, namely, we have

$$\begin{aligned} \widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1) \ge \frac{\left( \int [f_1-f_0]_+ d\mu \right) ^2}{\nu _1(f_1\ge f_0)}+\frac{\left( \int [f_0-f_1]_+ d\mu \right) ^2}{\nu _0(f_0> f_1)}. \end{aligned}$$

Since \( \Vert \nu _0-\nu _1\Vert _{TV}= 2\int [f_1-f_0]_+ d\mu = 2(\nu _1(f_1\ge f_0)-\nu _0(f_1\ge f_0))\), we get

$$\begin{aligned} \widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1) \ge \inf _{u\in [0,1]}\frac{(1+\frac{\Vert \nu _0-\nu _1\Vert _{TV}}{2})\Vert \nu _0-\nu _1\Vert _{TV}^2}{4u(1+\frac{\Vert \nu _0-\nu _1\Vert _{TV}}{2}-u)} = \frac{\Vert \nu _0-\nu _1\Vert _{TV}^2}{1+\frac{\Vert \nu _0-\nu _1\Vert _{TV}}{2}}. \end{aligned}$$

We now give the non-trivial example that achieves equality in the first inequality of (4.4), thus confirming that (4.1) can not exactly recover (4.2): Let \(\nu _0\) and \(\nu _1\) be two probability measures on the two-point space \(\{0,1\}\) defined by \(\nu _1(1)=\nu _0(0)=3/4\) and \(\nu _1(0)=\nu _0(1)=1/4\). Then

$$\begin{aligned} \Vert \nu _0-\nu _1\Vert _{TV}=2(\nu _1(1)-\nu _0(1))=1, \end{aligned}$$

and

$$\begin{aligned} \widetilde{\mathcal{T }}_2(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_2(\nu _0|\nu _1)= \frac{(\nu _1(1)-\nu _0(1))^2}{\nu _1(1)}+\frac{(\nu _0(0)-\nu _1(0))^2}{\nu _0(0)}=2/3, \end{aligned}$$

which gives the (claimed) equality in (4.4).

4.2 Tensorization of the displacement convexity property

In this section we prove that if the displacement convexity property of the entropy holds on \(n\) graphs \(G_1=(V_1,E_1)\), ..., \(G_n=(V_n,E_n)\), equipped with probability measures \(\mu _1,\ldots ,\mu _n\) and graph distances \(d_1,\ldots ,d_n\) respectively, then the displacement convexity of the entropy holds on their Cartesian product equipped with \(\mu _1\otimes \cdots \otimes \mu _n\) with respect to the tensorised transport costs \(I_2^{(n)}\) and \(\bar{I}_2^{(n)}\), i.e. we prove Theorem 1.1 which is one of our main theorems. As an application we shall apply such a property to the specific example of the hypercube at the end of the section.

Proof of Theorem 1.1

In this proof, we use the notation and definitions introduced in Sects. 3.2 and 3.3. Fix \(\nu _0,\nu _1 \in \mathcal P (V)\) and write the following disintegration formulas

$$\begin{aligned} \nu _0(x)&= \nu _0^1(x_1)\prod _{i=2}^{n} \nu _0^i(x_i | x_{1 : i-1}),\qquad \forall x=(x_1,\ldots ,x_n)\in V\\ \nu _1(y)&= \nu _1^1(y_1)\prod _{i=2}^{n} \nu _1^i(y_i | y_{1 : i-1}),\qquad \forall y=(y_1,\ldots ,y_n)\in V, \end{aligned}$$

where we recall that \(x_{1:i-1}=(x_{1},\ldots ,x_{i-1})\in V_{1}\times \cdots \times V_{i-1}.\)

By assumption, for every \(x,y \in V\), there are couplings \(\pi ^1\in \mathcal P (V_1\times V_1)\) and \(\pi ^i(\,\cdot \, | x_{1:i-1},y_{1:i-1}) \in \mathcal P (V_i\times V_i)\) such that

$$\begin{aligned} \pi ^1\in \Pi (\nu _0^1,\nu _1^1)\quad \text {and}\quad \pi ^i(\,\cdot \, | x_{1:i-1},y_{1:i-1})\in \Pi (\nu _0^i(\,\cdot \, |x_{1:i-1}),\nu _1^i(\,\cdot \, |y_{1:i-1})), \end{aligned}$$

and for which the following inequalities hold

$$\begin{aligned} H(\nu _t^{1}|\mu ^1)&\le (1-t)H(\nu _0^1 | \mu ^1) + tH(\nu _1^1 | \mu ^1) - C_1t(1-t)R_2(\pi ^1),\\ H(\nu _t^{i, x_{1:i-1},y_{1:i-1}}|\mu ^i)&\le (1-t)H(\nu _0^i(\,\cdot \, |x_{1:i-1}) | \mu ^i) + tH(\nu _1^i(\,\cdot \, | y_{1:i-1}) | \mu ^i) \\&\quad - C_it(1-t)R_2(\pi ^i(\,\cdot \,|x_{1:i-1}, y_{1:i-1})), \end{aligned}$$

where \(R_2:=I_2+\bar{I}_2\), \(\nu _t^{1}:=\nu _t^{\pi _1}\), and \(\nu _t^{i, x_{1:i-1},y_{1:i-1}}=\nu _t^{\pi ^i(\,\cdot \, | x_{1:i-1},y_{1:i-1})}.\)

Now, consider the Knothe-Rosenblatt coupling \(\hat{\pi }\in \Pi (\nu _0,\nu _1)\) constructed from the couplings \(\pi ^1\) and \(\pi ^i(\,\cdot \,| x_{1:i-1}, y_{1:i-1}),\) \(x,y\in V\) and denote by \(\gamma _t\) the path \(\nu _t^{\hat{\pi }}\in \mathcal P (V)\) connecting \(\nu _0\) to \(\nu _1.\)

Let us consider the disintegration of \(\gamma _t\) with respect to its marginals:

$$\begin{aligned} \gamma _t(z)=\gamma _t^1(z_1)\gamma _t^{2}(z_{2}|z_1)\cdots \gamma _t^{n}(z_n|z_1,\ldots ,z_{n-1}). \end{aligned}$$

We claim that there exist non-negative coefficients \(\alpha _{t}^{i}(x_{1:i-1},y_{1:i-1},z_{1:i-1})\) such that

$$\begin{aligned} \sum _{x_{1:i-1},y_{1:i-1}}\alpha _{t}^i(x_{1:i-1},y_{1:i-1},z_{1:i-1})=1 \end{aligned}$$

and such that for all \(i\in \{2,\ldots ,n\}\) it holds

$$\begin{aligned} \gamma _{t}^i(\,\cdot \,|z_{1:i-1}) = \sum _{x_{1:i-1}, y_{1:i-1}} \nu _{t}^{i,x_{1:i-1}, y_{1:i-1}}(\,\cdot \,)\alpha _{t}^{i}(x_{1:i-1},y_{1:i-1},z_{1:i-1}). \end{aligned}$$

Indeed, by definition one has \(\gamma _t(z)=\sum _{x,y\in V} \nu _t^{x,y}(z)\hat{\pi }(x,y).\) So, using the fact that, according to Lemma 2.3, \(\nu _t^{x,y}(z)=\prod _{k=1}^n\nu _t^{x_k,y_k}(z_k)\), we see that

$$\begin{aligned} \sum _{u \in V : u_{1:i}=z_{1:i}} \gamma _t(u)\!&= \! \sum _{x,y\in V}\left( \sum _{u \in V : u_{1:i}=z_{1:i}}\nu _t^{x,y}(u)\right) \hat{\pi }(x,y)\\ \!&= \!\sum _{x,y\in V} \prod _{k=1}^i \nu _t^{x_k,y_k}(z_k)\hat{\pi }(x,y)\\ \!&= \!\sum _{x_{1:i},y_{1:i}} \prod _{k=1}^i \nu _t^{x_k,y_k}(z_k)\pi ^k(x_k,y_k |x_{1:k-1}, y_{1:k-1})\\ \!&= \!\sum _{x_{1:i-1},y_{1:i-1}}\!\! \nu _t^{i, x_{1:i-1}, y_{1:i\!-\!1}}(z_i)\!\prod _{k=1}^{i-1} \nu _t^{x_k,y_k}(z_k)\pi ^k(x_k,y_k|x_{1:k-1}, y_{1:k-1}). \end{aligned}$$

From this it follows that

$$\begin{aligned} \gamma _{t^i(z_i | z_{1:i-1})}&= \dfrac{\sum _{u \in V : u_{1:i}=z_{1:i}} \gamma _t(u)}{\sum _{u \in V : u_{1:i-1}=z_{1:i-1}}\gamma _t(u)} \\&= \dfrac{\sum _{x_{1:i-1},y_{1:i-1}} \nu _t^{i, x_{1:i-1}, y_{1:i-1}}(z_i) \prod _{k=1}^{i-1} \nu _t^{x_k,y_k}(z_k)\pi ^k(x_k,y_k |x_{1:k-1}, y_{1:k-1})}{\sum _{x_{1:i-1},y_{1:i-1}} \prod _{k=1}^{i-1} \nu _t^{x_k,y_k}(z_k) \pi ^k(x_k,y_k|x_{1:k-1}, y_{1:k-1})}\\&=: \sum _{x_{1:i-1},y_{1:i-1}} \nu _t^{i, x_{1:i-1}, y_{1:i-1}}(z_i) \alpha _{t}^i(x_{1:i-1}, y_{1:i-1}, z_{1:i-1}), \end{aligned}$$

using obvious notation, from which the claim follows. Similarly, for all \(z_1\in V_1\), it holds \(\gamma ^1_t(z_1)=\nu ^1_t(z_1).\) The following equality will be useful below:

$$\begin{aligned} \alpha _{t}^i(x_{1:i-1}, y_{1:i-1}, z_{1:i-1}) = \frac{\displaystyle \prod \nolimits _{k=1}^{i-1} \nu _t^{x_k,y_k}(z_k)\pi ^k(x_k,y_k |x_{1:k-1}, y_{1:k-1})}{\displaystyle \sum \nolimits _{u \in V : u_{1:i-1}=z_{1:i-1}}\gamma _t(u)}. \end{aligned}$$
(4.5)

Now, let us recall the well known disintegration formula for the relative entropy: if \(\gamma \in \mathcal P (V)\) is absolutely continuous with respect to \(\mu \), then it holds

$$\begin{aligned} H(\gamma |{\mu }) = H(\gamma ^{1}| \mu ^{1}) + \sum _{i=2}^{n} \sum _{z\in V} H(\gamma ^{i}(\,\cdot \,| z_{1:i-1}) | \mu ^{i}) \gamma (z). \end{aligned}$$
(4.6)

Applying (4.6) to \(\gamma _{t}\), and the (classical) convexity of the relative entropy, it holds

$$\begin{aligned} H(\gamma _{t}|\mu )&= H(\gamma _{t}^1 | \mu ^1) + \sum _{i=2}^{n} \sum _{z\in V} H(\gamma _{t}^{i}(\,\cdot \,| z_{1:i-1}) | \mu ^{i}) \gamma _{t}(z)\\&\le H(\nu _{t}^1 |\mu ^1) +\sum _{i=2}^{n} \sum _{z\in V} \sum _{\small \begin{array}{c} x_{1:i-1},\\ y_{1:i-1} \end{array}} \alpha _{t}^i(x_{1:i-1},y_{1:i-1},z_{1:i-1}) H(\nu _{t}^{i, x_{1:i-1},y_{1:i-1}} | \mu ^{i}) \gamma _{t}(z). \end{aligned}$$

Now we deal with each term in the sum separately. Fix \(i \in \{2,\dots ,n\}\). We have

$$\begin{aligned}&\sum _{z\in V} \sum _{\small \begin{array}{c} x_{1:i-1},\\ y_{1:i-1} \end{array}} \alpha _{t}^i(x_{1:i-1},y_{1:i-1},z_{1:i-1}) H(\nu _{t}^{i, x_{1:i-1},y_{1:i-1}} | \mu ^{i}) \gamma _{t}(z) \\&= \sum _{z_{1:i-1}} \sum _{\small \begin{array}{c} x_{1:i-1},\\ y_{1:i-1} \end{array}} \alpha _{t}^i(x_{1:i-1},y_{1:i-1},z_{1:i-1}) H(\nu _{t}^{i, x_{1:i-1},y_{1:i-1}} | \mu ^{i})\!\!\!\!\! \sum _{\small \begin{array}{c} u\in V\!\!:\\ u_{1:i-1}=z_{1:i-1} \end{array}}\!\!\!\!\!\gamma _{t}(u)\\&= \sum _{z_{1:i-1}}\sum _{\small \begin{array}{c} x_{1:i-1},\\ y_{1:i-1}\end{array}} H(\nu _{t}^{i, x_{1:i-1},y_{1:i-1}} | \mu ^{i}) \prod _{k=1}^{i-1}\nu _{t}^{x_{k},y_{k}}(z_{k})\pi ^k(x_{k},y_{k} | x_{1:k-1}, y_{1:k-1}) \qquad (\text {by } (4.5))\\&= \sum _{\small \begin{array}{c} x_{1:i-1},\\ y_{1:i-1} \end{array}} H(\nu _{t}^{i, x_{1:i-1},y_{1:i-1}} | \mu ^{i}) \prod _{i=1}^{i-1}\pi ^k(x_{k},y_{k} | x_{1:k-1}, y_{1:k-1}) \qquad (\text {integrating over } z_k)\\&= \sum _{x, y} H(\nu _{t}^{i, x_{1:i-1},y_{1:i-1}} | \mu ^{i}) \prod _{k=1}^n\pi ^k(x_{k},y_{k} | x_{1:k-1}, y_{1:k-1}). \end{aligned}$$

Therefore, \(H(\gamma _{t}|\mu ) \le H(\nu _{t}^1 |\mu ^1) +\sum _{i=2}^{n} \sum _{x, y} H(\nu _{t}^{i, x_{1:i-1},y_{1:i-1}} | \mu ^{i}) \hat{\pi }(x,y)\). Now, applying the assumed displacement convexity inequalities, we get

$$\begin{aligned} H(\gamma _{t}|\mu )&\le (1-t) \left[ H(\nu _{0}^1 |\mu ^1) +\sum _{i=2}^{n} \sum _{x, y} H(\nu _{0}^{i}(\,\cdot \,|x_{1:i-1}) | \mu ^{i}) \hat{\pi }(x,y) \right] \\&\quad +\, t\left[ H(\nu _{1}^1 | \mu ^1) +\sum _{i=2}^{n} \sum _{x, y} H(\nu _{1}^{i}(\,\cdot \,| y_{1:i-1}) | \mu ^{i}) \hat{\pi }(x,y)\right] \\&\quad -\, Ct(1-t)\left[ R_{2}(\pi ^1) + \sum _{i=2}^{n} \sum _{x,y}R_{2}(\pi ^i(\,\cdot \,| x_{1:i-1}, y_{1:i-1}))\hat{\pi }(x,y)\right] \\&= (1-t) \left[ H(\nu _{0}^1 |\mu ^1) +\sum _{i=2}^{n} \sum _{x} H(\nu _{0}^{i}(\,\cdot \,|x_{1:i-1}) | \mu ^{i})\nu _{0}(x) \right] \\&\quad +\, t\left[ H(\nu _{1}^1 | \mu ^1) +\sum _{i=2}^{n} \sum _{y} H(\nu _{1}^{i}(\,\cdot \,| y_{1:i-1}) | \mu ^{i}) \nu _{1}(y)\right] \\&\quad -\, Ct(1-t)\left[ R_{2}(\pi ^1) + \sum _{i=2}^{n}\sum _{x,y} R_{2}(\pi ^i(\,\cdot \,| x_{1:i-1}, y_{1:i-1}))\hat{\pi }(x,y)\right] \\&\le (1-t)H(\nu _{0}|\mu ) + tH(\nu _{1}|\mu ) - Ct(1-t) (I_{2}^{(n)}(\hat{\pi })+\bar{I}_{2}^{(n)}(\hat{\pi })), \end{aligned}$$

where the last inequality follows from the disintegration equality (4.6) for the relative entropy and from the disintegration inequality given in Proposition 3.5. \(\square \)

As an application of Theorem 1.1, we derive the displacement convexity of entropy property on the hypercube.

Corollary 4.4

(Displacement convexity on the hypercube) Let \(\mu \) be a non-trivial Bernoulli measure on \(\{0,1\}\) and define its \(n\)-fold product \(\mu ^{\otimes n}\) on \(\Omega _n = \{0,1\}^n\). For any \(\nu _0, \nu _1 \in \mathcal P (\Omega _n)\), there exists a \(\pi \in \Pi (\nu _{0},\nu _{1})\) such that for any \(t \in [0,1]\),

$$\begin{aligned} H(\nu _t^{\pi }|\mu ^{\otimes n}) \le (1-t)H(\nu _0|\mu ^{\otimes n})+tH(\nu _1|\mu ^{\otimes n}) -\frac{t(1-t)}{2}\left( I_2^{(n)}(\pi )+ \bar{I}_{2}^{(n)}(\pi ) \right) ,\nonumber \\ \end{aligned}$$
(4.7)

and there exists \(\pi \in \Pi (\nu _{0},\nu _{1})\) such that for any \(t \in [0,1]\),

$$\begin{aligned} H(\nu _t^{\pi }|\mu ^{\otimes n}) \le (1-t)H(\nu _0|\mu ^{\otimes n})+tH(\nu _1|\mu ^{\otimes n}) -{2t(1-t)}J_2^{(n)}(\pi ). \end{aligned}$$
(4.8)

Proof

According to Proposition 4.1, for all \(\nu _0,\nu _1 \in \mathcal P (\{0,1\})\), it holds

$$\begin{aligned} H(\nu _t|\mu ) \le (1-t)H(\nu _0|\mu )+tH(\nu _1|\mu ) -\frac{t(1-t)}{2}\left( \widetilde{\mathcal{T }}_{2}(\nu _1|\nu _0)+\widetilde{\mathcal{T }}_{2}(\nu _0|\nu _1) \right) ,\qquad \forall t\in [0,1], \end{aligned}$$

with \(\nu _t =(1-t)\nu _0 + t\nu _1\). As we have seen in the proof of Lemma 3.2 the coupling \(\pi \) defined by (3.1) is optimal for both \(\widetilde{\mathcal{T }}_{2}(\nu _1|\nu _0)\) and \(\widetilde{\mathcal{T }}_{2}(\nu _0|\nu _1)\). Since on the two-point space \(\nu _t=\nu _t^\pi \) is independent of \(\pi \), the preceding inequality can be rewritten as follows:

$$\begin{aligned} H(\nu _t^\pi |\mu ) \le (1-t)H(\nu _0|\mu )+tH(\nu _1|\mu ) -\frac{t(1-t)}{2}\left( I_2(\pi )+\bar{I}_2(\pi ) \right) ,\quad \forall t\in [0,1]. \end{aligned}$$

Therefore, we are in a position to apply Theorem 1.1, and to conclude that \(\mu ^{\otimes n}\) verifies the announced displacement convexity property (4.7).

Similarly, by Lemma 3.2, the displacement convexity property (4.2) ensures that for all \(\nu _0,\nu _1 \in \mathcal P (\{0,1\})\)

$$\begin{aligned} H(\nu _t^\pi |\mu ) \le (1-t)H(\nu _0|\mu )+tH(\nu _1|\mu ) -{2t(1-t)}J_2(\pi ),\qquad \forall t\in [0,1]. \end{aligned}$$

The result then follows from Theorem 1.1.

Let \(\pi \) be a coupling of \(\nu _0,\nu _1 \in \mathcal P (\Omega _n)\). By the Cauchy-Schwarz inequality, we have

$$\begin{aligned} J_2^{(n)}(\pi )&= \sum _{i=1}^n \left( \sum _{x,y \in \Omega _n} {1\!\!1}_{x_i \ne y_i} \pi (x,y) \right) ^2 \ge \frac{1}{n} \left( \sum _{x,y \in \Omega _n} \sum _{i=1}^n {1\!\!1}_{x_i \ne y_i} \pi (x,y) \right) ^2 \\&= \frac{1}{n} \left( \sum _{x,y \in \Omega _n} d(x,y) \pi (x,y) \right) ^2 \\&\ge \frac{1}{n} W_1 (\nu _1,\nu _0)^2. \end{aligned}$$

We immediately deduce from Corollary 4.4 the following weaker result.

Corollary 4.5

Let \(\mu \) be a probability measure on \(\{0,1\}\) and define its \(n\)-fold product \(\mu ^{\otimes n}\) on \(\Omega _n = \{0,1\}^n\). For any \(\nu _0,\nu _1 \in \mathcal P (\Omega _n)\), there exists \(\pi \in \Pi (\nu _0,\nu _1)\) such that for \(t \in [0,1]\),

$$\begin{aligned} H(\nu _t^{\pi }|\mu ^{\otimes n}) \le (1-t)H(\nu _0|\mu ^{\otimes n})+tH(\nu _1|\mu ^{\otimes n}) -\frac{2t(1-t)}{n} W_1 (\nu _1,\nu _0)^2. \end{aligned}$$

The constant \(1/n\) encodes, in some sense, the discrete Ricci curvature of the hypercube in accordance with the various definitions of the discrete Ricci curvature (see the introduction).

Remark 4.6

Since \(\widetilde{\mathcal{T }}_2\) is defined as an infimum, one can replace, for free, the term \(I_2^{(n)}(\pi )\) by \(\widetilde{\mathcal{T }}_2^{(n)}(\nu _1|\nu _0)\) in (4.7). Moreover, if one chooses \(\nu _0=\mu ^{\otimes n}\) and uses that \(H(\nu _t^{\pi }|\mu ^{\otimes n}) \ge 0\), one easily derives from (4.7) the following transport-entropy inequality:

$$\begin{aligned} \widetilde{\mathcal{T }}_2^{(n)}(\nu |\mu ^{\otimes n})+\widetilde{\mathcal{T }}_2^{(n)}(\mu ^{\otimes n}|\nu ) \le 2 H(\nu |\mu ^{\otimes n}), \quad \forall \nu \in \mathcal P (\Omega _n). \end{aligned}$$

See [17] for more on such an inequality (on graphs). Note that the above argument is general and that one can always derive from the displacement convexity of the entropy some Talagrand-type transport-entropy inequality.

5 HWI type inequalities on graphs

As already stated in the introduction, the displacement convexity of entropy property is usually (i.e., in continuous space settings) the strongest property in the following hierarchy:

$$\begin{aligned} \hbox {Displacement convexity } \Rightarrow \hbox { HWI } \Rightarrow \hbox {Log Sobolev}. \end{aligned}$$

Applying an argument based a the differentiation property of \(\nu _t^\pi \), in this section, we derive HWI and log-Sobolev type inequalities from the displacement convexity property.

We shall start with the aforementioned differentiation property of the path \(\nu _t^\pi \). Then, we derive a general statement on product of graphs that allows to obtain symmetric HWI inequality from the displacement convexity property of the entropy. As a consequence, we get a new symmetric HWI inequality on the hypercube that implies a modified log-Sobolev inequality on the hypercube. This modified log-Sobolev inequality also implies, by means of the Central Limit Theorem, the classical log-Sobolev inequality for the standard Gaussian measure, with the optimal constant.

Then we move to another HWI type inequality involving the Dirichlet form \(\mathcal{E _\mu (f,\log f)}\) based on Eq. (5.1) available on complete graph.

5.1 Differentiation property

A second property of the path defined in (2.2) and (2.3) is the following time differentiation property.

For any \(z\) on a given geodesic \(\gamma \) from \(x\) to \(y\), if \(z\ne y\), let \(\gamma _+(z)\) denotes the (unique) vertex on \(\gamma \) at distance \(d(z,y)-1\) from \(y\) (and thus at distance \(d(x,z)+1\) from \(x\)), and similarly if \(z\ne x\), let \(\gamma _-(z)\) denote the vertex on \(\gamma \) at distance \(d(z,y)+1\) from \(y\) (and hence at distance \(d(x,z)-1\) from \(x\)). In other words, following the geodesic \(\gamma \) from \(x\) toward \(y\), \(\gamma _-(z)\) is the vertex just anterior to \(z\), and \(\gamma _+(z)\) the vertex posterior to \(z\).

For any real function \(f\) on \(V\), we also define two related notions of gradient along \(\gamma \): for all \(z\in \gamma \), \(z\ne y\),

$$\begin{aligned} \nabla _\gamma ^+f(z)= f(\gamma _+(z))-f(z), \end{aligned}$$

and for all \(z\in \gamma \), \(z\ne x\),

$$\begin{aligned} \nabla _\gamma ^-f(z)= f(z)-f(\gamma _-(z)). \end{aligned}$$

By convention, we put \(\nabla ^-_\gamma f(x)=\nabla _\gamma ^+f(y)=0\), and \(\nabla _\gamma ^+f(z)=\nabla ^-_\gamma f(z)=0,\) if \(z\notin \gamma .\) Let \(\nabla _\gamma f\) denote the following convex combination of these two gradients:

$$\begin{aligned} \nabla _\gamma f(z)= \frac{d(y,z)}{d(x,y)} \nabla _\gamma ^+f(z) + \frac{d(x,z)}{d(x,y)} \nabla _\gamma ^-f(z). \end{aligned}$$

Observe that, although not explicitly stated, \(\nabla _\gamma \) depends on \(x\) and \(y\). Finally, for all \(z\in [\![x,y]\!]\), we define

$$\begin{aligned} \nabla _{x,y}f (z) = \frac{1}{|\Gamma (x,z,y)| }\sum _{\gamma \in \Gamma (x,z,y)} \nabla _\gamma f(z), \end{aligned}$$

and when \(z\notin [\![x,y]\!]\), we set \(\nabla _{x,y}f(z)=0.\)

Proposition 5.1

For all function \(f :V \rightarrow \mathbb{R }\) and all \(x,y \in V\), it holds

$$\begin{aligned} \frac{\partial }{\partial t} \nu _t^{x,y}(f) =d(x,y) \nu _t^{x,y} ( \nabla _{x,y}f ). \end{aligned}$$

As a direct consequence of the above differentiation property, we are able to give an explicit expression of the derivative (with respect to time) of the relative entropy of \(\nu _t^\pi \) with respect to an arbitrary reference measure.

Corollary 5.2

Let \(\nu _0\), \(\nu _1\) and \(\mu \) be three probability measures on \(V\). Assume that \(\mu (x)>0\) for all \(x\) in \(V\). Then, for any coupling \(\pi \in \Pi (\nu _0,\nu _1)\), it holds

$$\begin{aligned} \frac{\partial }{\partial t} H(\nu _t^\pi | \mu )_{|_{t=0}} = \sum _{\genfrac{}{}{0.0pt}{}{x,z \in V:}{z \sim x}} \left( \log \frac{\nu _0(z)}{\mu (z)} - \log \frac{\nu _0(x)}{\mu (x)} \right) \sum _{y \in V} d(x,y)\frac{|\Gamma (x,z,y)|}{|\Gamma (x,y)|} \pi (x,y). \end{aligned}$$

The proof of Corollary 5.2 can be found below. Before, we illustrate Corollary 5.2 on the example of the complete graph.

5.2 Example of the complete graph \(K_n\)

Let \(K_n\) be the complete graph with \(n\) vertices. Recall that \(\nu _t^\pi =(1-t)\nu _0+t\nu _1\) (see Sect. 2.1.1). Then, under the assumption of Corollary 5.2, since \(d(x,y)= |\Gamma (x,y)| = |\Gamma (z,y)|=1\), we have

$$\begin{aligned} \frac{\partial }{\partial t} H(\nu _t^\pi | \mu )_{|_{t=0}}&= \sum _{x \in K_n} \sum _{z \sim x} (\log f(z) - \log f(x) ) \pi (x,z)\\&= \sum _{z \in K_n} \log f(z) \nu _1(z) - \sum _{x \in K_n} f(x) \log f(x) \mu (x) \end{aligned}$$

where we set for simplicity \(f=\nu _0/\mu \). On the other hand, since \(f\) is a density with respect to \(\mu \),

$$\begin{aligned} - \mathcal E _\mu (f, \log f)&:= - \frac{1}{2} \sum _{x,z \in K_n} (\log f(z) - \log f(x) )(f(z) -f(x)) \mu (x) \mu (z) \\&= \sum _{z \in K_n} \log f(z) \mu (z) - \sum _{x \in K_n} f(x) \log f(x) \mu (x). \end{aligned}$$

Hence, if \(\nu _1 = \mu \equiv 1/n\) is the uniform measure on \(K_n\) (that charges all the points), we can conclude that

$$\begin{aligned} \frac{\partial }{\partial t} H(\nu _t^\pi | \mu )_{|_{t=0}} = - \mathcal E _\mu (f, \log f). \end{aligned}$$
(5.1)

Note that, when \(\mu \equiv 1/n\), \(\mathcal E _\mu \) corresponds to the Dirichlet form associated to the uniform chain on the complete graph (each point jumps to any point with probability \(1/n\)).

In order to prove Proposition 5.1, we need some preparation. Recall that \(\mathcal B (n,t)\) denotes a binomial variable of parameter \(n\) and \(t\), and that, for any function \(h :\{0,1,\ldots ,n\} \rightarrow \mathbb{R }\), \(\mathcal B (n,t)(h)=\sum _{k=0}^n h(k)C_n^kt^{k}(1-t)^{n-k}\).

Lemma 5.3

Let \(n\in \mathbb{N }^*\) and \(t\in [0,1]\). For any function \(h :\{0,1,\ldots ,n\}\rightarrow \mathbb{R }\) it holds

$$\begin{aligned} \frac{\partial }{\partial t} \mathcal B (n,t) (h) \!=\! \sum _{k=0}^n \left[ (h(k\!+\!1)\!-\!h(k))(n-k)+(h(k)-h(k-1))k\right] \,C_n^kt^k(1-t)^{n-k}, \end{aligned}$$

with the convention that \(h(-1)=h(n+1)=0.\)

Proof of Lemma 5.3

By differentiating in \(t\), we have

$$\begin{aligned} \frac{\partial }{\partial t} \mathcal B (n,t)(h) = \sum _{k=0}^n h(k)kC_n^kt^{k-1}(1-t)^{n-k} - \sum _{k=0}^n h(k)(n-k)C_n^kt^{k}(1-t)^{n-k-1}. \end{aligned}$$

Now, using that \(1=t+(1-t)\) and that \(kC_n^k= (n-k+1) C_n^{k-1}\), we get

$$\begin{aligned} k C_{n}^{k} t^{k-1}(1-t)^{n-k} = kC_{n}^{k} t^{k}(1-t)^{n-k} + (n-k+1)C_{n}^{k-1} t^{k-1}(1-t)^{n-k+1}, \end{aligned}$$

with the convention that \(C_{n}^{-1}=0\). Similarly, using that \((n-k)C_n^{k}= (k+1) C_n^{k+1}\), we have

$$\begin{aligned} (n\!-\!k) C_{n}^{k} t^{k}(1\!-\!t)^{n-k-1} \!=\! (n-k) C_{n}^{k} t^{k}(1-t)^{n-k}+(k+1)C_{n}^{k+1} t^{k+1}(1-t)^{n-k-1}. \end{aligned}$$

Hence,

$$\begin{aligned} \frac{\partial }{\partial t}\mathcal B (n,t)(h)&= \sum _{k=0}^n h(k) (n-k+1)C_{n}^{k-1} t^{k-1}(1-t)^{n-k+1}\\&\quad -\sum _{k=0}^n h(k)(n-k)C_{n}^{k} t^{k}(1-t)^{n-k} +\sum _{k=0}^n h(k)kC_{n}^{k} t^{k}(1-t)^{n-k}\\&\quad -\sum _{k=0}^n h(k)(k+1)C_{n}^{k+1}t^{k+1}(1-t)^{n-k-1}\\&= \sum _{k=0}^{n} (h(k+1)-h(k))(n-k)C_n^kt^k(1-t)^{n-k} \\&\quad + \sum _{k=0}^{n} (h(k)-h(k-1))kC_n^kt^k(1-t)^{n-k}, \end{aligned}$$

with the convention that \(h(-1)=h(n+1)=0\). \(\square \)

We were informed by Hillion [19] that the above elementary lemma also appears in his thesis. We are now in a position to prove Proposition 5.1.

Proof of Proposition 5.1

Set \(n=d(x,y)\) and let \(\Gamma \) be a random variable uniformly distributed on \(\Gamma (x,y)\) and \(N_t\) be a random variable with Binomial law \(\mathcal B (n,t)\) independent of \(\Gamma \). By definition \(\nu _t^{x,y}\) is the law of \(X_t = \Gamma _{N_t}.\) Using the independence, we have

$$\begin{aligned} \nu _t^{x,y}(f)= \mathbb{E }\left[ f(X_t)\right] = \sum _{k=0}^n h(k) C^k_nt^k(1-t)^{n-k}, \end{aligned}$$

with \(h(k)=\mathbb{E }[f(\Gamma _k)]\), \(k=0,1\dots ,n\). According to Lemma 5.3, we thus get

$$\begin{aligned} \frac{\partial }{\partial t}\nu _t^{x,y}(f)&= \sum _{k=0}^n \left[ (h(k+1)-h(k))(n-k)+(h(k)-h(k-1))k\right] \,C_n^kt^k(1-t)^{n-k}\\&= \mathbb{E }\left[ (h(N_t+1)-h(N_t))(n-N_t)+(h(N_t)-h(N_t-1))N_t \right] \\&= \mathbb{E }\left[ (f(\Gamma _{N_t+1})-f(\Gamma _{N_t}))d(\Gamma _{N_t},y)+(f(\Gamma _{N_t})-f(\Gamma _{N_t-1}))d(x,\Gamma _{N_t}) \right] \\&= \mathbb{E }\left[ (f(\Gamma ^+(X_t))-f(X_t))d(X_t,y)+(f(X_t)-f(\Gamma ^-(X_t)))d(x,X_t) \right] \\&= \mathbb{E }\left[ d(x,y)\nabla _\Gamma f(X_t) \right] \!. \end{aligned}$$

Finally, observe that the law of \(\Gamma \) knowing \(X_t=z\in [\![x,y]\!]\) is uniform on \(\Gamma (x,z,y).\) Indeed,

$$\begin{aligned} \mathbb{P }(\Gamma =\gamma ,\ X_t=z)= \mathbb{P }(\Gamma = \gamma ,\ \gamma _{N_t}=z)&= \mathbb{P }(\Gamma = \gamma ,\ N_t=d(x,z),\ z\in \gamma ) \\&= \frac{{1\!\!1}_{\Gamma (x,z,y)}(\gamma )}{|\Gamma (x,y)|}\mathbb{P }(N_t=d(x,z)). \end{aligned}$$

On the other hand,

$$\begin{aligned} \mathbb{P }(X_t=z)=\nu _t^{x,y}(z)=\mathbb{P }(N_t=d(x,z)) \frac{|\Gamma (x,z,y)|}{|\Gamma (x,y)|}, \end{aligned}$$

which proves the claim. By the definition of \(\nabla _{x,y}f\), it thus follows that

$$\begin{aligned} \frac{\partial }{\partial t}\nu _t^{x,y}(f) = d(x,y)\,\nu _t^{x,y} (\nabla _{x,y}f), \end{aligned}$$

which completes the proof. \(\square \)

Proof of Corollary 5.2

For simplicity, let \(F=\log (\nu _0/\mu )\). We observe that, since \(\sum _{z \in V} \frac{\partial }{\partial t} \nu _t^\pi (z)=0\), by Proposition 5.1 (recall that \(\nu _0^\pi =\nu _0\) and \(\nu _0^{x,y}=\delta _x\) by construction),

$$\begin{aligned} \frac{\partial }{\partial t} H(\nu _t^\pi | \mu )_{|_{t=0}}&= \frac{\partial }{\partial t} \left( \sum _{z \in V} \nu _t^\pi (z) \log \frac{\nu _t^\pi (z)}{\mu (z)} \right) _{|_{t=0}} = \frac{\partial }{\partial t} \nu _t^\pi (F)_{|_{t=0}} \\&= \sum _{(x,y) \in V^2} \pi (x,y) \frac{\partial }{\partial t} \nu _t^{x,y}( F ) \\&= \sum _{(x,y) \in V^2} \pi (x,y) d(x,y) \nabla _{x,y} F (x). \end{aligned}$$

By the definition of the gradient, for any \(\gamma \in \Gamma (x,y)\), it holds \(\nabla _\gamma F(x) = \nabla _\gamma ^+ F(x)\). Thus, by the definition of \(\nabla _{x,y}F\), we get

$$\begin{aligned} \frac{\partial }{\partial t} H(\nu _t^\pi | \mu )_{|_{t=0}} = \sum _{(x,y) \in V^2} \frac{\pi (x,y) d(x,y)}{|\Gamma (x,y)|} \sum _{\gamma \in \Gamma (x,y)} \nabla _\gamma ^+ F (x). \end{aligned}$$

Now, observe that for \((x,y)\in V^2\) given, it holds

$$\begin{aligned} \sum _{\gamma \in \Gamma (x,y)}\nabla _\gamma ^+F(x)=\sum _{\gamma \in \Gamma (x,y)}F(\gamma ^+(x))-F(x)=\sum _{z\sim x} (F(z)-F(x)) |\Gamma (x,z,y)|\,, \end{aligned}$$

completing the proof. \(\square \)

5.3 Symmetric HWI inequality for products of graphs

The aim of this section is to prove Proposition 1.2 and to derive a certain reinforced log-Sobolev inequality (see below for a brief justification of the name) in the discrete setting, and as a consequence, the classical log-Sobolev inequality of Gross on the (continuous) line, with the optimal constant.

Proof of Proposition 1.2

The displacement convexity inequality ensures that for all \(t \in [0,1]\),

$$\begin{aligned} H(\nu _0|\mu )\le H(\nu _1|\mu )- \frac{H(\nu _t^\pi | \mu ) - H(\nu _0|\mu )}{t} -c(1-t) (I_2^{(n)} (\pi )+\bar{I}_2^{(n)}(\pi )). \end{aligned}$$

As \(t\) goes to 0, this yields

$$\begin{aligned} H(\nu _0|\mu )\le H(\nu _1|\mu ) - \frac{\partial }{\partial t}H(\nu _t^\pi |\mu )_{|t=0}- c(I_2^{(n)} (\pi )+\bar{I}_2^{(n)}(\pi )), \end{aligned}$$

where \(\pi \in \Pi (\nu _0,\nu _1)\). According to Corollary 5.2, it holds

$$\begin{aligned} - \frac{\partial }{\partial t} H(\nu _t^\pi |\mu )_{|t=0}&= \sum _{\genfrac{}{}{0.0pt}{}{x,z \in V^n:}{z \sim x}} \left( \log \frac{\nu _0(x)}{\mu (x)} - \log \frac{\nu _0(z)}{\mu (z)} \right) \sum _{y \in V^n} d(x,y)\frac{|\Gamma (x,z,y)|}{|\Gamma (x,y)|} \pi (x,y)\\&\le \sum _{x\in V^n} \sum _{i=1}^n \left[ \sum _{z\in N_i(x)} \left( \log \frac{\nu _0(x)}{\mu (x)} - \log \frac{\nu _0(z)}{\mu (z)} \right) \right] _{+} \\&\quad \times \sum _{y \in V^n} d(x,y)\frac{|\Gamma (x,z,y)|}{|\Gamma (x,y)|} \pi (x,y). \end{aligned}$$

According to (2.6), by induction on \(n\ge 1\), we get that for all \(u,y\in V^n\),

$$\begin{aligned} |\Gamma (u,y)|=\frac{d(u,y)!}{\prod _{j=1}^n d(u_j,y_j)!} \prod _{j=1}^n |\Gamma (u_j,y_j)|. \end{aligned}$$

Applying this formula with \(u=z\in N_i(x)\) for some \(i\in \{1,\ldots ,n\}\) and \(u=x\), we get that for all \(y\) such that \(z\in [\![x,y]\!]\), it holds

$$\begin{aligned} \frac{|\Gamma (x,z,y)|}{|\Gamma (x,y)|}=\frac{|\Gamma (z,y)|}{|\Gamma (x,y)|}=\frac{d(z,y)!}{d(x,y)!} \frac{d(x_i,y_i)!}{d(z_i,y_i)!}\frac{|\Gamma (z_i,y_i)|}{|\Gamma (x_i,y_i)|}=\frac{d(x_i,y_i)}{d(x,y)}\frac{|\Gamma (z_i,y_i)|}{|\Gamma (x_i,y_i)|},\nonumber \\ \end{aligned}$$
(5.2)

using that \(x_j=z_j\) for all \(i\ne j\) and the relations \(d(x,y)=1\,+\,d(z,y)\) and \(d(x_i,y_i)=1+d(z_i,y_i).\) Therefore, when \(z\in N_i(x)\),

$$\begin{aligned} \sum _{y \in V^n} d(x,y)\frac{|\Gamma (x,z,y)|}{|\Gamma (x,y)|} \pi (x,y)&= \sum _{y \in V} d(x_i,y_i)\frac{|\Gamma (x_i,z_i,y_i)|}{|\Gamma (x_i,y_i)|} \pi (x,y)\\&\le \sum _{y \in V} d(x_i,y_i) \pi (x,y). \end{aligned}$$

Plugging this inequality into the expression for \(-\frac{\partial }{\partial t}H(\nu _t^\pi |\mu )_{|t=0}\) yields:

$$\begin{aligned} -\frac{\partial }{\partial t}H(\nu _t^\pi |\mu )_{|t=0}&\le \sum _{x\in V^n} \sum _{i=1}^n \left[ \sum _{z\in N_i(x)} \left( \log \frac{\nu _0(x)}{\mu (x)} - \log \frac{\nu _0(z)}{\mu (z)} \right) \right] _{+} \sum _{y \in V^n} d(x_i,y_i)\pi (x,y)\\&\le \sum _{x\in V^n} \sum _{i=1}^n\left[ \sum _{z\in N_i(x)} \left( \log \frac{\nu _0(x)}{\mu (x)} - \log \frac{\nu _0(z)}{\mu (z)} \right) \right] _{+} \sum _{y \in V^n} d(x_i,y_i)\frac{\pi (x,y)}{\nu _0(x)}\,\nu _0(x)\\&\le \sqrt{\sum _{x\in V^n} \sum _{i=1}^n \left[ \sum _{z\in N_i(x)} \left( \log \frac{\nu _0(x)}{\mu (x)} - \log \frac{\nu _0(z)}{\mu (z)} \right) \right] _{+}^2\nu _0(x)}\sqrt{I_2^{(n)}(\pi )}, \end{aligned}$$

where the last line follows from the Cauchy-Schwarz inequality. This completes the proof. \(\square \)

We proceed with the announced reinforced log-Sobolev inequality and its consequences.

Choose \(\nu _1=\mu \) in (1.10) and denote by \(f(x)=\nu _0(x)/\mu (x)\). Then, using the elementary inequality \(\sqrt{ab}\le a/(2\varepsilon ) + \varepsilon b/2\), \(\varepsilon >0\), we immediately get the following corollary.

Corollary 5.4

(Reinforced log-Sobolev) Under the same assumptions of Proposition 1.2, for all \(f :V^n \rightarrow (0,\infty )\) with \(\mu (f)=1\), for all \(\varepsilon \le 2c\), it holds that

$$\begin{aligned} \mathrm{Ent }_\mu (f)&\le \frac{1}{2\varepsilon } \sum _{x\in V^n} \sum _{i=1}^n \left[ \sum _{z\in N_i(x)} \left( \log f(x) - \log f(z) \right) \right] _{+}^2\nonumber \\&\times f(x)\mu (x) -(c-\frac{\varepsilon }{2}) \widetilde{\mathcal{T }}_2(\mu |f\mu )-c \widetilde{\mathcal{T }}_2(f\mu |\mu ). \end{aligned}$$
(5.3)

Inequality (5.3) can be seen as a reinforcement of a (discrete) modified log-Sobolev inequality. The next corollary deals with the special case of the discrete cube.

Corollary 5.5

(Reinforced log-Sobolev on \(\Omega _n\) and Gross’ Inequality) Let \(\mu \) be a non-trivial Bernoulli measure on \(\{0,1\}\). Then, for any \(n\) and any \(f :\Omega _n \rightarrow (0,\infty )\), it holds

$$\begin{aligned} \mathrm{Ent }_{\mu ^{\otimes n}}(f)\le \frac{1}{2} \sum _{x\in \Omega _n} \sum _{i=1}^n \left[ \log f(x) - \log f(\sigma _i(x)) \right] _{+}^2f(x)\mu ^{\otimes n}(x) - \frac{1}{2} \widetilde{\mathcal{T }}_2(f\mu |\mu ), \nonumber \\ \end{aligned}$$
(5.4)

where \(\sigma _i(x)=(x_1,\dots ,x_{i-1},1-x_i,x_{i+1},\dots ,x_n)\) is the neighbor of \(x=(x_1, \dots , x_n)\) for which the \( i\)-th coordinate differs from that of \(x\).

As a consequence, for any \(n\) and any \(g :\mathbb R ^n \rightarrow \mathbb R \) smooth enough, it holds

$$\begin{aligned} \mathrm{Ent }_{\gamma _n}(e^g)\le \frac{1}{2} \int |\nabla g|^2 e^g d\gamma _n \end{aligned}$$
(5.5)

where \(\gamma _n\) is the standard Gaussian measure on \(\mathbb R ^n\), and \(|\nabla g|\) is the length of the gradient of \(g\).

Remark 5.6

Note that the constant \(1/2\) in the above log-Sobolev inequality for the standard Gaussian is optimal, see e.g. [1, Chapter 1].

Proof of Proposition 1.2

By Corollary 4.4, Inequality (5.3) holds with \(c=1/2\). Observe that \(N_i(x)=\{\sigma _i(x)\}\) where \(\sigma _i(x)=(x_1,\dots ,x_{i-1},1-x_i,x_{i+1},\dots ,x_n)\) is the neighbor of \(x=(x_1,\dots ,x_n)\) for which the \( i\)-th coordinate differs from that of \(x\). For \(\varepsilon =1\), Corollary 5.4 gives

$$\begin{aligned} \mathrm{Ent }_{\mu ^{\otimes n}}(f)\le \frac{1}{2} \sum _{x\in \Omega _n} \sum _{i=1}^n \left[ \log f(x) - \log f(\sigma _i(x)) \right] _{+}^2f(x)\mu ^{\otimes n}(x) - \frac{1}{2} \widetilde{\mathcal{T }}_2(f\mu |\mu ), \end{aligned}$$

which is the first part of the corollary.

For the second part, we shall apply the Central Limit Theorem. Our starting point is the following modified log-Sobolev inequality on the hypercube:

$$\begin{aligned} \mathrm{Ent }_{\mu ^{\otimes n}}(f)\le \frac{1}{2} \sum _{x\in \Omega _n} \sum _{i=1}^n \left[ \log f(x) - \log f(\sigma _i(x)) \right] _{+}^2f(x)\mu ^{\otimes n}(x) \end{aligned}$$
(5.6)

that holds for all product probability measures on the hypercube \(\Omega _n=\{0,1\}^n\), for all dimensions \(n\ge 1\).

First we observe that, by the tensorization of the log-Sobolev inequality (see e.g. [1, Chapter 1]), we only need to prove Gross’ Inequality (5.5) in dimension one (\(n=1\)). Then, thanks to a result by Miclo [37], we know that extremal functions in the log-Sobolev inequality, in dimension one, are monotone. Hence, we can assume that \(g\) is monotone and non-decreasing (the case \(g\) non-increasing can be treated similarly). Furthermore, for convenience, we first assume that the function \(g :\mathbb R \rightarrow \mathbb R \) is smooth and bounded.

Let \(\mu _p\) be the Bernoulli probability measure with parameter \(p\in (0,1)\). We apply (5.6) to the function \(f=e^{G_n}\), with

$$\begin{aligned} G_n(x)={ g \left( \frac{\sum _{i=1}^n x_i - np}{\sqrt{ np(1-p)}} \right) }, \qquad x \in \Omega _n, \end{aligned}$$

so that \(\mathrm{Ent }_{\mu _p^{\otimes n}}\left( e^{G_n}\right) \) tends to \(\mathrm{Ent }_\gamma (e^g)\) by the Central Limit Theorem. It remains to identify the limit, when \(n\) tends to infinity, of the Dirichlet form [in the right-hand side of (5.6)]. Let \(\bar{x}^iy_i\) denote the vector \((x_1,\ldots ,x_{i-1},y_i,x_{i+1},\ldots ,x_{n})\). Then,

$$\begin{aligned} \sum _{x_i\in \{0,1\}} [G_n(x)-G_n(\sigma _i(x))]_+^2 e^{G_n(x)}\,\mu _p(x_i)&= p [G_n(\bar{x}^i1)-G_n(\bar{x}^i0)]_+^2 e^{G_n(\bar{x}^i1)}\\&\quad +\, (1-p) [G_n(\bar{x}^i0)-G_n(\bar{x}^i1)]_+^2 e^{G_n(\bar{x}^i0)}. \end{aligned}$$

Now, since

$$\begin{aligned} \frac{\sum _{i=1}^n x_i - np}{\sqrt{ np(1-p)}} - \frac{\sum _{j\ne i} x_j- (n-1)p}{\sqrt{(n-1)p(1-p)}}&= \frac{x_i}{\sqrt{ np(1-p)}}+\frac{1}{\sqrt{ p(1-p)}}\sum _{j\ne i} x_j\left( \frac{1}{\sqrt{n}}-\frac{1}{\sqrt{n-1}}\right) \\&\quad -\,\frac{p}{\sqrt{ p(1-p)}}\left( \sqrt{n}-\sqrt{n-1}\right) \\&= \frac{x_i}{\sqrt{ np(1-p)}}-\frac{\sum _{j\ne i} x_j}{ \sqrt{ p(1-p)} \left( {\sqrt{n}}+{\sqrt{n-1}}\right) {\sqrt{n}}{\sqrt{n-1}}} \\&\quad -\,\frac{p}{\sqrt{ p(1-p)}\left( \sqrt{n}+\sqrt{n-1}\right) } =O\left( \frac{1}{\sqrt{n}}\right) , \end{aligned}$$

by a Taylor Expansion, we have

$$\begin{aligned} G_n(\bar{x}^i1)-G_n(\bar{x}^i0)=\frac{1}{\sqrt{ np(1-p)}}\,g'\left( \frac{\sum _{j\ne i} x_j-p ({n-1})}{\sqrt{(n-1)p(1-p)}}\right) +O\left( \frac{1}{n} \right) . \end{aligned}$$

Setting \(\displaystyle y_i(x)=\frac{\sum _{j\ne i} x_j-p( {n-1})}{\sqrt{(n-1)p(1-p)}}\), it follows that

$$\begin{aligned} \sum _{x_i\in \{0,1\}}[G_n(x)-G_n(\sigma _i(x))]_+^2 e^{G_n(x)}\,\mu _p(x_i) = \frac{ g'\left( y_i(x)\right) ^2 e^{g(y_i(x))}}{n(1-p)} + O\left( \frac{1}{n^{3/2}}\right) . \end{aligned}$$

Now, since all \(y_i(x)\)’s have the same law under \(\mu _p^{\otimes n}\), it follows that

$$\begin{aligned} \sum _{x\in \Omega _n} \sum _{i=1}^n[G_n(x)-G_n(\sigma _i(x))]_+^2 e^{G_n(x)}\,\mu _p^{\otimes n}(x)&= \!\! \sum _{x\in \Omega _n} \! \frac{g'\left( y_1(x) \right) ^2e^{g(y_1(x))}}{1-p} \mu _p^{ \otimes n}(x) \\&\quad +\, O\left( \frac{1}{\sqrt{n}}\right) . \end{aligned}$$

The desired result follows by the Central Limit Theorem, then optimizing over all \(p\in (0,1)\), and finally by a standard density argument. This ends the proof. \(\square \)

5.4 The complete graph

Combining the differentiation property (5.1) together with the displacement convexity on the complete graph of Proposition 4.1, we prove the following result.

Proposition 5.7

(HWI type inequality on the complete graph) Let \(\mu \equiv 1/n\) be the uniform measure on the complete graph \(K_n\). Then, for any \(f :V(K_n) \rightarrow (0,\infty )\) with \(\int fd\mu =1\), it holds

$$\begin{aligned} \mathrm{Ent }_\mu (f) \le \mathcal E _\mu (f,\log f) - \frac{1}{2} \left( \widetilde{\mathcal{T }}_2 (\mu |f \mu ) + \widetilde{\mathcal{T }}_2 (f \mu |\mu ) \right) \!, \end{aligned}$$

where

$$\begin{aligned} \mathcal E _\mu (f,\log f) := \frac{1}{2} \sum _{x,y \in K_n} (f(y)-f(x))(\log f(y) - \log f(x))\mu (x)\mu (y) \end{aligned}$$

corresponds to the Dirichlet form associated to the Markov chain on \(K_n\) that jumps uniformly at random from any vertex to any vertex (i.e. with transition probabilities \(K(x,y)=\mu (y)=1/n\), for any \(x\), \(y \in V(K_n)\)).

Proof

We follow the same line of proof as in Proposition 1.2. Fix \(f :V(K_n) \rightarrow (0,\infty )\) with \(\int fd\mu =1\). By Proposition 4.1, applied to \(\nu _1 = \mu \) (which implies that \(H(\nu _1|\mu )=0\)) and \(\nu _0= f\mu \), we have

$$\begin{aligned} H(\nu _t | \mu ) \le (1-t) H(\nu _0|\mu ) - \frac{t(1-t)}{2} \left( \widetilde{\mathcal{T }}_2 (\nu _1|\nu _0) + \widetilde{\mathcal{T }}_2 (\nu _0|\nu _1) \right) \!, \end{aligned}$$

where \(\nu _t=(1-t)\nu _0 + t\nu _1\). Hence, as \(t\) goes to 0, we get

$$\begin{aligned} \int f \log f d\mu =H(\nu _0|\mu ) \le -\frac{\partial }{\partial t} H(\nu _t | \mu )_{|_{t=0}} - \frac{1}{2} \left( \widetilde{\mathcal{T }}_2 (\nu _1|\nu _0) + \widetilde{\mathcal{T }}_2 (\nu _0|\nu _1) \right) \!. \\ \end{aligned}$$

The expected result follows from (5.1). \(\square \)

In the case of the two-point space, one can deal with any Bernoulli measure (not only the uniform one as in the case of the complete graph).

Proposition 5.8

(HWI for the two-point space) Let \(\mu \) be a Bernoulli-\(p\), \(p \in (0,1)\), measure on the two-point space \(\Omega _1=\{0,1\}\). Then, for any \(f :\Omega _1 \rightarrow (0,\infty )\) with \(\mu (f)=1\), it holds

$$\begin{aligned} \mathrm{Ent }_\mu (f) \le \mathcal E _\mu (f,\log f) - \frac{1}{2} \max \left\{ \widetilde{\mathcal{T }}_2 (\mu |f \mu ) + \widetilde{\mathcal{T }}_2 (f \mu |\mu ), \Vert f\mu -\mu \Vert _{TV}\right\} \end{aligned}$$

where,

$$\begin{aligned} \mathcal E _\mu (f,\log f) = p(1-p)(f(1)-f(0))(\log f(1) - \log f(0)). \end{aligned}$$

See Remark 4.3 for a comparison between \(\widetilde{\mathcal{T }}_2 (\mu |f \mu ) \,+\, \widetilde{\mathcal{T }}_2 (f \mu |\mu )\) and \(\Vert f\mu -\mu \Vert _{TV}\).

Proof

Reasoning as above, Proposition 4.1, applied to \(\nu _1 = \mu \) and \(\nu _0= f\mu \), implies

$$\begin{aligned} \mathrm{Ent }_\mu (f) \le - \frac{\partial }{\partial t} H(\nu _t | \mu )_{|_{t=0}} - \frac{1}{2}\max \left\{ \widetilde{\mathcal{T }}_2 (\mu |f \mu ) + \widetilde{\mathcal{T }}_2 (f \mu |\mu ), \Vert f\mu -\mu \Vert _{TV}\right\} , \end{aligned}$$

where \(\nu _t = (1-t)f\mu + t \mu \). Set \(q=1-p\). Since \(H(\nu _t |\mu ) = [(1-t)f(0)q + t q] \log [(1-t)f(0)+t] + [(1-t)f(1)p + t p] \log [(1-t)f(1)+t]\), it immediately follows that

$$\begin{aligned} \frac{\partial }{\partial t} H(\nu _t | \mu )_{|_{t=0}}&= q(1-f(0))\log f(0) \!+\! q(1\!-\!f(0)) \!+\! p(1\!-\!f(1))\log f(1) \!+\! p(1\!-\!f(1)) \\&= q(1-f(0))\log f(0) + p(1-f(1))\log f(1) \end{aligned}$$

where the second equality is a consequence of the fact that \(p+q=1=\mu (f)=qf(0)+pf(1)\). Using again that \(1=qf(0)+pf(1)\), we observe that

$$\begin{aligned} q(1-f(0))\log f(0) = pq(f(1) - f(0))\log f(0) \end{aligned}$$

and

$$\begin{aligned} p(1-f(1))\log f(1) = -pq(f(1)-f(0) \log f(1), \end{aligned}$$

from which the expected result follows. \(\square \)

6 Prékopa-Leindler type inequality

In this section we show by a duality argument that the displacement convexity property implies a discrete version of the Prékopa-Leindler inequality. (This argument was originally done by Lehec [27] in the context of Brascamp-Lieb inequalities.) Then we show that this Prékopa-Leindler inequality allows to recover the discrete modified log-Sobolev inequality (5.6) and a weak version of the transport entropy inequality of Remark 4.6.

Let us first recall the statement of the usual Prékopa-Leindler inequality.

Theorem 6.1

(Prékopa-Leindler [28, 43, 44]) Let \(n\in \mathbb{N }^*\) and \(t\in [0,1]\). For all triples \((f,g,h)\) of measurable functions on \(\mathbb{R }^n\) such that

$$\begin{aligned} h((1-t)x+ty)\ge (1-t)f(x)+tg(y),\qquad \forall x,y\in \mathbb{R }^n, \end{aligned}$$

it holds

$$\begin{aligned} \int e^{h(z)}\,dz \ge \left( \int e^{f(x)}\,dx\right) ^{1-t}\left( \int e^{g(y)}\,dy\right) ^t. \end{aligned}$$

Using the identity (with \(\Vert \cdot \Vert \) denoting the Euclidean norm),

$$\begin{aligned} \frac{1}{2}\Vert (1-t)x+ty\Vert _2^2=(1-t)\frac{\Vert x\Vert _2^2}{2}+t\frac{\Vert y\Vert _2^2}{2}-t(1-t)\frac{\Vert x-y\Vert _2^2}{2}, \qquad x,y \in \mathbb R ^n, \end{aligned}$$

one can recast, without loss, the preceding result into an inequality for the Gaussian distribution.

Theorem 6.2

(Prékopa-Leindler: the Gaussian case) Let \(\gamma _n\) be the standard normal distribution on \(\mathbb{R }^n\) and \(t\in [0,1]\). For all triples \((f,g,h)\) of measurable functions on \(\mathbb{R }^n\) such that

$$\begin{aligned} h((1-t)x+ty)\ge (1-t)f(x)+tg(y)-\frac{t(1-t)}{2}\Vert x-y\Vert _2^2,\qquad \forall x,y\in \mathbb{R }^n, \nonumber \\ \end{aligned}$$
(6.1)

it holds that

$$\begin{aligned} \int e^{h(z)}\,\gamma _n(dz) \ge \left( \int e^{f(x)}\,\gamma _n(dx)\right) ^{1-t}\left( \int e^{g(y)}\,\gamma _n(dy)\right) ^t. \end{aligned}$$

The next result shows that a discrete Prékopa-Leindler inequality can be derived from the displacement convexity property of the relative entropy.

Theorem 6.3

(Prékopa-Leindler (discrete version)) Let \(n\in \mathbb{N }^*\), \(t\in [0,1]\) and \(\mu \in \mathcal P (V^n)\). Suppose that \(\mu \) verifies the following property: for any \(\nu _0, \nu _1 \in \mathcal P (V^n)\), there exists a coupling \(\pi \in \Pi (\nu _0,\nu _1)\) such that

$$\begin{aligned} H(\nu _t^\pi |\mu ) \le (1-t) H(\nu _0|\mu ) + t H(\nu _1|\mu ) - ct(1-t) I_2^{(n)}(\pi ). \end{aligned}$$
(6.2)

If \((f,g,h)\) is a triple of functions on \(V^n\) such that: \(\forall x\in V^n\), \(\forall m\in \mathcal P (V^n)\) ,

$$\begin{aligned} \int \int h(z) \,\nu _t^{x,y}(dz) m(dy)&\ge (1-t)f(x) + t \int g (y)\,m(dy)-ct(1-t) \nonumber \\&\quad \times \sum _{i=1}^n \left( \int d(x_i,y_i)\,m(dy)\right) ^2, \end{aligned}$$
(6.3)

then it holds

$$\begin{aligned} \int e^{h(z)}\,\mu (dz) \ge \left( \int e^{f(x)}\,\mu (dx) \right) ^{1-t} \left( \int e^{g(y)} \,\mu (dy) \right) ^t. \end{aligned}$$

Proof

Let \(n \in \mathbb N \), \(f, g, h : V^n \mapsto \mathbb{R }\), \(\mu \in \mathcal P (V^n)\), \(t \in [0,1]\) and \(c \in (0, \infty )\) satisfying the hypotheses of the theorem. Given \(\nu _0, \nu _1 \in \mathcal P (V^n)\), let \(\pi \) be such that (6.2) holds and let \(p\) be such that \(\pi (x,y)=\nu _0(x)p(x,y)\), \(x,y \in V^n\).

Then, integrate (6.3) in the variable \(x\) with respect to \(\nu _0\), with \(m(y)=p(x,y)\), so that (recalling (2.3))

$$\begin{aligned} \int h \,d\nu _t^{\pi } \ge (1-t) \int f\,d\nu _0 + t \int g\,d\nu _1 -ct(1-t) I_2^{(n)}(\pi ). \end{aligned}$$

Together with (6.2), we end up with

$$\begin{aligned} \int h \,d\nu _t^{\pi } - H(\nu _t^\pi |\mu ) \ge (1-t) \left( \int f d\nu _0 - H(\nu _0|\mu ) \right) + t \left( \int g\,d\nu _1 - H(\nu _1|\mu )\right) . \end{aligned}$$

The result follows by optimization, since by duality (for any \(\alpha :V^n \mapsto \mathbb{R }\)) ,

$$\begin{aligned} \sup _{m \in \mathcal P (V^n)} \left\{ \int \alpha \,dm - H(m|\mu ) \right\} = \log \int e^\alpha \,d\mu . \end{aligned}$$

This ends the proof. \(\square \)

An immediate corollary is a Prékopa-Leindler inequality on the discrete hypercube.

Corollary 6.4

Let \(\mu \) be a probability measure on \(\{0,1\}\), \(n\in \mathbb{N }^*\) and \(t\in [0,1]\). For all triple \((f,g,h)\) verifying (6.3) with \(c=1/2\), it holds

$$\begin{aligned} \int e^{h(z)}\,\mu ^{\otimes n}(dz) \ge \left( \int e^{f(x)}\,\mu ^{\otimes n}(dx) \right) ^{1-t} \left( \int e^{g(y)} \,\mu ^{\otimes n}(dy) \right) ^t. \end{aligned}$$

It is well known that Talagrand’s transport-entropy inequality and the logarithmic Sobolev inequality for the Gaussian measure are both consequences of the Prékopa-Leindler inequality of Theorem 6.2 [4]. Similarly the discrete version of Prékopa-Leindler inequality implies the modified logarithmic Sobolev inequality induced by Corollary 5.4 and the transport-entropy inequality associated with the distance \(\widetilde{\mathcal{T }}_2\) of Remark 4.6.

Corollary 6.5

Assume that the following Prékopa-Leindler inequality holds: for all \(t\in (0,1)\), for all triples of functions \((f,g,h)\) on \(V^n\) such that: \(\forall x\in V^n\), \(\forall m\in \mathcal P (V^n)\) ,

$$\begin{aligned} \int \int h(z) \,\nu _t^{x,y}(dz) m(dy)&\ge (1-t)f(x) + t \int g (y)\,m(dy)-ct(1-t) \\&\quad \times \sum _{i=1}^n \left( \int d(x_i,y_i)\,m(dy)\right) ^2, \end{aligned}$$

it holds that

$$\begin{aligned} \int e^{h(z)}\,\mu (dz) \ge \left( \int e^{f(x)}\,\mu (dx) \right) ^{1-t} \left( \int e^{g(y)} \,\mu (dy) \right) ^t. \end{aligned}$$

Then one has, for all functions \(h :V^n \rightarrow \mathbb R \),

$$\begin{aligned} \mathrm{Ent }_\mu (e^h)\le \frac{1}{4c} \sum _{x\in V^n} \sum _{i=1}^n \left[ \sum _{z\in N_i(x)} \left( h(x) - h(z) \right) \right] _{+}^2e^{h(x)}\mu (x). \end{aligned}$$

and for all probability measures \(\nu \), absolutly continous with respect to \(\mu \),

$$\begin{aligned} c\,\widetilde{\mathcal{T }}_2(\mu |\nu )\le H(\nu |\mu ), \end{aligned}$$
(6.4)
$$\begin{aligned} c\,\widetilde{\mathcal{T }}_2(\nu |\mu )\le H(\nu |\mu ), \end{aligned}$$
(6.5)

Proof

We first prove the transport-entropy inequalities (6.4) and (6.5). Let \(k\) be a function on \(V^n\) (necessarily bounded, since \(V\) is finite). We apply the discrete Prékopa-Leindler inequality with \(h=0\), \(g=-(1-t)k\) and \(f=tQk\), with \(Qk\) defined so that the condition (6.3) holds: for all \(x\in V^n\),

$$\begin{aligned} Qk(x)=\inf _{m\in \mathcal P (V^n) } \left\{ \int k ( y)\,m(dy) + c \sum _{i=1}^n \left( \int d(x_i,y_i)\,m(dy) \right) ^2 \right\} . \end{aligned}$$

Therefore, one has for all \(t\in (0,1)\),

$$\begin{aligned} \left( \int e^{tQk}d\mu \right) ^{1/t}\left( \int e^{-(1-t)k}\,d\mu \right) ^{1/(1-t)}\le 1. \end{aligned}$$

As \(t\) goes to 1, we get for all functions \(k\) on \(V^n\),

$$\begin{aligned} \int e^{Qk}d\mu \le e^{\mu (k)}, \end{aligned}$$

and this is known to be a dual form of the transport-entropy inequality (6.4) (see [17]). Similarly as \(t\) goes to 0, we get for all functions \(k\) on \(V^n\),

$$\begin{aligned} \int e^{-k}d\mu \le e^{-\mu (Qk)}, \end{aligned}$$

which is a dual form of the transport-entropy inequality (6.5).

Let us now turn to the proof of the modified discrete logarithmic Sobolev inequality. Fix a bounded function \(h:V^n\rightarrow \mathbb{R }\) and choose \(g=th\) and \(f=h+tR_th\) with \(R_t h \) designed so that condition (6.3) holds. Namely, for all \(x\in V^n\),

$$\begin{aligned} R_t h(x)=\inf _m&\left\{ \frac{1}{t(1-t)}\left( \int \int h(z)\nu _t^{x,y}(dz)\,m(dy) - (1-t) h(x) \right) \right. \\&\quad -\left. \frac{t}{1-t} \int h ( y)\,m(dy) + c \sum _{i=1}^n \left( \int d(x_i,y_i)\,m(dy) \right) ^2 \right\} \,, \end{aligned}$$

where the infimum runs over all probability measures \(m \in \mathcal P (V^n)\). Then the Prékopa-Leindler inequality reads

$$\begin{aligned} \int e^h d\mu \ge \left( \int e^h e^{t R_t h} d\mu \right) ^{1-t} \left( \int e^{th} d\mu \right) ^t, \end{aligned}$$

which can be rewritten as

$$\begin{aligned} 1 \ge \left( \int e^{t R_t h} d\mu _h \right) ^{1/t} \left( \int e^{(t-1)h} d\mu _h\right) ^{1/(1-t)}, \end{aligned}$$

with \(d\mu _h=\frac{e^h}{\int e^h\,d\mu }\,d\mu .\) Letting \(t\) go to \(0\), we easily deduce (leaving some details to the reader) that,

$$\begin{aligned} \int \Bigg (\liminf _{t \rightarrow 0} R_t h\Bigg ) e^h d\mu \le \int e^h d\mu \log \int e^h d\mu \,. \end{aligned}$$

This can equivalently be written as

$$\begin{aligned} \mathrm{Ent }_\mu (e^h) \le \int \Bigg (h - \liminf _{t \rightarrow 0} R_t h\Bigg )e^h d\mu . \end{aligned}$$

We conclude using the following claim. \(\square \)

Claim 6.6

For all \(x \in \mathbb{R }\), we have

$$\begin{aligned} h(x)-\liminf _{t \rightarrow 0} R_t h(x) \le \frac{1}{4c} \sum _{i=1}^n \left[ \sum _{z\in N_i(x)} \left( h(x) - h(z) \right) \right] _+^2. \end{aligned}$$

Proof of Claim 6.6

By a Taylor expansion and by Proposition 5.1, for all \(x,y\in V^n\) ,

$$\begin{aligned} \int h(z)\nu _t^{x,y}(dz)&= \nu _t^{x,y}(h) = \nu _0^{x,y}(h)+t d(x,y) \nu _0^{x,y}\left( \nabla ^{x,y} h\right) + o(t) \\&= h(x)+t d(x,y) \nabla ^{x,y} h(x)+ o(t), \end{aligned}$$

with the quantity \(o(t)\) independent of \(y\) since \(h\) is bounded. Now, from the definition of the sets \(N_i(x)\), \(i\in \{1,\ldots ,n\}\) and using the identity (5.2), one has

$$\begin{aligned} \nabla ^{x,y} h(x)&= \frac{1}{|\Gamma (x,y)|} \sum _{\gamma \in \Gamma (x,y)}\left( h(\gamma _+(x))-h(x)\right) = \sum _{z\in V_n, z\sim x} \left( h(z)-h(x)\right) \frac{|\Gamma (x,z,y)|}{|\Gamma (x,y)|}\\&= \sum _{i=1}^n \sum _{z\in N_i(x)} \left( h(z)-h(x)\right) \frac{d(x_i,y_i)|\Gamma (x_i,z_i,y_i)|}{d(x,y)|\Gamma (x_i,y_i)|}. \end{aligned}$$

Therefore

$$\begin{aligned} h(x)- R_t h(x)&= \sup _m \left\{ \int \sum _{i=1}^n \sum _{z\in N_i(x)} \left( h(x)-h(z)\right) d(x_i,y_i)\frac{|\Gamma (x_i,z_i,y_i)|}{|\Gamma (x_i,y_i)|} \,m(dy)\right. \\&-\left. c \sum _{i=1}^n \left( \int d(x_i,y_i)\,m(dy) \right) ^2 \right\} +o(1)\\&\le \sum _{i=1}^n \sup _m \left\{ \left[ \sum _{z\in N_i(x)} \left( h(x)-h(z)\right) \right] _+ \int d(x_i,y_i) m(dy)\right. \\&-\left. c\left( \int d(x_i,y_i)\,m(dy) \right) ^2 \right\} +o(1)\\&\le \sum _{i=1}^n \sup _{v\ge 0} \left\{ v \left[ \sum _{z\in N_i(x)} \left( h(x)-h(z)\right) \right] _+ - cv^2 \right\} +o(1) \\&= \frac{1}{4c}\sum _{i=1}^n \left[ \sum _{z\in N_i(x)} \left( h(x)-h(z)\right) \right] _+^2+o(1). \end{aligned}$$

The claim follows by letting \(t\) go to 0. \(\square \)