1 Introduction

Given a random variable X of law \(\mu \) taking values on a metric space \((\Omega ,d_\Omega )\) and a function \(f:\Omega \rightarrow {\mathbb {R}}\), a concentration of measure inequality quantifies the probability that the random variable f(X) deviates from its mean or its median. Since the early age of the theory, concentration inequalities have seen many new methods, refinements and exciting applications to various areas of mathematics [1,2,3]. Among the different classes of concentration inequalities, Gaussian concentration is arguably the most standard one: the measure \(\mu \) is said to be sub-Gaussian if there exist constants \(K,\kappa >0\) such that, for all \(A\subseteq \Omega \) with \(\mu (A)\ge 1/2\) we have for any \(r\ge 0\)

$$\begin{aligned} \mu \left( \{x\in \Omega :\,d_\Omega (x,A)<r\} \right) \ge 1-Ke^{-\kappa r^2}\,. \end{aligned}$$
(1)

In her seminal work [4], Marton made the beautiful observation that the above behavior can be obtained as a consequence of a transportation cost inequality: if there exists \(c>0\) such that, for any probability measure \(\nu<\!<\mu \),

figure a

then (1) holds with constants \(\kappa =\frac{1}{c}\) and \(K=1\) for all \(r>\sqrt{c\ln 2}\). Here, \(S(\nu \Vert \mu )\) refers to the relative entropy between the measures \(\nu \) and \(\mu \), whereas the quantity \(W_1(\mu ,\nu )\) in (TC(c)) is the Wasserstein distance between the two measures \(\mu ,\nu \):

$$\begin{aligned} W_1(\mu ,\nu ):=\sup _{\Vert f\Vert _L\le 1}\,\big |{\mathbb {E}}_\mu [f]-{\mathbb {E}}_\nu [f]\big |\,. \end{aligned}$$

Later, Bobkov and Götze [5] proved that transportation cost inequalities are in fact equivalent to the property of sub-Gaussianity: more precisely, (TC(c)) holds if and only if for all Lipschitz functions \(f:\Omega \rightarrow {\mathbb {R}}\),

$$\begin{aligned} {\mathbb {P}}_\mu \Big (\big |f(X)-{\mathbb {E}}_\mu [f(X)]\big |>t \Big )\le 2\,e^{-\frac{t^2}{c}}\,,\forall t\ge 0\,. \end{aligned}$$

One of the main advantages of transportation cost inequalities is their tensorization property: assume that \(\mu \) satisfies \({\text {TC}}(c)\), then \(\mu ^{\otimes n}\) satisfies \({\text {TC}}(nc)\) for all \(n\in {\mathbb {N}}\), where the set \(\Omega ^n\) is provided with the metric

$$\begin{aligned} d_n(x^n,y^n):=\sum _{i=1}^n\,d_{\Omega }(x_i,y_i)\,. \end{aligned}$$

Perhaps the simplest example of that sort is given by taking \(\Omega ^n=[d]^n\) endowed with the Hamming distance \(d_H\). In the case \(n=1\), the corresponding Wasserstein distance reduces to the total variation, and \({\text {TC}}(1/2)\) holds for any measure \(\mu \), since it simply reduces to Pinsker’s inequality. For \(n\ge 1\), \(\mu ^{\otimes n}\) satisfies \({\text {TC}}(n/2)\).

While the theory of concentration inequalities for i.i.d. random variables is by now well understood, things become more challenging when the random variables are allowed to depend on each other [3, 6]. One way to extend concentration bounds to weakly dependent random variables is to assume that their joint law \(\mu \) satisfies the so-called Dobrushin uniqueness condition [6]. Dobrushin’s uniqueness condition plays an important role in the study of Gibbs measures in the one-phase region; however, it often turns out to be a very strong requirement on the measure \(\mu \). More recently, Marton gave an attempt at extending the i.i.d. theory beyond the mere Gibbs setting [7]. Her main result consists in a logarithmic Sobolev inequality for a generic measure \(\mu \)—well known to imply transportation cost inequalities—under the so-called Dobrushin–Shlosman mixing condition [8], the latter condition being weaker than Dobrushin’s uniqueness condition. As mentioned in [9], such paths to establish Gaussian concentration suffer from the difficulty of deriving explicit constants. Moreover, the result of [7] also relies on the crucial assumption that the measure \(\mu \) has full support.

Recently, concentration inequalities have attracted much attention in the communities of random matrix theory, quantum information theory and operator algebras [10,11,12,13,14,15,16,17,18]. In [14], a quantum Wasserstein distance of order 1 (or quantum \(W_1\) distance) was defined on the set of the quantum states of n qudits with the property that it strictly reduces to the classical Wasserstein distance on \([d]^n\) for states that are diagonal in the computational basis. This quantum generalization of the Wasserstein distance is based on the notion of neighboring states. Two quantum states of n qudits are neighboring if they differ only in one qudit, i.e., if they coincide after that qudit is discarded. The quantum \(W_1\) distance is then that induced by the maximum norm that assigns distance at most one to every couple of neighboring states [14, Definition 4]. Such norm is called quantum \(W_1\) norm and is denoted with \(\left\| \cdot \right\| _{W_1}\). The quantum \(W_1\) norm proposed in Ref. [14] admits a dual formulation in terms of a quantum generalization of the Lipschitz constant. Denoting with \(\mathcal {O}_n\) the set of the observables of n qudits, the Lipschitz constant of the observable \(H\in \mathcal {O}_n\) is defined as [14, Sect. V]

$$\begin{aligned} \Vert H\Vert _L:&=2\max _{i\in [n]}\min \left\{ \left\| H- H^{(i)}\right\| _\infty :H^{(i)}\right. \nonumber \\&\left. \in \mathcal {O}_n\text { does not act on the}~ i\text {-th qudit}\right\} \,. \end{aligned}$$
(2)

Then, the quantum \(W_1\) distance between the states \(\rho \) and \(\omega \) can also be expressed as [14, Sect. V]

$$\begin{aligned} \left\| \rho -\omega \right\| _{W_1} =\max \left\{ \mathrm {Tr}\left[ \left( \rho -\omega \right) H\right] :\,H\in \mathcal {O}_n,\,\Vert H\Vert _L\le 1\right\} \,. \end{aligned}$$
(3)

Moreover, in [14] it was showed that \({\text {TC}}(n/2)\) holds for any tensor product \(\omega =\omega _1\otimes \cdots \otimes \omega _n\) of quantum states, hence extending Marton’s original inequality with the exact same constant: for any state \(\rho \) of n qudits,

$$\begin{aligned} \left\| \rho -\omega _1\otimes \cdots \otimes \omega _n\right\| _{W_1}\le \sqrt{\frac{n}{2}\,S(\rho \Vert \omega _1\otimes \cdots \otimes \omega _n)}\,, \end{aligned}$$

where \(S(\rho \Vert \omega ):=\mathrm {Tr}[\rho \,(\ln \rho -\ln \omega )]\) denotes Umegaki’s relative entropy between the states \(\rho \) and \(\omega \).

Main results: In this paper, we prove that any of the following conditions implies a transportation cost inequality:

  1. (i)

    A non-commutative Dobrushin uniqueness condition (Sect. 3);

  2. (ii)

    A generalization of Ollivier’s coarse Ricci curvature bound (Sect. 4);

  3. (iii)

    A modified logarithmic Sobolev inequality condition (Sect. 5);

  4. (iv)

    A condition of local indistinguishability of the state (Sect. 6).

Each of these methods comes with its strengths and weaknesses:

  1. (i)

    The non-commutative Dobrushin uniqueness condition implies a nontrivial TCI at any temperature (see Remark 1), but the scaling of the constant c with the number of subsystems is optimal only in one dimension (see Remark 2).

  2. (ii)

    The coarse Ricci curvature bound provides TC inequalities for essentially any geometry, but it is only valid above a threshold temperature that depends on the locality of the Hamiltonian. Furthermore, such threshold temperature is in practice strictly larger than the true critical temperature (see Proposition 9).

  3. (iii)

    Quantum modified logarithmic Sobolev inequalities are typically more difficult to prove than their classical counterparts and are currently only proven to hold in specific cases. However, for one-dimensional systems, a recently derived modified logarithmic Sobolev inequality [19, 20] provides us with TC (up to polylogarithmic overhead) at any positive temperature.

  4. (iv)

    The condition of local indistinguishability of the state for regular lattices. Although the condition can be checked for classical systems, we do not yet have a way to prove it in the quantum setting.

We conclude the article with two natural applications of our bounds. First, we derive Gaussian concentration bounds for a large class of Lipschitz observables whenever the state \(\omega \) is that of a commuting Hamiltonian at large enough temperature (Sect. 7). Second, we argue on the use of the transportation cost inequality in proving the equivalence between the microcanonical and the canonical ensembles and an exponential improvement over the weak Eigenstate Thermalization Hypothesis (Sect. 8).

2 Notations and Basic Definitions

Given a finite set V, we denote by \(\mathcal {H}_V=\bigotimes _{v\in V}\mathcal {H}_v\) the Hilbert space of \(n=|V|\) qudits (i.e., \(\mathcal {H}_v\equiv {\mathbb {C}}^d\) for all \(v\in V\)) and by \(\mathcal {B}_V\) the algebra of linear operators on \(\mathcal {H}_V\). \(\mathcal {O}_V\) corresponds to the space of self-adjoint linear operators on \(\mathcal {H}_V\), whereas \(\mathcal {O}^T_V\subset \mathcal {O}_V\) is the subspace of traceless self-adjoint linear operators. \(\mathcal {O}_V^+\) denotes the cone of positive semidefinite linear operators on \(\mathcal {H}_V\) and \(\mathcal {S}_V\subset \mathcal {O}_V^+\) denotes the set of quantum states. We denote by \(\mathcal {P}_V\) the set of probability measures on \([d]^V\). For any subset \(A\subseteq V\), we use the standard notations \(\mathcal {O}_A, \mathcal {S}_A\ldots \) for the corresponding objects defined on subsystem A. Given a state \(\rho \in \mathcal {S}_V\), we denote by \(\rho _A\) its marginal onto the subsystem A. For any \(X\in \mathcal {O}_V\), we denote by \(\Vert X\Vert _1\) its trace norm. The identity on \(\mathcal {O}_{v}\), \(v\in V\), is denoted by \({\mathbb {I}}_v\).

Given two states \(\rho ,\,\omega \in \mathcal {S}_V\) such that \({\text {supp}}(\rho )\subseteq {\text {supp}}(\omega )\), their quantum relative entropy is defined as [21,22,23]

$$\begin{aligned} S(\rho \Vert \omega )=\mathrm {Tr}\big [\rho \,(\ln \rho -\ln \omega ) \big ]\,. \end{aligned}$$
(4)

Whenever \(\rho =\rho _{AB}\) is a bipartite state and \(\omega =\rho _A\otimes \rho _B\), their relative entropy reduces to the mutual information

$$\begin{aligned} I(A;B)_\rho :=S(\rho _{AB}\Vert \rho _A\otimes \rho _B)\,. \end{aligned}$$
(5)

In the next sections, we also utilize the measured relative entropy [24,25,26,27]

$$\begin{aligned} S_{{\mathbb {M}}}(\rho \Vert \omega ):=\sup _{(\mathcal {X},M)}\,S(P_{\rho ,M}\Vert P_{\sigma ,M})\,, \end{aligned}$$
(6)

where the supremum above is over all positive operator valued measures M that map the input quantum state to a probability distribution on a finite set \(\mathcal {X}\) with probability mass function given by \(P_{\rho ,M}(x)=\mathrm {Tr}\rho M(x)\).

In this paper, we study inequalities relating the \(W_1\) distance between two states to their relative entropy. More precisely, for a fixed state \(\omega \in \mathcal {S}_V\), we are interested in upper bounding the best constant \(C(\omega )>0\) such that, for all \(\rho \in \mathcal {S}_V\) with \({\text {supp}}(\rho )\subseteq {\text {supp}}(\omega )\),

$$\begin{aligned} \left\| \rho -\omega \right\| _{W_1}\le \,\sqrt{C(\omega )\,S(\rho \Vert \omega )}\,. \end{aligned}$$
(7)

In general, given a constant \(c\ge C(\omega )\), we refer to the above inequality for \(C(\omega )\) replaced by c as a transportation cost inequality, denoted by \({\text {TC}}(c)\). As mentioned in the introduction, the following holds [14, Theorem 2]:

Proposition 1

For any product state \(\omega \in \mathcal {S}_V\),

$$\begin{aligned} C(\omega ) \le \frac{|V|}{2}\,. \end{aligned}$$
(8)

In the next sections, we aim at recovering the linear dependence of the constant \(C(\omega )\) on the size \(n=|V|\) of the system under various measures of independence.

We will need the following properties of the quantum \(W_1\) distance:

Proposition 2

[14, Proposition 2]. The quantum \(W_1\) distance coincides with the trace distance for quantum states that differ in only one site, i.e., for any \(X\in \mathcal {O}_V^T\) such that \({\mathrm {Tr}}_vX = 0\) for some \(v\in V\) we have

$$\begin{aligned} \left\| X\right\| _{W_1} = \frac{1}{2}\left\| X\right\| _1\,. \end{aligned}$$
(9)

Proposition 3

[14, Proposition 5]. The quantum \(W_1\) distance between two quantum states that differ only in the region \(A\subseteq V\) is at most \(2\left| A\right| \) times their trace distance, i.e., for any \(X\in \mathcal {O}_V^T\) such that \({\mathrm {Tr}}_AX=0\) we have

$$\begin{aligned} \left\| X\right\| _{W_1} \le \left| A\right| \left\| X\right\| _1\,. \end{aligned}$$
(10)

Proposition 4

(Tensorization [14, Proposition 4]). The quantum \(W_1\) distance is additive with respect to the tensor product, i.e., let \(A,\,B\) be disjoint subsets of V. Then, for any \(\rho _A,\,\sigma _A\in \mathcal {S}_A\) and any \(\rho _B,\,\sigma _B\in \mathcal {S}_B\) we have

$$\begin{aligned} \left\| \rho _A\otimes \rho _B - \sigma _A\otimes \sigma _B\right\| _{W_1} = \left\| \rho _A - \sigma _A\right\| _{W_1} + \left\| \rho _B - \sigma _B\right\| _{W_1}\,. \end{aligned}$$
(11)

Proposition 5

[14, Proposition 13]. Let \(\Phi :\mathcal {O}_V\rightarrow \mathcal {O}_V\) be a quantum channel. For any \(v\in V\), let \(A_v\subseteq V\) be the light-cone of the site v, i.e., the minimum subset of V such that \({\mathrm {Tr}}_{A_i}\Phi (X) = 0\) for any \(X\in \mathcal {O}_V\) such that \({\mathrm {Tr}}_vX=0\). Then, \(\Phi \) can expand the quantum \(W_1\) distance by at most twice the size of the largest light-cone, i.e., for any \(X\in \mathcal {O}_V^T\) we have

$$\begin{aligned} \left\| \Phi (X)\right\| _{W_1} \le 2\max _{v\in V}\left| A_v\right| \left\| X\right\| _{W_1}\,. \end{aligned}$$
(12)

Proposition 6

[14, Proposition 15]. For any \(H\in \mathcal {O}_V\) and any \(v\in V\), we have

$$\begin{aligned} \left\| H - {\mathbb {I}}_v\otimes \frac{1}{d}\,{\mathrm {Tr}}_vH\right\| _\infty \le \left\| H\right\| _L\,. \end{aligned}$$
(13)

Proposition 7

[14, Corollary 1]. For any \(\rho ,\,\sigma \in \mathcal {S}_V\),

$$\begin{aligned} \left\| \rho - \sigma \right\| _{W_1} \ge \frac{1}{2}\sum _{v\in V}\left\| \rho _v - \sigma _v\right\| _1\,, \end{aligned}$$
(14)

and equality holds whenever both \(\rho \) and \(\sigma \) are product states.

Theorem 1

(\(W_1\) continuity of the entropy [14, Theorem 1]). For any \(\rho ,\,\sigma \in \mathcal {S}_V\), we have

$$\begin{aligned} \left| S(\rho ) - S(\sigma )\right| \le g\left( \left\| \rho -\sigma \right\| _{W_1}\right) + \left\| \rho -\sigma \right\| _{W_1}\ln \left( d^2\left| V\right| \right) \,, \end{aligned}$$
(15)

where for any \(t\ge 0\)

$$\begin{aligned} g(t) = \left( t+1\right) \ln \left( t+1\right) - t\ln t\,. \end{aligned}$$
(16)

3 Dobrushin Uniqueness Condition

In this section, we consider a spin chain and prove the transportation cost inequality under a quantum generalization of Dobrushin’s uniqueness condition [6]. Such condition is formulated in terms of the conditional probability distributions of the state of a subset of V conditioned on the state of a second disjoint subset of V. Therefore, formulating a quantum version of Dobrushin’s uniqueness condition requires a quantum counterpart of the conditional probability distribution. In the classical setting, given two random variables X and Y taking values in finite sets and with joint probability distribution \(\omega _{XY}\), the conditional probability distribution \(\omega _{Y|X}\) of Y given X with probability mass function

$$\begin{aligned} \omega _{Y|X=x}(y) = \frac{\omega _{XY}(x,y)}{\omega _X(x)} \end{aligned}$$
(17)

represents the knowledge that we have on Y when we know only the value of X. We can associate with such conditional distribution the stochastic map \(\Phi _{X\rightarrow XY}\) that has as input a probability distribution \(p_X\) for X and as output the joint probability distribution of XY with probability mass function given by

$$\begin{aligned} \Phi _{X\rightarrow XY}(p_X)(x,y) = \omega _{Y|X=x}(y)\,p_X(x) = \frac{\omega _{XY}(x,y)}{\omega _X(x)}\,p_X(x)\,. \end{aligned}$$
(18)

In the quantum setting, we consider a bipartite quantum system AB and a joint quantum state \(\omega _{AB}\) of AB. The quantum counterpart of the stochastic map (18) is called quantum recovery map [28, 29], and its action on a quantum state \(\rho _A\) of A is

$$\begin{aligned} \Phi _{A\rightarrow AB}(\rho _A) = \int _{\mathbb {R}}\omega _{AB}^{\frac{1-it}{2}}\,\omega _A^\frac{it-1}{2}\,\rho _A\,\omega _A^{-\frac{1+it}{2}}\,\omega _{AB}^{\frac{1+it}{2}}\,\mathrm{d}\mu _0(t)\,, \end{aligned}$$
(19)

where \(\mu _0\) is the probability distribution on \({\mathbb {R}}\) with density

$$\begin{aligned} \mathrm{d}\mu _0(t) = \frac{\pi \,\mathrm{d}t}{2\left( \cosh (\pi t)+1\right) }\,. \end{aligned}$$
(20)

We stress that (19) reduces to (18) whenever \(\rho _A\), \(\omega _A\) and \(\omega _{AB}\) commute. If A is in the state \(\omega _A\), the recovery map \(\Phi _{A\rightarrow AB}\) recovers the joint state \(\omega _{AB}\), i.e., \(\Phi _{A\rightarrow AB}(\omega _A) = \omega _{AB}\). The relevance of the recovery map comes from the recoverability theorem [29], which states that \(\Phi _{A\rightarrow AB}\) can recover a generic joint state \(\rho _{AB}\) from its marginal \(\rho _A\) if removing the subsystem B does not significantly decrease the relative entropy between \(\rho \) and \(\omega \). More precisely, for any quantum state \(\rho _{AB}\) of AB we have

$$\begin{aligned} S(\rho _{AB}\Vert \omega _{AB}) - S(\rho _A\Vert \omega _A) \ge S_{{\mathbb {M}}}(\rho _{AB}\Vert \Phi _{A\rightarrow AB}(\rho _A))\,. \end{aligned}$$
(21)

We consider the setting where V is partitioned as

$$\begin{aligned} V = A_1\sqcup \cdots \sqcup A_m\,. \end{aligned}$$
(22)

For any \(i\in [m]\), we denote with \(A_1^i\) the union \(A_1\sqcup \cdots \sqcup A_i\). The recoverability theorem implies the following Lemma 1, which we will employ several times:

Lemma 1

For any \(\rho ,\,\omega \in \mathcal {S}_V\), we have

$$\begin{aligned}&S(\rho \Vert \omega ) \ge \frac{1}{2m}\left( \sum _{i=1}^m\left\| \rho _{A_1^i} - \Phi _{A_1^{i-1}\rightarrow A_1^i}(\rho _{A_1^i})\right\| _1\right) ^2\,, \end{aligned}$$
(23)
$$\begin{aligned}&\left( \frac{1}{2m}\sum _{i=1}^m\left\| \rho _{A_1^i} - \Phi _{A_1^{i-1}\rightarrow A_1^i}(\rho _{A_1^i})\right\| _1\right) ^2 \le 1-\exp \left( -\frac{S(\rho \Vert \omega )}{m}\right) \,, \end{aligned}$$
(24)

where \(\Phi _{A_1^{i-1}\rightarrow A_1^i}\) are the recovery maps associated with \(\omega \).

Proof

Eq. (19) and Pinsker’s inequality imply for any \(i\in [m]\)

$$\begin{aligned}&S(\rho _{A_1^i}\Vert \omega _{A_1^i}) - S(\rho _{A_1^{i-1}}\Vert \omega _{A_1^{i-1}}) \ge S_{\mathbb {M}}\left( \rho _{A_1^i}\left\| \Phi _{A_1^{i-1}\rightarrow A_1^i}(\rho _{A_1^{i-1}})\right. \right) \nonumber \\&\quad \ge \frac{1}{2}\left\| \rho _{A_1^i} - \Phi _{A_1^{i-1}\rightarrow A_1^i}(\rho _{A_1^{i-1}})\right\| _1^2\,. \end{aligned}$$
(25)

Summing (25) over i and using the convexity of the square function yields

$$\begin{aligned} S(\rho \Vert \omega )&\ge \frac{1}{2}\sum _{i=1}^m\left\| \rho _{A_1^i} - \Phi _{A_1^{i-1}\rightarrow A_1^i}(\rho _{A_1^{i-1}})\right\| _1^2 \nonumber \\&\ge \frac{1}{2m}\left( \sum _{i=1}^m\left\| \rho _{A_1^i} - \Phi _{A_1^{i-1}\rightarrow A_1^i}(\rho _{A_1^i})\right\| _1\right) ^2\,. \end{aligned}$$
(26)

The claim (23) follows.

With an analogous proof, applying the improved Pinsker’s inequality

$$\begin{aligned} \frac{1}{4}\left\| \sigma - \tau \right\| _1^2 \le 1 - e^{-S_{\mathbb {M}}(\sigma \Vert \tau )} \end{aligned}$$
(27)

and the convexity of the function \(t\mapsto -\ln \left( 1-t^2\right) \), we get

$$\begin{aligned} S(\rho \Vert \omega ) \ge -m\ln \left( 1-\left( \frac{1}{2m}\sum _{i=1}^m\left\| \rho _{A_1^i} - \Phi _{A_1^{i-1}\rightarrow A_1^i}(\rho _{A_1^i})\right\| _1\right) ^2\right) \,. \end{aligned}$$
(28)

The claim (24) follows. \(\square \)

The following property of the recovery map will be fundamental:

Lemma 2

Let \(\omega _{ABC}\) be a joint state of the tripartite quantum system ABC. Let us assume that \(\omega _{ABC}\) is Markovian, i.e.,

$$\begin{aligned} I(A;C|B)_\omega = 0\,. \end{aligned}$$
(29)

Then, the recovery map \(\Phi _{AB\rightarrow ABC}\) associated with \(\omega _{ABC}\) does not act on the subsystem A.

Proof

From the characterization of the states that saturate the strong subadditivity [30], the Hilbert space \(\mathcal {H}_B\) of B has a decomposition

$$\begin{aligned} \mathcal {H}_B = \bigoplus _{i=1}^k \mathcal {H}_{B_i^L}\otimes \mathcal {H}_{B_i^R}\,, \end{aligned}$$
(30)

where the Hilbert spaces \(\left\{ \mathcal {H}_{B_i^L}\right\} _{i=1}^k\) and \(\left\{ \mathcal {H}_{B_i^L}\right\} _{i=1}^k\) are pairwise orthogonal, and \(\omega _{ABC}\) can be expressed as

$$\begin{aligned} \omega _{ABC} = \bigoplus _{i=1}^k p_i\,\omega _{AB_i^L}^{(i)}\otimes \omega _{B_i^RC}^{(i)}\,, \end{aligned}$$
(31)

where p is a probability distribution on [k], and each \(\omega _{AB_i^L}^{(i)}\) or \(\omega _{B_i^RC}^{(i)}\) is a quantum state with support in the corresponding \(\mathcal {H}_A\otimes \mathcal {H}_{B_i^L}\) or \(\mathcal {H}_{B_i^R}\otimes \mathcal {H}_C\). We have for any quantum state \(\rho _{AB}\) of AB

$$\begin{aligned} \Phi _{AB\rightarrow ABC}(\rho _{AB}) = \int _{\mathbb {R}}\omega _{ABC}^{\frac{1-it}{2}}\,\omega _{AB}^\frac{it-1}{2}\,\rho _{AB}\,\omega _{AB}^{-\frac{1+it}{2}}\,\omega _{ABC}^{\frac{1+it}{2}}\,\mathrm{d}\mu _0(t)\,. \end{aligned}$$
(32)

We have for any \(z\in {\mathbb {C}}\) that

$$\begin{aligned} \omega _{ABC}^z\,\omega _{AB}^{-z} = \bigoplus _{i=1}^k\left( \omega _{B_i^RC}^{(i)}\right) ^z\left( \omega _{B_i^R}^{(i)}\right) ^{-z} \end{aligned}$$
(33)

does not act on A, and the claim follows choosing \(z = \left( 1-it\right) /2\). \(\square \)

3.1 Markovian Case

In this subsection, we assume that \(\omega \in \mathcal {S}_V\) is a one-dimensional quantum Markov state. More precisely, let \(\left\{ A_1,\,\ldots ,\,A_m\right\} \) be a partition of V and let \(K = \max \left( \left| A_1\right| ,\,\ldots ,\,\left| A_m\right| \right) \). Then, we assume that

$$\begin{aligned} I(A_i;A_1^{i-2}|A_{i-1})_\omega = 0 \end{aligned}$$
(34)

for any \(i=3,\,\ldots ,\,m\). For any \(i\in [m]\), let \(\Phi _i\) be the recovery map (19) associated with \(\omega _{A_1^i}\) that recovers \(A_i\) from \(A_1^{i-1}\). From Lemma 2, \(\Phi _i\) acts only on \(A_{i-1}\), i.e., it is a map \(\Phi _i:\mathcal {O}_{A_{i-1}}\rightarrow \mathcal {O}_{A_{i-1}A_i}\). We also define

$$\begin{aligned} {\tilde{\Phi }}_i = {\mathrm {Tr}}_{A_{i-1}}\circ \Phi _i:\mathcal {O}_{A_{i-1}}\rightarrow \mathcal {O}_{A_i}\,. \end{aligned}$$
(35)

We can now state the main result of this section:

Theorem 2

Let us assume that for any \(i\in [m]\), \({\tilde{\Phi }}_i\) is a contraction with respect to the trace norm for all the couples of quantum states of \(A_1^{i-1}\) that differ only on the subsystem \(A_{i-1}\), i.e., that coincide after discarding \(A_{i-1}\). More precisely, we assume that there exists \(0\le \eta <1\) such that for any \(i\in [m]\) and any \(X\in \mathcal {O}_{A_1^{i-1}}^T\) with \({\mathrm {Tr}}_{A_{i-1}}X=0\) we have

$$\begin{aligned} \left\| {\tilde{\Phi }}_i(X)\right\| _1 \le \eta \left\| X\right\| _1\,. \end{aligned}$$
(36)

Then, we have

$$\begin{aligned} C(\omega ) \le 2m\,K^2\left( \frac{1}{1-\eta }+1\right) ^2\,. \end{aligned}$$
(37)

Furthermore, for any \(\rho \in \mathcal {S}_V\) we have

$$\begin{aligned} \left\| \rho - \omega \right\| _{W_1} \le K\left( \frac{1}{1-\eta }+1\right) 2m\sqrt{1-e^{-\frac{S(\rho \Vert \omega )}{m}}}\,. \end{aligned}$$
(38)

Proof

Let \(\rho \in \mathcal {S}_V\). On the one hand, we have from Lemma 3

$$\begin{aligned} \left\| \rho - \omega \right\| _{W_1}&= \left\| \sum _{i=1}^m(\Phi _m\circ \cdots \circ \Phi _{i+1})(\rho _{A_1^i} - \Phi _i(\rho _{A_1^{i-1}}))\right\| _{W_1}\nonumber \\&\le \sum _{i=1}^m\left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})(\rho _{A_1^i} - \Phi _i(\rho _{A_1^{i-1}}))\right\| _{W_1}\nonumber \\&\le K\left( \frac{1}{1-\eta }+1\right) \sum _{i=1}^m\left\| \rho _{A_1^i} - \Phi _i(\rho _{A_1^{i-1}})\right\| _1\,. \end{aligned}$$
(39)

On the other hand, we have from (23) of Lemma 1

$$\begin{aligned} S(\rho \Vert \omega ) \ge \frac{1}{2m}\left( \sum _{i=1}^m\left\| \rho _{A_1^i} - \Phi _i(\rho _{A_1^{i-1}})\right\| _1\right) ^2\,, \end{aligned}$$
(40)

and the claim (37) follows. The claim (38) follows by employing (24) in place of (23). \(\square \)

Remark 1

Condition (36) holds for some \(\eta <1\) iff \({\tilde{\Phi }}_i\) strictly decreases the trace distance between any two quantum states that differ only in the subsystem \(A_i\) on which \({\tilde{\Phi }}_i\) acts, i.e., that coincide after discarding \(A_i\). We expect this condition to hold for any strictly positive temperature.

Remark 2

An example of quantum state satisfying (34) is a Gibbs state of a nearest-neighbor Hamiltonian on the D-dimensional cubic lattice \(\Lambda = [L]^D\), where \(x,\,y\in \Lambda \) are neighbors iff \(\left\| x-y\right\| _1=1\). We can then choose \(m = L + 1\) and

$$\begin{aligned} A_i = \left\{ x\in \Lambda : x_1 = i\right\} \,,\qquad i=0,\,\ldots ,\,L\,, \end{aligned}$$
(41)

with

$$\begin{aligned} K = \left( L+1\right) ^{D-1}\,, \end{aligned}$$
(42)

and get from Theorem 2

$$\begin{aligned} C(\omega ) \le 2\left( L+1\right) ^{2D-1}\left( \frac{1}{1-\eta }+1\right) ^2 = 2\left| V\right| ^\frac{2D-1}{D}\left( \frac{1}{1-\eta }+1\right) ^2\,. \end{aligned}$$
(43)

We stress that, assuming that \(\eta \) remains bounded away from 1, we get \(C(\omega ) = O(|V|)\) iff \(D=1\), i.e., for one-dimensional systems.

Remark 3

We can choose

$$\begin{aligned} \eta&= \max \left\{ \left\| {\tilde{\Phi }}_i(X) - \omega _{A_i}\otimes {\mathrm {Tr}}_{A_{i-1}}X\right\| _1:i\in [m]\,,\;X\in \mathcal {O}_{A_1^{i-1}}\,,\;\left\| X\right\| _1=1\right\} \nonumber \\&\le \max _{i\in [m]}\left\| {\tilde{\Phi }}_i - \omega _{A_i}\otimes {\mathrm {Tr}}_{A_{i-1}}\right\| _\diamond \,, \end{aligned}$$
(44)

where \(\omega _{A_i}\otimes {\mathrm {Tr}}_{A_{i-1}}:\mathcal {O}_{A_{i-1}}\rightarrow \mathcal {O}_{A_i}\) is the quantum channel that replaces the input quantum state with \(\omega _{A_i}\) and

$$\begin{aligned} \left\| \Phi \right\| _\diamond = \sup \left\{ \left\| (\Phi \otimes {\mathbb {I}}_{\mathcal {B}(\mathcal {H})})(X)\right\| _1:X\in \mathcal {B}(\mathcal {H}^{\otimes 2})\,,\;\left\| X\right\| _1=1\right\} \end{aligned}$$
(45)

denotes the diamond norm of the linear map \(\Phi \) on \(\mathcal {B}(\mathcal {H})\).

Proposition 8

Let \(\omega \in \mathcal {S}_V\) satisfy (34), and assume

$$\begin{aligned} a = \max _{i\in [m-1]} S_\infty (\omega _{A_i}\otimes \omega _{A_{i+1}}\Vert \omega _{A_i A_{i+1}})<\frac{1}{2}\,, \end{aligned}$$
(46)

where

$$\begin{aligned} S_\infty (\rho \Vert \sigma ) = \ln \inf \left\{ \lambda \in {\mathbb {R}} : \rho \le \lambda \,\sigma \right\} \end{aligned}$$
(47)

denotes the quantum max-divergence [31] between the quantum states \(\rho \) and \(\sigma \). Then, we can choose in (36)

$$\begin{aligned} \eta = \sqrt{2\,a}\,. \end{aligned}$$
(48)

Proof

From Remark 1, we can choose

$$\begin{aligned} \eta= & {} \max _{i\in [m]}\max _{|\psi _i\rangle }\left\| {\tilde{\Phi }}_i(|\psi _i\rangle \langle \psi _i|) - \omega _{A_i}\otimes {\mathrm {Tr}}_{A_{i-1}}|\psi _i\rangle \langle \psi _i|\right\| _1 \nonumber \\\le & {} \max _{i\in [m]}\max _{|\psi _i\rangle }\left\| \Phi _i(|\psi _i\rangle \langle \psi _i|) - \omega _{A_i}\otimes |\psi _i\rangle \langle \psi _i|\right\| _1\,, \end{aligned}$$
(49)

where each \(|\psi _i\rangle \) is a unit vector in \(\mathcal {H}_{A_1^{i-1}}\). We have from Pinsker’s inequality

$$\begin{aligned} \left\| \Phi _i(|\psi _i\rangle \langle \psi _i|) - |\psi _i\rangle \langle \psi _i|\otimes \omega _{A_i}\right\| _1 \le \sqrt{2\,S_{{\mathbb {M}}}(|\psi _i\rangle \langle \psi _i|\otimes \omega _{A_i}\Vert \Phi _i(|\psi _i\rangle \langle \psi _i|))}\,. \end{aligned}$$
(50)

(46) implies

$$\begin{aligned} \ln \omega _{A_{i-1}A_i} \ge \ln \omega _{A_{i-1}} + \ln \omega _{A_i} - a\,. \end{aligned}$$
(51)

From the characterization of the states that saturate the strong subadditivity [30], we get

$$\begin{aligned} \ln \omega _{A_1^{i-1}} + \ln \omega _{A_{i-1}A_i} = \ln \omega _{A_{i-1}} + \ln \omega _{A_1^i}\,, \end{aligned}$$
(52)

therefore, (51) can be rewritten as

$$\begin{aligned} \ln \omega _{A_1^i} \ge \ln \omega _{A_1^{i-1}} + \ln \omega _{A_i} - a\,. \end{aligned}$$
(53)

Choosing in (25) \(\rho _{A_1^i} = |\psi _i\rangle \langle \psi _i|\otimes \omega _{A_i}\) we get with the help of (53)

$$\begin{aligned}&S_{{\mathbb {M}}}(|\psi _i\rangle \langle \psi _i|\otimes \omega _{A_i}\Vert \Phi _i(|\psi _i\rangle \langle \psi _i|)) \nonumber \\\le & {} \langle \psi _i|\left( \ln \omega _{A_1^{i-1}} - {\mathrm {Tr}}_{A_i}\left[ \omega _{A_i}\ln \omega _{A_1^i} \right] \right) |\psi _i\rangle - S(A_i)_\omega \le a\,, \end{aligned}$$
(54)

and the claim follows. \(\square \)

Remark 4

Condition (36) is reminiscent of the so-called Dobrushin uniqueness condition (see [6, Theorem 4]).

3.2 Non-Markovian States

Here, we prove an alternative version of Theorem 2 where the Markov condition (34) is replaced by exponential decay of correlations.

Theorem 3

Let \(V=[n]\) be a one-dimensional lattice, and let \(\omega \in \mathcal {S}_V\). For any \(i\in [n]\), let \(\Phi _i\) be the recovery map associated with \(\omega _{1\ldots i}\) that recovers the site i from the sites \(1\ldots i-1\). We assume that \(\omega \) has exponentially decaying correlations, in the sense that there exist \(C\ge 0\) and \(0\le \eta <1\) such that for any \(i\in [n]\), any \(k=0,\,\ldots ,\,\max (i,\,n-i)\) and any \(\tau \in \mathcal {S}_{1\ldots i}\),

$$\begin{aligned} \left\| {\mathrm {Tr}}_{i-k+1\ldots i+k}(\Phi _n\circ \cdots \circ \Phi _{i+1})(\tau _{1\ldots i}) - \tau _{1\ldots i-k}\otimes \omega _{i+k+1\ldots n}\right\| _1 \le C\,\eta ^k\,. \end{aligned}$$
(55)

We also assume that for any \(i\in [n]\), any \(k=0,\,\ldots ,\,i-1\) and any \(\tau \in \mathcal {S}_{1\ldots i-1}\)

$$\begin{aligned} \left\| {\mathrm {Tr}}_{i-k\ldots i}\Phi _i(\tau _{1\ldots i-1}) - \tau _{1\ldots i-k-1}\right\| _1 \le C\,\eta ^k\,. \end{aligned}$$
(56)

Then,

$$\begin{aligned} C(\omega ) \le 8\,n\left( 2+\frac{C+1}{1-\eta }-\frac{\ln \left( C^2\,n\right) }{2\ln \eta }\right) ^2\,. \end{aligned}$$
(57)

Proof

From (23) of Lemma 1, we have for any \(\rho \in \mathcal {S}_V\)

$$\begin{aligned} S(\rho \Vert \omega ) \ge \frac{1}{2n}\left( \sum _{i=1}^n\left\| \rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1})\right\| _1\right) ^2\,. \end{aligned}$$
(58)

We have

$$\begin{aligned} \left\| \rho - \omega \right\| _{W_1}&= \left\| \sum _{i=1}^n(\Phi _n\circ \cdots \circ \Phi _{i+1})(\rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1}))\right\| _{W_1}\nonumber \\&\le \sum _{i=1}^n\left\| (\Phi _n\circ \cdots \circ \Phi _{i+1})(\rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1}))\right\| _{W_1}\,. \end{aligned}$$
(59)

For any \(i\in [n]\), we have from Lemma 5

$$\begin{aligned}&\left\| (\Phi _n\circ \cdots \circ \Phi _{i+1})(\rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1}))\right\| _{W_1}\nonumber \\&\quad \le 2\sum _{k=0}^{\max \{i,\,n-i\}}\left\| {\mathrm {Tr}}_{i-k+1\ldots i+k}(\Phi _n\circ \cdots \circ \Phi _{i+1})(\rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1}))\right\| _1\nonumber \\&\quad \le 2\sum _{k=0}^{\max \{i,\,n-i\}}\left( \left\| \rho _{1\ldots i-k} - {\mathrm {Tr}}_{i-k+1\ldots i}\Phi _i(\rho _{1\ldots i-1}))\right\| _1\right. \nonumber \\&\qquad \left. + C\,\eta ^k\left\| \rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1}))\right\| _1\right) \,. \end{aligned}$$
(60)

We have for any \(k_0\in \left\{ 0,\,\ldots ,\,i-1\right\} \)

$$\begin{aligned}&\sum _{k=0}^{i-1}\left\| \rho _{1\ldots i-k} - {\mathrm {Tr}}_{i-k+1\ldots i}\Phi _i(\rho _{1\ldots i-1})\right\| _1\nonumber \\&\quad \le \sum _{k=0}^{k_0}\left\| \rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1})\right\| _1 + \sum _{k=k_0+1}^{i-1}\left\| \rho _{1\ldots i-k} - \omega _{1\ldots i-k} \right. \nonumber \\&\quad \quad \left. - {\mathrm {Tr}}_{i-k+1\ldots i}\Phi _i(\rho _{1\ldots i-k} - \omega _{1\ldots i-k})\right\| _1\nonumber \\&\quad \le \left( k_0+1\right) \left\| \rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1})\right\| _1 + C\sum _{k=k_0+1}^{i-1}\eta ^{k-1}\left\| \rho _{1\ldots i-k} - \omega _{1\ldots i-k}\right\| _1\nonumber \\&\quad \le \left( k_0+1\right) \left\| \rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1})\right\| _1 + \frac{C\,\eta ^{k_0}}{1-\eta }\left\| \rho -\omega \right\| _1\,, \end{aligned}$$
(61)

therefore

$$\begin{aligned}&\left\| (\Phi _n\circ \cdots \circ \Phi _{i+1})(\rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1}))\right\| _{W_1}\nonumber \\&\quad \le 2\left( \left( k_0+1+\frac{C}{1-\eta }\right) \left\| \rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1})\right\| _1 + \frac{C\,\eta ^{k_0}}{1-\eta }\left\| \rho -\omega \right\| _1\right) \,, \end{aligned}$$
(62)

and

$$\begin{aligned} \left\| \rho - \omega \right\| _{W_1}&\le 2\left( \left( k_0+1+\frac{C}{1-\eta }\right) \sum _{i=1}^n\left\| \rho _{1\ldots i} - \Phi _i(\rho _{1\ldots i-1})\right\| _1 + \frac{n\,C\,\eta ^{k_0}}{1-\eta }\left\| \rho -\omega \right\| _1\right) \nonumber \\&\quad \le 2\left( k_0+1+C\,\frac{1+\eta ^{k_0}\sqrt{n}}{1-\eta }\right) \sqrt{2\,n\,S(\rho \Vert \omega )}\,, \end{aligned}$$
(63)

and the claim follows choosing

$$\begin{aligned} k_0 = \left\lceil -\frac{\ln \left( C^2\,n\right) }{2\ln \eta }\right\rceil \,. \end{aligned}$$
(64)

\(\square \)

3.3 Auxiliary Lemmas

Lemma 3

Under the same hypotheses of Theorem 2, for any \(i\in [m]\) and any \(X\in \mathcal {O}_{A_1^i}^T\) such that

$$\begin{aligned} {\mathrm {Tr}}_{A_{i-1}A_i}X=0 \end{aligned}$$
(65)

we have

$$\begin{aligned} \left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})(X)\right\| _{W_1} \le K\left( \frac{1}{1-\eta }+1\right) \left\| X\right\| _1\,. \end{aligned}$$
(66)

Proof

We have from Lemma 4 and from the contractivity of the trace distance

$$\begin{aligned} \left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})(X)\right\| _{W_1}&\le \left| A_{i-1}\right| \left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})(X)\right\| _1 \nonumber \\&\quad + \left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _{W_1}\nonumber \\&\le K\left\| X\right\| _1 + \left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _{W_1}\,. \end{aligned}$$
(67)

We have

$$\begin{aligned}&\left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _{W_1}\nonumber \\&\quad \le \left| A_i\right| \left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _1 \nonumber \\&\quad \quad + \left\| {\mathrm {Tr}}_{A_i}(\Phi _m\circ \cdots \circ \Phi _{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _{W_1}\nonumber \\&\quad \le K\left\| {\mathrm {Tr}}_{A_{i-1}}X\right\| _1 + \left\| (\Phi _m\circ \cdots \circ \Phi _{i+2}\circ {\tilde{\Phi }}_{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _{W_1}\,. \end{aligned}$$
(68)

Iterating the procedure, we get

$$\begin{aligned}&\left\| (\Phi _m\circ \cdots \circ \Phi _{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _{W_1}\nonumber \\&\quad \le K\left( \left\| {\mathrm {Tr}}_{A_{i-1}}X\right\| _1 + \left\| {\tilde{\Phi }}_{i+1}\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _1 \right. \nonumber \\&\quad \quad \left. + \cdots + \left\| ({\tilde{\Phi }}_m\circ \cdots \circ {\tilde{\Phi }}_{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _1\right) \nonumber \\&\quad \quad + \left\| {\mathrm {Tr}}_{A_m}({\tilde{\Phi }}_m\circ \cdots \circ {\tilde{\Phi }}_{i+1})\left( {\mathrm {Tr}}_{A_{i-1}}X\right) \right\| _{W_1}\nonumber \\&\quad \le K\left( 1+\eta +\cdots + \eta ^{m-i}\right) \left\| {\mathrm {Tr}}_{A_{i-1}}X\right\| _1 + \left\| {\mathrm {Tr}}_{A_{i-1}A_i}X\right\| _1\nonumber \\&\quad \le \frac{K}{1-\eta }\left\| {\mathrm {Tr}}_{A_{i-1}}X\right\| _1 \le \frac{K}{1-\eta }\left\| X\right\| _1\,, \end{aligned}$$
(69)

where the last two inequalities follow from (36) and (65), respectively. The claim follows. \(\square \)

Lemma 4

For any \(X\in \mathcal {O}_V^T\) and any \(A\subseteq V\),

$$\begin{aligned} \left\| X\right\| _{W_1} \le \left| A\right| \left\| X\right\| _1 + \left\| {\mathrm {Tr}}_AX\right\| _{W_1}\,. \end{aligned}$$
(70)

Proof

Without loss of generality, we can assume that \(V=[n]\) and \(A = [k]\) for some \(k\in [n]\). We have

$$\begin{aligned} \left\| X\right\| _{W_1}&\le \left\| X - \frac{{\mathbb {I}}}{d}\otimes {\mathrm {Tr}}_1X\right\| _{W_1} + \left\| \frac{{\mathbb {I}}}{d}\otimes {\mathrm {Tr}}_1X\right\| _{W_1} \nonumber \\&= \frac{1}{2}\left\| X - \frac{{\mathbb {I}}}{d}\otimes {\mathrm {Tr}}_1X\right\| _1 + \left\| {\mathrm {Tr}}_1X\right\| _{W_1}\nonumber \\&\le \left\| X\right\| _1 + \left\| {\mathrm {Tr}}_1X\right\| _{W_1}\,, \end{aligned}$$
(71)

where the equality follows from Propositions 2 and 4 and the last inequality follows from the triangle inequality for the trace norm and its contractivity with respect to partial traces. By induction, we get

$$\begin{aligned} \left\| X\right\| _{W_1}&\le \left( \left\| X\right\| _1 + \cdots + \left\| {\mathrm {Tr}}_{1\ldots k-1}X\right\| _1\right) + \left\| {\mathrm {Tr}}_{1\ldots k}X\right\| _{W_1}\nonumber \\&\le k\left\| X\right\| _1 + \left\| {\mathrm {Tr}}_{1\ldots k}X\right\| _{W_1}\,, \end{aligned}$$
(72)

and the claim follows. \(\square \)

Lemma 5

Let \(V=[n]\). Then, for any \(X\in \mathcal {O}_V^T\),

$$\begin{aligned} \left\| X\right\| _{W_1} \le \left\| X\right\| _1 + \left\| {\mathrm {Tr}}_1X\right\| _1 + \cdots + \left\| {\mathrm {Tr}}_{1\ldots n-1}X\right\| _1\,. \end{aligned}$$
(73)

Proof

Follows from Lemma 4. \(\square \)

4 Curvature Bound

In the seminal paper [32], Ollivier introduced a generalization of the notion of curvature to generic, possibly discrete, metric spaces. In his framework, the curvature of a metric space \((\Omega ,d)\) endowed with a classical stochastic map P acting on the probability measures on \(\Omega \) is defined as the following contraction property of the Wasserstein distance \(W_1\): for any two probability measures \(\mu _1,\mu _2\),

$$\begin{aligned} W_1(P(\mu _1),P(\mu _2))\le \big (1-\kappa \big )\,W_1(\mu _1,\mu _2)\,. \end{aligned}$$
(74)

The constant \(\kappa >0\) is called the coarse Ricci curvature of the triple \((\Omega ,d,P)\). In particular, it is easy to verify that the existence of a positive coarse Ricci curvature induces the uniqueness of the invariant measure \(\nu \) for the Markov kernel P. Moreover, it was recently proven in [33] that Ollivier’s coarse Ricci curvature provides an upper bound on the transportation cost inequality for the measure \(\nu \), hence recovering the results from the smooth Riemannian setting.

Here, inspired by the works of [32] and [33], we prove that a contraction of the Lipschitz constant under a certain quantum channel constructed from the Petz recovery maps of the Gibbs state \(\omega \) can be used to conclude that \(\omega \) satisfies a transportation cost inequality. In particular, we do not need to assume that the underlying graph is \({\mathbb {Z}}\), in contrast with Sect. 3. Let \(G=(V,E)\) be a hypergraph with \(n=|V|\), and let \(H:=\sum _{A\in E}h_A\) be a Hamiltonian whose local terms \(h_A\) pairwise commute and are supported on the hyperedges \(A\in E\). For a given site \(v\in V\), we recall the composition of the partial trace \(\mathrm {Tr}_v\) on v with the rotated Petz recovery map of v:

$$\begin{aligned} \Psi _v(\rho )=\Phi _v\circ \mathrm {Tr}_v(\rho )=\int _{{\mathbb {R}}}\omega ^{\frac{1-it}{2}}\omega _{v^c}^{\frac{-1+it}{2}}\,(\rho _{v^c}\otimes I_v)\,\omega _{v^c}^{\frac{-1-it}{2}}\omega ^{\frac{1+it}{2}}\,\mathrm{d}\mu _0(t) \end{aligned}$$
(75)

for the probability density \(\mu _0(t):=\frac{\pi }{2}\big (\cosh (\pi t)+1\big )^{-1}\). Note that since we assumed \(\omega \) to be the Gibbs state of a commuting Hamiltonian, the map \(\Psi _v\) acts non-trivially on the neighborhood of v

$$\begin{aligned} N_v:=\bigcup \left\{ A\in E : v\in A\right\} \,. \end{aligned}$$
(76)

We also introduce the quantum channel

$$\begin{aligned} \Psi =\frac{1}{n}\sum _{v\in V}\Psi _v\,. \end{aligned}$$
(77)

We assume that \(\Psi \) is a contraction with respect to the \(W_1\) norm, i.e., that

$$\begin{aligned} \left\| \Psi \right\| _{W_1\rightarrow W_1} = \max _{\Delta \in \mathcal {O}_V^T}\frac{\left\| \Psi (\Delta )\right\| _{W_1}}{\left\| \Delta \right\| _{W_1}} \le 1-\frac{\kappa }{n} \end{aligned}$$
(78)

for some \(\kappa >0\), in analogy with (74). This contraction property was already derived in Ollivier’s original article [32] as a generalization of Dobrushin’s uniqueness condition. Here, we first prove that this condition implies the transportation cost inequality for the Gibbs state \(\omega \equiv \omega _\beta :=e^{-\beta H}/\mathrm {Tr}\,e^{-\beta H}\):

Theorem 4

With the conditions of the previous paragraph, we have

$$\begin{aligned} C(\omega _\beta ) \le 2n\,\frac{N^2}{\left( 1-e^{-\kappa }\right) ^2}\,, \end{aligned}$$
(79)

where \(N:=\max _{v\in V}|N_v|\).

Proof

We have for any state \(\rho \in \mathcal {S}_V\)

$$\begin{aligned} \left\| \rho - \omega _\beta \right\| _{W_1}\le \sum _{i=1}^n \left\| \Psi ^{i-1}(\rho )-\Psi ^i(\rho )\right\| _{W_1}+\left\| \Psi ^n(\rho )-\omega _\beta \right\| _{W_1}\,. \end{aligned}$$
(80)

The last term can be controlled by \(\left\| \rho -\omega _\beta \right\| _{W_1}\) thanks to the contraction (78):

$$\begin{aligned} \left\| \Psi ^n(\rho )-\omega _\beta \right\| _{W_1}\le \left( 1-\frac{\kappa }{n}\right) ^n\left\| \rho -\omega _\beta \right\| _{W_1}\le e^{-\kappa }\left\| \rho -\omega _\beta \right\| _{W_1}\,. \end{aligned}$$
(81)

On the other hand, the sum on the right-hand side of (80) can be controlled as follows:

$$\begin{aligned} \sum _{i=1}^n\left\| \Psi ^{i-1}(\rho )-\Psi ^i(\rho )\right\| _{W_1}&\le \frac{1}{n}\sum _{i=1}^n\sum _{v\in V} \left\| \Psi _v(\Psi ^{i-1}(\rho ))-\Psi ^{i-1}(\rho )\right\| _{W_1}\nonumber \\&\le \frac{N}{n} \sum _{i=1}^n\sum _{v\in V}\left\| \Psi _v(\Psi ^{i-1}(\rho ))-\Psi ^{i-1}(\rho )\right\| _1\,, \end{aligned}$$
(82)

where the last inequality follows by Proposition 3. Proceeding as in the proof of Lemma 1, by the joint use of Pinsker’s inequality with the recoverability bound followed by the data processing inequality we can further bound the trace distances above so that

$$\begin{aligned} \sum _{i=1}^n\left\| \Psi ^{i-1}(\rho )-\Psi ^i(\rho )\right\| _{W_1}&\le \frac{N}{n}\,\sum _{i=1}^n\sum _{v\in V}\sqrt{2\,S_{{\mathbb {M}}}(\Psi ^{i-1}(\rho )\Vert \Psi _v(\Psi ^{i-1}(\rho )))}\nonumber \\&\le {N}\sqrt{2\sum _{i=1}^n\sum _{v\in V}{S_{{\mathbb {M}}}(\Psi ^{i-1}(\rho )\Vert \Psi _v(\Psi ^{i-1}(\rho )))}}\nonumber \\&\le N\sqrt{2\sum _{i=1}^n\sum _{v\in V}\left( S(\Psi ^{i-1}(\rho )\Vert \omega _\beta )-S(\mathrm {Tr}_v\Psi ^{i-1}(\rho )\Vert \mathrm {Tr}_v\omega _\beta )\right) }\nonumber \\&\le N\sqrt{2\sum _{i=1}^n\sum _{v\in V}\left( S(\Psi ^{i-1}(\rho )\Vert \omega _\beta )-S(\Psi _v(\Psi ^{i-1}(\rho ))\Vert \omega _\beta )\right) }\nonumber \\&\overset{(1)}{\le } N \sqrt{2n\sum _{i=1}^n\left( S(\Psi ^{i-1}(\rho )\Vert \omega _\beta )-S(\Psi (\Psi ^{i-1}(\rho ))\Vert \omega _\beta )\right) }\nonumber \\&\le N\sqrt{2n\,S(\rho \Vert \omega _\beta )}\,. \end{aligned}$$
(83)

Inequality (1) above uses the concavity of the entropy, so that for any state \(\rho \)

$$\begin{aligned} \frac{1}{n}\sum _{v\in V}\left( S(\rho \Vert \omega _\beta )-S(\Psi _v(\rho )\Vert \omega _\beta )\right)&=S(\rho \Vert \omega _\beta )+\frac{1}{n} \sum _{v\in V} S(\Psi _v(\rho ))\nonumber \\&\quad +\frac{1}{n}\sum _{v\in V}\mathrm {Tr}\left[ \Psi _v(\rho )\ln \omega _\beta \right] \nonumber \\&\le S(\rho \Vert \omega _\beta )+ S(\Psi (\rho ))+\mathrm {Tr}\left[ \Psi (\rho )\ln \omega _\beta \right] \nonumber \\&=S(\rho \Vert \omega _\beta )-S(\Psi (\rho )\Vert \omega _\beta )\,. \end{aligned}$$
(84)

Plugging (81) and (83) onto (80), the result follows. \(\square \)

It remains to prove that (78) is satisfied at high enough temperature.

Proposition 9

There exists an inverse temperature \(\beta _c>0\) such that for all \(\beta <\beta _c\), (78) holds for some constant \(\kappa (\beta )>0\). In particular, whenever \(N>1\), one can choose

$$\begin{aligned} \beta _c=(5N\max _{A\in E}\Vert h_A\Vert _\infty )^{-1}W\Big (\frac{1}{16d^3}\Big )\,, \end{aligned}$$
(85)

where W denotes the Lambert function and is defined as the inverse of \(x\mapsto xe^x\).

Proof

We have

$$\begin{aligned} \left\| \Psi \right\| _{W_1\rightarrow W_1} = \max _{\Delta \in \mathcal {O}_V^T}\frac{\left\| \Psi (\Delta )\right\| _{W_1}}{\left\| \Delta \right\| _{W_1}}\,. \end{aligned}$$
(86)

Any \(\Delta \in \mathcal {O}_V^T\) can be expressed as [14, Sect. III]

$$\begin{aligned} \Delta = \sum _{v\in V}\Delta _v \end{aligned}$$
(87)

such that for any \(v\in V\), \(\Delta _v\in \mathcal {O}_V^T\) satisfies \({\mathrm {Tr}}_v\Delta _v=0\) and

$$\begin{aligned} \left\| \Delta \right\| _{W_1} = \sum _{v\in V}\left\| \Delta _v\right\| _{W_1} = \frac{1}{2}\sum _{v\in V}\left\| \Delta _v\right\| _1\,. \end{aligned}$$
(88)

Therefore, we have

$$\begin{aligned} \left\| \Psi \right\| _{W_1\rightarrow W_1} = \max _{v\in V}\max \left\{ \left\| \Psi (\Delta _v)\right\| _{W_1}:\Delta _v\in \mathcal {O}_V^T,\,{\mathrm {Tr}}_v\Delta _v=0,\,\left\| \Delta _v\right\| _1=2\right\} \,. \end{aligned}$$
(89)

We have

$$\begin{aligned} \left\| \Psi (\Delta _v)\right\| _{W_1}&\le \left\| \Psi (\Delta _v) - \frac{{\mathbb {I}}_v}{d}\otimes {\mathrm {Tr}}_v\Psi (\Delta _v)\right\| _{W_1} + \left\| \frac{{\mathbb {I}}_v}{d}\otimes {\mathrm {Tr}}_v\Psi (\Delta _v)\right\| _{W_1} \nonumber \\&= \frac{1}{2}\left\| \Psi (\Delta _v) - \frac{{\mathbb {I}}_v}{d}\otimes {\mathrm {Tr}}_v\Psi (\Delta _v)\right\| _1 + \left\| {\mathrm {Tr}}_v\Psi (\Delta _v)\right\| _{W_1}\nonumber \\&\le \frac{1}{2}\left\| \Psi (\Delta _v)\right\| _1 + \frac{1}{2}\left\| {\mathrm {Tr}}_v\Psi (\Delta _v)\right\| _1 + \left\| {\mathrm {Tr}}_v\Psi (\Delta _v)\right\| _{W_1}\,, \end{aligned}$$
(90)

where the equality follows from Proposition 2 and  4. Since \({\mathrm {Tr}}_v\Delta _v=0\), we have \(\Psi _v(\Delta _v) = 0\), and

$$\begin{aligned} \frac{1}{2}\left\| \Psi (\Delta _v)\right\| _1 \le \frac{1}{2n}\sum _{w\in V{\setminus } v}\left\| \Psi _w(\Delta _v)\right\| _1 \le 1 - \frac{1}{n}\,. \end{aligned}$$
(91)

For any \(w\in V{\setminus } N_v\), we have

$$\begin{aligned} {\mathrm {Tr}}_v\Psi _w(\Delta _v) = \Psi _w({\mathrm {Tr}}_v\Delta _v) = 0\,. \end{aligned}$$
(92)

Then,

$$\begin{aligned} {\mathrm {Tr}}_v\Psi (\Delta _v) = \frac{1}{n}\sum _{w\in N_v{\setminus } v}{\mathrm {Tr}}_v\Psi _w(\Delta _v)\,. \end{aligned}$$
(93)

We have for any \(w\in N_v{\setminus } v\), recalling that \(v\in N_w\),

$$\begin{aligned} {\mathrm {Tr}}_{N_w{\setminus } v}{\mathrm {Tr}}_v\Psi _w(\Delta _v) = {\mathrm {Tr}}_{N_w}\Psi _w(\Delta _v) = {\mathrm {Tr}}_{N_w}\Delta _v = 0\,, \end{aligned}$$
(94)

therefore,

$$\begin{aligned} \left\| {\mathrm {Tr}}_v\Psi _w(\Delta _v)\right\| _{W_1} \le \left( N-1\right) \left\| {\mathrm {Tr}}_v\Psi _w(\Delta _v)\right\| _1\,, \end{aligned}$$
(95)

and

$$\begin{aligned} \left\| {\mathrm {Tr}}_v\Psi (\Delta _v)\right\| _{W_1} \le \frac{N-1}{n}\sum _{w\in N_v{\setminus } v}\left\| {\mathrm {Tr}}_v\Psi _w(\Delta _v)\right\| _1\,. \end{aligned}$$
(96)

Moreover,

$$\begin{aligned} \left\| {\mathrm {Tr}}_v\Psi (\Delta _v)\right\| _1 \le \frac{1}{n}\sum _{w\in N_v{\setminus } v}\left\| {\mathrm {Tr}}_v\Psi _w(\Delta _v)\right\| _1\,. \end{aligned}$$
(97)

Putting together (90), (91), (96) and (97), we get

$$\begin{aligned} \left\| \Psi (\Delta _v)\right\| _{W_1}&\le 1 - \frac{1}{n} + \frac{N-\frac{1}{2}}{n}\sum _{w\in N_v{\setminus } v}\left\| {\mathrm {Tr}}_v\Psi _w(\Delta _v)\right\| _1\nonumber \\&\le 1 - \frac{1}{n} + \frac{N-\frac{1}{2}}{n}\sum _{w\in N_v{\setminus } v}\left( \left\| \omega _w\otimes {\mathrm {Tr}}_{vw}\Delta _v\right\| _1 + 2\left\| \Psi _w - \omega _w\otimes {\mathrm {Tr}}_w\right\| _\diamond \right) \nonumber \\&= 1 - \frac{1}{n} + \frac{2N-1}{n}\sum _{w\in N_v{\setminus } v}\left\| \Psi _w - \omega _w\otimes {\mathrm {Tr}}_w\right\| _\diamond \,, \end{aligned}$$
(98)

where \(\omega _w\otimes {\mathrm {Tr}}_w\) is the quantum channel that replaces with \(\omega _w\) the state of the site w. We then have

$$\begin{aligned} \left\| \Psi \right\| _{W_1\rightarrow W_1} \le 1 - \frac{1}{n} + \frac{2N-1}{n}\sum _{w\in N_v{\setminus } v}\left\| \Psi _w - \omega _w\otimes {\mathrm {Tr}}_w\right\| _\diamond \,. \end{aligned}$$
(99)

We have

$$\begin{aligned}&\left\| \Psi _w - \omega _w\otimes {\mathrm {Tr}}_w\right\| _\diamond \nonumber \\&\quad \le \int _{{\mathbb {R}}}\left\| \omega ^\frac{1-it}{2}\,\omega _{w^c}^\frac{it-1}{2}\left( {\mathbb {I}}_w\otimes {\mathrm {Tr}}_w\left[ \cdot \right] \right) \omega _{w^c}^{-\frac{it+1}{2}}\,\omega ^\frac{1+it}{2} - \omega _w\otimes {\mathrm {Tr}}_w\left[ \cdot \right] \right\| _\diamond \mathrm{d}\mu _0(t)\,. \nonumber \\ \end{aligned}$$
(100)

Since the Hamiltonian terms \(h_A\) commute we have that, given \(H_v:=\sum _{A\ni v}h_A\),

$$\begin{aligned} \omega ^\frac{1-it}{2}\,\omega _{v^c}^\frac{it-1}{2}=e^{-\beta \frac{1-it}{2}H_v}\Big (\mathrm {Tr}_v\big [e^{-\beta H_v}\big ]\Big )^{\frac{it-1}{2}}\,. \end{aligned}$$
(101)

Now,

$$\begin{aligned} \Big \Vert \omega ^{\frac{1-it}{2}}\omega _{v^c}^{\frac{it-1}{2}}-d^{\frac{it-1}{2}}{\mathbb {I}}\Big \Vert _\infty&\le \Big \Vert e^{-\beta \frac{1-it}{2}H_v}-{\mathbb {I}}\Big \Vert _\infty \,\Big \Vert \Big (\mathrm {Tr}_v\big [e^{-\beta H_v}\big ]\Big )^{\frac{it-1}{2}}\Big \Vert _\infty \nonumber \\&\quad + \,\Big \Vert \Big (\mathrm {Tr}_v\big [e^{-\beta H_v}\big ]\Big )^{\frac{it-1}{2}}-d^{\frac{it-1}{2}}{\mathbb {I}}\Big \Vert _\infty \nonumber \\&\overset{(1)}{\le } \beta \,\frac{\sqrt{1+t^2}}{2}\Vert H_v\Vert _\infty \,e^{\beta \frac{\sqrt{1+t^2}}{2}\Vert H_v\Vert _\infty }d^{-\frac{1}{2}}\,e^{\frac{\beta }{2}\Vert H_v\Vert _\infty }\nonumber \\&\quad +\frac{\sqrt{1+t^2}}{2}\,M^{1+\frac{\sqrt{1+t^2}}{2}}\,d\Big \Vert d^{-1}\mathrm {Tr}_v\big [e^{-\beta H_v}\big ]-{\mathbb {I}}\Big \Vert _\infty \nonumber \\&\le {\beta \,\sqrt{1+t^2}\Vert H_v\Vert _\infty }\,d^{1+\frac{\sqrt{1+t^2}}{2}}e^{\beta \Vert H_v\Vert _\infty \big (2+\frac{\sqrt{1+t^2}}{2}\big )}\nonumber \\&\equiv f_v(\beta ,t)\,. \end{aligned}$$
(102)

Inequality (1) above follows from the operator convexity of \(x\mapsto x^{-\frac{1}{2}}\) as well as Lemma 6, where \(M:=\max \{\Vert \mathrm {Tr}_v\big [e^{-\beta H_v}\big ]\Vert _\infty ,\Vert \mathrm {Tr}_v\big [e^{-\beta H_v}\big ]^{-1}\Vert _\infty ,d\}\le d\,e^{\beta \Vert H_v\Vert _\infty }\). Moreover,

$$\begin{aligned} e^{-2\beta \Vert H_v\Vert _\infty }d^{-1}{\mathbb {I}}\le \omega _v\le e^{2\beta \Vert H_v\Vert _\infty }d^{-1}{\mathbb {I}}\quad \Rightarrow \quad \Vert \omega _v-d^{-1}{\mathbb {I}}\Vert _1 \le 2\beta \Vert H_v\Vert _\infty \,e^{2\beta \Vert H_v\Vert _\infty }\,. \end{aligned}$$
(103)

Therefore,

$$\begin{aligned}&\left\| \omega ^\frac{1-it}{2}\,\omega _{v^c}^\frac{it-1}{2}\left( {\mathbb {I}}_v\otimes {\mathrm {Tr}}_v\left[ \cdot \right] \right) \omega _{v^c}^{-\frac{it+1}{2}}\,\omega ^\frac{1+it}{2} - \omega _v\otimes {\mathrm {Tr}}_v\left[ \cdot \right] \right\| _\diamond \nonumber \\&\qquad \qquad \le d^{\frac{1}{2}}\big (e^{\beta \Vert H_v\Vert _\infty }+1\big )\,f_v(\beta ,t)+\Vert d^{-1}{\mathbb {I}}-\omega _v\Vert _\infty \nonumber \\&\qquad \qquad \le d^{\frac{1}{2}}\big (e^{\beta \Vert H_v\Vert _\infty }+1\big )\,f_v(\beta ,t)+ 2\beta \Vert H_v\Vert _\infty \,e^{2\beta \Vert H_v\Vert _\infty }\,, \end{aligned}$$
(104)

and the integrand in (100) tends to zero pointwise for \(\beta \rightarrow 0\). On the other hand, we have for any \(t\in {\mathbb {R}}\)

$$\begin{aligned}&\left\| \omega ^\frac{1-it}{2}\,\omega _{v^c}^\frac{it-1}{2}\left( {\mathbb {I}}_v\otimes {\mathrm {Tr}}_v\left[ \cdot \right] \right) \omega _{v^c}^{-\frac{it+1}{2}}\,\omega ^\frac{1+it}{2} - \omega _v\otimes {\mathrm {Tr}}_v\left[ \cdot \right] \right\| _\diamond \nonumber \\&\le \left\| \omega ^\frac{1-it}{2}\,\omega _{v^c}^\frac{it-1}{2}\left( {\mathbb {I}}_v\otimes {\mathrm {Tr}}_v\left[ \cdot \right] \right) \omega _{v^c}^{-\frac{it+1}{2}}\,\omega ^\frac{1+it}{2}\right\| _\diamond + \left\| \omega _v\otimes {\mathrm {Tr}}_v\left[ \cdot \right] \right\| _\diamond \nonumber \\&\le 2\,, \end{aligned}$$
(105)

therefore the integrand in (100) is uniformly bounded. Then, we get for all \(t\in {\mathbb {R}}_+\) that

$$\begin{aligned} \left\| \Psi _v - \omega _v\otimes {\mathrm {Tr}}_v\right\| _\diamond \le d^{\frac{1}{2}}\big (e^{\beta \Vert H_v\Vert _\infty }+1\big )\,f_v(\beta ,t)+ 2\beta \Vert H_v\Vert _\infty \,e^{2\beta \Vert H_v\Vert _\infty } +2\mu _0([-t,t]^c) \end{aligned}$$
(106)

Therefore, for any \(0<\kappa <1\) there exists \(\beta (\kappa )>0\) such that condition (78) is satisfied for all \(0 \le \beta \le \beta (\kappa )\). More precisely, in view of (106) and (98), it is sufficient that

$$\begin{aligned} 4 \, {\beta \,\sqrt{1+t^2} C}\,d^{\frac{3+\sqrt{1+t^2}}{2}}e^{\beta C\big (3+\frac{\sqrt{1+t^2}}{2}\big )} +2\mu _0([-t,t]^c)\le \frac{N-1}{2N-1} \end{aligned}$$
(107)

where \(C:=\sup _v\Vert H_v\Vert _\infty \). Moreover, it is clear that \(\mu _0([-t,t]^c)\le 2e^{-\pi t}\). The result follows after choosing t so that the exponentially decaying term \(4e^{-\pi t}\) counts for at most half the upper bound and solving (107) for \(\beta _c\), up to some numerical simplifications. \(\square \)

Remark 5

The lower bound (85) can be compared to that in the classical setting [32, Example 17] (see also [34]): there, the author showed that for a Hamiltonian of the form \(U(S):=-\sum _{x\sim y\in G}S(x)S(y)-H\sum _xS(x)\), where \(S(x)\in \{-1,1\}\) denotes the spin configuration at the site x of a graph G, i.e., \(d=2\),

$$\begin{aligned} \beta _c\ge \frac{1}{2}\,\ln \Big (\frac{N+1}{N-1}\Big )\sim _{N\rightarrow \infty }\frac{1}{N}\,, \end{aligned}$$

which shows asymptotic optimality of our result, up to numerical multiplicative constants. For comparison, the exact value of \(\beta _c\) for the Ising model on the regular infinite tree with degree N is known to be equal to \(\frac{1}{2}\ln \big (\frac{N}{N-2}\big )\).

4.1 Auxiliary Lemma

Lemma 6

For any positive, definite matrices AB and all \(z\in {\mathbb {C}}\),

$$\begin{aligned} \Vert A^z-B^z\Vert _\infty \le |z|\,\max \{\Vert A\Vert _\infty ,\Vert A^{-1}\Vert _\infty ,\Vert B\Vert _\infty ,\Vert B^{-1}\Vert _\infty \}^{1+|{\text {Re}}(z)|}\,\Vert A-B\Vert _\infty \,, \end{aligned}$$
(108)

Proof

It suffices to use a linear interpolation between A and B: \(A(s):=sA+(1-s)B\). We have

$$\begin{aligned} A^{z}-B^z&=\int _{0}^1\,\frac{d}{ds}A(s)^{z}\,\mathrm{d}s\nonumber \\&=z\,\iint _{[0,1]^2}\,A(s)^{zu}\,\frac{\mathrm{d}}{\mathrm{d}s}\ln (A(s))\,A^{z(1-u)}\,\mathrm{d}s\mathrm{d}u\nonumber \\&=z\iint _{[0,1]^2}\int _0^\infty \,A(s)^{zu}(A(s)+v)^{-1}\,(A-B)(A(s)+v)^{-1}\nonumber \\&\quad A(s)^{z(1-u)}\,\mathrm{d}v\mathrm{d}u\mathrm{d}s\,. \end{aligned}$$
(109)

Then,

$$\begin{aligned} \Vert A^{z}-B^z\Vert _\infty&\le |z|\,\int _{0}^1\int _0^\infty \,\Vert A(s)^{{\text {Re}}(z)}\Vert _\infty \,\Vert (A(s)+v)^{-1}\Vert _\infty ^2\,\Vert A-B\Vert _\infty \,\mathrm{d}v\mathrm{d}s\nonumber \\&\le |z|\,\Vert A-B\Vert _\infty \,M(z)\,\int _0^1\Vert A(s)^{-1}\Vert _\infty \,\mathrm{d}s\nonumber \\&\le |z|\,M(z)\cdot M'\,\Vert A-B\Vert _\infty \,. \end{aligned}$$
(110)

by the operator convexity of \(x\mapsto x^{-1}\) where \(M(z):=\max _{s\in [0,1]}\Vert (sA+(1-s)B)^{{\text {Re}}(z)}\Vert _\infty \) and \(M':=\max \{\Vert A^{-1}\Vert _\infty ,\Vert B^{-1}\Vert _\infty \}\). The result follows by operator convexity of the inverse function and further simple estimates. \(\square \)

5 Modified Logarithmic Sobolev Inequalities

In this section, we pursue a different approach to prove transportation cost inequalities for \(W_1\), namely through the existence of a non-commutative entropic inequality known as the modified logarithmic Sobolev inequality [35, 36]. In order to introduce our main result, we need a variation of the Lipschitz constant that was introduced in [13]. This definition departs from a noncommutative differential structure, which we define below (see [37]):

Definition 1

(Differential structure) A set of operators \(L_k \in \mathcal {O}_V\) and constants \(\omega _k\in {\mathbb {R}}\) define a differential structure \(\{L_k,\omega _k\}_{k\in \mathcal {K}}\) for a full rank state \(\omega \in \mathcal {S}_V\) if

  1. 1

    \(\{L_k\}_{k\in \mathcal {K}}=\{L_k^{\dagger }\}_{k\in \mathcal {K}}\);

  2. 2

    \(\{L_k\}_{k\in \mathcal {K}}\) consists of eigenvectors of the modular operator \(\Delta _\omega (X):=\omega X\omega ^{-1}\) with

    $$\begin{aligned} \Delta _\omega (L_k)=e^{-\omega _k}L_k\,. \end{aligned}$$
    (111)

Such a differential structure can be used to provide the set of matrices with a Lipschitz constant that is tailored to \(\omega \), see, e.g., [13, 37] for more on this. In order to distinguish that constant from \(\Vert .\Vert _L\), we refer to it as the differential Lipschitz constant and denote it by \({\left| \left| \left| \nabla X \right| \right| \right| }\). It is defined as:

$$\begin{aligned} {\left| \left| \left| \nabla X \right| \right| \right| }:= \left( \sum _{k\in \mathcal {K}} (e^{-\omega _k/2}+e^{\omega _k/2})\Vert \partial _kX\Vert _{\infty }^2\right) ^{1/2}\,, \end{aligned}$$
(112)

where \(\partial _k X\equiv [L_k,X]\). For ease of notations, we will denote the differential structure by the couple \((\nabla ,\omega )\). The notion of a differential structure is also intimately connected to that of the generator of a quantum dynamical semigroup converging to \(\omega \) [37], and properties of that semigroup immediately translate to properties of the metric. This is because the differential structure can be used to define an operator that behaves in an analogous way to the Laplacian on a smooth manifold, which in turn induces a heat semigroup. We refer to [13, 37] for more details on this connection and interpretation.

When the state \(\omega \) is a quantum Gibbs state corresponding to a local, commuting Hamiltonian associated with a uniformly bounded interaction defined on a lattice \(V\subset \subset {\mathbb {Z}}^D\), the differential structure \((\nabla ,\omega )\) can be chosen as local. This means that the operators \(L_k\equiv L_{i,\alpha }\) are indexed by a site \(i\in V \) and an index \(\alpha \) of a set \(\Gamma \) whose cardinality only depends on the local dimension d and the locality \(\kappa \) of \(\omega \). Moreover, we assume that the operators \(L_{i,\alpha }\) are supported on a neighborhood \(\mathcal {N}_i\) of site i of diameter \(r\equiv r(\kappa )\) and the corresponding constants \(\omega _{i,\alpha }\) are uniformly bounded: \(\sup _{i\in {\mathbb {Z}}^D}\max _{\alpha \in \Gamma }|\omega _{i,\alpha }|\equiv \Omega <\infty \). The definition in Eq. (112) yields a metric on states by duality:

$$\begin{aligned} W_{\nabla }(\rho ,\omega ):=\sup \limits _{X=X^\dagger ,\, {\left| \left| \left| \nabla X \right| \right| \right| }\le 1}\left| {\text {Tr}}\left( X(\rho -\omega )\right) \right| . \end{aligned}$$

Proposition 10

Given the Gibbs state \(\omega \) of a local commuting Hamiltonian H on \(V\subset \subset {\mathbb {Z}}^D\) with \(|V|=n\) and associated local differential structure \((\nabla ,\omega )\), the following bound holds for all \(\rho \in \mathcal {S}_V\):

$$\begin{aligned} \left\| \rho -\omega \right\| _{W_1}\le C\,\sqrt{n}\,W_{\nabla }(\rho ,\omega )\,, \end{aligned}$$

for some constant C independent of n.

Proof

By duality, it is equivalent to prove that for all \(H\in \mathcal {O}_V\)

$$\begin{aligned} {\left| \left| \left| \nabla H \right| \right| \right| }\le C\,\sqrt{n}\,\Vert H\Vert _L\,. \end{aligned}$$

First, we have

$$\begin{aligned} {\left| \left| \left| \nabla H \right| \right| \right| }&=\Big (\sum _{i\in V}\sum _{\alpha \in \Gamma }(e^{-\omega _{i,\alpha }/2}+e^{\omega _{i,\alpha }/2})\,\Vert [L_{i,\alpha },H]\Vert _\infty ^2\Big )^{\frac{1}{2}} \end{aligned}$$
(113)
$$\begin{aligned}&\le \sqrt{n|\Gamma |}\,\sqrt{2\,e^{\Omega /2}}\,\max _{i\in V}\,\max _{\alpha \in \Gamma }\,\Vert [L_{i,\alpha },H]\Vert _\infty \,. \end{aligned}$$
(114)

Now, since for each pair \((i,\alpha )\), \(L_{i,\alpha } \) is supported on a neighborhood \(\mathcal {N}_i\) of site \(i\in V\),

$$\begin{aligned} \Vert [L_{i,\alpha },H]\Vert _\infty&= \big \Vert [L_{i,\alpha },H-\frac{{\mathbb {I}}_{\mathcal {N}_i}}{d^{|\mathcal {N}_i|}}\otimes {\mathrm {Tr}}_{\mathcal {N}_i}(H)]\big \Vert _\infty \end{aligned}$$
(115)
$$\begin{aligned}&\le 2\,\Vert L_{i,\alpha }\Vert _\infty \,\big \Vert H-\frac{{\mathbb {I}}_{\mathcal {N}_i}}{d^{|\mathcal {N}_i|}}\otimes {\mathrm {Tr}}_{\mathcal {N}_i}(H)\big \Vert _\infty \,. \end{aligned}$$
(116)

Next, by a telescopic sum argument, we can further control the last infinity norm on the right hand side above as follows: given an arbitrary ordering of the region \(\mathcal {N}_i\),

$$\begin{aligned}&\big \Vert H-\frac{{\mathbb {I}}_{\mathcal {N}_i}}{d^{|\mathcal {N}_i|}}\otimes {\mathrm {Tr}}_{\mathcal {N}_i}(H)\big \Vert _\infty \nonumber \\&\quad \overset{(1)}{\le } \sum _{j=1}^{|\mathcal {N}_i|} \Big \Vert \Big (\frac{{\mathbb {I}}_{1\ldots j-1}}{d^{j-1}}\otimes {\mathrm {Tr}}_{1\ldots j-1}- \frac{{\mathbb {I}}_{1\ldots j}}{d^{j}}\otimes {\mathrm {Tr}}_{1\ldots j}\Big )(H)\Big \Vert _\infty \end{aligned}$$
(117)
$$\begin{aligned}&\quad \overset{(2)}{\le } |\mathcal {N}_i| \max _{j\in \mathcal {N}_i}\,\Vert H-\frac{{\mathbb {I}}_j}{d}\otimes {\mathrm {Tr}}_{j}(H)\Vert _\infty \,, \end{aligned}$$
(118)

where (1) follows from the triangle inequality, whereas (2) follows from the fact that the maps \(\frac{{\mathbb {I}}_{1\ldots j-1}}{d^{j-1}}\otimes {\mathrm {Tr}}_{1\ldots j-1}\) are completely positive and unital, and therefore contract the operator norm. All in all, we have derived the following bound on the differential Lipschitz constant of H:

$$\begin{aligned} {\left| \left| \left| \nabla {H} \right| \right| \right| }&\le \sqrt{n|\Gamma |}\,2\sqrt{2\,e^{\Omega /2}}\,\max _{i\in V}\,\max _{\alpha \in \Gamma }\,\Vert L_{i,\alpha }\Vert _\infty \, |\mathcal {N}_i|\,\max _{j\in \mathcal {N}_i}\,\Vert H-\frac{{\mathbb {I}}_j}{d}\otimes {\mathrm {Tr}}_{j}(H)\Vert _\infty \end{aligned}$$
(119)
$$\begin{aligned}&\overset{(3)}{\le } \frac{d^2-1}{d^2}\sqrt{n|\Gamma |}\,2\sqrt{2\,e^{\Omega /2}}\,\max _{i\in V}\,\max _{\alpha \in \Gamma }\,\Vert L_{i,\alpha }\Vert _\infty \, |\mathcal {N}_i|\,\Vert H\Vert _L\end{aligned}$$
(120)
$$\begin{aligned}&\equiv C\,\sqrt{n}\,\max _{i\in V}\Vert H-\frac{{\mathbb {I}}_j}{d}\otimes {\mathrm {Tr}}_{j}(H)\Vert _\infty \,, \end{aligned}$$
(121)

for some constant C independent of n, and where (3) follows from Proposition 6. \(\square \)

The advantage of \(W_1\) as compared to \(W_{\nabla }\) is that it does not depend on the state \(\omega \). On the other hand, the bound derived in Proposition 10 can be used in conjunction with recently proved transportation cost inequalities for \(W_{\nabla }\) through the proof of the existence of a modified logarithmic Sobolev inequality in order to get analogous inequalities for \(W_1\) (see [38] for more details):

Theorem 5

Let \(\omega \) be the Gibbs state of a local commuting Hamiltonian H at inverse temperature \(\beta \) on \(V\subset \subset {\mathbb {Z}}^D\). Then, there exists a critical inverse temperature \(\beta _c\) such that \(C(\omega _\beta )\le C\,n\) for some constant C independent of \(n=|V|\) whenever \(\beta <\beta _c\) if any of the two conditions below is satisfied:

\({\text {(i)}}\):

H is classical;

\({\text {(ii)}}\):

H is a nearest neighbor Hamiltonian.

Moreover, we can drop the assumption of 2-locality in the 1D case, where \(\beta _c=0\) at the cost of getting a slightly worsened constant \(C(\omega _\beta )\le Cn{\text {polylog}}(n)\), so that we recover the result of Theorem 2.

Proof

In [19, 20, 38], the existence of local differential structures associated with \(\omega \) that satisfy the so-called modified logarithmic Sobolev inequality was proved under the conditions of the theorem. Moreover, the modified logarithmic Sobolev inequality implies the transportation cost inequality for the differential Wasserstein distance [13]: there exists a constant \(C'\) independent of n such that

$$\begin{aligned} W_{\nabla }(\rho ,\omega )\le \sqrt{C'\,S(\rho \Vert \omega )} \end{aligned}$$
(122)

for all state \(\rho \in \mathcal {S}_V\). This fact in conjunction with Proposition 10 allows us to conclude. \(\square \)

6 Local Indistinguishability

In this section, we provide a transportation cost inequality under a condition of local indistinguishability [39,40,41]. In the classical setting, this condition constitutes a weakening of Dobrushin Shlosman’s mixing condition [8] recently considered by Marton [7]. Moreover, as opposed to the latter, our technique has the benefit of not requiring the local specifications of the state to be uniformly lower bounded by a positive number, at the cost of getting a slightly worsened constant.

6.1 Transportation Cost from Local Indistinguishability

We start by proving our general result in the quantum setting. Here, we assume that the \(n=(2m+1)^D\) qudits are arranged on a D-dimensional regular lattice \(V:=[-m,m]^D\). Before we state our main result, we need to introduce the notion of a non-commutative conditional expectation.

Definition 2

(Conditional expectations) Let \(\mathcal {N}\subseteq \mathcal {B}_V\) be a von Neumann subalgebraFootnote 1 of \(\mathcal {B}_V\). A conditional expectation onto \(\mathcal {N}\) is a completely positive unital map \(E_\mathcal {N}^\dagger :\mathcal {B}_V\rightarrow \mathcal {N}\) satisfying

  1. (i)

    for all \(X\in \mathcal {N}\), \(E^\dagger _\mathcal {N}(X)=X\);

  2. (ii)

    for all \(a,b\in \mathcal {N},X\in \mathcal {B}_V\), \(E^\dagger _\mathcal {N}(aXb)=aE_\mathcal {N}(X)b\).

We denote by \(E_{\mathcal {N}}\) its adjoint map with respect to the trace inner product, i.e.,

$$\begin{aligned} \mathrm {Tr}(E_{\mathcal {N}}(X)Y)=\mathrm {Tr}(XE_\mathcal {N}^\dagger (Y))\,. \end{aligned}$$

As a simple example, we consider a full-rank state \(\sigma \in \mathcal {S}_V\) and let \((e^{t\mathcal {L}})_{t\ge 0}\) be a quantum Markov semigroup. Under the following detailed balance condition, the limit \(\lim _{t\rightarrow \infty }e^{t\mathcal {L}^\dagger }=E^\dagger _{\mathcal {N}}\) is a conditional expectation onto the algebra \(\mathcal {N}\) of fixed points of the semigroup:

$$\begin{aligned} \forall X,Y\in \mathcal {B}_V,\quad \mathrm {Tr}\big ( \sigma \,X^\dagger \mathcal {L}^\dagger (Y)\big )=\mathrm {Tr}\big ( \sigma \,\mathcal {L}^\dagger (X)^\dagger Y\big )\,. \end{aligned}$$

Next, for a state \(\rho \), the relative entropy with respect to \(\mathcal {N}\) is defined as follows

$$\begin{aligned}S(\rho \Vert \mathcal {N}):=S(\rho \Vert E_{\mathcal {N}}(\rho ))=\inf _{E_{\mathcal {N}}(\sigma )=\sigma } S(\rho \Vert \sigma )\,,\end{aligned}$$

where the infimum is always attained by \(\sigma = E_{\mathcal {N}}(\rho )\). Indeed, for any \(\sigma \) satisfying \(E_{\mathcal {N}}(\sigma )=\sigma \), we have the following chain rule (see [42, Lemma 3.4])

$$\begin{aligned} S(\rho \Vert \sigma )=S(\rho \Vert E_{\mathcal {N}}(\rho ))+S(E_{\mathcal {N}}(\rho )\Vert \sigma )\,. \end{aligned}$$
(123)

Hence, the infimum is attained if and only if \(S(E_{\mathcal {N}}(\rho )\Vert \sigma )=0\).

Definition 3

(Local indistinguishability) Let \(\{\mathcal {N}_C\}_{C\subseteq V}\) be a set of subalgebras of \(\mathcal {B}_V\) such that \(\mathcal {B}_{{C}^c}\subset \mathcal {N}_C\) and \({\mathbf {E}}:=\{E_C\}_{C\subseteq V}\) be a set of compatible conditional expectations \(E^\dagger _C:\mathcal {B}_V\rightarrow \mathcal {N}_C\) acting non-trivially on region C, i.e., they satisfy the property that for any \(C\subseteq C'\), \(E_C\circ E_{C'}=E_{C'}\circ E_C= E_{C'}\). Then, we say that \({\mathbf {E}}\) satisfies local indistinguishability if there exists a fast decaying function \(\varphi :{\mathbb {N}}\rightarrow {\mathbb {R}}\) independent of V such that for every regions \(XYZ\subset V\) with \({\text {dist}}(X,Z)\ge \ell \), and for all states \(\rho \in \mathcal {S}_{V}\),

$$\begin{aligned} \Vert E_{YZ}\circ ( E_{XYZ}- E_{XY})(\rho )\Vert _1\le \,|XYZ|\,\varphi (\ell ) \,, \end{aligned}$$

For instance, take a product state \(\omega \in \mathcal {S}_V\) and for each region \(C\subseteq V\), denote \(E_C(\rho )=\mathrm {Tr}_C(\rho )\otimes \omega _C\). One can easily verify that the maps \(E_C\) are conditional expectations and satisfy the local indistinguishability condition with \(\varphi =0\). We are now ready to state and prove the main theorem of this section. For a strictly decreasing function \(\varphi :{\mathbb {N}}\rightarrow {\mathbb {R}}_+\) and a positive real number \(a>0\), we denote by \(\varphi ^{-1}(a):=\min \{\ell \in {\mathbb {N}}:\varphi (\ell )\le a\}\).

Theorem 6

Let \({\mathbf {E}}\) be a set of compatible conditional expectations satisfying local indistinguishability with fast decaying function \(\varphi \). Then, for all hypercubes \(V_0\subset V\) and all \(\rho \in \mathcal {S}_V\),

$$\begin{aligned} \left\| \rho -E_{V_0}(\rho )\right\| _{W_1}\le 2\sqrt{20}\,(7\varphi ^{-1}(|V_0|^{-3/2}))^D\sqrt{|V_0|\,S(\rho \Vert E_{V_0}(\rho ))}\,, \end{aligned}$$
(124)

for some fixed constant c of order 1. In particular, whenever \(E_{V}(\rho )=\omega _V\in \mathcal {S}_V\) for all states \(\rho \), and assuming the exponential clustering function \(\varphi (\ell ):=\kappa e^{-\ell /\xi }\), the state \(\omega _{V}\) satisfies \({\text {TC}}(c)\) with \(c=O({n}{\text {polylog}}(n))\).

Fig. 1
figure 1

Geometry of the lattice in the proof of Theorem 6 with tiling by regions \(A_+:=\cup _i A_{+,i}\), \(B_+:=\cup _i B_{+,i}\) and \(C:=\cup C_i\)

Proof

For sake of clarity, we provide the proof for \(D=2\) only, although the general case follows similarly. First, we partition the hypercube \(V_0\) into regions A, \(B_+\) and \(C_+\) in the same way as done in [41] (see also Fig. 1). Then, by triangle inequality

$$\begin{aligned}&\Vert \rho -E_{V_0}(\rho )\Vert _{W_1}\nonumber \\&\quad \le \Vert \rho -E_{A_+}E_{B_-C}(\rho )\Vert _{W_1}+\Vert E_{A_+}E_{B_-C}(\rho )-E_{V_0}(\rho )\Vert _{W_1}\,\nonumber \\&\quad \le \Vert (\mathrm {id}-E_{A+}E_{B_+}E_{C})(\rho )\Vert _{W_1}+\Vert E_{A+}(E_{B_+}E_{C}-E_{B_-C})(\rho )\Vert _{W_1}\nonumber \\&\qquad +\Vert (E_{A_+}E_{B_-C}-E_{V_0})(\rho )\Vert _{W_1}\, \nonumber \\&\quad \le (I)+ 2|A_+^{\max }|\,(II)+(III)\,, \end{aligned}$$
(125)

where

$$\begin{aligned} (I):=&\Vert (\mathrm {id}-E_{A+}E_{B_+}E_{C})(\rho )\Vert _{W_1}\\ (II):=&\Vert (E_{B_+}E_{C}-E_{B_-C})(\rho )\Vert _{W_1}\\ (III):=&\Vert (E_{A_+}E_{B_-C}-E_{V_0})(\rho )\Vert _{W_1} \end{aligned}$$

and where the last bound in (125) follows from Proposition 5 with \(|A_+^{\max }|:=\max _{i}|A_{+,i}|\). Now, we control each of the norms on the right-hand side of (125) separately. First, we denote by \(E_{C^{(0)}}:=\mathrm {id}\), \(C^{(i)}:=\cup _{j\le i}C_j\) given an arbitrary ordering of the connected subregions in C, and similarly for the other regions \(A_+\) and \(B_+\). Then,

$$\begin{aligned} (I)\le&\Vert (\mathrm {id}-E_{C})(\rho )\Vert _{W_1}+\Vert (E_{C}-E_{B_+}E_C)(\rho )\Vert _{W_1}\nonumber \\&\quad +\Vert (E_{B_+}E_C-E_{A_+}E_{B+}E_C)(\rho )\Vert _{W_1} \end{aligned}$$
(126)
$$\begin{aligned} \le&|C^{\max }|\,\sum _{i\in \mathcal {I}_C}\Vert (E_{C^{(i-1)}}-E_{C^{(i)}})(\rho )\Vert _1\nonumber \\&\quad +|B_+^{\max }|\sum _{i\in \mathcal {I}_{B+}}\Vert (E_{B_+^{(i-1)}}-E_{B_+^{(i)}})(E_{C}(\rho ))\Vert _1\nonumber \\&\quad +|A_+^{\max }|\sum _{i\in \mathcal {I}_{A+}}\Vert (E_{A_+^{(i-1)}}-E_{A_+^{(i)}})E_{B_+}E_{C}(\rho )\Vert _1\end{aligned}$$
(127)
$$\begin{aligned} \le&|V_0^{\max }|\,\Big \{\,\sqrt{2|\mathcal {I}_C|}\,\sqrt{\sum _{i\in \mathcal {I}_C}\,S(E_{C^{(i-1)}}(\rho )\Vert E_{C^{(i)}}(\rho ))}\nonumber \\&\quad +\,\sqrt{2|\mathcal {I}_{B_+}|}\,\sqrt{\sum _{i\in \mathcal {I}_{B+}}\,S(E_{B_+^{(i-1)}}(E_C(\rho ))\Vert E_{B_+^{(i)}}(E_C(\rho )))} \nonumber \\&\quad +\sqrt{2|\mathcal {I}_{A_+}|}\,\sqrt{\sum _{i\in \mathcal {I}_{A_+}}\,S(E_{A_+^{(i-1)}}E_{B_+}E_C(\rho )\Vert E_{A_+^{(i)}}E_{B_+}E_C(\rho ))}\Big \}\end{aligned}$$
(128)
$$\begin{aligned} \le&|V_0^{\max }|\,\sqrt{2|\mathcal {I}_{\max }|}\,\Big \{ \sqrt{S(\rho \Vert E_C(\rho ))}+\sqrt{S(E_{C}(\rho )\Vert E_{B_+}E_C(\rho ))}\nonumber \\&\quad +\sqrt{S(E_{B_+}E_C(\rho )\Vert E_{A_+}E_{B_+}E_C(\rho ))}\Big \} \end{aligned}$$
(129)

with

$$\begin{aligned}&|C^{\max }|:=\max _{i}|C_{i}|\\&|V_0^{\max }|:=\max \{|A_+^{\max }|,|B_+^{\max }|,|C^{\max }|\}\\&|\mathcal {I}_{\max }|:=\max \{|\mathcal {I}_{A_+}|, |\mathcal {I}_{B_+}| ,|\mathcal {I}_C|\}\\&\end{aligned}$$

where \(|\mathcal {I}_C|\) denotes the number of connected components in C, and similarly for the other sets. Above, (126) follows by the triangle inequality, (127) by the triangle inequality and Proposition 3, and (128) by Pinsker’s inequality as well as Jensen’s inequality for \(x\mapsto x^{\frac{1}{2}}\). (129) follows from the chain rule in (123), and the fact that the regions \(C_i\), resp. \(A_{+,i}\), resp. \(B_{+,i}\), do not overlap, so that for instance \(E_{A_{+,i}}E_{A_{+,j}}=E_{A_{+,j}}E_{A_{+,i}}=E_{A_{+,j}\cup A_{+,i}}\). Next, we control the second norm on the right-hand side of (125): using Proposition 3 , we have

$$\begin{aligned} \mathrm{(II)}&\le 2|B_+|\,\Vert (E_{B_+}E_{C}-E_{B_-C})(\rho )\Vert _{1}\nonumber \\&= 2|B_+|\,\Vert (E_{B+}E_C-E_{B_-C})(E_C-E_{B_-C})(\rho )\Vert _1\end{aligned}$$
(130)
$$\begin{aligned}&\le 2|B_+|\,\Vert E_{B+}E_C-E_{B_-C}\Vert _{1\rightarrow 1}\,\Vert (E_C-E_{B_-C})(\rho )\Vert _1\nonumber \\&\le 2|B_+|\,\max _{\rho '\in \mathcal {S}_V}\,\Vert E_B(E_C-E_{B_-C})(\rho ')\Vert _1\,\Vert E_C(\rho )-E_{B_-C}(\rho )\Vert _1\end{aligned}$$
(131)
$$\begin{aligned}&\le 2\,|B_+|\,|B_-C|\,\varphi (\ell )\,\Vert E_C(\rho )-E_{B_-C}(\rho )\Vert _1 \end{aligned}$$
(132)
$$\begin{aligned}&\le 2\,|B_+|\,|B_-C|\,\varphi (\ell )\,\sqrt{2\,S(E_C(\rho )\Vert E_{B_-C}(\rho ))}\,. \end{aligned}$$
(133)

Above, (130) follows form the compatibility of the conditional expectations, and (131) from \(E_{B_+}\circ E_B=E_{B_+}\) and the monotonicity of the trace-distance under such CPTP map. (132) follows from the condition of local indistinguishability when taking \(X=C\backslash B\), \(Y=B\backslash B_-\) and \(Z=B_-\), and assuming that \({\text {dist}}(X,Z)\ge \ell \). Finally, (133) follows from an application of Pinsker’s inequality. Similarly, we find

$$\begin{aligned} \mathrm{(III)}\le 2\,|A_+|\,|V_0|\,\varphi (\ell )\,\sqrt{2\,S(E_{B_-C}(\rho )\Vert E_{V_0}(\rho ))} \end{aligned}$$
(134)

Then, by inserting (129), (133) and (134) into (125), we have

$$\begin{aligned}&\Vert \rho -E_{V_0}(\rho )\Vert _{W_1}\nonumber \\&\quad \le 2\sqrt{2}\max \big \{ |V_0^{\max }|\,\sqrt{|\mathcal {I}_{\max }|}, |A_+|\,|V_0|\,\varphi (\ell ),|B_+|\,|B_-C|\varphi (\ell )\big \} \nonumber \\&\quad \,\Big \{ \sqrt{S(\rho \Vert E_C(\rho ))}+\sqrt{S(E_{C}(\rho )\Vert E_{B_+}E_C(\rho ))}+\sqrt{S(E_{B_+}E_C(\rho )\Vert E_{A_+}E_{B_+}E_C(\rho ))}\nonumber \\&\qquad + \sqrt{S(E_{B_-C}(\rho )\Vert E_{V_0}(\rho ))}+\sqrt{S(E_C(\rho )\Vert E_{B_-C}(\rho ))}\Big \}\nonumber \\&\quad \le 2\sqrt{10}\max \big \{ |V_0^{\max }|\,\sqrt{|\mathcal {I}_{\max }|}, |A_+|\,|V_0|\,\varphi (\ell ),|B_+|\,|B_-C|\varphi (\ell )\big \}\nonumber \\&\quad \Big (S(\rho \Vert E_C(\rho ))+S(E_{C}(\rho )\Vert E_{B_+}E_C(\rho ))+S(E_{B_+}E_C(\rho )\Vert E_{A_+}E_{B_+}E_C(\rho ))\nonumber \\&\qquad +S(E_C(\rho )\Vert E_{B_-C}(\rho ))+S(E_{B_-C}(\rho )\Vert E_{V_0}(\rho ))\Big )^{\frac{1}{2}}\end{aligned}$$
(135)
$$\begin{aligned}&\quad \le 2 \sqrt{20}\max \big \{ |V_0^{\max }|\,\sqrt{|\mathcal {I}_{\max }|}, |A_+|\,|V_0|\,\varphi (\ell ),|B_+|\,|B_-C|\varphi (\ell )\big \}S(\rho \Vert E_{V_0}(\rho ))^{\frac{1}{2}} \end{aligned}$$
(136)

where (135) is another directly application of Jensen’s inequality for \(x\mapsto x^{\frac{1}{2}}\), whereas (136) follows from two uses of the chain rule (123) after adding the positive term \(S(E_{A_+}E_{B_+}E_C(\rho )\Vert E_{V_0}(\rho ))\) to the square root and a final use of the data processing inequality. The result then follows after choosing the length \(\ell :=\varphi ^{-1}(|V_0|^{-3/2})\) so that

$$\begin{aligned} \max \big \{|A_+|\,|V_0|,\,|B_+|\,|B_-C|\big \}\,\varphi (\ell )\le |V_0|^2\varphi (\ell )\le |V_0^{\max }|\,\sqrt{|\mathcal {I}_{\max }|}\,. \end{aligned}$$

With this choice, and estimating \(|V_0^{\max }|\le (7\ell )^D\) the bound found in (136) can be further controlled by

$$\begin{aligned} \Vert \rho -E_{V_0}(\rho )\Vert _{W_1}&\le 2\sqrt{20}\,(7\ell )^{D}\,\sqrt{|V_0|\,S(\rho \Vert E_{V_0}(\rho ))} \end{aligned}$$
(137)
$$\begin{aligned}&\le 2\sqrt{20}\,(7\varphi ^{-1}(|V_0|^{-3/2}))^D\sqrt{|V_0|\,S(\rho \Vert E_{V_0}(\rho ))}\,. \end{aligned}$$
(138)

\(\square \)

6.2 Classical Case

In this section, we restrict our analysis to classical conditional expectations and probability measures. In this setting, it is easy to see that the property of local indistinguishability is implied by the following condition. Here, with a slight abuse of notations, we will use the same symbol for a probability measure \(\mu \) on the Borel sets of \([d]^V\) and its corresponding probability mass function.

Definition 4

(Local indistinguishability, classical case.) Let \(\mu \) be a probability measure on \([d]^{V}\), and \(\{\mu _C\}_{C\subseteq V}\) be a set of compatible conditional probability measures \(\mu _C(.|x_{{C}^c})\) acting on the sets \([d]^C\), i.e., they satisfy the property that for any \(C\subseteq C'\), \({\mathbb {E}}_{\mu _C}\circ {\mathbb {E}}_{\mu _{C'}}={\mathbb {E}}_{\mu _{C'}}\circ {\mathbb {E}}_{\mu _C}= {\mathbb {E}}_{\mu _{C'}}\). Then, we say that the measure \(\mu \) satisfies local indistinguishability if there exists a fast decaying function \(\varphi :{\mathbb {N}}\rightarrow {\mathbb {R}}_+\) such that for every regions \(V'=XYZ\subset V\) such that \({\text {dist}}(i,j)\ge \ell \) for any \(i\in X\) and \(j\in Z\),

$$\begin{aligned}&\max _{x_{X}\in [d]^{X}}\,\max _{x_{{V'}^c}\in [d]^{{V'}^c}}\,\, \sum _{y_{V'}} \big |\mu _{Y|X{V'}^c}(y_Y|x_Xx_{{V'}^c})\mu _{XY|ZV'^c}(y_{XY}|y_Zx_{V'^c})\\&\quad -\mu _{V'|{V'}^c}(y_{V'}|x_{{V'}^c})\big |\le |V'| \varphi (\ell ) \,, \end{aligned}$$

where \(\partial Z\) denotes the boundary of Z.

Corollary 1

Let \(\mu \) be a probability measure on \([d]^{V}\) satisfying local indistinguishability with fast decaying function \(\varphi \). Then, for all \(\nu<\!<\mu \),

$$\begin{aligned} W_1(\nu ,\mu )\le 2\sqrt{20}\,(7\varphi ^{-1}(n^{-3/2}))^D\sqrt{n\,S(\nu \Vert \mu )}\,. \end{aligned}$$
(139)

Equivalently, the measure \(\mu \) satisfies the following sub-Gaussian tail: for any function f such that \(\Vert f\Vert _L\le 1\),

$$\begin{aligned} {\mathbb {P}}_\mu \Big (|f(X)-{\mathbb {E}}_\mu [f(X)]|>t \Big )\le 2\,\exp \left( -\frac{t^2}{80n(7\varphi ^{-1}(n^{-3/2}))^{2D}}\right) \,. \end{aligned}$$
(140)

7 Gaussian Concentration

As mentioned before, the classical transportation cost inequalities for a measure \(\mu \) are equivalent to the sub-Gaussian bounds on the tail probability of any Lipschitz function f of a random variable X drawn according to \(\mu \). One way to see this is by using the variational formulation of the relative entropy in order to bound the Laplace transform of f(X). In the non-commutative setting, this leads to the following characterization of the transportation cost constant \(C(\omega )\):

Proposition 11

For any \(\omega \in \mathcal {S}_V\),

$$\begin{aligned} C(\omega ) = 4\sup _{K\in \mathcal {O}_V}\frac{\ln {\mathrm {Tr}}\exp \left( K + \ln \omega \right) - {\mathrm {Tr}}\left[ \omega \,K\right] }{\left\| K\right\| _L^2}\,, \end{aligned}$$
(141)

and the \(\sup \) can be restricted to \(K\in \mathcal {O}_V\) such that \({\mathrm {Tr}}\left[ \omega \,K\right] =0\).

Proof

Let \({\tilde{C}}(\omega )\) be the right-hand side of (141). On the one hand, let \(K\in \mathcal {O}_V\) satisfy \({\mathrm {Tr}}\left[ \omega \,K\right] = 0\), and let

$$\begin{aligned} \rho = \frac{\exp \left( K + \ln \omega \right) }{{\mathrm {Tr}}\exp \left( K + \ln \omega \right) } \in \mathcal {S}_V\,. \end{aligned}$$
(142)

We have

$$\begin{aligned} \ln {\mathrm {Tr}}\exp \left( K + \ln \omega \right)= & {} {\mathrm {Tr}}\left[ \rho \,K\right] - S(\rho \Vert \omega ) \le \left\| \rho - \omega \right\| _{W_1}\left\| K\right\| _L\nonumber \\&- \frac{\left\| \rho - \omega \right\| _{W_1}^2}{C(\omega )} \le \frac{C(\omega )\left\| K\right\| _L^2}{4}\,, \end{aligned}$$
(143)

therefore \({\tilde{C}}(\omega )\le C(\omega )\).

On the other hand, let \(\rho \in \mathcal {S}_V\), and let \(K\in \mathcal {O}_V\) such that

$$\begin{aligned} \left\| K\right\| _L = \frac{2\left\| \rho - \omega \right\| _{W_1}}{{\tilde{C}}(\omega )}\,,\qquad {\mathrm {Tr}}\left[ \omega \,K\right] =0\,,\qquad {\mathrm {Tr}}\left[ \rho \,K\right] = \frac{2\left\| \rho - \omega \right\| _{W_1}^2}{{\tilde{C}}(\omega )}\,. \nonumber \\ \end{aligned}$$
(144)

We have

$$\begin{aligned} S(\rho \Vert \omega ) \ge {\mathrm {Tr}}\left[ \rho \,K\right] - \ln {\mathrm {Tr}}\exp \left( K + \ln \omega \right) \ge \frac{\left\| \rho - \omega \right\| _{W_1}^2}{{\tilde{C}}(\omega )}\,, \end{aligned}$$
(145)

where the last inequality follows from the definition of \({\tilde{C}}(\omega )\), therefore \(C(\omega )\le {\tilde{C}}(\omega )\), and the claim follows. \(\square \)

In the tracial setting [10], and more generally whenever \([K,\omega ]=0\) the quantity \(\mathrm {Tr}\exp (K+\ln \omega )\) can be interpreted as the Laplace transform of K in the state \(\omega \), and therefore the equivalence between Gaussian concentration and the transportation cost inequality holds. However, this is no longer true when K and \(\omega \) do not commute, and the following bound can turn out to be strictly stronger to the transportation cost inequality as a consequence of the Golden–Thompson inequality: for any \(K\in \mathcal {O}_V\) such that \(\mathrm {Tr}[\omega K]=0\),

$$\begin{aligned} \ln \mathrm {Tr}\left[ \omega \,e^K\right] \le \,\frac{C'(\omega )}{4} \Vert K\Vert _L^2\,. \end{aligned}$$
(146)

In other words, \(C'(\omega )\ge C(\omega )\). Recently, bounds of the form of (146) were obtained for some subclasses of Lipschitz observables K (typically local observables) when the \(\omega \) is the Gibbs state of a (possibly non-commuting) quasi-local Hamiltonian H [43] using cluster expansion techniques. However, the existence of the Gaussian concentration inequality for general Lipschitz observables was left open.

Here instead, we pursue a different approach using our transportation cost inequality. In particular, we prove that (146) can be approximately recovered for Gibbs states of commuting Hamiltonians for a larger class of Lipschitz observables than those considered in [43]. For this, we adapt the result of [13, Theorem 8] which was written for \(W_{1,\nabla }\) to the case of \(W_1\). In this section, we denote by \(X_R\), respectively, \(X_I\) the real, respectively, imaginary parts of an operator \(X\in \mathcal {B}_V\). Given an observable \(O\in \mathcal {O}_V\) with spectral decomposition \(O:=\sum _{\lambda }\lambda P_\lambda \), a state \(\omega \in \mathcal {S}_V\) and a real number \(r\in {\mathbb {R}}\), we denote by

$$\begin{aligned} {\mathbb {P}}_\omega (O\ge r):=\sum _{\lambda \ge r}\mathrm {Tr}(\omega P_\lambda ) \end{aligned}$$
(147)

the probability of getting an eigenvalue \(\lambda \ge r\) when measuring O on the state \(\omega \).

Theorem 7

Assume that the full-rank state \(\omega \in \mathcal {S}_V\) satisfies \({\text {TC}}(c)\) for some \(c>0\). Then, for any observable \(O\in \mathcal {O}_V\),

$$\begin{aligned} {\mathbb {P}}_{\omega }\big (|O- \mathrm {Tr}({\omega O})\,{\mathbb {I}}|\ge r \big )\le 2\,\exp \left( -\frac{r^2}{4c\max \big \{\Vert (\omega ^{-\frac{1}{2}}O\omega ^{\frac{1}{2}})_R\Vert _{L}^2,\Vert (\omega ^{-\frac{1}{2}}O\omega ^{\frac{1}{2}})_I\Vert _{L}^2\big \}}\right) \,. \end{aligned}$$
(148)

Whenever \([O,\omega ]=0\) the bound can be tightened into

$$\begin{aligned} {\mathbb {P}}_{\omega }\big (|O- \mathrm {Tr}({\omega O})\,{\mathbb {I}}|\ge r \big )\le 2\,\exp \left( -\frac{r^2}{c \Vert O\Vert _{L}^2}\right) \,. \end{aligned}$$
(149)

Therefore, whenever \(\omega \in \mathcal {S}_V\) corresponds to the Gibbs state of a local commuting Hamiltonian on a hypergraph at inverse temperature \(\beta \), the above bounds hold as long as \(0<\beta < \beta _c\) where \(\beta _c\) is defined in (85).

Proof

Given \(X\in \mathcal {B}_V\), we denote by \(X:=X_R+iX_I\) its decomposition onto real and imaginary parts. We also assume that \(\mathrm {Tr}(\omega X)=0\) and \(\Vert X_R\Vert _L,\Vert X_I\Vert _L\le 1\). By assumption, we have that for any \(\rho \in \mathcal {S}_V\)

$$\begin{aligned} \left| \mathrm {Tr}(\rho X)\right| \le \left| \mathrm {Tr}(\rho X_R)\right| +\left| \mathrm {Tr}(\rho X_I)\right| \le 2\left\| \rho -\omega \right\| _{W_1}\le 2\sqrt{c\,S(\rho \Vert \omega )}\,. \end{aligned}$$
(150)

Then, since \(\inf _{\theta >0}\Big (\frac{a}{\theta }+\frac{b\theta }{2}\Big )=\sqrt{2ab}\) for any \(a,b\ge 0\), we have that for all \(\theta >0\):

$$\begin{aligned} \big |\mathrm {Tr}(\rho X)\big |\le \sqrt{2}\,\frac{S(\rho \Vert \omega )}{\theta }+\frac{c\,\theta }{\sqrt{2}}\qquad \Leftrightarrow \qquad \theta \big |\mathrm {Tr}(\rho X)\big |- \frac{c}{\sqrt{2}}\,\theta ^2\le \sqrt{2}\,S(\rho \Vert \omega )\,. \end{aligned}$$
(151)

Next, we further upper bound the relative entropy in terms of the maximal divergence \({\widehat{S}}(\rho \Vert \omega ):=\mathrm {Tr}\big [\omega \big (\omega ^{-\frac{1}{2}}\rho \omega ^{-\frac{1}{2}}\big )\ln (\omega ^{-\frac{1}{2}}\rho \omega ^{-\frac{1}{2}}) \big ]\) [44]. Choosing \(\rho =\omega ^{\frac{1}{2}}e^{\theta O}\omega ^{\frac{1}{2}}/\mathrm {Tr}(\omega e^{\theta O})\) for some observable \(O\in \mathcal {O}_V\), we arrive at

$$\begin{aligned} \theta \big |\mathrm {Tr}(\rho X)\big |- \frac{c}{\sqrt{2}}\,\theta ^2\le \sqrt{2}\,\theta \frac{\mathrm {Tr}(\omega e^{\theta O}O)}{\mathrm {Tr}(\omega e^{\theta O})}-\sqrt{2}\ln \big (\mathrm {Tr}(\omega e^{\theta O})\big )\,. \end{aligned}$$
(152)

Next, we choose \(X=\sqrt{2}\omega ^{-\frac{1}{2}}O\omega ^{\frac{1}{2}}\), so that the previous inequality reduces to

$$\begin{aligned} \ln \big (\mathrm {Tr}(\omega e^{\theta O})\big )\le \frac{c}{2}\,\theta ^2\,. \end{aligned}$$
(153)

The above inequality can be interpreted as a bound on the log-Laplace transform of the non-commutative variable O in the state \(\omega \). By a use of Markov’s inequality followed by an optimization over the variable \(\theta >0\), we finally get

$$\begin{aligned} {\mathbb {P}}_\omega \big (\big |O\big |\ge r\big )\le 2e^{-\frac{r^2}{2c}}\,. \end{aligned}$$
(154)

The result follows after simple rescalings. The tightening in the case of an observable commuting with \(\omega \) can be found by following the same steps as the ones above. \(\square \)

In general, there is no way to precisely relate the Lipschitz constants of the real and imaginary parts of \(\omega ^{-\frac{1}{2}}O\omega ^{\frac{1}{2}}\) to the Lipschitz constant of O when \([O,\omega ]\ne 0\). In the next result, we, however, prove that the constants have similar scalings in the case of a commuting Gibbs measure \(\omega \) of a local Hamiltonian.

Lemma 7

Let \(O=\sum _{A\subseteq V} \lambda _A\,O_A\otimes {\mathbb {I}}_{A^c}\) be the decomposition of an observable O in \(\mathcal {O}_V\), where for each subregion A, \(O_A\) is exactly supported in A with \(\Vert O_A\Vert _\infty \le 1\), and \(\lambda _A\in {\mathbb {R}}\). Let further \(\omega \) be the Gibbs state of a geometrically k-local, commuting Hamiltonian \(H_V:=\sum _{B\subset V}h_B\otimes {\mathbb {I}}_{B^c}\) at inverse temperature \(\beta \). Then,

$$\begin{aligned} \Vert (\omega ^{-\frac{1}{2}}O\omega ^{\frac{1}{2}})_{R}\Vert _{L}\,,~~ \Vert (\omega ^{-\frac{1}{2}}O\omega ^{\frac{1}{2}})_{I}\Vert _{L}\le 4\max _{i\in V}\,\sum _{\begin{array}{c} A\subset V\\ A_{\partial k}\ni i \end{array}}\,|\lambda _A|\,\exp \Big (\beta \sum _{B\cap A\ne \emptyset }\Vert h_B\Vert _\infty \Big )\,, \end{aligned}$$
(155)

where \(A_{\partial k}:=\{j\in V:\, {\text {dist}}(j,A)\le k\}\) denotes the k-enlargement of A. In particular, whenever the state \(\omega \) satisfies \({\text {TC}}(c)\) with \(c=O(n)\), any local observable O gives rise to a sub-Gaussian random variable variance O(n) when measured in the state \(\omega \).

Proof

We prove the bound for the real part of \((\omega ^{-\frac{1}{2}}O\omega ^{\frac{1}{2}})_{R}\) since the proof for the imaginary part follows the exact same reasoning. First, by Proposition 6, since for any \(A\subset V\), \((\sigma ^{-\frac{1}{2}}O_A\sigma ^{\frac{1}{2}})_R\) is supported in region \(A_{\partial k}\), we have that

$$\begin{aligned} \Vert (\omega ^{-\frac{1}{2}}O\omega ^{\frac{1}{2}})_{R}\Vert _{L}&\le 2\,\max _{i\in V}\,\Big \Vert \sum _{\begin{array}{c} A\subset V\\ A_{\partial k}\ni i \end{array}} \lambda _A\,\Big [(\omega ^{-\frac{1}{2}}O_A\omega ^{\frac{1}{2}})_R-\mathrm {Tr}_i\big [(\omega ^{-\frac{1}{2}}O_A\omega ^{\frac{1}{2}})_R \big ]\otimes \frac{{\mathbb {I}}_d}{d} \Big ]\Big \Vert _\infty \end{aligned}$$
(156)
$$\begin{aligned}&\le 4\,\max _{i\in V}\sum _{\begin{array}{c} A\subset V\\ A_{\partial k\ni i} \end{array}}\,|\lambda _A|\,\Vert \omega ^{-\frac{1}{2}}O_A\omega ^{\frac{1}{2}}\Vert _\infty \end{aligned}$$
(157)
$$\begin{aligned}&\le 4\,\max _{i\in V}\sum _{\begin{array}{c} A\subset V\\ A_{\partial k\ni i} \end{array}}\,|\lambda _A|\,\Vert e^{\beta \sum _{B\cap A\ne \emptyset }h_B}O_Ae^{-\beta \sum _{B\cap A\ne \emptyset }h_B}\Vert _\infty \end{aligned}$$
(158)
$$\begin{aligned}&\le 4\,\max _{i\in V}\sum _{\begin{array}{c} A\subset V\\ A_{\partial k\ni i} \end{array}}\,|\lambda _A|\,e^{\beta \sum _{B\cap A\ne \emptyset }\Vert h_B\Vert _\infty }\,. \end{aligned}$$
(159)

The result follows. \(\square \)

7.1 Comparison to Previous Tail Bounds

Our main result can be compared to other recently derived concentration bounds for quantum Gibbs states: in [45, Corollary 5.4], the authors consider a product state \(\rho =\bigotimes _{v\in V}\rho _v\in \mathcal {S}_V\) as well as a Hamiltonian \(H=\sum _{A\in E_{k,m}}h_A\), where the set \(E_{k,m}\) of subsets of V has the following properties: for any \(A\in E_{k,m}\),

  1. (i)

    \(|A|\le k\);

  2. (ii)

    \(\big |\{A'\in E_{k,m}:\,A\cap A'\ne \emptyset \}\big |\le m\).

With these conditions, he was able to prove that

$$\begin{aligned} {\mathbb {P}}_{\rho }\big (|H-\mathrm {Tr}(\rho H)|\ge r\big )\le 2e^{-\frac{r^2}{4eN^3kn}}\,, \end{aligned}$$

where number \(N:=\max _{v\in V}\sum _{A\in E_{k,m}|v\in A}\,1\) is the number of local terms acting non-trivially on spin v. A similar bound was previously derived by Kuwahara [46, Theorem 7], under a notion of g-extensivity: a local Hamiltonian H is said to be g-extensive if for every spin v, \(\sum _{A\in E_{k,m}|\,v\in A} \Vert h_A\Vert _\infty \le g\). Under this condition, he shows that

$$\begin{aligned} {\mathbb {P}}_\rho \big ( |H-\mathrm {Tr}(\rho H)|\ge r\big )=\mathcal {O}(1)\,e^{-\frac{r^2}{cn\log (\frac{r}{\sqrt{n}})}}\,, \end{aligned}$$

where c is a \(\mathcal {O}(1)\) constant which depends only on k and g. Although these results recover the Gaussian tails of our Theorem 7 (up to logarithmic overheads), they only work for tensor product states and a subclass of Lipschitz observables. In particular, the tails become trivial whenever the Hamiltonian is a sum of terms acting on non-intersecting regions A of arbitrary size. In contrast, our bound is still non-trivial for this class of observables, since their Lipschitz constant is still \(\mathcal {O}(1)\).

More recently, Kuwahara and Saito derived new concentration bounds for Gibbs states of interacting Hamiltonians in order to study the problem of equivalence of quantum statistical ensembles [43, 47] (see Sect. 8): in [47] first, the authors consider a Gibbs state \(\omega \) of a local Hamiltonian on a D-dimensional regular lattice \({\mathbb {Z}}^D\). They further assume the following \((r_0,\xi )\) clustering: for any operators \(O_A,O_B\) supported on the subsets A and B,

$$\begin{aligned} |\mathrm {Tr}(\omega O_AO_B)-\mathrm {Tr}(\omega O_A)\mathrm {Tr}(\omega O_B)|\le \Vert O_A\Vert _\infty \,\Vert O_B\Vert _\infty \,e^{-{\text {dist}}(A,B)/\xi }, \end{aligned}$$
(160)

whenever \({\text {dist}}(A,B)\ge r_0\). Under this condition, they were able to show in Equation (S.17) (see also [45, Theorem 4.2] for a similar bound)

$$\begin{aligned} {\mathbb {P}}_\omega \big (|O-\mathrm {Tr}(\omega O)|\ge r\big )\le \min \big \{1,(e+3e\xi )\max \big (e^{-(r^2/(cn))^{\frac{1}{D+1}}},\,e^{-\frac{r^2}{c' \ell ^Dn}}\big )\big \}\,, \end{aligned}$$

for some constants \(c,c'\) further depending on \(\xi \) and D, and where \(\ell \) denotes the locality of the observable O. Therefore, and although the clustering of correlations is known to hold at high enough temperature [48], the bound is suboptimal for two reasons: firstly, whenever r is small enough, the exponent has the worse scaling \(r^{2/(D+1)}\). Secondly, the bounds badly dependence on the locality of O, and becomes trivial whenever O is a sum of highly non-local terms. This second limitation also holds for the Gaussian concentration bound found in [43, Corollary 1] for high-temperature Gibbs states of Hamiltonians with long-range interactions. In comparison with the works cited above, our bound always provides better dependence of the tail on the locality of the observable, albeit under the condition that the Hamiltonian is made of local commuting terms.

8 Equivalence of Statistical Mechanical Ensembles

The three main ensembles employed in quantum statistical mechanics to compute the equilibrium properties of quantum systems are the canonical ensemble, the microcanonical ensemble and the diagonal ensemble. The quantum state associated with the canonical ensemble is the Gibbs state, which describes the physics of a system that is at thermal equilibrium with a large bath at a given temperature. The diagonal and microcanonical ensembles both describe the physics of an isolated quantum system, and the associated states are convex combinations of the eigenstates of the Hamiltonian. The microcanonical ensemble assumes a uniform probability distribution for the energy in a given energy shell. The diagonal ensemble includes all the states that are diagonal in the eigenbasis of the Hamiltonian, and in particular it includes the eigenstates themselves.

For many quantum systems, the microcanonical and canonical ensembles give the same expectation values for local observables if the corresponding states have approximately the same average energy. A lot of effort has been devoted to determining conditions under which the two ensembles are equivalent [49,50,51,52]. The most prominent among such conditions are short ranged interactions and a finite correlation length, but analytical proofs can be obtained only in the case of regular lattices [52]. The situation is more complex for the diagonal ensemble. The condition under which this ensemble is equivalent to the microcanonical and canonical ensembles is called Eigenstate Thermalization Hypothesis (ETH) [53,54,55,56,57], and states that the expectation values of local observables on the eigenstates of the Hamiltonian are a smooth function of the energy, i.e., for any given local observable, any two eigenstates with approximately the same energy yield approximately the same expectation value. The ETH is an extremely strong condition on the Hamiltonian and several quantum systems, including all integrable systems, do not satisfy it. A weak version of the ETH has been formulated [47, 58], stating that for any given local observable, most eigenstates in an energy shell yield approximately the same expectation value, or, more precisely, that the fraction of eigenstates yielding expectation values far from the Gibbs state with the same average energy vanishes in the thermodynamical limit. The weak ETH implies the equivalence between the canonical and microcanonical ensembles, but is not sufficient to prove their equivalence with the diagonal ensemble. Under the hypothesis of finite correlation length in the Gibbs state, an analytical proof of the weak ETH is available only for regular lattices [47].

A connection between a transportation cost inequality and the ETH was made by one of the authors in the case of a regular lattice and a nearest neighbour Hamiltonian [38]. Here, we look at the general problem of the equivalence of the statistical mechanical ensembles and of the weak ETH from the perspective of optimal mass transport and show that such equivalence can be formulated as closeness of the respective states in the \(W_1\) distance. The closeness in the \(W_1\) distance implies closeness of the expectation values of all Lipschitz observables, which constitute a significantly larger class than local observables. Therefore, the perspective of optimal mass transport can significantly extend the previous results. Moreover, we will show that the equivalence of the ensembles is intimately linked to the constant of the transportation cost inequality for the Gibbs states.

As in the rest of the paper, we consider a quantum system made by n qudits located at the vertices of a graph with vertex set V. Let us assume that a Gibbs state \(\omega \in \mathcal {S}_V\) satisfies the transportation cost inequality with a constant

$$\begin{aligned} C(\omega ) \le n\,C\,, \end{aligned}$$
(161)

where C does not depend on n. This condition is satisfied under the hypotheses of Theorem 2, Theorem 4 or Theorem 5. We stress that, contrarily to the results of Refs. [47, 52], the condition does not require us to restrict to regular lattices, since Theorem 4 does not need this hypothesis. The following Proposition 12 implies that any state \(\rho \in \mathcal {S}_V\) is close in \(W_1\) distance to the Gibbs state \(\omega \) with the same average energy, provided that \(\rho \) and \(\omega \) have approximately the same entropy, i.e.,

$$\begin{aligned} S(\omega ) - S(\rho ) \ll n\,. \end{aligned}$$
(162)

Moreover, under the same hypothesis, the average reduced states over one qudit of \(\rho \) and \(\omega \) are close in trace distance.

Proposition 12

Let \(\omega \in \mathcal {S}_V\) be a Gibbs state for the Hamiltonian \(H\in \mathcal {O}_V\). Then, any quantum state \(\rho \in \mathcal {S}_V\) with the same average energy as \(\omega \) satisfies

$$\begin{aligned} \frac{1}{n}\left\| \rho -\omega \right\| _{W_1} \le \sqrt{C\,\frac{S(\omega ) - S(\rho )}{n}}\,. \end{aligned}$$
(163)

Moreover, let \(\Lambda :\mathcal {O}_V\rightarrow \mathcal {O}({\mathbb {C}}^d)\) be the quantum channel that computes the average marginal state over one qudit, i.e., for any \(\rho \in \mathcal {S}_V\),

$$\begin{aligned} \Lambda (\rho ) = \frac{1}{n}\sum _{v\in V}\rho _v\,. \end{aligned}$$
(164)

Then,

$$\begin{aligned} \left\| \Lambda (\rho ) - \Lambda (\omega )\right\| _1 \le 2\sqrt{C\,\frac{S(\omega ) - S(\rho )}{n}}\,. \end{aligned}$$
(165)

Proof

We have from the transportation cost inequality

$$\begin{aligned} \left\| \rho -\omega \right\| _{W_1} \le \sqrt{C(\omega )\,S(\rho \Vert \omega )} = \sqrt{C(\omega )\left( S(\omega ) - S(\rho )\right) }\,, \end{aligned}$$
(166)

where the last equality follows since \({\mathrm {Tr}}\left[ \rho \ln \omega \right] = {\mathrm {Tr}}\left[ \omega \ln \omega \right] \).

We have from Proposition 7

$$\begin{aligned} \left\| \Lambda (\rho ) - \Lambda (\omega )\right\| _1 \le \frac{1}{n}\sum _{v\in V}\left\| \rho _v - \omega _v\right\| _1 \le \frac{2}{n}\left\| \rho - \omega \right\| _{W_1} \le \frac{2}{n}\sqrt{C(\omega )\left( S(\omega ) - S(\rho )\right) }\,, \end{aligned}$$
(167)

and the claim follows. \(\square \)

Choosing \(\rho \) to be diagonal in the eigenbasis of the Hamiltonian, Proposition 12 implies that any convex combination of a sufficiently large number of eigenstates is close in \(W_1\) distance to the Gibbs state with the same average energy. Such number of eigenstates can even be an exponentially small fraction of the total number of eigenstates appearing in a microcanonical state, since the uniform superposition of a fraction \(e^{-n\epsilon }\) of the eigenstates decreases the entropy by \(\epsilon \,n\). Therefore, Proposition 12 constitutes an exponential improvement over the weak ETH.

A natural question is whether also the strong ETH can be captured by the \(W_1\) distance. Unfortunately the answer is negative. Indeed, proving the strong ETH via optimal mass transport would mean to prove that all the eigenstates of the Hamiltonian are close in \(W_1\) distance to the Gibbs states with the corresponding average energy. However, Theorem 1 implies that any state with low entropy, and in particular any pure state, is far from any state with large entropy, and in particular from a Gibbs state with temperature \(\Omega (1)\). More precisely, for any two states \(\rho ,\,\omega \in \mathcal {S}_V\),

$$\begin{aligned} \left\| \rho -\omega \right\| _{W_1} \ge \frac{S(\omega ) - S(\rho ) - \ln \left( n+1\right) - 1}{\ln \left( d^2n\right) }\,. \end{aligned}$$
(168)

Equation (168) also implies that any quantum state which is close in \(W_1\) distance to the Gibbs state with the same average energy must have approximately also the same entropy, and in this sense Proposition 12 is optimal.

8.1 Comparison with Previous Results

To make our result more easily comparable to the literature, let us introduce more formally the microcanonical ensemble: given the decomposition \(H=\sum _E EP(E)\), we define the microcanonical ensemble state

$$\begin{aligned} \omega _{E,\Delta }:=\frac{P(E,\Delta )}{\mathrm {Tr}(P(E,\Delta ))}\,, \end{aligned}$$

where \(P(E,\Delta )\) corresponds to the projection onto the subspace spanned by the eigenvectors whose energy belongs to the interval \((E-\Delta ,E]\).

Corollary 2

Assume the Gibbs state \(\omega \) satisfies \(C(\omega ) \le Cn\). Then, for any Lipschitz observable O,

$$\begin{aligned} \frac{1}{n}\,\big |\mathrm {Tr}(\omega O)-\mathrm {Tr}(\omega _{E,\Delta } O)\big |\le \Vert O\Vert _L\,o_{n\rightarrow \infty }(1)\,. \end{aligned}$$

Proof

In view of Proposition 12, it suffices to control the relative entropy between the microcanonical and canonical ensemble states. Then,

$$\begin{aligned} S(\omega _{E,\Delta }\Vert \omega )=\ln \frac{\mathrm {Tr}(e^{-\beta H})}{\mathrm {Tr}(P(E,\Delta ))}+\beta \mathrm {Tr}\Big [H\,\frac{P(E,\Delta )}{\mathrm {Tr}(P(E,\Delta ))}\Big ]&\le \beta E+\ln \Big [\frac{\mathrm {Tr}(e^{-\beta H})}{\mathrm {Tr}(P(E,\Delta ))}\Big ]\,. \end{aligned}$$
(169)

Next, we control the ration \(\frac{\mathrm {Tr}(e^{-\beta H})}{\mathrm {Tr}(P(E,\Delta ))}\). For this, we use an argument which was already used in [47, Equation (S.56)]: First, we have found in (149) that

$$\begin{aligned} {\mathbb {P}}_\omega (|H-\mathrm {Tr}(\omega H){\mathbb {I}}|\ge r)\le 2\,e^{-\frac{r^2}{Cn\Vert H\Vert _L^2}}\,. \end{aligned}$$

Therefore, choosing the interval \({\tilde{\Delta }}:=(\mathrm {Tr}(\omega H)-\sqrt{Cn\ln (4)}\Vert H\Vert _L,\mathrm {Tr}(\omega H)+\sqrt{Cn\ln (4)}\Vert H\Vert _L]\), we have

$$\begin{aligned} \mathrm {Tr}\Big [\sum _{E\in {\tilde{\Delta }}}\,\frac{e^{-\beta E}}{\mathrm {Tr}(e^{-\beta H})} P(E)\Big ]\ge \frac{1}{2} \end{aligned}$$

Next, we define

$$\begin{aligned} {\tilde{Z}}:=\mathrm {Tr}\Big [\sum _{E\in {\tilde{\Delta }}} e^{-\beta E}P(E)\Big ]\ge \frac{\mathrm {Tr}(e^{-\beta H})}{2} \end{aligned}$$
(170)

Choosing a slightly extended interval \({\tilde{\Delta }}':=(\mathrm {Tr}(\omega H)-\sqrt{Cn\ln (4)}\Vert H\Vert _L-\Delta ,\mathrm {Tr}(\omega H)+\sqrt{Cn\ln (4)}\Vert H\Vert _L+\Delta ]\), we have

$$\begin{aligned} {\tilde{Z}}\le \sum _{\nu \in {\mathbb {Z}}:\,\nu \Delta \in {\tilde{\Delta }}'}\mathrm {Tr}(P(\nu \Delta ,\Delta ))\,e^{-\beta \Delta (\nu -1)} \end{aligned}$$
(171)

Now, for \(E:={\text {argmax}} e^{-\beta E}\mathrm {Tr}(P(E,\Delta ))\), we have

$$\begin{aligned} \mathrm {Tr}(\nu \Delta ,\Delta )e^{-\beta \delta (\nu -1)}\le e^{\beta \Delta }e^{-\beta E}\,\mathrm {Tr}(P(E,\Delta ))\,. \end{aligned}$$

Replacing in (171), we have that

$$\begin{aligned} {\tilde{Z}}\le e^{\beta \Delta }(2+2\sqrt{Cn\ln (4)}\Vert H\Vert _L/\Delta )e^{-\beta E}\mathrm {Tr}(P(E,\Delta )) \end{aligned}$$

Finally, using the lower bound (170), we have that

$$\begin{aligned} \ln \Big [\frac{\mathrm {Tr}(e^{-\beta H})}{\mathrm {Tr}(P(E,\Delta ))}\Big ]\le \beta (\Delta -E)+\ln (4+4\sqrt{Cn\ln (4)}\Vert H\Vert _L/\Delta ) \end{aligned}$$

Therefore, plugging this last bound into (169), we have found that

$$\begin{aligned} S(\omega _{E,\Delta }\Vert \omega )\le \beta \Delta +\ln (4+4\sqrt{Cn\ln (4)}\Vert H\Vert _L/\Delta )\,. \end{aligned}$$

Therefore, \( S(\omega _{E,\Delta }\Vert \omega )=o(n)\) whenever \(\Delta =e^{-o(n)}\), and the result follows.

\(\square \)

In [47, Theorem 2], it is showed that, under the \((r_0,\xi )\)-clustering of correlations (160), for any observable \(O:=\sum _{v}O_v\) where each \(O_v\) acts on spin v as well as other spins w with \({\text {dist}}(v,w)\le \ell \) and has \(\Vert O_v\Vert _\infty \le 1\),

$$\begin{aligned} \frac{1}{n}\, \big |\mathrm {Tr}(\omega _{E,\Delta } O)-\mathrm {Tr}(\omega O) \big |\le \frac{1}{\sqrt{n}}\, \max (c_1B_1,c_2B_2), \end{aligned}$$

where \(B_1:=\log (\sqrt{n}/\Delta )^{\frac{d+1}{2}}\), \(B_2:=(\ell ^D\log (\sqrt{n}/\Delta ))^{\frac{1}{2}}\), and the constants \(c_1\) and \(c_2\) depend on \(D,r_0,\xi \) and the locality k of H. Therefore, as long as the energy shell \(\Delta \) is chosen as \(\Delta \sim e^{-\mathcal {O}(n^{\frac{1}{D+1}})}\) the averages of the operator density \(\frac{O}{n}\) in the canonical and microcanonical ensemble states converge to the same number as \(n\rightarrow \infty \). Similar bounds were also derived in [43, Corollary 3] for larger classes of non-local Hamiltonians and observables above some threshold temperature. Corollary 2 constitutes an improvement over these results in two senses: Firstly, it applies to a more general class of Lipschitz observables. Secondly, it allows for a smaller energy shell \(\Delta =e^{-o(n)}\). However, the condition \({\text {TC}}(Cn)\) is currently only known to hold for the smaller class of local commuting Hamiltonians.