1 Introduction

1.1 Context and motivation

Tracial von Neumann algebras have long been viewed as a non-commutative analog of probability spaces, where the elements of the von Neumann algebra play the role of non-commuting random variables, but it was Voiculescu who pointed out that free products of operator algebras provide an analog of probabilistic independence with its own central limit theorem [73, 74], initiating the discipline of free probability theory. Free probability has since had many applications both to random matrix theory e.g. [75] and to von Neumann algebras e.g. [78]. Many developments in free probability theory have been motivated by information geometry (here by “information geometry” we mean the study of the space \(\mathcal {P}(M)\) of probability measures on a manifold M, both as a metric space with the Wasserstein distance and as a formal Riemannian manifold, as well as the study of entropy and Fisher’s information as functions on \(\mathcal {P}(M)\); see [44, 46, 56, 57]). For instance, Voiculescu introduced free entropy and Fisher information [76, 77, 79] and Biane and Voiculescu [11] defined an analog of the \(L^p\) Wasserstein distance for non-commutative laws (the analog of probability distributions for m-tuples of non-commuting random variables), which was then used in free Talagrand inequalities [11, 22, 36, 37].

Information-geometric ideas have also been used in quantum information theory, another non-commutative analog of probability theory that is distinct from free probability theory, even though it uses similar concepts and terminology. For a survey of quantum information theory, see [83, 84]. To prevent any confusion, in free probability, operators in a tracial von Neumann algebra are viewed as non-commutative random variables (and there is no known analog of multivariable densities), while in quantum information theory, a positive operator with trace 1 in a von Neumann algebra with a (not necessarily bounded) trace is viewed as a density.Footnote 1 Hence, for example, a random matrix is typically studied in free probability theory, while a matrix-valued density is typically studied in quantum information theory. Our paper is focused on the free probabilistic framework; however, in Sect. 5, we will draw a connection between free probabilistic optimal couplings and certain aspects of quantum information theory, specifically quantum channels or unital completely positive trace-preserving maps.

In classical information geometry, both the Wasserstein distance and the entropy are intimately related to transport equations (differential equations describing functions which push forward some given probability distribution to another given probability distribution). In the free setting, there has been some success in constructing non-commutative transport of measure for a special type of non-commutative law known as a free Gibbs law from a convex potential V in [23, 31, 39,40,41]; these ideas have even been generalized beyond the setting of tracial von Neumann algebras [53, 54, 66]. Unfortunately, the transport maps constructed in [23, 39, 40] were not optimal. The transport in [31] was shown to be the gradient of a convex function, hence one would expect it to be optimal in light of the classical Monge–Kantorovich duality, but it was not clear yet how to prove this because there was no known non-commutative Monge–Kantorovich duality. The optimality of these couplings was later verified in [41, Remark 9.11] by studying a Legendre transform for (sufficiently regular, uniformly convex) non-commutative functions [41, Lemma 9.10]. This idea was one of the starting points for our current investigation into non-commutative optimal couplings, Legendre transforms, and Monge–Kantorovich duality with minimal regularity assumptions.

One of the challenges in even formulating a Monge–Kantorovich duality for the free setting is to decide what type of convex functions to use. Operator algebras are often thought of as non-commutative analogs of algebras of functions on a topological space or a measure space, but without a clear analog for points of the underlying space. Our approach is to consider functions that can be evaluated on random variables rather than on points, or more precisely, to study functions \(f: L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \rightarrow {\mathbb {R}}\) where \(\mathcal {A}\) is a tracial von Neumann algebra, \(L^2(\mathcal {A})\) is the non-commutative \(L^2\) space, and the subscript \({{\,\mathrm{sa}\,}}\) indicates the real subspace of self-adjoint elements. The classical analog would be a function \(L^2(\Omega ,P;{\mathbb {R}}^m) \rightarrow {\mathbb {R}}\) where \((\Omega ,P)\) is a probability space, rather than a function \({\mathbb {R}}^m \rightarrow {\mathbb {R}}\). As we discuss in Sect. 1.4, such functions on the space of classical random variables have already found applications to Hamilton–Jacobi equations on the Wasserstein space [26, 29] as well as the master equation on \({\mathbb {R}}^m \times \mathcal {P}({\mathbb {R}}^m)\) in mean field games [14, 27, 28].

As in [31] and [39], we remark that the complexity of classifying von Neumann algebras presents serious obstructions to non-commutative transport theory that simply do not exist in the classical setting. It is a widely used fact in classical probability theory that any two standard Borel probability spaces with no atoms are measurably isomorphic; hence one can always arrange that their random variables are on some canonical probability space. By contrast, McDuff [48] showed that there are uncountably many non-isomorphic tracial von Neumann algebras that are diffuse with trivial center (that is, \(\mathrm {II}_1\) factors). This provides a real obstruction to non-commutative transport of measure, because if \(X = (X_1,\dots ,X_m)\) and \(Y = (Y_1,\dots ,Y_m)\) are m-tuples of self-adjoint non-commutative random variables such that X is expressed as a “function” of Y and vice versa (for some reasonable notion of non-commutative functions), then X and Y generate the same von Neumann algebra. Hence, non-commutative laws which produce non-isomorphic von Neumann algebras simply cannot be transported to each other in an invertible way. Another result of Ozawa [58] (based on group-theoretic results of Gromov [30] and Olshanskii [55]) shows there is no separable \(\mathrm {II}_1\) factor that contains an isomorphic copy of every separable \(\mathrm {II}_1\)-factor. Hence, we cannot even expect that there is some non-commutative law \(\mu \) such that all other non-commutative laws can be expressed as push-forwards of \(\mu \).

These obstructions must inform how we go about defining the convex functions for the Monge–Kantorovich duality, as well as the level of regularity that we expect from an optimal coupling. In fact, in Sect. 5 we make a more explicit connection between optimal couplings and this result of Gromov, Olshanshkii, and Ozawa as well as exploring other pathological properties of the non-commutative Wasserstein distance through connections with quantum information theory.

1.2 Main results

Before stating the non-commutative Monge–Kantorovich duality, we establish following notational conventions; see Sect. 2 for background. By tracial \(\mathrm {W}^*\)-algebra we mean a pair \(\mathcal {A}= (A,\tau )\) where A is a \(\mathrm {W}^*\)-algebra (or von Neumann algebra) and \(\tau : A \rightarrow \mathbb {C}\) is a faithful normal tracial state. In analogy with classical probability, we will denote the underlying algebra A by \(L^\infty (\mathcal {A})\) and the trace by \(\tau _{\mathcal {A}}\) when it is convenient to avoid naming A and \(\tau \) explicitly. We denote by \(L^2(\mathcal {A})\) the Hilbert space obtained from the GNS construction of A and \(\tau \).

We denote by \(L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) the set of m-tuples of self-adjoint elements of \(L^\infty (\mathcal {A})\) and for \(X = (X_1,\dots ,X_m) \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), we write \(\Vert X\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \max _{j=1,\dots ,m} \Vert X\Vert _{L^\infty (\mathcal {A})}\). If \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), then \(\mathrm {W}^*(X)\) denotes the \(\mathrm {W}^*\)-algebra generated by X equipped with the appropriate trace.

For each \(X = (X_1,\dots ,X_m) \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), the non-commutative law \(\lambda _X\) is the linear map from the non-commutative polynomial algebra \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \) to \(\mathbb {C}\) given by \(\lambda _X(p) = \tau _{\mathcal {A}}(p(X))\). The space of non-commutative laws (of self-adjoint m-tuples from any tracial \(\mathrm {W}^*\)-algebra) is denoted \(\Sigma _m\). Furthermore, \(\Sigma _{m,R}\) denotes the subspace of those laws \(\lambda _X\) where \(\Vert X\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le R\) (where \(\mathcal {A}\) is a tracial \(\mathrm {W}^*\)-algebra and \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\)). The weak-\(*\) topology on \(\Sigma _{m,R}\) refers to the topology of pointwise convergence on \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \).

Following [11], a coupling of \(\mu \), \(\nu \in \Sigma _m\) is a triple \((\mathcal {A},X,Y)\) where \(\mathcal {A}\) is a tracial \(\mathrm {W}^*\)-algebra and \(X, Y \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) such that \(\lambda _X = \mu \) and \(\lambda _Y = \nu \). The Wasserstein distance \(d_W^{(2)}(\mu ,\nu )\) is the infimum of \(\Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) over all couplings \((\mathcal {A},X,Y)\). We denote by \(C(\mu ,\nu )\) the supremum of \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) over all couplings \((\mathcal {A},X,Y)\), where \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \sum _{j=1}^m \langle X_j,Y_j\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}}\). We say that a coupling is optimal if it achieves the infimum of \(\Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) or equivalently if it achieves the supremum of \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\). The existence of optimal couplings was observed in [11]. That paper also showed that the non-commutative Wasserstein distance agrees with the classical one in the situation that \(X_1\), ..., \(X_m\) commute and \(Y_1\), ..., \(Y_m\) commute [11, Theorem 1.5].

As mentioned before, the functions used in the non-commutative Monge–Kantorovich duality are functions on \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) for tracial \(\mathrm {W}^*\)-algebra \(\mathcal {A}\) with separable predual. However, because of Ozawa’s result [58], it is not sufficient to fix a single such tracial \(\mathrm {W}^*\)-algebra, but rather we must consider functions that are defined on \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) for every such \(\mathcal {A}\). We give more precise versions of the definitions in Sect. 3.

Definition 1.1

A tracial \(\mathrm {W}^*\)-function with values in \((-\infty ,\infty ]\) is a collection of functions \(f^{\mathcal {A}}: L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \rightarrow (-\infty ,+\infty ]\), such that whenever \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) is an inclusion map of tracial \(\mathrm {W}^*\)-algebras, \(f^{\mathcal {A}} = f^{\mathcal {B}} \circ \iota \) (here \(\iota \) is extended to a map \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \rightarrow L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\)). If \(\mu \in \Sigma _m\) and f is a tracial \(\mathrm {W}^*\)-function, then \(\mu (f)\) is defined as \(f^{\mathcal {A}}(X)\) whenever \(\mathcal {A}\) is a tracial \(\mathrm {W}^*\)-algebra with separable predual and \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) with \(\lambda _X = \mu \); this is well-defined because \(\mathrm {W}^*(X)\) is determined up to isomorphism by \(\lambda _X = \mu \).

One example of a tracial \(\mathrm {W}^*\)-function would be

$$\begin{aligned} f^{\mathcal {A}}(X) = {\left\{ \begin{array}{ll} \tau _{\mathcal {A}}(p(X)), &{} \Vert X\Vert _\infty \le R \\ \infty , &{} \text {otherwise,} \end{array}\right. } \end{aligned}$$

where p is a non-commutative polynomial. Tracial \(\mathrm {W}^*\)-functions also include scalar-valued tracial non-commutative smooth functions as in [40] and [41] in the following sense. If \(\phi \) is such a tracial non-commutative smooth function, then \(\phi ^{\mathcal {A}}(X)\) is only a priori defined when \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\); however, in many cases \(\phi \) is Lipschitz with respect to \(\Vert \cdot \Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) and hence can be extended to a function on \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) which will be a tracial \(\mathrm {W}^*\)-function. However, tracial \(\mathrm {W}^*\)-functions are much more general because they are not assumed to be continuous in any sense.

Definition 1.2

We say that f is E-convex if \(f^{\mathcal {A}}\) is convex and lower semi-continuous on \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) for each \(\mathcal {A}\), and if for every inclusion \(\iota : \mathcal {A}\rightarrow \mathcal {B}\), letting \(E: \mathcal {B}\rightarrow \mathcal {A}\) be the corresponding trace-preserving conditional expectation, we have \(f^{\mathcal {A}}(E[X]) \le f^{\mathcal {B}}(X)\) for \(X \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\). Here we use the notation \(E[X] = (E[X_1],\dots ,E[X_m])\) when \(X = (X_1,\dots ,X_m)\).

Motivation for the definition of E-convexity will be given in Lemmas 1.10 and 1.17.

Proposition 1.3

\(C(\mu ,\nu )\) is equal to the infimum of \(\mu (f) + \nu (g)\) over pairs (fg) of E-convex \(\mathrm {W}^*\)-functions that satisfy \(f^{\mathcal {A}}(X) + g^{\mathcal {A}}(Y) \ge \langle X,Y\rangle _{L^2(\mathcal {A})}\) for every tracial \(\mathrm {W}^*\)-algebra with separable predual and X, \(Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). There exists an admissible pair of E-convex functions that achieves the infimum. See Definition 3.22 and Propositions 3.23 and 3.24.

Another consequence of the classification-related obstructions to non-commutative transport is that we cannot expect too much regularity in general for the E-convex functions associated to an optimal coupling. For instance, suppose two non-commutative laws \(\mu \) and \(\nu \) generate tracial von Neumann algebras that cannot embed into each other. This implies that if (XY) is an optimal coupling of these two laws on a tracial \(\mathrm {W}^*\)-algebra \(\mathcal {A}\), then neither of \(\mathrm {W}^*(X)\) and \(\mathrm {W}^*(Y)\) is contained in the other. Thus, even though the non-commutative laws may be diffuse, the situation is similar to when coupling the classical measures \((1/2)(\delta _{-1} + \delta _1)\) and \((1/3)(\delta _{-1} + \delta _0 + \delta _1)\); in the optimal coupling, neither random variable can be expressed as a function of the other. However, if a pair of E-convex functions associated to an optimal coupling were differentiable, that would imply that X is in the von Neumann algebra generated by Y and vice versa as a consequence of Lemma 3.10.

It is natural to ask how close an arbitrary non-commutative optimal coupling is to a coupling where X and Y generate the same von Neumann algebra. As a first application of duality, we show that every optimal coupling can be decomposed into an optimal coupling where the two variables generate the same \(\mathrm {W}^*\)-algebra and some additional orthogonal pieces.

Theorem 1.4

Suppose that \((\mathcal {A},X,Y)\) is an optimal coupling of \(\mu , \nu \in \Sigma _m\). Then there exists a \(\mathrm {W}^*\)-subalgebra \(\mathcal {B}\) such that the following hold. Let \(E_{\mathcal {B}}: \mathcal {A}\rightarrow \mathcal {B}\) be the trace-preserving conditional expectation, and let \(X' = E_{\mathcal {B}}[X]\) and \(Y' = E_{\mathcal {B}}[Y]\).

  1. (1)

    \(X'\) and \(Y'\) each generate \(\mathcal {B}\).

  2. (2)

    \((\mathcal {B},X',Y')\) is an optimal coupling of \(\lambda _{X'}\) and \(\lambda _{Y'}\).

  3. (3)

    \(X' - Y'\), \(X - X'\), \(Y - Y'\) are mutually orthogonal.

See Theorem 3.25.

Our main results in Sect. 4 concern the displacement interpolation. If \((\mathcal {A},X,Y)\) is an optimal coupling of \(\mu \) and \(\nu \), then the displacement interpolation refers to the family of random variables \(X_t = (1 - t)X + tY\) for \(t \in [0,1]\). The associated laws \(\mu _t = \lambda _{X_t}\) form a metric geodesic in \(\Sigma _m\) with respect to the Wasserstein distance (see Proposition A.22). With the help of non-commutative Legendre transforms and Hopf-Lax semigroups, we will see that the E-convex functions associated to the couplings \((\mathcal {A},X_s,X_t)\) for \(s, t \in (0,1)\) have more regularity than the E-convex functions associated to the original coupling \((\mathcal {A},X,Y)\) (see Proposition 4.12). As a consequence, we obtain the following non-commutative transport result.

Theorem 1.5

Let \((\mathcal {A},X,Y)\) be an optimal coupling of \(\mu , \nu \in \Sigma _m\). Then \(\mathrm {W}^*((1 - t)X + tY) = \mathrm {W}^*(X,Y)\) for all \(t \in (0,1)\). For proof, see Sect. 4.3.

For instance, this theorem entails that for classical optimal couplings, the \(\sigma \)-algebra generated by \(X_t\) is the same for all \(t \in (0,1)\), which could be deduced directly from classical optimal transport theory by a similar proof. The reader is encouraged to work out the classical example of \((1/2)(\delta _{-1} + \delta _1)\) and \((1/3)(\delta _{-1} + \delta _0 + \delta _1)\) as motivation.

The results of Sect. 5 highlight additional ways in which non-commutative optimal transport theory is significantly more complicated than its classical counterpart; specifically, the negative solution of the Connes embedding problem [42] has a natural interpretation in terms of optimal couplings. We observe that optimization over couplings involved in the definition of the Wasserstein distance can be replaced by optimization over what are called factorizable quantum channels in quantum information theory (see Observation 5.5). The results of [32, 42, 52] imply that there exist quantum channels between finite-dimensional matrix algebras which are factorizable whose factorization requires an infinite-dimensional non-Connes embeddable von Neumann algebra (see Sect. 5.3) for definitions). We then show through Lemma 5.7 that channels with this property must occur as optimizers in the definition of Wasserstein distance. From the optimal transportation point of view, this means that the optimal distance between certain tuples of finite-dimensional matrices cannot be even approximately realized inside a finite-dimensional coupling.

Proposition 1.6

Thanks to [32] and [42], for certain \(n \in {\mathbb {N}}\), there exist non-commutative laws \(\mu \) and \(\nu \) associated to \(n^2\)-tuples in \(M_n(\mathbb {C})\) for which an optimal coupling requires a non-Connes embeddable tracial \(\mathrm {W}^*\)-algebra; see Corollary 5.14. Furthermore, thanks to [52], for every \(n \ge 11\) and \(d \in {\mathbb {N}}\), there exist \(n^2\)-tuples in \(M_n(\mathbb {C})\) such that if \((\mathcal {A},X,Y)\) is a coupling that is optimal among couplings on Connes-embeddable tracial \(\mathrm {W}^*\)-algebras, then \(\mathcal {A}\) must have dimension at least d; see Corollary 5.8 and Remark 5.15.

In contrast to classical probability theory, we show that the \(L^2\)-Wasserstein metric does not generate the weak-\(*\) topology on \(\Sigma _{m,R}\). We call the topology on \(\Sigma _{m,R}\) generated by the Wasserstein distance the Wasserstein topology. We characterize when the two topologies agree at some \(\mu \) in terms of the associated tracial \(\mathrm {W}^*\)-algebra (Proposition 5.21) and hence obtain the following results (relying on the work of Connes [17]).

Proposition 1.7

The Wasserstein topology on \(\Sigma _{m,R}\) is strictly stronger than the weak-\(*\) topology; see [11] and Corollary 5.17. Furthermore, let \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\) denote the set of non-commutative laws \(\lambda _X\) where X comes from \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) with \(\mathcal {A}\) finite-dimensional. Let \(\mu \) be a non-commutative law and let \(\mathcal {A}\) be a tracial \(\mathrm {W}^*\)-algebra with a generating m-tuple X such that \(\lambda _X = \mu \) and \(\Vert X\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le R\). Then \(\mu \) is in the weak-\(*\) closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\) if and only if \(\mathcal {A}\) is Connes-embeddable; see Lemma 5.12. Moreover, in this case, the weak-\(*\) and Wasserstein topologies on \(\Sigma _{m,R}\) agree at \(\mu \) if and only if \(\mu \) is in the Wasserstein closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\), which is equivalent to \(\mathcal {A}\) being approximately finite-dimensional; see Proposition 5.26.

Approximate finite-dimensionality (see Sect. 5.4 for definition) is the strongest way that a \(\mathrm {W}^*\)-algebra can be approximated by finite-dimensional algebras (besides being finite-dimensional itself), and thus the latter condition is quite restrictive when \(m > 1\). For instance, there is up to isomorphism only one AFD \(\mathrm {II}_1\) factor [70, Sect. XIV.2]. In Sect. 6.1, we explain how Propositions 1.6 and 1.7 pose challenges to studying the large-N convergence of Wasserstein distance for random matrix models.

The results of Propositions 1.6 and 1.7 constrast strongly with the classical situation. Some treatments of optimal transport (e.g. [72, p. 75]) take for granted the fact that finitely supported probability measures are weak-\(*\) dense in the space of probability measures on a compact set. Such approximation arguments do not work in the non-commutative case for several reasons. Due to the negative resolution of the Connes embedding problem [42], the non-commutative laws that can be realized in finite-dimensional algebras are not weak-\(*\) dense. Furthermore, by Proposition 1.7, the weak-\(*\) closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\) is much larger than its Wasserstein closure (assuming \(m > 1\)). Finally, by Proposition 1.6, even if two laws \(\mu \) and \(\nu \) can be realized in finite-dimensional algebras, an optimal coupling need not be weak-\(*\) approximable by couplings in finite-dimensional algebras.

Because the weak-\(*\) and Wasserstein topologies are different for \(m > 1\), one can deduce that \(\Sigma _{m,R}\) with the Wasserstein distance is not compact (Corollary 5.27). The following even more startling result is a consequence of Gromov, Olshanskii, and Ozawa’s work [58, Theorem 1].

Theorem 1.8

For \(m > 1\) and \(R > 0\), the space \(\Sigma _{m,R}\) is not separable with respect to \(d_W^{(2)}\).

1.3 Organization

The paper is organized as follows:

  • In Sect. 1.4 and Sect. 1.5, we motivate the definition of E-convex functions and the associated duality result in terms of two toy examples, classical probability spaces and \(M_n(\mathbb {C})\).

  • In Sect. 2, we recall standard background on tracial \(\mathrm {W}^*\)-algebras and their interpretation as non-commutative probability spaces for the sake of readers who are not specialists in that topic.

  • In Sect. 3, we describe the properties of E-convex functions and the associated Legendre transform; we prove the non-commutative Monge–Kantorovich duality (Proposition 1.3) and the decomposition theorem for optimal couplings (Theorem 1.4).

  • In Sect. 4, we study the non-commutative analog of inf-convolution and the regularity properties of E-convex and semi-concave functions; we prove Theorem 1.5 and give further detail about the functions associated to the displacement interpolation in Proposition 4.12.

  • In Sect. 5, we connect non-commutative optimal couplings with quantum information theory and prove Proposition 1.6. Then we study the differences between the weak-\(*\) and the Wasserstein topology using a certain stability property (Proposition 5.21) and hence prove Proposition 1.7. Finally, we show non-separability of the Wasserstein space in Sect. 5.5.

  • In Sect. 6.1, we explain how Sect. 5 illustrates the difficulty of studying random matrix optimal transport in the large-N limit. Then Sect. 6.2 sketches a different but analogous theory of non-commutative optimal couplings that uses bimodules and \({\text {UCPT}}\)-maps of tracial \(\mathrm {W}^*\)-algebras.

  • In the appendix Sect. A, we define non-commutative laws and optimal couplings for elements of non-commutative \(L^p\) spaces, and show the existence of \(L^p\) optimal couplings and Wasserstein geodesics.

1.4 Motivation from classical probability

First, we recall the classical Monge–Kantorovich duality. Fix a standard Borel probability space \((\Omega ,P)\) with no atoms. For \(\mu \) and \(\nu \) compactly supported probability measures on \({\mathbb {R}}^m\), a coupling of \(\mu \) and \(\nu \) is a pair (XY) of random variables on \(\Omega \) with \(X \sim \mu \) and \(Y \sim \nu \). The classical Wasserstein distance is the infimum of \(\Vert X - Y\Vert _{L^2(\Omega ,P;{\mathbb {R}}^m)}\) over all such couplings, and a coupling is said to be optimal if it achieves this infimum.

Theorem 1.9

(See [72, Theorem 5.10, Particular Case 5.17]) Let (XY) be a coupling of two compactly supported measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^m\). Then (XY) is optimal if and only if there exists a pair of convex functions \(f, g: {\mathbb {R}}^m \rightarrow {\mathbb {R}}\) satisfying \(f(x) + g(y) \ge \langle x,y\rangle \) for \(x, y \in {\mathbb {R}}^m\) and \({\mathbb {E}}[f(X)] + {\mathbb {E}}[g(Y)] = {\mathbb {E}} \langle X,Y\rangle \). Furthermore, \({\mathbb {E}}[f(X)] + {\mathbb {E}}[g(Y)] = {\mathbb {E}} \langle X,Y\rangle \) implies that Y is almost surely in the subdifferential of f at X and X is almost surely in the subdifferential of g at Y.

As explained above, E-convex functions will be an analog of functions on \(L^2(\Omega ,\!P;\!{\mathbb {R}}^m)\) rather than \({\mathbb {R}}^m\). Every convex function on \({\mathbb {R}}^m\) defines a convex function on \(L^2(\Omega ,P;{\mathbb {R}}^m)\) as follows.

Lemma 1.10

Let \(f: {\mathbb {R}}^m \rightarrow (-\infty ,\infty ]\) be convex and lower semi-continuous. Let \((\Omega ,P)\) be a non-atomic standard Borel probability space with underlying \(\sigma \)-algebra \(\mathcal {F}\). Define

$$\begin{aligned} {\tilde{f}}: L^2(\Omega ,P;{\mathbb {R}}^m) \rightarrow {\mathbb {R}}, X \mapsto {\mathbb {E}}[f(X)], \end{aligned}$$

which is well-defined in \((-\infty ,\infty ]\) thanks to Jensen’s inequality. Then

  1. (1)

    \({\tilde{f}}(X)\) only depends on the law (probability distribution) of X.

  2. (2)

    \({\tilde{f}}\) is convex and lower semi-continuous.

  3. (3)

    Suppose that \({\tilde{f}}(X) < \infty \). Then Y is in the subdifferential of \({\tilde{f}}\) at X if and only if Y is in the subdifferential of f at X almost surely.

  4. (4)

    \({\tilde{f}}\) is monotone under conditional expectations: If \(\mathcal {G}\) is a sub-\(\sigma \)-algebra of \(\mathcal {F}\), then

    $$\begin{aligned} {\tilde{f}}(E[X|\mathcal {G}]) \le {\tilde{f}}(X). \end{aligned}$$

Sketch of proof

(1) This is immediate.

(2) Convexity of \({\tilde{f}}\) is immediate from convexity of f. To show lower semi-continuity of \({\tilde{f}}\), note that \(f(x) + |x|^2/2\) is bounded from below by some constant C and thus \(g(x) := f(x) + |x|^2 / 2 - C\) is a nonnegative convex function. If \(X_n \rightarrow X\) in \(L^2(\Omega ,P;{\mathbb {R}}^m)\), then \(X_n \rightarrow X\) in probability, and hence \(\liminf _{n \rightarrow \infty } g(X_n) \ge g(X)\) in probability. Thus, by Fatou’s lemma for convergence in probability \(\liminf _{n \rightarrow \infty } {\tilde{g}}(X_n) \ge {\tilde{g}}(X)\), which implies that \({\tilde{f}}\) is also lower semicontinuous.

(3) If Y is in the subdifferential of f at X almost surely and \(Z \in L^2(\Omega ,P)\), then \(f(Z) \ge f(X) + \langle Z-X,Y\rangle _{{\mathbb {R}}^m}\) almost surely, and thus by taking expectations \({\tilde{f}}(Z) \ge {\tilde{f}}(X) + \langle Z-X,Y\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)}\). For the converse, let \(S = \{x \in {\mathbb {R}}^m: f(x) < \infty \}\) and fix a countable dense subset \(\Xi \) of S. For each \(n > 0\) and \(\xi \in \Xi \), let \(E_{n,\xi }\) be the event

$$\begin{aligned} E_{n,\xi } = \{f(\xi ) \le f(X) + \langle \xi - X,Y\rangle _{{\mathbb {R}}^m} - 1/n\}. \end{aligned}$$

Because Y is in the subdifferential of \({\tilde{f}}\) at X, we have

$$\begin{aligned} {\tilde{f}}(1_{E_{n,\xi }^c}X + 1_{E_{n,\xi }} \xi ) \ge {\tilde{f}}(X) + \langle 1_{E_{n,\xi }}(\xi - X),Y\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)}. \end{aligned}$$

On the other hand, by definition of \(E_{n,\xi }\), we have

$$\begin{aligned} {\tilde{f}}(1_{E_{n,\xi }^c}X + 1_{E_{n,\xi }} \xi ) \le {\tilde{f}}(X) + \langle 1_{E_{n,\xi }}(\xi - X),Y\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)} + \frac{1}{n} P(E_{n,\xi }). \end{aligned}$$

Therefore, \(P(E_{n,\xi }) = 0\). Since this holds for all \(n \in {\mathbb {N}}\), we have \(f(\xi ) \ge f(X) + \langle \xi - X,Y\rangle _{{\mathbb {R}}^m}\) almost surely for each \(\xi \). Since \(\Xi \) is countable, we have this condition every \(\xi \in \Xi \) at once almost surely. On this event, if \(x \in {\mathbb {R}}^m\) with \(f(x) < \infty \), then f is continuous at x, and therefore by taking sequence of \(\xi \in \Xi \) that converges to x we obtain \(f(x) \ge f(X) + \langle x-X,Y\rangle _{{\mathbb {R}}^m}\).

(4) This follows from Jensen’s inequality and the existence of regular conditional distributions for standard Borel probability spaces. \(\square \)

Remark 1.11

Similar reasoning shows that if g is the Legendre transform of f on \({\mathbb {R}}^m\), then \({\tilde{g}}\) is the Legendre transform of \({\tilde{f}}\) on \(L^2(\Omega ,P;{\mathbb {R}}^m)\).

Let us call a function \(F: L^2(\Omega ,P;{\mathbb {R}}^m) \rightarrow (-\infty ,\infty ]\) classically E-convex if

  1. (1)

    F(X) depends only on the law of X.

  2. (2)

    F is convex and lower semi-continuous.

  3. (3)

    We have \(F(E[X|\mathcal {G}]) \le F(X)\) for every sub-\(\sigma \)-algebra \(\mathcal {G}\) and every \(X \in L^2(\Omega ,P;{\mathbb {R}}^m)\).

Then we have the following version of Monge–Kantorovich duality using classically E-convex functions on \(L^2(\Omega ,P;{\mathbb {R}}^m)\).

Corollary 1.12

Let (XY) be a coupling on \((\Omega ,P)\) of two compactly supported measures \(\mu \) and \(\nu \) on \({\mathbb {R}}^m\). Then (XY) is optimal if and only if there exists a pair of classically E-convex functions F, \(G: L^2(\Omega ,P;{\mathbb {R}}^m) \rightarrow (-\infty ,\infty ]\) such that

$$\begin{aligned} F(X') + G(Y') \ge \langle X',Y'\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)} \text { for all } X', Y' \in L^2(\Omega ,P;{\mathbb {R}}^m), \end{aligned}$$

and

$$\begin{aligned} F(X) + G(Y) = \langle X,Y\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)}. \end{aligned}$$

Proof

(\(\implies \)) By the classical Monge–Kantorovich duality, there are convex functions \(f, g: {\mathbb {R}}^m \rightarrow (-\infty ,\infty ]\) with \(f(x) + g(y) \ge \langle x,y\rangle _{{\mathbb {R}}^m}\) and \({\mathbb {E}}f(X) + {\mathbb {E}} f(Y) = \langle X,Y\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)}\). Let \(F = {\tilde{f}}\) and \(G = {\tilde{g}}\). By Lemma 1.10, F and G are classical E-convex and clearly \(F(X) + G(Y) = \langle X,Y\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)}\). Also, \(F(X') + G(Y') \ge \langle X',Y'\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)}\) since \(f(x) + f(y) \ge \langle x,y\rangle _{{\mathbb {R}}^m}\).

(\(\Leftarrow \)) Suppose that \((X',Y')\) is another coupling of \(\mu \) and \(\nu \) on \((\Omega ,P)\). Then

$$\begin{aligned} \langle X',Y'\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)} \le F(X') + G(Y') = F(X) + G(Y) = \langle X,Y\rangle _{L^2(\Omega ,P;{\mathbb {R}}^m)}, \end{aligned}$$

where in the middle equality we have used that \(F(X) = F(X')\) and \(G(Y) = G(Y')\) since \(X \sim X'\) and \(Y \sim Y'\) in law. Therefore, the coupling (XY) is optimal. \(\square \)

Corollary 1.12 is the statement that we will generalize to the non-commutative setting. We remark that although classically E-convex functions are much less concrete than convex functions on \({\mathbb {R}}^m\), Corollary 1.12 still has the power to prove the classical analogs of Theorems 1.4 and 1.5 by exactly the same arguments that we will use in the non-commutative case.

In fact, convex functions on a space of classical random variables have also been used in the theory of mean field games [28]. Mean field games involves the study of the master equation [14, 27], a differential equation for a function \(u(t,x,\mu )\) depending on a time variable t, a space variable x (representing the position of an individual agent), and a measure \(\mu \) (representing the distribution of the positions of a continuum of other agents). We can define a function \({\widehat{u}}\) on \([0,\infty ) \times {\mathbb {R}}^m \times L^2(\Omega ,P;{\mathbb {R}}^m)\) by \({\widehat{u}}(t,x,X) = u(t,x,\mu _X)\), where \(\mu _X\) is the law of X. The first-order regularity conditions needed to solve the master equation are more easily stated in terms of the function \({\widehat{u}}\) on the Hilbert space \({\mathbb {R}}^m \times L^2(\Omega ,P;{\mathbb {R}}^m)\). Moreover, the proof of existence and uniqueness of solutions to Hamilton–Jacobi equations on Wasserstein space \(\mathcal {P}_2({\mathbb {R}}^m)\) [26, 29] relies on the theory of viscosity solutions to Hamilton–Jacobi equations on Hilbert spaces [18, 19, 47, 49].

The inf-convolution techniques that we use in Sect. 4 are an important special case of this theory of Hamilton–Jacobi equations on Hilbert spaces. In fact, part of our motivation was to understand the non-commutative version of Hamilton–Jacobi equations for functions of a random variable. Recent work has connected random matrix theory to viscosity solutions of Hamilton–Jacobi equations [10] and mean field games [15]. However, these connections are restricted to the setting of a single random matrix because they rely heavily on the description of self-adjoint random matrices in terms of their eigenvalues. It would be of great interest to have a theory of viscosity solutions to partial differential equations in several non-commuting variables as is suggested by the study of heat equations in [23, 38, 41] and the Hamilton–Jacobi–Bellman equation in [21, 38].

1.5 Motivation from matrix tuples

In order to motivate some of the ideas of our paper, we explain a toy model of couplings between tuples of \(n \times n\) matrices. Let \(M_n(\mathbb {C})\) denote the space of complex \(n \times n\) matrices. Let \({{\,\mathrm{tr}\,}}_n = (1/n) {{\,\mathrm{Tr}\,}}_n\) be the normalized trace on \(M_n(\mathbb {C})\). We define an inner product on \(M_n(\mathbb {C})\) by

$$\begin{aligned} \langle S,T\rangle _{{{\,\mathrm{tr}\,}}_n} = {{\,\mathrm{tr}\,}}_n(S^*T). \end{aligned}$$

Let \(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\) denote the real subspace of self-adjoint matrices. Then \(\langle X,Y\rangle _{{{\,\mathrm{tr}\,}}_n} \in {\mathbb {R}}\) for all \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\). Every element of \(M_n(\mathbb {C})\) can be uniquely written as \(S + iT\) with \(S, T \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\), and hence there is a natural identification of the complex inner product space \(M_n(\mathbb {C})\) with the complexification of the real inner product space \(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\).

From a non-commutative probability viewpoint, we can view \(M_n(\mathbb {C})\) as an algebra of “random variables” and the normalized trace \({{\,\mathrm{tr}\,}}_n: M_n(\mathbb {C}) \rightarrow \mathbb {C}\) as the “expectation.” To motivate this, suppose \(X \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\). The empirical spectral distribution of X is the measure \(\mu = \frac{1}{n} \sum _{j=1}^n \delta _{\lambda _j}\) where \(\lambda _1\), ..., \(\lambda _n\) are the eigenvalues of X listed with multiplicity. We then have for every polynomial p that

$$\begin{aligned} {{\,\mathrm{tr}\,}}_n(p(X)) = \int p\,d\mu . \end{aligned}$$

Thus, \(\mu \) is analogous to the distribution of a random variable.

If \(X = (X_1,\dots ,X_m) \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\), the “joint distribution” of \(X_1\), ..., \(X_m\) is not described by a measure on \({\mathbb {R}}^m\), since \(X_1\), ..., \(X_m\) do not commute. Rather we consider the non-commutative law \(\lambda _X\), which is the linear functional on the algebra of m-variable non-commutative polynomials given by

$$\begin{aligned} p \mapsto {{\,\mathrm{tr}\,}}_n(p(X_1,\dots ,X_m)). \end{aligned}$$

It turns out that two tuples X and \(Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\) have the same non-commutative law if and only if they are unitarily conjugate.

Lemma 1.13

Let \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\). Then the following are equivalent

  1. (1)

    \({{\,\mathrm{tr}\,}}_n(p(X)) = {{\,\mathrm{tr}\,}}_n(p(Y))\) whenever p is a non-commutative polynomial in m variables.

  2. (2)

    There exists a unitary U in \(M_n(\mathbb {C})\) such that \(Y_j = UX_j U^*\) for \(j = 1\), ..., m.

This lemma follows from the multivariate version of Specht’s theorem [67] observed by Wiegmann [82] and verified in [43, Theorem 2.2]. This result is closely connected to the invariant theory of matrices [62], and related results have been rediscovered many times as the survey [65] explains. Moreover, many in the operator algebras community are aware it can be deduced from Lemma 2.33 below, and the fact that any two trace-preserving embeddings of a finite-dimensional tracial \(*\)-algebra into \(M_n(\mathbb {C})\) are unitarily conjugate, which is a consequence of the Artin-Wedderburn-type classification of finite-dimensional \(*\)-algebras and their representations (see e.g. [25, Sect. 2]).

We consider the toy problem of optimally coupling two matrix tuples inside \(M_n(\mathbb {C})\) (beware that because of Proposition 1.6 an optimal coupling inside \(M_n(\mathbb {C})\) is not necessarily optimal among all couplings in tracial \(\mathrm {W}^*\)-algebras). Because of Lemma 1.13, the toy problem reduces to the following: Given \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\), find a unitary U so that \(\Vert UXU^* - Y\Vert _{{{\,\mathrm{tr}\,}}_n}\) is as small as possible, where \(UXU^* = (UX_1U^*, \dots , UX_mU^*)\), and where \(\Vert \cdot \Vert _{{{\,\mathrm{tr}\,}}_n}\) is the normalized Hilbert-Schmidt norm

$$\begin{aligned} \Vert T\Vert _{{{\,\mathrm{tr}\,}}_n} = \left( \sum _{j=1}^m {{\,\mathrm{tr}\,}}_n(T_j^*T_j) \right) ^{1/2}. \end{aligned}$$

This motivates the following definition: For \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\), we say that (XY) are an optimal coupling in \(M_n(\mathbb {C})\) if \(\Vert UXU^* - Y\Vert _{{{\,\mathrm{tr}\,}}_n} \ge \Vert X - Y\Vert _{{{\,\mathrm{tr}\,}}_n}\) for every unitary U. The next lemma guarantees existence of optimal couplings.

Lemma 1.14

Let X, \(Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\). Then there exists an \(n \times n\) unitary U that minimizes \(\Vert UXU^* - Y\Vert _{{{\,\mathrm{tr}\,}}_n}\). Moreover, every such unitary must satisfy

$$\begin{aligned} \sum _{j=1}^m [UX_jU^*,Y_j] = 0, \end{aligned}$$

where \([S,T] = ST - TS\) is the commutator.

Proof

Existence of a minimizer follows from the fact that the unitary group is compact and \(U \mapsto \Vert UXU^* - Y\Vert _{{{\,\mathrm{tr}\,}}_n}\) is continuous. Now suppose that U is a minimizer and let \(Z = UXU^*\). Let A be a self-adjoint matrix, and consider the unitary \(e^{itA}\) for \(t \in {\mathbb {R}}\). By minimality, we have \(\Vert e^{itA}Ze^{-itA} - Y\Vert _{{{\,\mathrm{tr}\,}}_n}^2 \ge \Vert Z - Y\Vert _{{{\,\mathrm{tr}\,}}_n}^2\). Since \(\Vert e^{itA}Ze^{-itA}\Vert _{{{\,\mathrm{tr}\,}}_n}^2 = \Vert Z\Vert _{{{\,\mathrm{tr}\,}}_n}^2\), it follows that \(\langle e^{itA}Ze^{-itA},Y\rangle _{{{\,\mathrm{tr}\,}}_n}\) is minimized at \(t = 0\). Differentiating at \(t = 0\) yields

$$\begin{aligned} \sum _{j=1}^m {{\,\mathrm{tr}\,}}_n((iAZ_j - iZ_jA) Y_j) = \sum _{j=1}^m {{\,\mathrm{tr}\,}}_n(A i(Z_jY_j - Y_jZ_j)) = {{\,\mathrm{tr}\,}}_n \left( A \sum _{j=1}^m i[Z_j,Y_j] \right) . \end{aligned}$$

Since this holds for all \(A \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\), it follows that \(\sum _{j=1}^m [Z_j,Y_j] = 0\) as desired. \(\square \)

Remark 1.15

In the case \(m = 1\), this lemma actually provides an alternative proof the spectral theorem as follows. Let \(X \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\). Let Y be a fixed diagonal matrix with distinct diagonal entries \(y_1\) ,..., \(y_n\). Let U be a unitary minimizing \(\Vert UXU^* - Y\Vert _{{{\,\mathrm{tr}\,}}_n}\). Then \([UXU^*,Y] = 0\). Any matrix A that commutes with Y must satisfy \(a_{i,j} y_j = y_i a_{i,j}\), and hence A must be diagonal. Therefore, \(UXU^*\) is diagonal.Footnote 2

Next, we describe an analog of the Monge–Kantorovich duality for the setting of matrix tuples.

Lemma 1.16

Let \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\). Then (XY) is an optimal coupling in \(M_n(\mathbb {C})\) if and only if there exist functions \(f, g: M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m \rightarrow {\mathbb {R}}\) satisfying the following properties:

  1. (1)

    f and g are convex.

  2. (2)

    f and g are unitarily invariant, that is, \(f(UX'U^*) = f(X')\) and \(g(UY'U^*) = g(Y')\) for U unitary and \(X'\), \(Y' \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\).

  3. (3)

    \(f(X') + g(Y') \ge \langle X',Y'\rangle _{{{\,\mathrm{tr}\,}}_n}\) for all \(X'\), \(Y' \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\).

  4. (4)

    \(f(X) + g(Y) = \langle X,Y\rangle _{{{\,\mathrm{tr}\,}}_n}\).

Proof

(\(\implies \)). Let \(\mathcal {U}(M_n(\mathbb {C}))\) be the unitary group. Let

$$\begin{aligned} f(X') = \sup _{U \in \mathcal {U}(M_n(\mathbb {C}))} \langle X',UYU^*\rangle _{{{\,\mathrm{tr}\,}}_n}. \end{aligned}$$

Note that f is convex because it is the supremum of a family of affine functions. Moreover, f is unitarily invariant because we took the supremum over all unitaries U.

Let g be the Legendre transform of f, that is,

$$\begin{aligned} g(Y') = \sup _{X' \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m} \left( \langle Y',X'\rangle _{{{\,\mathrm{tr}\,}}_n} - f(X') \right) . \end{aligned}$$

It is immediate that g is convex, g is unitarily invariant because f is unitarily invariant and the inner product is unitarily invariant, and \(f(X') + g(Y') \ge \langle X',Y'\rangle _{{{\,\mathrm{tr}\,}}_n}\) for all \(X'\), \(Y' \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\). In particular, \(f(X) + g(Y) \ge \langle X,Y\rangle _{{{\,\mathrm{tr}\,}}_n}\).

On the other hand, note that the supremum defining f(X) is achieved when \(U = 1\) because we assumed that (XY) is optimal coupling, hence \(\langle X,UYU^*\rangle \) is maximized when \(U = 1\). Hence, \(f(X) = \langle X,Y\rangle \). Moreover,

$$\begin{aligned} f(X') \ge \langle X',Y\rangle _{{{\,\mathrm{tr}\,}}_n}, \end{aligned}$$

hence

$$\begin{aligned} g(Y) \le \sup _{X' \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^n} \left( \langle X',Y\rangle _{{{\,\mathrm{tr}\,}}_n} - \langle X',Y\rangle _{{{\,\mathrm{tr}\,}}_n} \right) = 0. \end{aligned}$$

Thus, \(f(X) + g(Y) \le \langle X,Y\rangle _{{{\,\mathrm{tr}\,}}_n}\). Hence, \(f(X) + g(Y) = \langle X,Y\rangle \) as desired.

(\(\Leftarrow \)) Suppose that f and g satisfy (1)–(4). Let U be a unitary. Then

$$\begin{aligned} \langle UXU^*,Y\rangle _{{{\,\mathrm{tr}\,}}_n} \le f(UXU^*) + g(Y) = f(X) + g(Y) = \langle X,Y\rangle _{{{\,\mathrm{tr}\,}}_n}. \end{aligned}$$

Therefore, (XY) is optimal. \(\square \)

Unitarily invariant convex functions on \(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\) satisfy a monotonicity property with respect to the non-commutative conditional expectation from \(M_n(\mathbb {C})\) onto a \(*\)-subalgebra A, which is one motivation for our notion of E-convexity in the tracial \(\mathrm {W}^*\)-setting.

Lemma 1.17

Let A be a \(*\)-subalgebra of \(M_n(\mathbb {C})\), and let \(E: M_n(\mathbb {C}) \rightarrow A \subseteq M_n(\mathbb {C})\) be the orthogonal projection with respect to the inner product \(\langle S,T\rangle _{{{\,\mathrm{tr}\,}}_n} = {{\,\mathrm{tr}\,}}_n(S^*T)\). Then \(E[ST] = S E[T]\) and \(E[TS] = E[T]S\) and \(E[T^*] = E[T]^*\) for \(T \in M_n(\mathbb {C})\) and \(S \in A\). Moreover, if \(f: M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m \rightarrow {\mathbb {R}}\) is a convex function that is invariant under unitary conjugation, then for \(X = (X_1,\dots ,X_m) \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\), we have

$$\begin{aligned} f(E[X]) \le f(X). \end{aligned}$$

Here \(E[X] = (E[X_1],\dots ,E[X_m])\).

Proof

For a subalgebra \(A \subseteq M_n(\mathbb {C})\), we denote by \(\mathcal {U}(A)\) the group of unitary matrices that are contained in A. We define the commutant

$$\begin{aligned} A' = \{S \in M_n(\mathbb {C}) | [S,T] = 0 \text { for all } T \in A \}. \end{aligned}$$

We recall that \(A'' = A\) by von Neumann’s bicommutant theorem [68, Theorem II.3.9].

Let \(\mu \) be the Haar measure on \(\mathcal {U}(A')\), and define \(F:M_n(\mathbb {C}) \rightarrow M_n(\mathbb {C})\) by

$$\begin{aligned} F(X) = \int _{\mathcal {U}(A')} UXU^* \,d\mu (U). \end{aligned}$$

We claim that \(F(X) = E[X]\). First, to show that \(F(X) \in A\), note that for \(V \in \mathcal {U}(\mathcal {A}')\), we have \(VF(X)V^* = F(X)\), hence \([F(X),V] = 0\) by invariance of the Haar measure. Since \(A'\) is a \(*\)-algebra, it is linearly spanned by its unitaries, and therefore, \([F(X),S] = 0\) for all \(S \in A'\). So \(F(X) \in A'' = A\). Furthermore, for all \(T \in A\), we have

$$\begin{aligned} {{\,\mathrm{tr}\,}}_n(T^* F(X))= & {} \int _{\mathcal {U}(A')} {{\,\mathrm{tr}\,}}_n(T^* UXU^*) \,d\mu (U) \\= & {} \int _{\mathcal {U}(A')} {{\,\mathrm{tr}\,}}_n(UT^*XU^*) \,d\mu (U) = {{\,\mathrm{tr}\,}}_n(T^*X). \end{aligned}$$

Thus, F(X) is the orthogonal projection of X onto A, or \(F(X) = E[X]\), as desired.

Similar computations from definition of F show that F is an A-A-bimodule map and \(F(X^*) = F(X)^*\), and hence these properties also hold for E.

Since \(\mu \) is a probability measure, Jensen’s inequality and the unitary invariance of f imply that

$$\begin{aligned} f(E[X]) \le \int _{\mathcal {U}(A')} f(UXU^*)\,d\mu (U) = f(X). \end{aligned}$$

\(\square \)

2 Background on Tracial \(\mathrm {W}^*\)-algebras

For the sake of readers who are less familiar tracial \(\mathrm {W}^*\)-algebras, we explain the prerequisites needed for the paper: the definition of a tracial \(\mathrm {W}^*\)-algebra, its interpretation as a non-commutative generalization of probability spaces, inclusions and trace-preserving conditional expectations of tracial \(\mathrm {W}^*\)-algebras, free products with amalgamation, and non-commutative laws.

2.1 Tracial \(\mathrm {W}^*\)-algebras

Historically, von Neumann algebras and \(\mathrm {W}^*\)-algebras were defined differently, but it turns out that these two definitions give the same objects thanks to work of Sakai; see e.g. [64, Theorem 1.16.7]. Here we follow Sakai’s approach that starts with the definition of \(\mathrm {W}^*\)-algebras as \(\mathrm {C}^*\)-algebras which are dual Banach spaces [64]. Other background references on von Neumann algebras include [2, 68,69,70].

Definition 2.1

A unital \(*\)-algebra is a (unital) algebra A over \(\mathbb {C}\) together with a skew-linear involution \(a \mapsto a^*\) such that \((ab)^* = b^*a^*\). If A and B are \(*\)-algebras, then a map \(\rho : A \rightarrow B\) is said to be a \(*\)-homomorphism if it is linear and respects multiplication and the \(*\)-operation.

Definition 2.2

A unital \(\mathrm {C}^*\)-algebra is a \(*\)-algebra A equipped with a norm \(\Vert \cdot \Vert \) such that

  • A is a Banach space with respect to \(\Vert \cdot \Vert \);

  • \(\Vert ab\Vert \le \Vert a\Vert \Vert b\Vert \) for \(a, b \in A\);

  • \(\Vert a^*a\Vert = \Vert a\Vert ^2\) for \(a \in A\).

Definition 2.3

A \(\mathrm {W}^*\)-algebra is a \(\mathrm {C}^*\)-algebra A together with a topology \(\mathscr {T}\), such that A as a Banach space is the dual of some Banach space \(A_*\) and \(\mathscr {T}\) is the weak-\(*\) topology on A.

We remark that \(A_*\) can be uniquely recovered from \((A,\mathscr {T})\) as the subspace of \(A^{**}\) consisting of linear functionals that are continuous with respect to \(\mathscr {T}\). In fact, it turns out that the predual of \(A_*\) of a \(\mathrm {W}^*\)-algebra A is uniquely determined by A alone without reference to its weak-\(*\) topology [64, Corollary 1.13.3].

Definition 2.4

If A is a \(\mathrm {W}^*\)-algebra and \(A_*\) is a predual of A, then a faithful normal trace on A is an element \(\tau \in A_*\) satisfying the following properties:

  • \(\tau (1) = 1\);

  • \(\tau (a^*a) \ge 0\) for \(a \in A\);

  • \(\tau (a^*a) = 0\) if and only if \(a = 0\);

  • \(\tau (ab) = \tau (ba)\) for \(a, b \in A\).

We remark that in general von Neumann algebra theory, the word “trace” is often used to refer to the semi-finite trace on a semi-finite von Neumann algebra, but in this paper “trace” always means “tracial state.”

Definition 2.5

A tracial \(\mathrm {W}^*\)-algebra is a pair \(\mathcal {A}= (A,\tau )\), where A is a \(\mathrm {W}^*\)-algebra and \(\tau \) is a faithful normal trace.

Example 2.6

Let \((\Omega ,P)\) be a probability space. We take \(A = L^\infty (\Omega ,P)\), with the pointwise addition and multiplication operations. The \(*\)-operation is pointwise complex conjugation. The norm is the standard one for \(L^\infty (\Omega ,P)\), and note that \(\Vert fg\Vert \le \Vert f\Vert \Vert g\Vert \) and \(\Vert f^*f\Vert = \Vert f\Vert ^2\). By the Riesz representation theorem, \(L^\infty (\Omega ,P) = L^1(\Omega ,P)^*\), and therefore, we can take \(A_* = L^1(\Omega ,P)\), and then equip \(L^\infty (\Omega ,P)\) with the corresponding weak-\(*\) topology. We define \(\tau \) using the element \(1 \in L^1(\Omega ,P)\), so that \(\tau (f) = \int _{\Omega } f\,dP\). Since \(L^\infty (\Omega ,P)\) is commutative, it is immediate that \(\tau (fg) = \tau (gf)\). The other properties of \(\tau \) are straightforward to check from well-known facts in measure theory. Conversely, it turns out that every commutative tracial \(\mathrm {W}^*\)-algebra is isomorphic to \(L^\infty \) of some probability space [64, Sect. 1.18], [68, Theorem 1.18].

Example 2.7

Let H be an infinite-dimensional Hilbert space, and let \(A = B(H)\) be the algebra of bounded operators on H equipped with the operator norm. Let \(A_*\) be the space of trace class operators. Then A can be canonically identified with the dual of \(A_*\) by the pairing \((a,T) = {{\,\mathrm{Tr}\,}}(aT)\) for \(a \in A\) and \(T \in A_*\). The weak-\(*\) topology on B(H) is also known as the \(\sigma \)-weak operator topology. Thus, B(H) is a \(\mathrm {W}^*\)-algebra. However, it is not a tracial \(\mathrm {W}^*\)-algebra because \({{\,\mathrm{Tr}\,}}\) is not well-defined on all of B(H) and \({{\,\mathrm{Tr}\,}}(1) = \infty \). See for instance [64, Theorem 1.15.3].

Theorem 2.8

(GNS construction for tracial \(\mathrm {W}^*\)-algebras) Let \(\mathcal {A}= (A,\tau )\) be a tracial \(\mathrm {W}^*\)-algebra. Note that \(\langle a,b\rangle _{\mathcal {A}} := \tau (a^*b)\) defines an inner product on A (which is non-degenerate because \(\tau \) is faithful). This can be completed to a Hilbert space, which we denote by \(L^2(\mathcal {A})\). Let us denote the map \(A \rightarrow L^2(\mathcal {A})\) by \(a \mapsto {\widehat{a}}\). Then for each \(a \in A\), there is are unique operators \(\pi _\ell (a), \pi _r(a) \in B(L^2(\mathcal {A}))\) such that \(\pi _\ell (a) {\widehat{b}} = {\widehat{ab}}\) and \(\pi _r(a) {\widehat{b}} = {\widehat{ba}}\) for \(b \in A\). Moreover, \(\pi _\ell \) defines a \(*\)-homomorphism \(A \rightarrow B(L^2(\mathcal {A}))\) which is continuous with respect to the weak-\(*\) topologies on A and \(B(L^2(\mathcal {A}))\). Similarly, \(\pi _r\) is a \(*\)-anti-homomorphism (it preserves \(+\) and \(*\) but reverses the order of multiplication) that is weak-\(*\) continuous. Furthermore, since \(\Vert a^*\Vert _{L^2(\mathcal {A})} = \Vert a\Vert _{L^2(\mathcal {A})}\), there is a unique skew-linear isometry \(J: L^2(\mathcal {A}) \rightarrow L^2(\mathcal {A})\) such that \(J({\widehat{a}}) = \widehat{a^*}\). See [51, Sect. IV] and [2, Sect. 7].

Example 2.9

Let \(A = L^\infty (\Omega ,P)\) and let \(\tau \) be integration against P. Then \(\langle f,g\rangle _{L^2(\mathcal {A})} = \int _{\Omega } \overline{f}g\,dP\). The completion \(L^2(\mathcal {A})\) can be canonically identified with \(L^2(\Omega ,P)\). The map \(\widehat{~}\) is the standard inclusion \(L^\infty (\Omega ,P) \rightarrow L^2(\Omega ,P)\). The operator \(\pi (f) \in B(L^2(\Omega ,P))\) is the operator of multiplication by f.

Remark 2.10

Our examples indicate that if \(\mathcal {A}= (A,\tau )\) is a tracial \(\mathrm {W}^*\)-algebra, then A is an analog of \(L^\infty (\Omega ,P)\), \(A_*\) is an analog of \(L^1(\Omega ,P)\) and \(L^2(\mathcal {A})\) is an analog of \(L^2(\Omega ,P)\). In fact, there is even a non-commutative analog of measurable functions on \(\Omega \) that are finite almost everywhere; this is known as the algebra \({{\,\mathrm{Aff}\,}}(\mathcal {A})\) of operators affiliated to \(\mathcal {A}\), certain closed unbounded operators on the Hilbert space \(L^2(\mathcal {A})\). The space \(L^2(\mathcal {A})\) can be canonically identified with a subspace of the affiliated operators. Thus, the left and right multiplication operators \(\pi _\ell (a)\) and \(\pi _r(a)\) for \(a \in A\) become instances of multiplying affiliated operators. Moreover, there are subspaces \(L^p(\mathcal {A}) \subseteq {{\,\mathrm{Aff}\,}}(\mathcal {A})\) for \(p \in [1,\infty )\) which share many properties of the classical \(L^p\) spaces. There is also a natural identification of \(A_*\) with \(L^1(\mathcal {A})\). See §A.1 and the references therein for details.

2.2 \(\mathrm {W}^*\)-embeddings, trace-preserving conditional expectations, and \(\mathrm {W}^*\)-isomorphisms

Notation 2.11

If \(\mathcal {A}= (A,\tau )\) is a tracial \(\mathrm {W}^*\)-algebra, we will use the notation \(L^\infty (\mathcal {A})\) for A and \(\tau _{\mathcal {A}}\) for \(\tau \) when it is convenient to avoid naming A and \(\tau \) explicitly. In particular, the norm on A will be denoted \(\Vert \cdot \Vert _{L^\infty (\mathcal {A})}\). Furthermore, we will treat \(L^\infty (\mathcal {A})\) as a subspace of \(L^2(\mathcal {A})\). We will also write ab rather than \(\pi _\ell (a) {\widehat{b}}\) and ba rather than \(\pi _r(a) {\widehat{b}}\) for \(a \in L^\infty (\mathcal {A})\) and \(b \in L^2(\mathcal {A})\). Finally, we write \(a^*\) instead of J(a) for \(a \in L^2(\mathcal {A})\). We denote by \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}\) the real subspace of \(L^2(\mathcal {A})\) consisting of those elements fixed by J.

Definition 2.12

Let \(\mathcal {A}\) and \(\mathcal {B}\) be tracial \(\mathrm {W}^*\)-algebras. A linear map \(\phi : L^\infty (\mathcal {A}) \rightarrow L^\infty (\mathcal {B})\) is said to be trace-preserving if \(\tau _{\mathcal {A}} = \tau _{\mathcal {B}} \circ \phi \).

Lemma 2.13

(See [12, Lemma 1.5.11] and [2, Sect. 9.1]) Let \(\mathcal {A}\) and \(\mathcal {B}\) be tracial \(\mathrm {W}^*\)-algebras. Let \(\phi : L^\infty (\mathcal {A}) \rightarrow L^\infty (\mathcal {B})\) be a trace-preserving unital \(*\)-homomorphism. Then

  1. (1)

    \(\phi \) extends to an isometry \(L^2(\mathcal {A}) \rightarrow L^2(\mathcal {B})\), and in particular \(\phi \) is injective on A.

  2. (2)

    \(\phi \) is a contraction \(L^\infty (\mathcal {A}) \rightarrow L^\infty (\mathcal {B})\).

  3. (3)

    The adjoint map \(E = \phi ^*: L^2(\mathcal {B}) \rightarrow L^2(\mathcal {A})\) restricts to a map \(L^\infty (\mathcal {B}) \rightarrow L^\infty (\mathcal {A})\) that is contractive with respect to the \(L^\infty \) norm.

  4. (4)

    We have \(E[b^*] = E[b]^*\) for \(b \in L^\infty (\mathcal {B})\), and in fact also for \(b \in L^2(\mathcal {B})\).

  5. (5)

    E is a bimodule map over \(L^\infty (\mathcal {A})\), that is, for \(a \in L^\infty (\mathcal {A})\) and \(b \in L^2(\mathcal {B})\), we have \(E(\phi (a)b) = a E(b)\) and \(E(b \phi (a)) = E(b)a\).

  6. (6)

    E is unital (\(E(1) = 1\)) and trace-preserving (\(\tau _{\mathcal {A}} \circ E = \tau _{\mathcal {B}}\)).

Definition 2.14

In the situation of the previous lemma, we call \(\phi \) a (tracial \(\mathrm {W}^*\))-embedding \(\mathcal {A}\rightarrow \mathcal {B}\) and E the associated trace-preserving conditional expectation. (Note that both maps are unital by definition and the previous proposition.)

Remark 2.15

It turns out that a trace-preserving \(*\)-homomorphism \(L^\infty (\mathcal {A}) \rightarrow L^\infty (\mathcal {B})\) is automatically continuous with respect to the weak-\(*\) topology, essentially because the weak-\(*\) topology can be recovered from the action of \(L^\infty (\mathcal {A})\) on \(L^2(\mathcal {A})\) by Theorem 2.8; see [24] or [2, Proposition 2.6.4]. For similar reasons, the trace-preserving conditional expectation is also weak-\(*\) continuous.

Example 2.16

Suppose that \(\mathcal {B}= L^\infty (\Omega ,\mathcal {F},P)\) for some probability space \((\Omega ,\mathcal {F},P)\), where \(\mathcal {F}\) is the \(\sigma \)-algebra associated to the measure. Let \(\mathcal {G}\) be a \(\sigma \)-subalgebra of \(\mathcal {F}\). Then there is an expectation-preserving inclusion \(L^\infty (\Omega ,\mathcal {G},P) \rightarrow L^\infty (\Omega ,\mathcal {F},P)\). This extends to a map on the \(L^2\) spaces, and the adjoint of this map is the conditional expectation \(E: L^2(\Omega ,\mathcal {F},P) \rightarrow L^2(\Omega ,\mathcal {G},P)\) sending X to \(E[X|\mathcal {G}]\). The properties in Lemma 2.13 then reduce to the well-known classical properties of conditional expectation. For instance, (2) the conditional expectation is contractive on \(L^\infty \), (3) The conditional expectation respects complex conjugation, (4) ff \(X \in L^2(\Omega ,\mathcal {F},P)\) and \(Y \in L^\infty (\Omega ,\mathcal {G},P)\), then \(E[XY | \mathcal {G}] = E[X | \mathcal {G}] Y\), (5) the conditional expectation is expectation-preserving: \(E[E[X|\mathcal {G}]] = E[X]\).

Notation 2.17

If \(\mathcal {A}\) and \(\mathcal {B}\) are tracial \(\mathrm {W}^*\)-algebras, we say that \(\mathcal {A}\subseteq \mathcal {B}\) if \(L^\infty (\mathcal {A}) \subseteq L^\infty (\mathcal {B})\), the addition, product, \(*\)-operation and weak-\(*\) topology for \(L^\infty (\mathcal {A})\) are the restrictions of those from \(L^\infty (\mathcal {B})\), and \(\tau _{\mathcal {A}} = \tau _{\mathcal {B}}|_{L^\infty (\mathcal {A})}\). In this case, we denote the conditional expectation \(\mathcal {B}\rightarrow \mathcal {A}\) by \(E_{\mathcal {A}}\).

As the paper will often deal with m-tuples of self-adjoint elements of \(L^2\), we introduce the following convention to simplify notation.

Notation 2.18

If \(\mathcal {A}\) and \(\mathcal {B}\) are tracial \(\mathrm {W}^*\)-algebras and \(\phi : L^\infty (\mathcal {A}) \rightarrow L^\infty (\mathcal {B})\) is a tracial \(\mathrm {W}^*\)-embedding or a trace-preserving conditional expectation, then we will use the same letter \(\phi \) to denote the extension of the map to the \(L^2\) spaces. Furthermore, if \(X = (X_1,\dots ,X_m) \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), then we will write \(\phi (X) = (\phi (X_1),\dots ,\phi (X_m))\).

Definition 2.19

A tracial \(\mathrm {W}^*\)-embedding \(\phi : \mathcal {A}\rightarrow \mathcal {B}\) is said to be a tracial \(\mathrm {W}^*\)-isomorphism if it is bijective and the inverse map is also a tracial \(\mathrm {W}^*\)-embedding.

For reasons of mathematical logic, the class of tracial \(\mathrm {W}^*\)-algebras is not a set. However, it will be convenient for us in Sect. 3.2 to have a set of isomorphism class representatives of tracial \(\mathrm {W}^*\)-algebras with separable predual.

Lemma 2.20

There exists a set \({\mathbb {W}}\) of tracial \(\mathrm {W}^*\)-algebras, such that

  1. (1)

    the elements of \({\mathbb {W}}\) are pairwise non-isomorphic,

  2. (2)

    for every tracial \(\mathrm {W}^*\)-algebra with separable predual, there is a tracial \(\mathrm {W}^*\)-isomorphism to some element of \({\mathbb {W}}\).

Proof (Sketch of proof)

We saw earlier that if \(\mathcal {A}= (A,\tau )\) is a tracial \(\mathrm {W}^*\)-algebra with separable predual, then there is a \(\mathrm {W}^*\)-embedding \(A \rightarrow B(H_{\mathcal {A}})\) (here by \(\mathrm {W}^*\)-embedding, we mean an injective normal \(*\)-homomorphism in the theory of von Neumann algebras). Also, it is well-known (see e.g. [64]) that if A has separable predual, then \(H_{\mathcal {A}} \cong L^2(\mathcal {A})\) is separable and hence isomorphic as a Hilbert space to \(\ell ^2({\mathbb {N}})\). Therefore, A is isomorphic to some \(\mathrm {W}^*\)-subalgebra of \(B(\ell ^2({\mathbb {N}}))\). Let \(S_1\) be the set of \(\mathrm {W}^*\)-subalgebras of \(B(\ell ^2({\mathbb {N}}))\) (which is a subset of the power set of \(B(\ell ^2({\mathbb {N}}))\)). Let \(S_2\) be the set of pairs \(\{(A,\tau ): A \in S_1, \tau : A \rightarrow \mathbb {C}\text { faithful normal trace}\}\). If \((A,\tau ) \in S_2\), then the adjoint of the inclusion map produces a map from the space \(B(\ell ^2({\mathbb {N}}))_*\) of trace class operators to \(A_* \cong L^1(A,\tau )\), and hence \(A_*\) is separable. Thus, \(S_2\) is a set of tracial \(\mathrm {W}^*\)-algebras such that every tracial \(\mathrm {W}^*\)-algebra with separable predual is isomorphic to some element of \(S_2\). Finally, observe that tracial \(\mathrm {W}^*\)-isomorphism defines an equivalence relation on \(S_2\), and let \(S_3\) be the set of equivalence classes. \(\square \)

2.3 Amalgamated free products

Next, we explain the definition of free independence with amalgamation. This is an analog of conditional independence in classical probability theory. For background see for instance [81] or [12, §4.7].

Definition 2.21

Let \(\mathcal {A}= (A,\tau )\) be a tracial \(\mathrm {W}^*\)-algebra. Let B, \(A_1\), ..., \(A_N\) be \(\mathrm {W}^*\)-subalgebras of A with \(B \subseteq A_j\) for every j. Let \(\mathcal {B}= (B,\tau |_B)\) and let \(E_{\mathcal {B}}: \mathcal {A}\rightarrow \mathcal {B}\) be the trace-preserving conditional expectation. We say that \(A_1\), ..., \(A_N\) are freely independent with amalgamation over \(\mathcal {B}\) if the following condition holds: Whenever \(\ell \in {\mathbb {N}}\) and \(i_1\), ..., \(i_\ell \in \{1,\dots ,N\}\) with \(i_1 \ne i_2\), \(i_2 \ne i_3\), ..., \(i_{\ell -1} \ne i_\ell \) and \(a_j \in A_{i_j}\) with \(E_{\mathcal {B}}[a_j] = 0\) for \(j = 1\), ..., \(\ell \), then \(E_{\mathcal {B}}[a_1 \dots a_\ell ] = 0\).

Proposition 2.22

Let \(\mathcal {B}= (B,\sigma )\) be a tracial \(\mathrm {W}^*\)-algebra. For \(j = 1\), ..., N, let \(\mathcal {A}_j = (A_j,\tau _j)\) be a tracial \(\mathrm {W}^*\)-algebra and let \(\iota _j: \mathcal {B}\rightarrow \mathcal {A}_j\) be a tracial \(\mathrm {W}^*\)-embedding. Then there exists a tracial \(\mathrm {W}^*\)-algebra \(\mathcal {A}= (A,\tau )\) and tracial \(\mathrm {W}^*\)-embeddings \(\iota : \mathcal {B}\rightarrow \mathcal {A}\) and \(\phi _j: \mathcal {A}_j \rightarrow \mathcal {A}\) such that \(\iota = \phi _j \circ \iota _j\) for all j, and such that \(\phi _1(A_1)\), ..., \(\phi _N(A_N)\) are freely independent in \(\mathcal {A}\) with amalgamation over \(\iota (B)\). Moreover, \((\mathcal {A},\tau ,\iota ,\phi _1,\dots ,\phi _N)\) are unique up to a canonical isomorphism; in other words, if \(({\tilde{\mathcal {A}}},{\tilde{\tau }},{\tilde{\iota }},{\tilde{\phi }}_1,\dots ,{\tilde{\phi }}_N)\) is another such tuple, then there is a unique tracial \(\mathrm {W}^*\)-isomorphism \(\pi : \mathcal {A}\rightarrow {\tilde{\mathcal {A}}}\) satisfying \(\pi \circ \iota = {\tilde{\iota }}\) and \(\pi \circ \phi _j = {\tilde{\phi }}_j\) for all j.

Definition 2.23

If \(\mathcal {B}\), \(\mathcal {A}_1\), ..., \(\mathcal {A}_N\), and \(\mathcal {A}\) are as above (with the specified maps \(\iota \), \(\phi _1\), ..., \(\phi _N\)), then we say that \(\mathcal {A}\) is a free product of \(\mathcal {A}_1\), ..., \(\mathcal {A}_N\) with amalgamation over \(\iota _1(\mathcal {B})\), ..., \(\iota _N(\mathcal {B})\).

In the case where \(\mathcal {B}= \mathbb {C}\), we refer to these concepts simply as free independence and free products.

2.4 Non-commutative laws and generators

Next, we describe the space of non-commutative laws. A non-commutative law is the analog of a linear functional \(\mathbb {C}[x_1,\dots ,x_m] \rightarrow {\mathbb {R}}\) given by \(f \mapsto \int f\,d\mu \) for some compactly supported measure on \({\mathbb {R}}^m\). Instead of \(\mathbb {C}[x_1,\dots ,x_m]\), we use the non-commutative polynomial algebra in m variables.

Definition 2.24

(Non-commutative polynomial algebra) We denote by \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \) the universal unital algebra generated by variables \(x_1\), ..., \(x_m\). As a vector space, \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \) has a basis consisting of all products \(x_{i_1} \dots x_{i_\ell }\) for \(\ell \ge 0\) and \(i_1\), ..., \(i_\ell \in \{1,\dots ,m\}\). We equip \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \) with the unique \(*\)-operation such that \(x_j^* = x_j\); more explicitly, the \(*\)-operation is defined on monomials by \((x_{i_1} \dots x_{i_\ell })^* = x_{i_\ell }^* \dots x_{i_1}^*\).

Definition 2.25

(Non-commutative law) A linear functional \(\lambda : \mathbb {C}\langle x_1,\dots ,x_m\rangle \) is said to be exponentially bounded if there exists \(R > 0\) such that \(|\lambda (x_{i_1} \dots x_{i_\ell })| \le R^\ell \) for all \(\ell \in {\mathbb {N}}_0\) and \(i_1\), ..., \(i_\ell \in \{1,\dots ,m\}\), and in this case we say R is an exponential bound for \(\lambda \). A non-commutative law is a unital, positive, tracial, exponentially bounded linear functional \(\lambda : \mathbb {C}\langle x_1,\dots ,x_m\rangle \rightarrow \mathbb {C}\). We denote the space of non-commutative laws by \(\Sigma _m\), and we equip it with the weak-\(*\) topology (that is, the topology of pointwise convergence on \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \)). We denote by \(\Sigma _{m,R}\) the subset of \(\Sigma _m\) comprised of non-commutative laws with exponential bound R.

Observation 2.26

The space \(\Sigma _{m,R}\) is convex, compact, and metrizable.

Observation 2.27

Let A be a \(*\)-algebra and \(X = (X_1, \dots , X_m) \in A_{{{\,\mathrm{sa}\,}}}^m\). Then there is a unique \(*\)-homomorphism \(\pi _{X}: \mathbb {C}\langle x_1,\dots ,x_m\rangle \rightarrow \mathcal {A}\) such that \(\pi _{X}(x_j) = X_j\) for \(j = 1\), ..., m.

Definition 2.28

(Non-commutative law of an m-tuple) Let \(\mathcal {A}\) be a tracial \(\mathrm {W}^*\)-algebra. Let \(X = (X_1,\dots ,X_m) \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Then we define \(\lambda _{X}: \mathbb {C}\langle x_1,\dots ,x_m\rangle \rightarrow \mathbb {C}\) by \(\lambda _{X} = \tau \circ \pi _{X}\), where \(\pi _X\) is the map defined in the previous observation.

Notation 2.29

If \(\mathcal {A}\) is a tracial \(\mathrm {W}^*\)-algebra and \(X \in L^\infty (\mathcal {A})^m\), we write

$$\begin{aligned} \Vert X\Vert _{L^\infty (\mathcal {A})^m} := \max (\Vert X_j\Vert _{L^\infty (\mathcal {A})}: j = 1,\dots ,m). \end{aligned}$$

Observation 2.30

If \(\mathcal {A}\) and X are as above, then \(\lambda _{X}\) is a non-commutative law with exponential bound \(\Vert X\Vert _\infty \). Conversely, if R is an exponential bound for \(\lambda _{X}\), then

$$\begin{aligned} \Vert X\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \max _j \lim _{n \rightarrow \infty } \tau (X_j^{2n})^{1/2n} \le R. \end{aligned}$$

Hence, \(\Vert X\Vert _\infty \) is the smallest exponential bound for \(\lambda _{X}\) and in particular it is uniquely determined by \(\lambda _{X}\).

In the case of a single operator X, we can apply the spectral theorem to show that there is a unique probability measure \(\mu _X\) on \({\mathbb {R}}\) satisfying

$$\begin{aligned} \int _{{\mathbb {R}}} f\,d\mu _X = \tau (f(X)) \text { for } f \in C_0({\mathbb {R}}). \end{aligned}$$

Since X is bounded, \(\mu _X\) is compactly supported and thus makes sense to evaluate on polynomials. If p is a polynomial, then \(\lambda _X[p] = \int _{{\mathbb {R}}} p\,d\mu _X\). Thus, \(\lambda _X\) is simply the linear functional on polynomials corresponding to the spectral distribution.

We use the notation \(\lambda _{X}\) in particular when \(\mathcal {A}= M_n(\mathbb {C})\). We denote by \({{\,\mathrm{tr}\,}}_n\) the normalized trace \((1/n) {{\,\mathrm{Tr}\,}}\) on \(M_n(\mathbb {C})\); recall that this is the unique (unital) trace on \(M_n(\mathbb {C})\). Thus, for any \(X \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\), a non-commutative law \(\lambda _{X}\) is unambiguously specified by the previous definition. In the \(m = 1\) case, the non-commutative law is given by the empirical spectral distribution. Note that when \(\mathrm {X}\) is a random m-tuple of matrices, we will use the notation \(\lambda _{X}\) by default to refer to the empirical non-commutative law, that is, the (random) non-commutative law of X with respect to \({{\,\mathrm{tr}\,}}_n\).

The next proposition shows that any non-commutative law can be realized by a self-adjoint m-tuple in some tracial \(\mathrm {W}^*\)-algebra. This is a version of the Gelfand-Naimark-Segal construction (or GNS construction). A proof can be found in [3, Proposition 5.2.14(d)].

Proposition 2.31

(GNS construction for non-commutative laws) Let \(\lambda \in \Sigma _{m,R}\). Then we may define a semi-inner product on \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \) by

$$\begin{aligned} \langle p,q\rangle _\lambda = \lambda (p^*q). \end{aligned}$$

Let \(H_\lambda \) be the separation-completion of \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \) with respect to this inner product, that is, the completion of \(\mathbb {C}\langle x_1,\dots ,x_m\rangle / \{p: \lambda (p^*p) = 0\}\), and let [p] denote the equivalence class of a polynomial p in \(H_\lambda \).

There is a unique unital \(*\)-homomorphism \(\pi : \mathbb {C}\langle x_1,\dots ,x_m\rangle \rightarrow B(H_\lambda )\) satisfying \(\pi (p)[q] = [pq]\) for p, \(q \in \mathbb {C}\langle x_1,\dots ,x_m\rangle \). Moreover, \(\Vert \pi (x_j)\Vert \le R\).

Let \(X_j = \pi (x_j)\), let \(X = (X_1,\dots ,X_m)\) and let A be the \(\mathrm {W}^*\)-subalgebra of \(B(H_\lambda )\) generated by \(X_1\), ..., \(X_m\). Define \(\tau : A \rightarrow \mathbb {C}\) by \(\tau (Y) = \langle [1], Y[1]\rangle _\lambda \). Then \(\tau \) is a faithful normal trace on A, and hence \(\mathcal {A}= (A,\tau )\) is a tracial \(\mathrm {W}^*\)-algebra.

Definition 2.32

In the situation of the previous proposition, we call \((\mathcal {A},X)\) the GNS realization of \(\lambda \).

The tracial \(\mathrm {W}^*\)-algebra associated to \(\lambda \) is canonical in the sense that any other construction would yield an isomorphic tracial \(\mathrm {W}^*\)-algebra. The following lemma can be deduced from the well-known properties of the GNS representation associated to a faithful trace \(\tau \) on a \(\mathrm {W}^*\)-algebra \(\mathcal {A}\) (which gives the so-called standard form of a tracial \(\mathrm {W}^*\)-algebra).

Lemma 2.33

Let \(\mathcal {A}\) and \(\mathcal {B}\) be tracial \(\mathrm {W}^*\)-algebras. Let \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(Y \in L^\infty (\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) such that \(\lambda _{X} = \lambda _{Y}\). Let \(\mathrm {W}^*(X)\) and \(\mathrm {W}^*(Y)\) be the \(\mathrm {W}^*\)-subalgebras of \(L^\infty (\mathcal {A})\) and \(L^\infty (\mathcal {B})\) generated by X and \(\mathrm {Y}\) respectively, equipped with the traces \(\tau _{\mathcal {A}}|_{\mathrm {W}^*(X)}\) and \(\tau _{\mathcal {B}}|_{\mathrm {W}^*(Y)}\). Then there is a unique tracial \(\mathrm {W}^*\)-isomorphism \(\rho : \mathrm {W}^*(X) \rightarrow \mathrm {W}^*(Y)\) such that \(\rho (X_j) = Y_j\).

Here is a related lemma about generating sets for a tracial \(\mathrm {W}^*\)-algebra, which relies on the Kaplansky density theorem [68, Theorem II.4.8].

Lemma 2.34

Let \(\mathcal {A}\) be a tracial \(\mathrm {W}^*\)-algebra. Let \(S \subseteq L^\infty (\mathcal {A})\). Let \(\mathrm {W}^*(S)\) be the smallest \(\mathrm {W}^*\)-subalgebra of \(\mathcal {A}\) containing S, which is equal to the weak-\(*\) closure of the unital \(*\)-algebra generated by S. Then every \(Z \in \mathrm {W}^*(S)\) can be approximated in the \(L^2(\mathcal {A})\) norm by a sequence \(Z_n\) in the unital \(*\)-algebra generated by S such that \(\Vert Z_n\Vert _{L^\infty (\mathcal {A})} \le \Vert Z\Vert _{L^\infty (\mathcal {A})}\). Furthermore, if \(\phi : \mathcal {A}\rightarrow \mathcal {B}\) is a \(\mathrm {W}^*\)-embedding, then \(\phi |_{\mathrm {W}^*(S)}\) is uniquely determined by \(\phi |_S\).

In fact, the notion of generators for a \(\mathrm {W}^*\)-algebra extends to elements of \(L^2(\mathcal {A})\). For instance, for a self-adjoint tuple \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), using the theory of affiliated operators sketched in Sect. A.1, it is valid to apply a bounded Borel function f to \(X_j\) through functional calculus, and \(f(X_j)\) will be an element of \(\mathcal {A}\). Thus, we may define \(\mathrm {W}^*(X)\) as (for instance) the \(\mathrm {W}^*\)-subalgebra generated by \(\arctan (X_1)\), ..., \(\arctan (X_m)\), and then, as one would hope, X turns out to be in \(L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\). See also [12, pp. 482–483]. We can state a characterization of \(\mathrm {W}^*(X)\) without reference to affiliated operators as follows.

Lemma 2.35

Let \(\mathcal {A}\) be a tracial \(\mathrm {W}^*\)-algebra and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Then there exists a unique smallest \(\mathrm {W}^*\)-subalgebra \(B \subseteq L^\infty (\mathcal {A})\) such that \(X \in B_{{{\,\mathrm{sa}\,}}}^m\). We use the notation \(\mathrm {W}^*(X)\) for B and for \((B,\tau _{\mathcal {A}}|_B)\) as needed.

3 Duality for \(L^2\) Optimal Couplings

Our goal is to prove a version of the Monge–Kantorovich duality for the non-commutative version of the \(L^2\) Wasserstein distance defined by Biane and Voiculescu [11]. In Sect. 3.1 we recall the definitions of optimal couplings that were stated more succinctly in the introduction. We define E-convex functions in Sect. 3.2 and the corresponding Legendre transform in Sect. 3.3. Then we prove the non-commutative Monge–Kantorovich duality in Sect. 3.4, and as an application we prove a decomposition result for optimal couplings in Sect. 3.5.

3.1 Wasserstein distance and optimal couplings

Definition 3.1

(Biane-Voiculescu [11, Sect. 1.1]) Let \(\mu \), \(\nu \in \Sigma _m\) be non-commutative laws. A coupling of \(\mu \) and \(\nu \) is a triple \((\mathcal {A},X,Y)\) where \(\mathcal {A}\) is a tracial \(\mathrm {W}^*\)-algebra and X, \(Y \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) such that \(\lambda _X = \mu \) and \(\lambda _Y = \nu \). For \(\mu \), \(\nu \in \Sigma _m\), the (non-commutative \(L^2\)) Wasserstein distance \(d_W^{(2)}(\mu ,\nu )\) is the infimum of \(\Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) over all couplings \((\mathcal {A},X,Y)\) for \(\mathcal {A}\in {\mathbb {W}}\).

It is shown in [11, Theorem 1.3] that \(d_W^{(2)}\) is a metric on the set \(\Sigma _m\), and for each \(R > 0\), \(\Sigma _{m,R}\) is complete in this metric. However, as shown in Sect. 5.4, the topology generated by \(d_W^{(2)}\) is strictly stronger than the weak-\(*\) topology on \(\Sigma _m\). The notion of optimal couplings corresponding to the Wasserstein distance is as follows.

Definition 3.2

A coupling \((\mathcal {A},X,Y)\) of two non-commutative laws \(\mu \) and \(\nu \) is optimal if \(\Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = d_W^{(2)}(\mu ,\nu )\).

Remark 3.3

As remarked in [11], for every \(\mu \), \(\nu \in \Sigma _m\), some optimal coupling exists. To see this, suppose \(R > 0\) is an exponential bound for \(\mu \) and \(\nu \). Note that that if \((\mathcal {A},X,Y)\) is a coupling and \(\gamma \) is the joint law of (XY), then \(\Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \left( \sum _{j=1}^m \gamma ((x_j - x_{m+j})^2)\right) ^{1/2}\). The space of joint laws \(\gamma \in \Sigma _{2m,R}\) with marginals \(\mu \) and \(\nu \) is closed in \(\Sigma _{2m,R}\) and therefore compact, and \(\gamma \mapsto \left( \sum _{j=1}^m \gamma ((x_j - x_{m+j})^2)\right) ^{1/2}\) is continuous. Thus, it achieves a minimum at some \(\gamma ^*\), and we obtain an optimal coupling \((\mathcal {A},X,Y)\) from the GNS construction with \(\gamma ^*\) (Proposition 2.31).

Just as in classical optimal transport theory, it is convenient to frame \(L^2\) optimal couplings in terms of inner products rather than \(L^2\) norms in order to relate them with Legendre transforms. If \((\mathcal {A},X,Y)\) is a coupling of \(\mu \) and \(\nu \), then

$$\begin{aligned} \Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 = \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + \Vert Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2. \end{aligned}$$

Since \(\Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\) and \(\Vert Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\) are uniquely determined by \(\mu \) and \(\nu \), a coupling minimizes \(\Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) if and only if it maximizes the inner product \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\). This motivates the following definition.

Definition 3.4

For \(\mu \), \(\nu \in \Sigma _m\), we denote by \(C(\mu ,\nu )\) the maximal value of \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) over all couplings \((\mathcal {A},X,Y)\) of \(\mu \) and \(\nu \).

The preceding paragraph shows that

$$\begin{aligned} d_W^{(2)}(\mu ,\nu )^2 = \sum _{j=1}^m \mu (x_j^2) + \sum _{j=1}^m \nu (x_j^2) - 2 C(\mu ,\nu ). \end{aligned}$$

The goal of the section is to establish a duality result that \(C(\mu ,\nu )\) is the infimum of \(\mu (f) + \nu (g)\) over certain pairs (fg) of E-convex functions.

3.2 E-convex functions

Fix a set \({\mathbb {W}}\) of isomorphism class representatives for tracial \(\mathrm {W}^*\)-algebras with separable predual, as was given by Lemma 2.20. (Although we are only considering a set of isomorphism class representatives, we make no identifications between different tracial \(\mathrm {W}^*\)-embeddings from a given \(\mathcal {A}\in {\mathbb {W}}\) to a given \(\mathcal {B}\in {\mathbb {W}}\).)

Definition 3.5

Let S be a set. A tracial \(\mathrm {W}^*\)-function with values in S is tuple \(f = (f^{\mathcal {A}})_{\mathcal {A}\in {\mathbb {W}}}\), where \(f^{\mathcal {A}}: L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \rightarrow S\) if whenever \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) is a tracial \(\mathrm {W}^*\)-embedding, we have \(f^{\mathcal {A}} = f^{\mathcal {B}} \circ \iota \). (Here the inclusion \(\iota \) is understood to extend to a map \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \rightarrow L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) per Notation 2.18.)

Thus, roughly speaking, being a \(\mathrm {W}^*\)-function means that the evaluation of f on some \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) is independent of the ambient algebra. Hence, in particular, for bounded operators, \(f^{\mathcal {A}}(X)\) only depends on the non-commutative law of X.

Although the definition of f only specifies \(f^{\mathcal {A}}\) when \(\mathcal {A}\) is in the set \({\mathbb {W}}\), it will sometimes be convenient to use the notation \(f^{\mathcal {A}}\) for a general tracial \(\mathrm {W}^*\)-algebra \(\mathcal {A}\) with separable predual. Indeed, by our choice of \({\mathbb {W}}\), there exists an isomorphism \(\phi \) from \(\mathcal {A}\) to some \(\mathcal {B}\in {\mathbb {W}}\). We can then set \(f^{\mathcal {A}} = f^{\mathcal {B}} \circ \phi \). This is well-defined, that is, independent of the particular choice of \(\phi \), because \(f^{\mathcal {B}} \circ \psi = f^{\mathcal {B}}\) for every automorphism \(\psi \) of \(\mathcal {B}\); this in turn follows from the definition of \(\mathrm {W}^*\)-functions since an automorphism \(\psi \) is in particular an inclusion from \(\mathcal {B}\) into \(\mathcal {B}\).

Definition 3.6

A tracial \(\mathrm {W}^*\)-function \(f = (f^{\mathcal {A}})_{\mathcal {A}\in {\mathbb {W}}}\) with values in \([-\infty ,+\infty ]\) is said to be E-convex if either it is identically equal to \(-\infty \) or the following conditions hold:

  1. (1)

    For each \(\mathcal {A}\), \(f^{\mathcal {A}}\) is a convex and lower semi-continuous function \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \rightarrow (-\infty ,+\infty ]\).

  2. (2)

    If \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) is a trace-preserving embedding, and if \(E = \iota ^*: \mathcal {B}\rightarrow \mathcal {A}\) is the corresponding trace-preserving conditional expectation, then

    $$\begin{aligned} f^{\mathcal {A}}(E[X]) \le f^{\mathcal {B}}(X) \end{aligned}$$

    for \(X \in \mathcal {B}_{{{\,\mathrm{sa}\,}}}^m\). (Here E is understood to extend to a map \(L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \rightarrow L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) per Notation 2.18.)

Example 3.7

For \(t \in (0,\infty )\), let \(q_t^{\mathcal {A}}(X) = (1/2t) \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\). Then \(q_t\) is E-convex. Indeed, it is convex because of the Cauchy-Schwarz and arithmetic-geometric mean inequalities. It is clearly continuous. Finally, it satisfies monotonicity under conditional expectation because conditional expectations are contractive in \(\Vert \cdot \Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\).

We next explain an equivalent characterization of E-convexity using subgradient vectors.

Definition 3.8

If H is a real Hilbert space and \(f: H \rightarrow (-\infty ,\infty ]\) is a function, we say that \(y \in H\) is a subgradient for f at x if

$$\begin{aligned} f(x') \ge f(x) + \langle y, x' - x\rangle \text { for all } x' \in H. \end{aligned}$$

We define the subdifferential \(\eth f(x)\) as the set of subgradient vectors at x.

The following facts are well-known in convex analysis.

Lemma 3.9

Let H be a Hilbert space. If \(f: H \rightarrow [-\infty ,\infty ]\) is convex and lower semi-continuous and f(x) is finite, then \(\eth f(x)\) is nonempty, closed, and convex. Conversely, \(f: H \rightarrow (-\infty ,\infty )\) and \(\eth f\) is nonempty for every x, then f is convex.

Analogously, we will show that E-convex \(\mathrm {W}^*\)-functions are characterized by the existence of a subgradient vector Y to \(f^{\mathcal {A}}\) at X such that \(Y \in L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\) (where \(\mathrm {W}^*(X)\) is given by Lemma 2.35). In addition, we handle the case where f can take the value \(+\infty \).

Lemma 3.10

Let f be a \(\mathrm {W}^*\)-function taking values in \((-\infty ,\infty )\). Then f is E-convex if and only if for each \(\mathcal {A}\in {\mathbb {W}}\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), there exists \(Y \in L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\) which is a subgradient vector to \(f^{\mathcal {A}}\) at X. Here \(\mathrm {W}^*(X)\) is given by Lemma 2.35.

Proof

First, suppose that f is E-convex. Fix \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). By Lemma 3.9, there exists some subgradient vector Z to \(f^{\mathcal {A}}(X)\). Let \(\mathcal {B}= \mathrm {W}^*(X)\), let \(E: \mathcal {A}\rightarrow \mathcal {B}\) be the trace-preserving conditional expectation, and let \(Y = E[Z]\). Then for \(X' \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), we have

$$\begin{aligned} f^{\mathcal {A}}(X')&\ge f^{\mathcal {B}}(E[X']) = f^{\mathcal {A}}(E[X']) \\&\ge f^{\mathcal {A}}(X) + \langle Z, E[X'] - X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {A}}(X) + \langle Z, E[X' - X]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {A}}(X) + \langle Y, X' - X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Thus, the desired subgradient condition holds.

Conversely, suppose this subgradient condition holds. Lower semi-continuity of \(f^{\mathcal {A}}\) follows from the existence of subgradient vectors. For \(X_0\), \(X_1 \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(t \in (0,1)\), we have

$$\begin{aligned} f^{\mathcal {A}}((1 - t) X_0 + t X_1) \le (1 - t) f^{\mathcal {A}}(X_0) + t f^{\mathcal {A}}(X_1). \end{aligned}$$

because of the existence of a subgradient vector at \((1 - t) X_0 + t X_1\). To check the monotonicity under conditional expectation, consider an embedding \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) and let \(E: \mathcal {B}\rightarrow \mathcal {A}\) be the corresponding conditional expectation. Let \(X \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) and let \(X' = E[X]\). By (1), there is a subgradient vector Y to \(f^{\mathcal {B}}\) at the point \(X'\) that is in \(L^2(\mathrm {W}^*(X'))_{{{\,\mathrm{sa}\,}}}^m\), and in particular \(Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). But then

$$\begin{aligned} f^{\mathcal {B}}(X)&\ge f^{\mathcal {B}}(X') + \langle Y, X - X'\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {B}}(E[X]) + \langle Y, X - E[X]\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {B}}(E[X]). \end{aligned}$$

\(\square \)

Remark 3.11

The same argument shows that for a \(\mathrm {W}^*\)-function taking values in \((-\infty ,+\infty ]\), E-convexity is equivalent to the combination of the following three conditions:

  1. (1)

    For each \(\mathcal {A}\in {\mathbb {W}}\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), if \(f^{\mathcal {A}}(X) < \infty \), then there exists \(Y \in L^2(\mathrm {W}^*(X))\) which is a subgradient vector to \(f^{\mathcal {A}}\) at X.

  2. (2)

    For each \(\mathcal {A}\), the set \((f^{\mathcal {A}})^{-1}((-\infty ,M])\) is closed and convex in \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\).

  3. (3)

    If \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) is a tracial \(\mathrm {W}^*\)-embedding and \(E = \iota ^*: \mathcal {B}\rightarrow \mathcal {A}\) is the corresponding conditional expectation, then \(f^{\mathcal {B}}(X) < +\infty \) implies \(f^{\mathcal {A}}(E[X]) < +\infty \).

Remark 3.12

If f is a tracial \(\mathrm {W}^*\)-function, then \(f^{\mathcal {A}}(UXU^*) = f^{\mathcal {A}}(X)\) for every unitary U in \(L^\infty (\mathcal {A})\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\); this is because conjugation by U defines an automorphism of \(\mathcal {A}\) (hence in particular a tracial \(\mathrm {W}^*\)-embedding \(\mathcal {A}\rightarrow \mathcal {A}\)), and f respects tracial \(\mathrm {W}^*\)-embeddings.

If f is E-convex, then this unitary invariance gives rise to a “sum of commutators” condition on subgradient vectors related to Lemma 1.14. More precisely, suppose f is E-convex, \(Y \in \eth f^{\mathcal {A}}(X)\) and U is a unitary in \(L^\infty (\mathcal {A})\). Then

$$\begin{aligned} f^{\mathcal {A}}(X) = f^{\mathcal {A}}(UXU^*) \ge f^{\mathcal {A}}(X) + \langle UXU^* - X, Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

As in Lemma 1.14, by taking \(U = e^{itA}\) for \(A \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}\) and differentiating at \(t = 0\), we obtain \(\sum _{j=1}^m [X_j,Y_j] = 0\).

The next lemma describes how the subdifferential interacts with conditional expectations.

Lemma 3.13

Let f be an E-convex \(\mathrm {W}^*\)-function. Let \(\mathcal {A}\in {\mathbb {W}}\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Let \(\mathcal {B}\) be a tracial \(\mathrm {W}^*\)-subalgebra of \(\mathcal {A}\).

  1. (1)

    If \(f^{\mathcal {B}}(E_{\mathcal {B}}[X]) = f^{\mathcal {A}}(X)\), then

    $$\begin{aligned} \eth f^{\mathcal {B}}(E_{\mathcal {B}}[X]) = L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \cap \eth f^{\mathcal {A}}(X). \end{aligned}$$
  2. (2)

    If \(Y \in \eth f^{\mathcal {A}}(X)\), then \(E_{\mathrm {W}^*(X)}[Y] \in \eth f^{\mathcal {A}}(X)\).

Proof

(1) First, we show that \(\eth f^{\mathcal {B}}(E_{\mathcal {B}}[X]) \subseteq L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \cap \eth f^{\mathcal {A}}(X)\). If \(Y \in \eth f^{\mathcal {B}}(E_{\mathcal {B}}[X])\), then clearly \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\). Moreover, for all \(Z \in L^2(\mathcal {A})\), we have

$$\begin{aligned} f^{\mathcal {A}}(Z)&\ge f^{\mathcal {B}}(E_{\mathcal {B}}[Z]) \\&\ge f^{\mathcal {B}}(E_{\mathcal {B}}[X]) + \langle Y, E_{\mathcal {B}}[Z] - E_{\mathcal {B}}[X]\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {A}}(X) + \langle Y, Z - X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}, \end{aligned}$$

where we have used the fact that \(E_{\mathcal {B}}\) is self-adjoint and \(E_{\mathcal {B}}[Y] = Y\). Hence, \(Y \in \eth f^{\mathcal {A}}(X)\) as desired.

Conversely, to show that \(L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \cap \eth f^{\mathcal {A}}(X) \subseteq \eth f^{\mathcal {B}}(E_{\mathcal {B}}[X])\), suppose that \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \cap \eth f^{\mathcal {A}}(X)\). Then for \(Z \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\),

$$\begin{aligned} f^{\mathcal {B}}(Z)&= f^{\mathcal {A}}(Z) \\&\ge f^{\mathcal {A}}(X) + \langle Y, Z - X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {B}}(E_{\mathcal {B}}[X]) + \langle Y, Z - E_{\mathcal {B}}[X]\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}, \end{aligned}$$

because \(E_{\mathcal {B}}\) is a self-adjoint operator on \(L^2(\mathcal {A})\) and \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\).

(2) Let \(\mathcal {B}= \mathrm {W}^*(X)\) (where the trace is given by the restriction of \(\tau _{\mathcal {A}}\)). Let \(Z \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\). Then

$$\begin{aligned} f^{\mathcal {B}}(Z)&= f^{\mathcal {A}}(Z) \\&\ge f^{\mathcal {A}}(X) + \langle Y,Z - X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {B}}(X) + \langle E_{\mathcal {B}}[Y], Z - X\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Thus, \(E_{\mathcal {B}}[Y] \in \eth f^{\mathcal {B}}(X)\), and so by (1), \(E_{\mathcal {B}}[Y] \in \eth f^{\mathcal {A}}(X)\). \(\square \)

Lemma 3.14

Let f be an E-convex \(\mathrm {W}^*\)-function. Let \(\mathcal {A}\in {\mathbb {W}}\) and \(X\in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\).

  1. (1)

    There exists a unique \(Y \in \eth f^{\mathcal {A}}(X)\) of minimal \(L^2\)-norm.

  2. (2)

    The Y from (1) satisfies \(Y \in L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\).

  3. (3)

    Let \(\mathcal {B}= \mathrm {W}^*(Y)\) as described in Lemma 2.35, where Y is as in (1). Then \(f^{\mathcal {B}}(E_{\mathcal {B}}[X]) = f^{\mathcal {A}}(X)\) and \(\mathcal {B}= \mathrm {W}^*(E_{\mathcal {B}}[X])\).

Proof

(1) Because \(\eth f^{\mathcal {A}}(X)\) is a closed convex set, it has a unique element of minimal \(L^2\)-norm.

(2) Let \(\mathcal {C}= \mathrm {W}^*(X)\). Let \(Y' = E_{\mathcal {C}}[Y]\). We claim that \(Y' \in \eth f^{\mathcal {C}}(X)\). Let \(Z \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m\). Then

$$\begin{aligned} f^{\mathcal {C}}(Z)&= f^{\mathcal {A}}(Z) \\&\ge f^{\mathcal {A}}(X) + \langle Y,Z - X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {C}}(X) + \langle Y', Z - X\rangle _{ L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Thus, \(Y' \in \eth f^{\mathcal {C}}(X)\). By the previous lemma, \(Y' \in \eth f^{\mathcal {A}}(X)\). But because Y has minimal norm, we have \(\Vert E_{\mathcal {C}}[Y]\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \Vert Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\), hence \(E_{\mathcal {C}}[Y] = Y\), so \(Y \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m\).

(3) First, we show that \(f^{\mathcal {B}}(E_{\mathcal {B}}[X]) = f^{\mathcal {A}}(X)\). By E-convexity, \(f^{\mathcal {B}}(E_{\mathcal {B}}[X]) \le f^{\mathcal {A}}(X)\). Conversely,

$$\begin{aligned} f^{\mathcal {B}}(E_{\mathcal {B}}[X]) = f^{\mathcal {A}}(E_{\mathcal {B}}[X]) \ge f^{\mathcal {A}}(X) + \langle Y, E_{\mathcal {B}}[X] - X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = f^{\mathcal {A}}(X). \end{aligned}$$

Thus, by Lemma 3.13 (1), \(Y \in \eth f^{\mathcal {B}}(E_{\mathcal {B}}[X])\). Now letting \(\mathcal {D}= \mathrm {W}^*(E_{\mathcal {B}}[X])\) with the trace \(\tau |_{\mathcal {D}}\), Lemma 3.13 (2) implies that \(E_{\mathcal {D}}[Y] \in \eth f^{\mathcal {B}}(E_{\mathcal {B}}[X])\), hence also \(E_{\mathcal {D}}[Y] \in \eth f^{\mathcal {A}}(X)\) by Lemma 3.13 (1). Because Y was chosen to have minimal norm, we have \(E_{\mathcal {D}}[Y] = Y\), and thus, \(\mathcal {D}\supseteq \mathrm {W}^*(Y) = \mathcal {B}\) by the characterization of \(\mathrm {W}^*(Y)\) given in Lemma 2.35. Hence, \(\mathcal {B}= \mathcal {D}= \mathrm {W}^*(E_{\mathcal {B}}[X])\). \(\square \)

3.3 Legendre transforms

Definition 3.15

We define the Legendre transform as the tuple \(\mathcal {L}f = (\mathcal {L}f^{\mathcal {A}})_{\mathcal {A}\in {\mathbb {W}}}\) given by

$$\begin{aligned} \mathcal {L}f^{\mathcal {A}}(X)= & {} \sup \{ \langle \iota (X),Y\rangle - f^{\mathcal {B}}(Y): \mathcal {B}\in {\mathbb {W}}, \iota : \mathcal {A}\rightarrow \mathcal {B}\text { a tracial} \mathrm {W}^*\text {-embedding},\\&Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\} \end{aligned}$$

for \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\).

Example 3.16

Consider again \(q_t^{\mathcal {A}}(X) = (1/2t) \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\). A standard computation with norms and inner products shows that \(\mathcal {L} q_t = q_{1/t}\).

Proposition 3.17

Let f be a tracial \(\mathrm {W}^*\)-function.

  1. (1)

    The Legendre transform \(\mathcal {L}f\) is an E-convex tracial \(\mathrm {W}^*\)-function.

  2. (2)

    If \(f \le g\), then \(\mathcal {L}f \ge \mathcal {L}g\).

  3. (3)

    We have \(\mathcal {L}^2 f \le f\) with equality if and only if f is E-convex.

  4. (4)

    \(\mathcal {L}^2 f\) is the maximal E-convex function that is less than or equal to f.

Proof

(1) If f is identically equal to \(-\infty \) or \(+\infty \), \(\mathcal {L}f\) will be \(+\infty \) or \(-\infty \) respectively and there is nothing to prove. Hence, assume that f attains some finite value at some \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) for some \(\mathcal {B}\in {\mathbb {W}}\).

For any \(\mathcal {A}\in {\mathbb {W}}\), the free product \(\mathcal {A}* \mathcal {B}\) is isomorphic to some \(\mathcal {C}\in {\mathbb {W}}\). Let \(\iota _1: \mathcal {A}\rightarrow \mathcal {C}\) and \(\iota _2: \mathcal {B}\rightarrow \mathcal {C}\) be the corresponding tracial \(\mathrm {W}^*\)-embeddings. Then

$$\begin{aligned} \mathcal {L}f^{\mathcal {A}}(X) \ge \langle \iota _1(X),\iota _2(Y)\rangle _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {C}}(\iota _2(Y)) > -\infty , \end{aligned}$$

since \(f^{\mathcal {C}}(\iota _2(Y)) = f^{\mathcal {B}}(Y) < +\infty \). Hence, \(\mathcal {L} f\) is never equal to \(-\infty \).

For each \(\mathcal {A}\), the function \(\mathcal {L} f^{\mathcal {A}}\) is a supremum of affine functions, and therefore it is convex and lower semi-continuous.

Let \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) be a tracial \(\mathrm {W}^*\)-embedding and let \(E: \mathcal {B}\rightarrow \mathcal {A}\) be the corresponding trace-preserving conditional expectation. Let \(X \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\). Let \({\tilde{\iota }}: \mathcal {A}\rightarrow {\tilde{\mathcal {B}}}\) be another inclusion. Let \(\mathcal {M}\in {\mathbb {W}}\) be isomorphic to the amalgamated free product of \(\mathcal {B}\) and \({\tilde{\mathcal {B}}}\) over the subalgebra \(\mathcal {A}\) (or more precisely, over the images of \(\iota (\mathcal {A}) \subseteq \mathcal {B}\) and \({\tilde{\iota }}(\mathcal {A}) \subseteq {\tilde{\mathcal {B}}}\)) as in Proposition 2.22. Let \(\rho : \mathcal {B}\rightarrow \mathcal {M}\) and \({\tilde{\rho }}: {\tilde{\mathcal {B}}} \rightarrow \mathcal {M}\) be the inclusions. Then for \(Y \in L^2({\tilde{\mathcal {B}}})\),

$$\begin{aligned} \mathcal {L}f^{\mathcal {B}}(X)&\ge \langle \rho (X), {\tilde{\rho }}(Y)\rangle _{L^2(\mathcal {M})_{{{\,\mathrm{sa}\,}}}^m}- f^{\mathcal {M}}({\tilde{\rho }}(Y)) \\&= \langle {\tilde{\iota }} \circ E(X), Y\rangle _{L^2({\tilde{\mathcal {B}}})_{{{\,\mathrm{sa}\,}}}^m} - f^{{\tilde{\mathcal {B}}}}(Y), \end{aligned}$$

where we have used free independence with amalgamation to compute the inner product, and we have used the fact that f is a tracial \(\mathrm {W}^*\)-function. Because \({\tilde{\iota }}: \mathcal {A}\rightarrow {\tilde{\mathcal {B}}}\) and Y were arbitrary, we have

$$\begin{aligned} \mathcal {L}f^{\mathcal {B}}(X) \ge \mathcal {L} f^{\mathcal {A}}(E(X)), \end{aligned}$$

which establishes condition (2) in the definition of E-convexity.

It only remains to show that \(\mathcal {L}f\) is a tracial \(\mathrm {W}^*\)-function. Suppose \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) is a tracial \(\mathrm {W}^*\)-inclusion. If \(\iota ': \mathcal {B}\rightarrow \mathcal {C}\) is a tracial \(\mathrm {W}^*\)-inclusion, then so is \(\iota ' \circ \iota \), which implies that

$$\begin{aligned} \mathcal {L} f^{\mathcal {A}}(X) \ge \sup _{\begin{array}{c} \iota ': \mathcal {B}\rightarrow \mathcal {C}\\ Y \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \langle \iota ' \circ \iota (X),Y\rangle _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {C}}(Y) = \mathcal {L} f^{\mathcal {B}}(\iota (X)). \end{aligned}$$

If \(E: \mathcal {B}\rightarrow \mathcal {A}\) is the conditional expectation corresponding to \(\iota \), then by the preceding argument

$$\begin{aligned} \mathcal {L} f^{\mathcal {A}}(X) = \mathcal {L} f^{\mathcal {A}}(E \circ \iota (X)) \le \mathcal {L} f^{\mathcal {B}}(\iota (X)). \end{aligned}$$

Thus, \(\mathcal {L} f^{\mathcal {A}} = \mathcal {L} f^{\mathcal {B}} \circ \iota \), so \(\mathcal {L} f\) is a tracial \(\mathrm {W}^*\)-function.

(2) This is immediate from the definition and the properties of suprema and infima.

(3) By definition of \(\mathcal {L}f\), for every \(\mathcal {A}\in {\mathbb {W}}\) and X, \(Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), we have

$$\begin{aligned} \mathcal {L}f^{\mathcal {A}}(X) \ge \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {A}}(Y), \end{aligned}$$

hence

$$\begin{aligned} \mathcal {L}f^{\mathcal {A}}(X) + f^{\mathcal {A}}(Y) \ge \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Hence, given an inclusion \(\iota \) of \(\mathcal {A}\) into \(\mathcal {B}\) and \(Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(X \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\), we have

$$\begin{aligned} f^{\mathcal {A}}(Y) = f^{\mathcal {B}}(\iota (Y)) \ge \langle \iota (Y),X\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - \mathcal {L}f^{\mathcal {B}}(X). \end{aligned}$$

Taking the supremum on the right-hand side, \(f^{\mathcal {A}}(Y) \ge \mathcal {L}^2 f^{\mathcal {A}}(Y)\). Thus, \(f \ge \mathcal {L}^2 f\).

Now suppose that f is E-convex, and we must show that \(f = \mathcal {L}^2 f\). If f is identically \(-\infty \) or \(+\infty \), there is nothing to prove. Otherwise, fix \(\mathcal {A}\). Because \(f^{\mathcal {A}}\) is convex and lower semi-continuous, classical results about convex functions tell us that \(f^{\mathcal {A}}\) can be expressed as the supremum of a family of affine functions \((g_\alpha )_{\alpha \in I}\), where

$$\begin{aligned} g_\alpha (X) = \langle X,Z_\alpha \rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + c_\alpha \end{aligned}$$

with \(Z_\alpha \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(c_\alpha \in {\mathbb {R}}\). Let \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) be an inclusion and let \(E: \mathcal {B}\rightarrow \mathcal {A}\) be the corresponding conditional expectation. If \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\), then by the E-convexity property

$$\begin{aligned} \langle \iota (Z_\alpha ),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(Y)\le & {} \langle \iota (Z_\alpha ),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {A}}(E(Y)) \\\le & {} \langle Z_\alpha ,E(Y)\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - \langle Z_\alpha ,E(Y)\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - c_\alpha = -c_\alpha . \end{aligned}$$

Therefore, \(\mathcal {L} f^{\mathcal {A}}(Z_\alpha ) \le -c_\alpha \), which implies that

$$\begin{aligned} \mathcal {L}^2 f^{\mathcal {A}}(X) \ge \langle X,Z_\alpha \rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - \mathcal {L} f^{\mathcal {A}}(Z_\alpha ) \ge \langle X,Z_\alpha \rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + c_\alpha = g_\alpha (X). \end{aligned}$$

Therefore,

$$\begin{aligned} \mathcal {L}^2 f^{\mathcal {A}}(X) \ge \sup _{\alpha \in I} g_\alpha (X) = f^{\mathcal {A}}(X). \end{aligned}$$

So \(\mathcal {L}^2 f = f\) as desired. Conversely, if \(f = \mathcal {L}^2 f\), then f is E-convex because it is the Legendre transform of some function.

(4) We already showed that \(\mathcal {L}^2 f\) is E-convex and \(\mathcal {L}^2 f \le f\). Moreover, suppose g is E-convex and \(g \le f\). Then \(\mathcal {L} g \ge \mathcal {L} f\) and hence \(\mathcal {L}^2 g \le \mathcal {L}^2 f\) by (2). Meanwhile, \(g = \mathcal {L}^2 g\) by (3), and therefore \(g = \mathcal {L}^2 g \le \mathcal {L}^2 f\). \(\square \)

Remark 3.18

It follows from E-convexity that for every \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\), we have

$$\begin{aligned} \langle \iota (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(Y) \le \langle X,E[Y]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {A}}(E[Y]), \end{aligned}$$

where \(E: \mathcal {B}\rightarrow \mathcal {A}\) is the conditional expectation corresponding to \(\iota \). Therefore,

$$\begin{aligned} \mathcal {L} f^{\mathcal {A}}(X) = \sup _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left( \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {A}}(Y) \right) . \end{aligned}$$

Hence, if f is E-convex, there is no need to consider a larger \(\mathrm {W}^*\)-algebra when computing the Legendre transform, and moreover \(\mathcal {L} f^{\mathcal {A}}\) agrees with the classical Legendre transform of \(f^{\mathcal {A}}\) as a function on the real Hilbert space \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\).

Remark 3.19

In fact, the argument of Proposition 3.17 can be used to prove slightly stronger statements:

  • If \((f^{\mathcal {A}})_{\mathcal {A}\in {\mathbb {W}}}\) is any collection of functions, then \(\mathcal {L}f\) is convex, and for any tracial \(\mathrm {W}^*\)-embedding \(\iota : \mathcal {A}\rightarrow \mathcal {B}\), it satisfies \(\mathcal {L}f^{\mathcal {A}} \ge \mathcal {L}f^{\mathcal {B}} \circ \iota \).

  • If \((f^{\mathcal {A}})_{\mathcal {A}\in {\mathbb {W}}}\) satisfies \(f^{\mathcal {A}} \ge f^{\mathcal {B}} \circ \iota \) for every embeddings \(\iota : \mathcal {A}\rightarrow \mathcal {B}\), then \(\mathcal {L}f\) is E-convex.

  • In particular, if \((f^{\mathcal {A}})_{\mathcal {A}\in {\mathbb {W}}}\) is any collection of functions, then \(\mathcal {L}^2 f\) is E-convex.

The next lemma states the relationship between Legendre transforms and subgradients, which is exactly analogous to the behavior of classical Legendre transforms. We will use this lemma many times.

Lemma 3.20

Let f be an E-convex \(\mathrm {W}^*\)-function, let \(\mathcal {A}\in {\mathbb {W}}\) and \(X, Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Then the following are equivalent:

  1. (1)

    \(f^{\mathcal {A}}(X) + \mathcal {L}f^{\mathcal {A}}(Y) = \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\).

  2. (2)

    \(Y \in \eth f^{\mathcal {A}}(X)\).

  3. (3)

    \(X \in \eth \mathcal {L}f^{\mathcal {A}}(Y)\).

Proof

(1) \(\implies \) (2). Suppose that \(f^{\mathcal {A}}(X) + \mathcal {L}f^{\mathcal {A}}(Y) = \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\). By definition of \(\mathcal {L}f\), we have for all \(X' \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) that

$$\begin{aligned} \langle X',Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {A}}(X') \le \mathcal {L}f^{\mathcal {A}}(Y) = \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {A}}(X), \end{aligned}$$

hence, \(f^{\mathcal {A}}(X') \ge f^{\mathcal {A}}(X) + \langle X'-X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\), so \(Y \in \eth f^{\mathcal {A}}(X)\).

(2) \(\implies \) (1). Suppose \(Y \in \eth f^{\mathcal {A}}(X)\). Let \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) be a tracial \(\mathrm {W}^*\)-inclusion and \(E: \mathcal {B}\rightarrow \mathcal {A}\) the corresponding conditional expectation. Since \(E[\iota (X)] = X\), Lemma 3.13 (1) tells us that

$$\begin{aligned} \iota (\eth f^{\mathcal {A}}(X)) = \eth f^{\iota (\mathcal {A})}(\iota (X)) = L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \cap \eth f^{\mathcal {B}}(\iota (X)), \end{aligned}$$

so in particular, \(\iota (Y) \in \eth f^{\mathcal {B}}(\iota (X))\). Hence, for any \(Z \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\), we have

$$\begin{aligned} \langle Z,\iota (Y)\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(Z)\le & {} \langle \iota (X),\iota (Y)\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(\iota (X)) \\= & {} \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {A}}(X). \end{aligned}$$

Since \(\iota \), \(\mathcal {B}\), and Z were arbitrary, the supremum defining \(\mathcal {L}f^{\mathcal {A}}(Y)\) is attained at the point X, so that \(f^{\mathcal {A}}(X) + \mathcal {L}f^{\mathcal {A}}(Y) = \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\).

Therefore, we have proved that (1) \(\iff \) (2). Because f is E-convex, we have \(\mathcal {L}(\mathcal {L}f) = f\). Therefore, (1) \(\iff \) (3) follows from (1) \(\iff \) (2) by switching the roles of f and \(\mathcal {L}f\) and the roles of X and Y. \(\square \)

3.4 A non-commutative Monge–Kantorovich duality

Definition 3.21

If f is a tracial \(\mathrm {W}^*\)-function and \(\mu \in \Sigma _m\), then we define \(\mu (f) = f^{\mathcal {A}}(X)\), where \(\mathcal {A}\in {\mathbb {W}}\) is (isomorphic to) the GNS representation of \(\mu \) and X is the canonical generating m-tuple.

If f is a tracial \(\mathrm {W}^*\)-function, for every \(\mathcal {A}\) and every \(X \in \mathcal {A}_{{{\,\mathrm{sa}\,}}}^m\) with \(\lambda _{X} = \mu \), we have \(\mu (f) = f^{\mathcal {A}}(X)\). This follows by the definition of tracial \(\mathrm {W}^*\)-function and the fact that \(\mathrm {W}^*(X)\) is isomorphic to the GNS representation of \(\mu \).

Definition 3.22

Let us call a pair (fg) of tracial \(\mathrm {W}^*\)-functions admissible if they take values in \((-\infty ,\infty ]\) and for every \(\mathcal {A}\in {\mathbb {W}}\),

$$\begin{aligned} f^{\mathcal {A}}(X) + g^{\mathcal {A}}(Y) \ge \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \text { for all } X, Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m. \end{aligned}$$

Proposition 3.23

Let \(\mu \), \(\nu \in \Sigma _m\). The following quantities are equal:

  1. (1)

    \(C(\mu ,\nu )\).

  2. (2)

    \(\inf \{\mu (f) + \nu (g): (f,g) \text { admissible}\}\).

  3. (3)

    \(\inf \{\mu (f) + \nu (\mathcal {L}f): f \text { a tracial } \mathrm {W}^*\text {-function not identically } \infty \}\).

  4. (4)

    \(\inf \{\mu (f) + \nu (g): (f,g) \text { admissible and } E\text {-convex} \}\).

  5. (5)

    \(\inf \{\mu (f) + \nu (\mathcal {L}f): f ~E\text {-convex not identically } \infty \}\).

Here all the functions under consideration take values in \((-\infty ,\infty ]\).

Proof

(1) \(\le \) (2) Let \((\mathcal {A},X,Y)\) be a coupling of \(\mu \) and \(\nu \), and let (fg) be an admissible pair. Then

$$\begin{aligned} \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le f^{\mathcal {A}}(X) + g^{\mathcal {A}}(Y) = \mu (f) + \nu (g). \end{aligned}$$

Taking the supremum over couplings on the left-hand side and the infimum over admissible pairs (fg) on the right-hand side, we have (1) \(\le \) (2).

(2) \(\le \) (3). It is clear from the definition of \(\mathcal {L}f\) that \(f^{\mathcal {A}}(X) + \mathcal {L}f^{\mathcal {A}}(Y) \ge \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\). Therefore, \((f,\mathcal {L}f)\) is always an admissible pair, and hence (3) is the infimum over a smaller set than (2).

(3) \(\le \) (1). Define

$$\begin{aligned} f^{\mathcal {A}}(X) = {\left\{ \begin{array}{ll} 0, &{} \text {if } X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \text { and } \lambda _{X} = \mu , \\ +\infty , &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

Note that f is a tracial \(\mathrm {W}^*\)-function. Let \(\mathcal {A}\) be the GNS-representation of \(\nu \) with the canonical generators Y. Then \(\mathcal {L}f^{\mathcal {A}}(Y)\) is the supremum of \(\langle \iota (Y),X\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}\) where \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) is an inclusion and \(X \in L^\infty (\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) satisfies \(\lambda _{X} = \mu \). In particular for a non-commutative law \(\nu \), letting \((\mathcal {A},Y)\) be the GNS realization of \(\nu \), we have \(\nu (\mathcal {L}f) = \mathcal {L}f^{\mathcal {A}}(Y) = C(\mu ,\nu )\). Moreover, \(\mu (f) = 0\) and hence \(C(\mu ,\nu ) = \mu (f) + \nu (\mathcal {L}f)\).

(2) \(\le \) (4). This is immediate since (4) is the infimum over a smaller set.

(4) \(\le \) (5). Suppose that f is E-convex. Then \((f,\mathcal {L}f)\) is admissible as noted above. Also, \(\mathcal {L}f\) is always E-convex, so (5) is the infimum over a smaller set than (4).

(5) \(\le \) (3). Let f be a tracial \(\mathrm {W}^*\)-function. Then \(\mathcal {L}^2 f \le f\) and \((\mathcal {L}^2 f, \mathcal {L} f)\) is an E-convex admissible pair. Therefore,

$$\begin{aligned} \mu (f) + \nu (\mathcal {L}f) \ge \mu (\mathcal {L}^2f) + \nu (\mathcal {L}f). \end{aligned}$$

Of course, since \(\mathcal {L}(\mathcal {L}^2f) = \mathcal {L}^2 (\mathcal {L}f) = \mathcal {L}f\), the term on the right-hand side participates in the infimum (5). Since the f on the left-hand side was chosen arbitrarily, (3) \(\ge \) (5). \(\square \)

Proposition 3.24

Let \((\mathcal {A},X,Y)\) be a coupling of \(\mu , \nu \in \Sigma _m\). The following are equivalent:

  1. (1)

    The coupling is optimal.

  2. (2)

    There exists an admissible pair (fg) such that \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = f^{\mathcal {A}}(X) + g^{\mathcal {A}}(Y)\).

  3. (3)

    There exists a tracial \(\mathrm {W}^*\)-function f such that \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = f^{\mathcal {A}}(X) + \mathcal {L}f^{\mathcal {A}}(Y)\).

  4. (4)

    There exists an admissible, E-convex pair (fg) such that \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = f^{\mathcal {A}}(X) + g^{\mathcal {A}}(Y)\).

  5. (5)

    There exists an E-convex tracial \(\mathrm {W}^*\)-function f such that \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = f^{\mathcal {A}}(X) + \mathcal {L}f^{\mathcal {A}}(Y)\).

  6. (6)

    There exists an E-convex tracial \(\mathrm {W}^*\)-function f such that Y is a subgradient vector to \(f^{\mathcal {A}}\) at the point X.

Proof

It is immediate from the previous proposition that each of the conditions (2) – (5) implies (1).

For the converse implication, assume the coupling is optimal. Let

$$\begin{aligned} f^{\mathcal {B}}(Z) = {\left\{ \begin{array}{ll} 0, &{} \text {if } Z \in L^\infty (\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \text { and } \lambda _{Z} = \mu , \\ +\infty , &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

As in the proof of the previous proposition, we have \(\mu (f) + \nu (\mathcal {L}f) = C(\mu ,\nu )\), or equivalently \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = f^{\mathcal {A}}(X) + \mathcal {L}f^{\mathcal {A}}(Y)\). We also have \(C(\mu ,\nu ) = \mu (\mathcal {L}^2f) + \nu (\mathcal {L}f) \le \mu (f) + \nu (\mathcal {L}f) = C(\mu ,\nu )\). Thus, the pair \((\mathcal {L}^2f, \mathcal {L}f)\) fulfills all of the criteria of (2) – (5).

The equivalence of (5) and (6) follows from Lemma 3.20. \(\square \)

3.5 A decomposition result for optimal couplings

As an initial application of duality, we present the following result that expresses an optimal coupling (XY) in terms of another optimal coupling \((X',Y')\) with \(\mathcal {B}= \mathrm {W}^*(X') = \mathrm {W}^*(Y')\).

Theorem 3.25

Let \(\mu \), \(\nu \in \Sigma _{m,R}\), and let \((\mathcal {A},X,Y)\) be an optimal coupling of \(\mu \) and \(\nu \). Then there exists a \(\mathrm {W}^*\)-subalgebra \(\mathcal {B}\subseteq \mathcal {A}\) with the following properties, letting \(X' = E_{\mathcal {B}}[X]\) and \(Y' = E_{\mathcal {B}}[Y]\):

  1. (1)

    \(\mathcal {B}= \mathrm {W}^*(X') = \mathrm {W}^*(Y')\).

  2. (2)

    \(X - X'\), \(X' - Y'\), and \(Y' - Y\) are orthogonal.

  3. (3)

    \((\mathcal {A},X',Y')\) is an optimal coupling of \(\lambda _{X'}\) and \(\lambda _{Y'}\). Similarly, \((\mathcal {A},X,Y')\) and \((\mathcal {A},X',Y)\) are optimal couplings of the respective laws.

We may choose \(\mathcal {B}\) to be contained in \(\mathrm {W}^*(X)\) (or symmetrically, we may choose it to be contained in \(\mathrm {W}^*(Y)\)).

Furthermore, there exists some optimal coupling \((\mathcal {A},X,Y)\) and a \(\mathcal {B}\) satisfying (1) – (3) with respect to this coupling such that \(\mathrm {W}^*(X,\mathcal {B})\) and \(\mathrm {W}^*(Y,\mathcal {B})\) are freely independent with amalgamation over \(\mathcal {B}\).

Proof

Let

$$\begin{aligned} \mathscr {B} = \{\mathrm {W}^*\text {-subalgebras } \mathcal {B}\subseteq \mathcal {A}: \langle E_{\mathcal {B}}[X], E_{\mathcal {B}}[Y]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \}, \end{aligned}$$

which is partially ordered by inclusion. We claim that \(\mathscr {B}\) has a minimal element, and we will prove this by a transfinite reverse martingale argument. By Zorn’s lemma, it suffices to show that every chain in \(\mathscr {B}\) has a lower bound. Consider a chain \(\mathscr {C} \subseteq \mathscr {B}\), and let \(\mathcal {C}= \bigcap _{\mathcal {B}\in \mathscr {C}} \mathcal {B}\). We claim that \(\lim _{\mathcal {B}\in \mathscr {C}} E_{\mathcal {B}}[X] = E_{\mathcal {C}}[X]\) in \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Let

$$\begin{aligned} \delta = \inf _{\mathcal {B}\in \mathscr {C}} \Vert E_{\mathcal {B}}[X]\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2. \end{aligned}$$

Given \(\epsilon > 0\), there exists \(\mathcal {B}_0 \in \mathscr {C}\) such that \(\Vert E_{\mathcal {B}_0}[X]\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 < \delta ^2 + \epsilon ^2\). Then for all \(\mathcal {B}\in \mathscr {C}\) with \(\mathcal {B}\subseteq \mathcal {B}_0\), we have

$$\begin{aligned} \Vert E_{\mathcal {B}}[X] - E_{\mathcal {B}_0}[X]\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2= & {} \Vert E_{\mathcal {B}_0}[X]\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \Vert E_{\mathcal {B}}[X]\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \\\le & {} \delta ^2 + \epsilon ^2 - \delta ^2 = \epsilon ^2. \end{aligned}$$

This implies that \(Z = \lim _{\mathcal {B}\in \mathscr {C}} E_{\mathcal {B}}[X]\) exists in \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Moreover, \(\Vert Z_j\Vert _{L^\infty (\mathcal {A})} \le \Vert X_j\Vert _{L^\infty (\mathcal {A})}\). Clearly \(Z_j \in \bigcap _{\mathcal {B}\in \mathscr {C}} \mathcal {B}= \mathcal {C}\), and \(\langle Z,W\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X,W\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) for all \(W \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m\). Thus, \(\lim _{\mathcal {B}\in \mathscr {C}} E_{\mathcal {B}}[X] = E_{\mathcal {C}}[X]\) in \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). By the same token \(\lim _{\mathcal {B}\in \mathscr {C}} E_{\mathcal {B}}[Y] = E_{\mathcal {C}}[Y]\) in \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Therefore,

$$\begin{aligned} \langle E_{\mathcal {C}}[X], E_{\mathcal {C}}[Y]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \lim _{\mathcal {B}\in \mathscr {C}} \langle E_{\mathcal {B}}[X], E_{\mathcal {B}}[Y]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Therefore, \(\mathcal {C}\in \mathscr {B}\) as desired.

So by Zorn’s lemma, \(\mathscr {B}\) has some minimal element, which we will call \(\mathcal {B}\). Let \(X' = E_{\mathcal {B}}[X]\) and \(Y' = E_{\mathcal {B}}[Y]\). Now \(\mathrm {W}^*(X') \subseteq \mathcal {B}\) and we have

$$\begin{aligned} \langle X', E_{\mathrm {W}^*(X')}[Y']\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X',Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

By minimality of \(\mathcal {B}\), we have \(\mathcal {B}= \mathrm {W}^*(X')\), and similarly, \(\mathcal {B}= \mathrm {W}^*(Y')\). Hence, (1) holds.

To show that \(\mathcal {B}\) can be chosen inside \(\mathrm {W}^*(X)\), note that

$$\begin{aligned} \langle E_{\mathrm {W}^*(X)}[X], E_{\mathrm {W}^*(X)}[Y]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X, E_{\mathrm {W}^*(X)}[Y]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Thus, we can apply the same argument with \(\mathscr {B}\) replaced by the elements of \(\mathscr {B}\) contained inside \(\mathrm {W}^*(X)\).

To prove (2), since \(X - X' = X - E_{\mathcal {B}}[X]\) is orthogonal to \(\mathcal {B}\), it is immediate that \(X - X'\) and \(X' - Y'\) are orthogonal. Similarly, \(Y' - Y\) and \(X' - Y'\) are orthogonal. Finally, to show that \(X - X'\) and \(Y' - Y\) are orthogonal, note that

$$\begin{aligned} \langle X - X', Y' - Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}= & {} \langle X, Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + \langle X',Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \nonumber \\&- \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - \langle X',Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$
(3.1)

Observe that

$$\begin{aligned} \langle X, Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X,E_{\mathcal {B}}[Y]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle E_{\mathcal {B}}[X],E_{\mathcal {B}}[Y]\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X',Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Similarly, \(\langle X',Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X',Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\). Moreover, \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X',Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) by our choice of \(\mathcal {B}\). Thus, all the terms in (3.1) cancel, and \(X - X'\) and \(Y' - Y\) are orthogonal.

To prove (3), by Proposition 3.24, there exists an admissible pair of E-convex \(\mathrm {W}^*\)-functions f and g such that \(f^{\mathcal {A}}(X) + g^{\mathcal {A}}(Y) = \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\). By construction of \(\mathcal {B}\) and by E-convexity,

$$\begin{aligned} f^{\mathcal {A}}(X') + g^{\mathcal {A}}(Y')&\ge \langle X',Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= f^{\mathcal {A}}(X) + g^{\mathcal {A}}(Y) \\&\ge f^{\mathcal {A}}(X') + g^{\mathcal {A}}(Y'). \end{aligned}$$

This implies that \((X',Y')\) is an optimal coupling. By similar reasoning, since \(\langle X,Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = \langle X',Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) and \(f^{\mathcal {A}}(X') \le f^{\mathcal {A}}(X)\), we see that \((X',Y)\) is an optimal coupling, and symmetrically \((X,Y')\) is an optimal coupling.

Let \(\mathcal {A}_1\) be a copy of \(\mathrm {W}^*(X,\mathcal {B})\) and let \(\mathcal {A}_2\) be a copy of \(\mathrm {W}^*(Y,\mathcal {B})\). Let \({\tilde{\mathcal {A}}} = \mathcal {A}_1 *_{\mathcal {B}} \mathcal {A}_2\) be the amalgamated free product (with its canonical trace \({\tilde{\tau }}\)). Let \({\tilde{X}}\), \({\tilde{X}}'\), \({\tilde{Y}}\), and \({\tilde{Y}}'\) be the images of the original variables in \({\tilde{\mathcal {A}}}\). Then using free independence

$$\begin{aligned} \left\Vert {\tilde{X}} - {\tilde{Y}} \right\Vert _{L^2({\tilde{\mathcal {A}}})_{{{\,\mathrm {sa}\,}}}^m}^2&= \left\Vert {\tilde{X}} - {\tilde{X}}'\right\Vert _{L^2({\tilde{\mathcal {A}}})_{{{\,\mathrm {sa}\,}}}^m}^2 + \left\Vert {\tilde{X}}' - {\tilde{Y}}' \right\Vert _{L^2({\tilde{\mathcal {A}}})_{{{\,\mathrm {sa}\,}}}^m}^2 + \left\Vert {\tilde{Y}}' - {\tilde{Y}} \right\Vert _{L^2({\tilde{\mathcal {A}}})_{{{\,\mathrm {sa}\,}}}^m}^2 \\ {}&= \left\Vert X - X'\right\Vert _{L^2(\mathcal {A})_{{{\,\mathrm {sa}\,}}}^m}^2 + \left\Vert X' - Y' \right\Vert _{L^2(\mathcal {A})_{{{\,\mathrm {sa}\,}}}^m}^2 + \left\Vert Y' - Y \right\Vert _{L^2(\mathcal {A})_{{{\,\mathrm {sa}\,}}}^m}^2 \\ {}&= \Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm {sa}\,}}}^m}^2. \end{aligned}$$

Therefore, \(({\tilde{X}}, {\tilde{Y}})\) is also an optimal coupling of \(\mu \) and \(\nu \). The \(\mathrm {W}^*\)-subalgebra \(\mathcal {B}\subseteq {\tilde{\mathcal {A}}}\) also satisfies

$$\begin{aligned} \langle E_{\mathcal {B}}[{\tilde{X}}], E_{\mathcal {B}}[{\tilde{Y}}]\rangle _{{\tilde{\tau }}} = \langle {\tilde{X}}, {\tilde{Y}}\rangle _{{\tilde{\tau }}}, \end{aligned}$$

and satisfies (1). Thus, the same arguments as above show that \(\mathcal {B}\) in \({\tilde{\mathcal {A}}}\) satisfies (2) and (3). \(\square \)

4 The Displacement Interpolation

If \((\mathcal {A},X,Y)\) is an \(L^2\)-optimal coupling of \(\mu \), \(\nu \in \Sigma _m\), then one can consider the displacement interpolation \(X_t = (1 - t)X + tY\) for \(t \in [0,1]\). As shown in Proposition A.22 the corresponding family of laws defines a geodesic in \((\Sigma _m, d_W^{(2)})\). In this section, we study how the displacement interpolation interacts with non-commutative Monge–Kantorovich duality and use this to prove Theorem 1.5.

Motivated by analogous arguments in classical optimal transport theory, we approach the proof as follows (see §4.3 for more detail). By Proposition 3.24, there exists an E-convex function f such that \(\langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = f^{\mathcal {A}}(X) + \mathcal {L} f^{\mathcal {A}}(Y)\), or equivalently \(Y \in \eth f^{\mathcal {A}}(X)\). Letting \(q_t\) be the \(\mathrm {W}^*\)-function \(q_t^{\mathcal {A}}(X) = (1/2t) \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\), we observe that \(X_t \in \eth f_t^{\mathcal {A}}(X)\) where \(f_t = (1 - t) q_1 + t f\). Hence, \(X \in \eth (\mathcal {L} f_t)^{\mathcal {A}}(X_t)\). In order to show that \(X \in L^2(\mathrm {W}^*(X_t))_{{{\,\mathrm{sa}\,}}}^m\), we want to understand the regularity properties of \(\mathcal {L} f_t\).

It is well-known that for a convex function f on a Hilbert space H, the Legendre transform of \(f(x) + (t/2) \Vert x\Vert ^2\) is given by the inf-convolution \(g_t = \inf _{y \in H} [f^*(y) + (1/2t)\Vert x - y\Vert ^2]\), where \(f^*\) is the Legendre transform of f. Furthermore, \(g_t\) has a Lipschitz gradient for every \(t > 0\), and it satisfies the Hamilton–Jacobi equation

$$\begin{aligned} \frac{d}{dt} g_t = - \frac{1}{2} \Vert \nabla g_t\Vert ^2. \end{aligned}$$

This can be checked by hand, or deduced for instance from [6, Sect. 2, Theorem 1]; also relevant to Hamilton–Jacobi equations on Hilbert space are [7, 8, 18, 19, 47, 49].

In this section, we adapt the theory of inf-convolutions to the setting tracial \(\mathrm {W}^*\)-functions. In Sect. 4.1, we define inf-convolutions of \(\mathrm {W}^*\)-functions and prove their basic properties. In Sect. 4.2, we describe how inf-convolutions interact with E-convexity and semi-concavity. In Sect. 4.3, we conclude the proof of Theorem 1.5.

We emphasize that the novelty in our work is not in the form of the Hamilton–Jacobi equation but rather in the fact that we study variables from infinite-dimensional non-commutative algebras and want the function to be defined consistently with respect to inclusions of these algebras (that is, to be a tracial \(\mathrm {W}^*\)-function). This means for instance that if f and g are tracial \(\mathrm {W}^*\)-functions and \(f \square g\) is their inf-convolution as defined below, then \((f \square g)^{\mathcal {A}}\) need not agree with the inf-convolution of \(f^{\mathcal {A}}\) and \(g^{\mathcal {A}}\) as functions on the Hilbert space \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) (Remark 4.5); however, they do agree if f and g are E-convex (Lemma 4.6). Hence, a notion of viscosity solutions compatible with our theory of inf-convolutions will thus have to take into account the inclusions of one tracial \(\mathrm {W}^*\)-algebra into another.

4.1 Inf-convolutions

We begin with the definition and basic properties of the inf-convolution.

Definition 4.1

Let fg be two \(\mathrm {W}^*\)-functions with values in \([-\infty ,\infty ]\). We define the inf-convolution \(f \square g\) by

$$\begin{aligned} (f \square g)^{\mathcal {A}}(X) = \inf \left\{ f^{\mathcal {B}}(\iota (X) - Y) + g^{\mathcal {B}}(Y) | \iota : \mathcal {A}\rightarrow \mathcal {B}\text { embedding, } Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \right\} . \end{aligned}$$

Lemma 4.2

The object \(f \square g\) is a \(\mathrm {W}^*\)-function.

Proof

Let \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) be an inclusion, and we first show that

$$\begin{aligned} (f \square g)^{\mathcal {A}}(X) \le (f \square g)^{\mathcal {B}}(\iota (X)). \end{aligned}$$
(4.1)

If \(\iota ': \mathcal {B}\rightarrow \mathcal {C}\) is another inclusion and \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) as in the definition of \((f \square g)^{\mathcal {B}}\), then of course \(\iota ' \circ \iota \) is an inclusion and which can be used in the definition of \((f \square g)^{\mathcal {A}}\). This shows (4.1).

Conversely, suppose that \(\iota ': \mathcal {A}\rightarrow \mathcal {C}\) is an inclusion and \(Y \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m\) as in the definition of \((f \square g)^{\mathcal {A}}\). Then let \({\tilde{\mathcal {C}}}\) be the free product of \(\mathcal {B}\) and \(\mathcal {C}\) with amalgamation over the images of \(\mathcal {A}\) in the respective algebras. Then the image of Y in \({\tilde{\mathcal {C}}}\) participates in the infimum defining \((f \square g)^{\mathcal {B}}(\iota (X))\) and hence \((f \square g)^{\mathcal {B}}(\iota (X)) \le (f \square g)^{\mathcal {A}}(X)\). \(\square \)

Lemma 4.3

The inf-convolution is commutative and associative, that is, if f, g, h are \(\mathrm {W}^*\)-functions, then \(f \square g = g \square f\) and \((f \square g) \square h = f \square (g \square h)\).

Proof

We have

$$\begin{aligned} (f \square g)^{\mathcal {A}}(X) = \inf _{\iota : \mathcal {A}\rightarrow \mathcal {B}} \inf _{Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} [f^{\mathcal {B}}(\iota (X) - Y) + g^{\mathcal {B}}(Y)]. \end{aligned}$$

We substitute \(Z = \iota (X) - Y\) and thus obtain

$$\begin{aligned} \inf _{\iota : \mathcal {A}\rightarrow \mathcal {B}} \inf _{Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} [f^{\mathcal {B}}(Z) + g^{\mathcal {B}}(\iota (X) - Z)] = (g \square f)^{\mathcal {A}}(X). \end{aligned}$$

For associativity,

$$\begin{aligned} ((f \square g) \square h)^{\mathcal {A}}(X)&= \inf _{\begin{array}{c} \iota _1: \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( (f \square g)^{\mathcal {B}}(\iota _1(X) - Y) + h^{\mathcal {B}}(Y) \right) \nonumber \\&= \inf _{\begin{array}{c} \iota _1: \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \inf _{\begin{array}{c} \iota _2: \mathcal {B}\rightarrow \mathcal {C}\\ Z \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( f^{\mathcal {C}}(\iota _2(\iota _1(X)) \right. \nonumber \\&\qquad \left. - \iota _2(Y) - Z) + g^{\mathcal {C}}(Z) + h^{\mathcal {C}}(\iota _2(Y)) \right) . \end{aligned}$$
(4.2)

We claim that is equal to

$$\begin{aligned} \inf _{\begin{array}{c} \iota : \mathcal {A}\rightarrow \mathcal {B}\\ Y, Z \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( f^{\mathcal {C}}(\iota (X) - Y - Z) + g^{\mathcal {B}}(Z) + h^{\mathcal {B}}(Y) \right) , \end{aligned}$$
(4.3)

or in other words, in our earlier expression we can without loss of generality impose the condition that \(\mathcal {C}= \mathcal {B}\) and \(\iota _2 = {{\,\mathrm{id}\,}}\). The reason is that if we allowed Z to come only from the smaller algebra \(\mathcal {C}\), then the infimum could only increase, hence by shrinking \(\mathcal {C}\) to \(\mathcal {B}\), (4.3) \(\ge \) (4.2). On the other hand, if in (4.2), we allowed Y to come from the larger algebra \(\mathcal {C}\) instead of \(\mathcal {B}\), then the infimum could only decrease, and hence by enlarging \(\mathcal {B}\) to \(\mathcal {C}\), we see that (4.2) \(\le \) (4.3). Now the expression (4.3) is symmetric in g and h, and hence

$$\begin{aligned} (f \square g) \square h = (f \square h) \square g. \end{aligned}$$

This relation, together with commutativity, implies the associativity relation since

$$\begin{aligned} (f \square g) \square h = (g \square f) \square h = (g \square h) \square f = f \square (g \square h). \end{aligned}$$

\(\square \)

The relationship between inf-convolution and Legendre transform is exactly what one would expect based on the classical case.

Lemma 4.4

Let f and g be \(\mathrm {W}^*\)-functions. Then

$$\begin{aligned} \mathcal {L}(f \square g) = \mathcal {L}f + \mathcal {L}g. \end{aligned}$$

Proof

Observe that

$$\begin{aligned} \mathcal {L}(f \square g)^{\mathcal {A}}(X)&= \sup _{\begin{array}{c} \iota _1: \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( \langle \iota _1(X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - (f \square g)^{\mathcal {B}}(Y) \right) \\&= \!\!\sup _{\begin{array}{c} \iota _1: \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}}\!\! \left( \!\! \langle \iota _1(X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - \inf _{\begin{array}{c} \iota _2: \mathcal {B}\rightarrow \mathcal {C}\\ Z \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m \end{array} } \left( f^{\mathcal {C}}(\iota _2(Y) - Z) + g^{\mathcal {C}}(Z) \right) \!\!\right) , \end{aligned}$$

where we take the supremum over \(\mathcal {B}\) and \(\mathcal {C}\in {\mathbb {W}}\) and inclusions \(\iota _1: \mathcal {A}\rightarrow \mathcal {B}\) and \(\iota _2: \mathcal {B}\rightarrow \mathcal {C}\) and \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) and \(Z \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m\). This can be rewritten as

$$\begin{aligned} \sup _{\begin{array}{c} \iota _1: \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \sup _{\begin{array}{c} \iota _2: \mathcal {B}\rightarrow \mathcal {C}\\ Z \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m \end{array} } \left( \langle \iota _2(\iota _1(X)),\iota _2(Y)\rangle _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {C}}(\iota _2(Y) - Z) - g^{\mathcal {C}}(Z) \right) . \end{aligned}$$

We can assume without loss generality that \(\mathcal {B}= \mathcal {C}\) and \(\iota _2 = {{\,\mathrm{id}\,}}\). Indeed, allowing Y to range over the larger space \(L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m\) rather than \(L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) would only increase the supremum, but on the other hand, restricting Z to the smaller space \(L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) instead of \(L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m\) would only decrease the supremum. Thus, taking \(\mathcal {B}= \mathcal {C}\) and renaming \(\iota _1\) to \(\iota \), we obtain

$$\begin{aligned} \sup _{\iota : \mathcal {A}\rightarrow \mathcal {B}} \sup _{Y,Z \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \left( \langle \iota (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(Y - Z) - g^{\mathcal {B}}(Z) \right) . \end{aligned}$$

Substituting \(Z' = Y - Z\), we have

$$\begin{aligned}&\sup _{\iota : \mathcal {A}\rightarrow \mathcal {B}} \sup _{Z,Z' \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \left( \langle \iota (X),Z+Z'\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(Z') - g^{\mathcal {B}}(Z) \right) \nonumber \\&\quad = \sup _{\iota : \mathcal {A}\rightarrow \mathcal {B}} \sup _{Z,Z' \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \left( \langle \iota (X),Z'\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(Z') \right. \nonumber \\&\qquad \left. + \langle \iota (X),Z\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - g^{\mathcal {B}}(Z) \right) . \end{aligned}$$
(4.4)

We want to show that this is equal to

$$\begin{aligned}&\mathcal {L}f^{\mathcal {A}}(X) + \mathcal {L}g^{\mathcal {A}}(X) = \sup _{\iota _1: \mathcal {A}\rightarrow \mathcal {B}_1} \sup _{Z' \in L^2(\mathcal {B}_1)_{{{\,\mathrm{sa}\,}}}^m} \left( \langle \iota (X),Z'\rangle _{L^2(\mathcal {B}_1)_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}_1}(Z') \right) \nonumber \\&\quad + \sup _{\iota _2: \mathcal {A}\rightarrow \mathcal {B}_2} \sup _{Z \in L^2(\mathcal {B}_2)_{{{\,\mathrm{sa}\,}}}^m} \left( \langle \iota (X),Z\rangle _{L^2(\mathcal {B}_2)_{{{\,\mathrm{sa}\,}}}^m} - g^{\mathcal {B}_2}(Z) \right) . \end{aligned}$$
(4.5)

The only difference between the two expressions is that the latter allows \(\iota _1: \mathcal {A}\rightarrow \mathcal {B}_1\) and \(\iota _2: \mathcal {A}\rightarrow \mathcal {B}_2\) to be different, but the former takes them to be the same, and thus a priori (4.4) \(\le \) (4.5). However, in (4.5), for any given \(\mathcal {B}_1\), \(\mathcal {B}_2\), \(\iota _1\) and \(\iota _2\), let \(\mathcal {B}\) be the free product of \(\mathcal {B}_1\) and \(\mathcal {B}_2\) with amalgamation over the subalgebras \(\iota _1(\mathcal {A})\) in the first factor and \(\iota _2(\mathcal {A})\) in the second factor. Allowing \(Z'\) and Z to range over \(L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) rather than \(L^2(\mathcal {B}_1)_{{{\,\mathrm{sa}\,}}}^m\) and \(L^2(\mathcal {B}_2)_{{{\,\mathrm{sa}\,}}}^m\) respectively only increases the suprema over Z and \(Z'\), and hence (4.5) remains unchanged when we restrict to the case \(\iota _1 = \iota _2\), so it equals (4.4). \(\square \)

Remark 4.5

Suppose f and g are tracial \(\mathrm {W}^*\)-functions. Let \(f^{\mathcal {A}} \square g^{\mathcal {A}}\) denote the classical inf-convolution of \(f^{\mathcal {A}}\) and \(g^{\mathcal {A}}\) as functions on the Hilbert space \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Then \((f \square g)^{\mathcal {A}} \le f^{\mathcal {A}} \square g^{\mathcal {A}}\). However, the following example shows that two functions do not necessarily agree. Take \(m = 2\), and \(f^{\mathcal {A}}(X_1,X_2) = (1/2) \Vert (X_1,X_2)\Vert _{L^2(\mathcal {A})^2}^2\) and \(g^{\mathcal {A}}(X_1,X_2) = \tau _{\mathcal {A}}([X_1,X_2]^2)\). The formula for g is to be understood in the sense of affiliated operators (see Sect. A); since \(i[X_1,X_2]\) is a self-adjoint affiliated operator, \(-[X_1,X_2]^2\) is positive and hence \(\tau _{\mathcal {A}}([X_1,X_2]^2)\) is well-defined in \([-\infty ,0]\); see Theorem A.3 (4). Then \(g^{\mathbb {C}} = 0\) because \(\mathbb {C}\) is commutative, and hence also \(f^{\mathbb {C}} \square g^{\mathbb {C}} = 0\). On the other hand, let \(\iota : \mathbb {C}\rightarrow M_2(\mathbb {C})\) be the canonical inclusion, and let

$$\begin{aligned} Y_1 = \begin{pmatrix} 0 &{} 1 \\ 1 &{} 0 \end{pmatrix}, \qquad Y_2 = \begin{pmatrix} 0 &{} i \\ -i &{} 0 \end{pmatrix}, \qquad [Y_1,Y_2] = \begin{pmatrix} -2i &{} 0 \\ 0 &{} 2i \end{pmatrix}. \end{aligned}$$

Then for \(x_1, x_2, t \in {\mathbb {R}}\),

$$\begin{aligned} (f \square g)^{\mathbb {C}}(x_1,x_2)&\le \frac{1}{2} \Vert \iota (x_1) - tY_1\Vert _{L^2(M_2(\mathbb {C}))}^2 + \frac{1}{2} \Vert \iota (x_2)\nonumber \\&\qquad - tY_2\Vert _{L^2(M_2(\mathbb {C}))}^2 + t^4 \tau _{M_2(\mathbb {C})}([Y_1,Y_2]^2) \\&= \frac{1}{2} \Vert \iota (x_1) - tY_1\Vert _{L^2(M_2(\mathbb {C}))}^2 + \frac{1}{2} \Vert \iota (x_2) - tY_2\Vert _{L^2(M_2(\mathbb {C}))}^2 - 4t^4. \end{aligned}$$

The first two terms are quadratic in t, and thus, taking the infimum over \(t \in {\mathbb {R}}\), we see that \((f \square g)^{\mathbb {C}} = -\infty < f^{\mathbb {C}} \square g^{\mathbb {C}}\).

4.2 Inf-convolutions and regularity of E-convex functions

Lemma 4.6

If f and g are E-convex tracial \(\mathrm {W}^*\)-functions with \(f < \infty \), then \(f \square g\) is E-convex. Moreover, for any E-convex f and g, we have

$$\begin{aligned} (f \square g)^{\mathcal {A}}(X) = \inf _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left( f^{\mathcal {A}}(X - Y) + g^{\mathcal {A}}(Y) \right) . \end{aligned}$$
(4.6)

Proof

We prove the second claim first. Clearly,

$$\begin{aligned} (f \square g)^{\mathcal {A}}(X) \le \inf _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left( f^{\mathcal {A}}(X - Y) + g^{\mathcal {A}}(Y) \right) . \end{aligned}$$

For the opposite inequality, suppose that \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) is an embedding and \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\). Let \(E: \mathcal {B}\rightarrow \mathcal {A}\) be the conditional expectation. Then by E-convexity of f and g,

$$\begin{aligned} f^{\mathcal {A}}(X - E[Y]) + g^{\mathcal {A}}(E[Y]) \le f^{\mathcal {B}}(\iota (X) - Y) + g^{\mathcal {B}}(Y), \end{aligned}$$

and hence

$$\begin{aligned} \inf _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left( f^{\mathcal {A}}(X - Y) + g^{\mathcal {A}}(Y) \right) \le (f \square g)^{\mathcal {A}}(X). \end{aligned}$$

Now let us show that \(f \square g\) is E-convex when \(f < \infty \). If g is identically \(\infty \), then \(f \square g\) is identically \(\infty \), so there is nothing to prove. Suppose \(g^{\mathcal {B}}(Y)\) is finite for some \(\mathcal {B}\) and \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\). Then \((f \square g)^{\mathcal {A}}(X) < \infty \) everywhere because, letting \(\mathcal {C}\) be the free product of \(\mathcal {A}\) and \(\mathcal {B}\) and letting \(\iota _1: \mathcal {A}\rightarrow \mathcal {C}\) and \(\iota _2: \mathcal {B}\rightarrow \mathcal {C}\) be the corresponding inclusions,

$$\begin{aligned} (f \square g)^{\mathcal {A}}(X) \le f^{\mathcal {C}}(\iota _1(X) - \iota _2(Y)) + g^{\mathcal {C}}(\iota _2(Y)) < \infty . \end{aligned}$$

To prove convexity of \((f \square g)^{\mathcal {A}}\), let \(X_0\), \(X_1 \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), and let \(X_t = (1 - t)X_0 + tX_1\) for \(t \in (0,1)\). If \(Y_0\), \(Y_1 \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and if \(Y_t = (1 - t)Y_0 + tY_1\), then

$$\begin{aligned} (f \square g)^{\mathcal {A}}(X_t)&\le f^{\mathcal {A}}(X_t - Y_t) + g^{\mathcal {A}}(Y_t) \\&\le (1 - t) f^{\mathcal {A}}(X_0 - Y_0) + t f^{\mathcal {A}}(X_1 - X_1) + (1 - t) g^{\mathcal {A}}(Y_0) + t g^{\mathcal {A}}(Y_1). \end{aligned}$$

Since \(Y_0\) and \(Y_1\) were arbitrary, we can take the infimum over \(Y_0\) and \(Y_1\) and apply (4.6) to conclude that

$$\begin{aligned} (f \square g)^{\mathcal {A}}(X_t) \le (1 - t) (f \square g)^{\mathcal {A}}(X_0) + t (f \square g)^{\mathcal {A}}(X_1). \end{aligned}$$

This shows that \((f \square g)^{\mathcal {A}}\) is convex. Furthermore, since \(f \square g < \infty \), this relation implies that if \((f \square g)^{\mathcal {A}}\) is \(-\infty \) at one point in \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), then it is \(-\infty \) everywhere. Moreover, if \((f \square g)^{\mathcal {B}}\) is \(-\infty \), then so \((f \square g)^{\mathcal {A}}\), as we can see by considering the free product of \(\mathcal {A}\) and \(\mathcal {B}\).

It is automatic from these facts that \((f \square g)^{\mathcal {A}}\) is lower semi-continuous, since convexity automatically implies lower semi-continuity at points where \((f \square g)^{\mathcal {A}} < \infty \).

Finally, we must show the monotonicity of \((f \square g)\) under conditional expectation. Let \(\iota : \mathcal {A}\rightarrow \mathcal {B}\) be an embedding and let \(E: \mathcal {B}\rightarrow \mathcal {A}\) be the corresponding conditional expectation. If \(X, Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), then

$$\begin{aligned} (f \square g)^{\mathcal {A}}(E[X]) \le f^{\mathcal {A}}(E[X] - E[Y]) + g^{\mathcal {A}}(E[Y]) \le f^{\mathcal {B}}(X - Y) + g^{\mathcal {B}}(Y). \end{aligned}$$

Since Y on right-hand side was arbitrary, we conclude by (4.6) that \((f \square g)^{\mathcal {A}}(E[X]) \le (f \square g)^{\mathcal {B}}(X)\) as desired. \(\square \)

Observation 4.7

For \(t \in (0,\infty )\), let \(q_t^{\mathcal {A}}(X) = (1/2t) \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\). For \(s, t \in (0,\infty )\), because \(q_s\) and \(q_t\) are E-convex and take finite values, we have

$$\begin{aligned} q_s \square q_t = \mathcal {L}^2(q_s \square q_t) = \mathcal {L}(\mathcal {L} q_s + \mathcal {L} q_t) = \mathcal {L}(q_{1/s} + q_{1/t}) = \mathcal {L}(q_{1/(s+t)}) = q_{s+t}. \end{aligned}$$

Then by associativity of inf-convolution, for any tracial \(\mathrm {W}^*\)-function f, we have

$$\begin{aligned} q_s \square (q_t \square f) = (q_s \square q_t) \square f = q_{s+t} \square f. \end{aligned}$$

Thus, \((q_t \square (\cdot ))_{t > 0}\) defines a semigroup acting on tracial \(\mathrm {W}^*\)-functions. This is the tracial \(\mathrm {W}^*\)-analog of the Hopf-Lax semigroup.

Definition 4.8

If f is a tracial \(\mathrm {W}^*\)-function, we say that f is convex if \(f^{\mathcal {A}}\) is convex for every \(\mathcal {A}\in {\mathbb {W}}\). We say that f is semi-concave if \(q_t - f\) is convex for some \(t > 0\).

Lemma 4.9

Suppose f and g are tracial \(\mathrm {W}^*\)-functions and \(q_t - f\) is convex. Then \(q_t - f \square g\) is convex.

Proof

Note that

$$\begin{aligned}&(q_t - f \square g)^{\mathcal {A}}(X) = \sup _{\begin{array}{c} \iota : \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( q_t^{\mathcal {B}}(\iota (X)) - f^{\mathcal {B}}(\iota (X) - Y) - g^{\mathcal {B}}(Y) \right) \\&\quad =\! \sup _{\begin{array}{c} \iota : \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( q_t^{\mathcal {B}}(\iota (X) - Y) \!-\! f^{\mathcal {B}}(\iota (X) - Y) \!+\! \frac{1}{t} \langle \iota (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \!-\! q_t^{\mathcal {B}}(Y) \!-! g^{\mathcal {B}}(Y) \right) . \end{aligned}$$

The right-hand side is the supremum of a family convex functions of X and therefore is convex. \(\square \)

As a consequence of Lemmas 4.6 and 4.9, if f is E-convex, then \(q_t \square f\) is an E-convex and semi-concave function. The next results give a characterization of such functions as well as some of their regularity properties. These results are quite close to the standard results about convex functions on a Hilbert space, so we do not claim any originality, but nonetheless we include the proofs for the sake of completeness.

Proposition 4.10

Let f be an E-convex \(\mathrm {W}^*\)-function that is not identically \(\infty \) or \(-\infty \). Then the following are equivalent:

  1. (1)

    \(f = q_t \square g\) for some E-convex function g.

  2. (2)

    \(q_t - f\) is convex.

  3. (3)

    \(q_t - f\) is E-convex.

  4. (4)

    \(\mathcal {L}f - q_{1/t}\) is convex and lower semi-continuous.

  5. (5)

    \(\mathcal {L}f - q_{1/t}\) is E-convex.

Moreover, in this case, \(f < \infty \) everywhere.

Proof

(1) \(\implies \) (2) follow from Lemma 4.9.

(2) \(\implies \) (3). Because \(q_t - f\) takes finite values everywhere, by Lemma 3.10, it suffices to show that for every \(X \in L^2(\mathcal {A})\), there exists a some \(Z \in \eth (q_t-f)^{\mathcal {A}}(X) \cap L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\). Because \(q_t - f\) is convex, it has a subgradient vector Z at X, so that

$$\begin{aligned} q_t^{\mathcal {A}}(X') - f^{\mathcal {A}}(X') - q_t^{\mathcal {A}}(X) + f^{\mathcal {A}}(X) \ge \langle X'-X,Z\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}, \end{aligned}$$

which implies that

$$\begin{aligned}&f^{\mathcal {A}}(X') - f^{\mathcal {A}}(X) \le \langle X-X',Z\rangle + \frac{1}{2t} (\Vert X'\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2)\\&\quad = \langle X'-X,Z + (1/t)X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + \frac{1}{2t} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2. \end{aligned}$$
(4.7)

Because f is E-convex, there exists some \(Y \in \eth f^{\mathcal {A}}(X) \cap L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\). Of course,

$$\begin{aligned} f^{\mathcal {A}}(X') - f^{\mathcal {A}}(X) \ge \langle X'-X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$
(4.8)

This implies that

$$\begin{aligned} \langle X'-X,Z + (1/t)X - Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \ge - \frac{1}{2t} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \end{aligned}$$

for all \(X'\). Now take \(X' = -tZ + tY\) and obtain

$$\begin{aligned} -t \Vert Z + (1/t)X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2= & {} \langle X'-X,Z + (1/t)X - Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\\ge & {} - \frac{1}{2t} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \\\ge & {} -\frac{t}{2} \Vert Z + (1/t)X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2, \end{aligned}$$

which implies that \(Z + (1/t)X - Y = 0\), hence \(Z = Y - (1/t)X \in L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\).

(3) \(\implies \) (4). Note that

$$\begin{aligned}&\mathcal {L}f^{\mathcal {A}}(X) - q_{1/t}^{\mathcal {A}}(X)\\&\quad = \sup _{\begin{array}{c} \iota :\mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( \langle \iota (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - \frac{t}{2} \Vert \iota (X)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 - f^{\mathcal {B}}(Y) \right) \\&\quad = \sup _{\begin{array}{c} \iota :\mathcal {A}\rightarrow \mathcal {B}\\ Z \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( \langle \iota (X),Z+t \iota (X)\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - \frac{t}{2} \Vert \iota (X)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 - f^{\mathcal {B}}(Z + t\iota (X)) \right) \\&\quad = \sup _{\begin{array}{c} \iota :\mathcal {A}\rightarrow \mathcal {B}\\ Z \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( -\frac{1}{2t} \Vert Z\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 + \frac{1}{2t} \Vert Z + t \iota (X)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 - f^{\mathcal {B}}(Z + t\iota (X)) \right) . \end{aligned}$$

Because \(q_t - f\) is convex and lower semi-continuous, the right-hand side is the supremum of convex lower semi-continuous functions of X, and therefore is convex and lower semi-continuous.

(4) \(\implies \) (5). Let \(h = \mathcal {L}f\). Since f is not identically \(-\infty \) or \(\infty \), the same is true of h. We assumed in (3) that \(h - q_{1/t}\) is convex and lower semi-continuous. Moreover, if \(E: \mathcal {B}\rightarrow \mathcal {A}\) is a conditional expectation, then \(h^{\mathcal {B}}(X) < \infty \) implies \((h - q_{1/t})^{\mathcal {B}}(X) < \infty \) implies \((h - q_{1/t})^{\mathcal {A}}(E[X]) < \infty \) implies \(h^{\mathcal {A}}(E[X]) < \infty \). Thus, it remains to show that \(h^{\mathcal {A}}(E[X]) \le h^{\mathcal {B}}(X)\) whenever \(h^{\mathcal {A}}(E[X])\) is finite. As in Lemma 3.10, it suffices to show that for every \(\mathcal {A}\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) with \(h^{\mathcal {A}}(X) < \infty \), there exists some subgradient vector \(Y \in L^2(\mathrm {W}^*(X),\tau |_{\mathrm {W}^*(X)})_{{{\,\mathrm{sa}\,}}}^m\). By E-convexity of h, there exists some \(Z \in \eth h^{\mathcal {A}}(X) \cap L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\). Then we claim that \(Z - tX \in \eth (h-q_{1/t})^{\mathcal {A}}(X)\). To prove this, observe that by convexity of \(h - q_{1/t}\), for \(s \in (0,1)\), and \(X' \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\),

$$\begin{aligned} s (h - q_{1/t})^{\mathcal {A}}(X')&\ge (h - q_{1/t})^{\mathcal {A}}((1-s)X + sX') - (1 - s)(h - q_{1/t})^{\mathcal {A}}(X) \\&\ge h^{\mathcal {A}}(X) + \langle (1 - s)X + sX'-X, Z\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\\&\qquad - q_{1/t}^{\mathcal {A}}((1-s)X + sX') - h^{\mathcal {A}}(X) \\&\qquad + q_{1/t}^{\mathcal {A}}(X) + s (h - q_{1/t})^{\mathcal {A}}(X) \\&= s (h - q_{1/t})^{\mathcal {A}}(X) + s \langle X' - X,Z\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\qquad + q_{1/t}^{\mathcal {A}}(X) - q_{1/t}^{\mathcal {A}}((1-s)X + sX') \\&= s (h - q_{1/t})^{\mathcal {A}}(X) + s \langle X' - X,Z\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\qquad + \frac{t}{2} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \frac{t}{2} \Vert X + s(X' - X)\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&= s (h - q_{1/t})^{\mathcal {A}}(X) + s \langle X' - X,Z - tX\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\qquad - \frac{ts^2}{2} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2. \end{aligned}$$

Dividing by s and sending \(s \rightarrow 0^+\), we obtain

$$\begin{aligned} (h - q_{1/t})^{\mathcal {A}}(X') \ge (h - q_{1/t})^{\mathcal {A}}(X) + \langle X' - X, Z - tX\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Hence, \(Z - tX \in \eth (h - q_{1/t})^{\mathcal {A}}(X)\). Since \(Z - tX \in L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\), the proof is complete.

(5) \(\implies \) (1). Since \(\mathcal {L}f - q_{1/t}\) is E-convex, we have \(\mathcal {L}f - q_{1/t} = \mathcal {L} g\) for some E-convex function g by Proposition 3.17. Thus, since g and \(q_{1/t}\) are both E-convex, we have

$$\begin{aligned} f = \mathcal {L}^2 f = \mathcal {L}(\mathcal {L}g + q_{1/t}) = \mathcal {L} \mathcal {L}(g \square q_t) = g \square q_t, \end{aligned}$$

where the last line follows because \(g \square q_t\) is E-convex by Lemma 4.6.

Finally, (1) implies that \(f < \infty \) everywhere. Indeed, if \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), and if Y is some point where \(g^{\mathcal {B}}(Y) < \infty \), then let \(\mathcal {C}\) be the free product of \(\mathcal {A}\) and \(\mathcal {B}\) and let \(\iota _1: \mathcal {A}\rightarrow \mathcal {C}\) and \(\iota _2: \mathcal {B}\rightarrow \mathcal {C}\) be the corresponding inclusions. Then \((g \square q_t)^{\mathcal {A}}(X) \le \frac{1}{2t} \Vert \iota _1(X) - \iota _2(Y)\Vert _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}^2 + g^{\mathcal {C}}(\iota _2(Y)) < \infty \). \(\square \)

Proposition 4.11

Let f be an E-convex \(\mathrm {W}^*\)-function taking values in \({\mathbb {R}}\). Then the following are equivalent:

  1. (1)

    \(q_t - f\) is convex.

  2. (2)

    If \(\mathcal {A}\in {\mathbb {W}}\) and \(Y \in \eth f^{\mathcal {A}}(X)\) and \(Y' \in \eth f^{\mathcal {A}}(X')\), then \(\Vert Y - Y'\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le (1/t) \Vert X - X'\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\).

  3. (3)

    If \(\mathcal {A}\in {\mathbb {W}}\), then \(\eth f^{\mathcal {A}}(X)\) consists of a single point \(\nabla f^{\mathcal {A}}(X) \in L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\), and \(\nabla f^{\mathcal {A}}\) defines a (1/t)-Lipschitz function \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m \rightarrow L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\).

  4. (4)

    For each \(\mathcal {A}\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(Y \in \eth f^{\mathcal {A}}(X)\), we have

    $$\begin{aligned} \langle X'-X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\le & {} f^{\mathcal {A}}(X') - f^{\mathcal {A}}(X) \nonumber \\\le & {} \langle X'-X, Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + \frac{1}{2t} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \end{aligned}$$
    (4.9)

    for all \(X' \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\).

Proof

(1) \(\implies \) (2). By the previous Proposition 4.10, \(\mathcal {L}f - q_{1/t}\) is E-convex. Let \(Y \in \eth f^{\mathcal {A}}(X)\) and \(Y' \in \eth f^{\mathcal {A}}(X')\). Then by Lemma 3.20, we have \(X \in \eth \mathcal {L}f^{\mathcal {A}}(Y)\) and \(X' \in \eth \mathcal {L}f^{\mathcal {A}}(Y')\). By the same argument as (4) \(\implies \) (5) in the proof of Proposition 4.10, we have \(Z := X - tY \in \eth (\mathcal {L}f - q_{1/t})(Y)\) and \(Z' := X' - tY' \in \eth (\mathcal {L}f - q_{1/t})(Y')\). It follows that

$$\begin{aligned} \langle Z', Y - Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le \mathcal {L}f^{\mathcal {A}}(Y) - \mathcal {L}f^{\mathcal {A}}(Y') \le \langle Z, Y - Y'\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}, \end{aligned}$$

hence

$$\begin{aligned} 0&\le \langle Z' - Z, Y' - Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= \langle X' - X - t(Y' - Y), Y' - Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&= \langle X' - X,Y' - Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - t \Vert Y' - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&\le \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \Vert Y' - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - t \Vert Y' - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2. \end{aligned}$$

Therefore, \(\Vert Y' - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le (1/t) \Vert X - X'\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\) as desired.

(2) \(\implies \) (3). By taking \(X = X'\) in (2), we see that there is a unique \(Y \in \eth f^{\mathcal {A}}(X)\) and that \(X \mapsto Y\) is a (1/t)-Lipschitz function. By Lemma 3.10, we know that \(\eth f^{\mathcal {A}}(X)\) contains some point in \(L^2(\mathrm {W}^*(X))_{{{\,\mathrm{sa}\,}}}^m\), and this point must equal Y.

(3) \(\implies \) (4). Let \(\mathcal {A}\) and X be given. By our assumption of (3), there is a unique point \(Y = \nabla f^{\mathcal {A}}(X)\) in \(\eth f^{\mathcal {A}}(X)\). Let \(X' \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). The lower bound \(\langle X' - X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le f^{\mathcal {A}}(X') - f^{\mathcal {A}}(X)\) follows immediately from convexity. For the upper bound, let \(X_t = (1 - t)X' + tX\) and let \(Y_t = \nabla f^{\mathcal {A}}(X_t)\).

For \(n \in {\mathbb {N}}\), observe that

$$\begin{aligned} f^{\mathcal {A}}(X') - f^{\mathcal {A}}(X)&= \sum _{j=1}^n \left( f^{\mathcal {A}}(X_{j/n}) - f^{\mathcal {A}}(X_{(j-1)/n}) \right) \\&\le \sum _{j=1}^n \langle X_{j/n} - X_{(j-1)/n}, Y_{j/n}\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\le \sum _{j=1}^n \langle X_{j/n} - X_{(j-1)/n}, Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\qquad + \sum _{j=1}^n \Vert X_{j/n} - X_{(j-1)/n}\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \Vert Y_{j/n} - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\le \langle X' - X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\qquad + \sum _{j=1}^n \frac{1}{n} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \frac{1}{t} \Vert X_{j/n} - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\le \langle X' - X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + \frac{1}{t} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \sum _{j=1}^n \frac{j}{n^2} \\&= \langle X' - X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + \frac{1}{t} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \frac{n(n+1)}{2n^2}. \end{aligned}$$

Taking \(n \rightarrow \infty \) shows that \(f^{\mathcal {A}}(X') - f^{\mathcal {A}}(X) \le \langle X' - X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} + (1/2t) \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\) as desired.

(4) \(\implies \) (1). Let \(\mathcal {A}\in {\mathbb {W}}\). We show that \((q_t - f)^{\mathcal {A}}\) is convex by exhibiting a subgradient vector for every \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). Let \(Y \in \eth f^{\mathcal {A}}(X)\) and let \(X' \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). By (4),

$$\begin{aligned} (q_t - f)^{\mathcal {A}}(X') - (q_t - f)^{\mathcal {A}}(X)&\ge \frac{1}{2t} \Vert X'\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \frac{1}{2t} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&\qquad - \langle X' - X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \frac{1}{2t} \Vert X' - X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&= \langle X' - X, -Y + (1/t)X\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Hence, \(-Y + (1/t)X\) is a subgradient vector for \(q_t - f\) at X as desired. \(\square \)

4.3 Main results on the displacement interpolation

We start out by proving Theorem 1.5 which states that if \((\mathcal {A},X,Y)\) is an \(L^2\) optimal coupling and \(X_t = (1 - t)X + tY\), then \(\mathrm {W}^*(X_t) = \mathrm {W}^*(X,Y)\) for all \(t \in (0,1)\).

Proof of Theorem 1.5

By Proposition 3.24, there exists an E-convex function f such that \(Y \in \partial f^{\mathcal {A}}(X)\). Let \(f_t = (1 - t) q_1 + tf\), where \(q_1^{\mathcal {A}}(X) = (1/2) \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\). Since \((1 - t)X\) is a subgradient to \((1-t)q_1\) at X and tY is a subgradient to \(f^{\mathcal {A}}\) at X, we have \(X_t \in \eth f_t^{\mathcal {A}}(X)\). By Lemma 3.20, we have \(X \in \eth \mathcal {L} f_t^{\mathcal {A}}(X_t)\). Since \(f_t - q_{1/(1-t)} = f_t - (1 - t)q_1 = tf\) is E-convex, \(q_{1-t} - \mathcal {L}f_t\) is E-convex by Proposition 4.10. Hence, by Proposition 4.11, \(\eth \mathcal {L} f_t^{\mathcal {A}}(X_t)\) consists of a single point which is in \(L^2(\mathrm {W}^*(X_t))_{{{\,\mathrm{sa}\,}}}^m\). But we already know that \(X \in \eth \mathcal {L} f_t^{\mathcal {A}}(X_t)\), and therefore \(X \in L^2(\mathrm {W}^*(X_t))_{{{\,\mathrm{sa}\,}}}^m\).

A symmetrical argument shows that \(Y \in L^2(\mathrm {W}^*(X_t))_{{{\,\mathrm{sa}\,}}}^m\). Therefore, \(\mathrm {W}^*(X,Y) \subseteq \mathrm {W}^*(X_t)\). The reverse inclusion \(\mathrm {W}^*(X_t) \subseteq \mathrm {W}^*(X,Y)\) is obvious since \(X_t = (1 - t)X + tY\). \(\square \)

It follows from the triangle inequality that \((\mathcal {A},X_s,X_t)\) is an optimal coupling of the laws of \(X_s\) and \(X_t\) (see Proposition A.22). Another way to show that is, given an E-convex function f such that \(Y \in \eth f^{\mathcal {A}}(X)\), to derive E-convex functions \(f_{t,s}\) for \(s, t \in [0,1]\) such that \(X_t \in \eth f_{t,s}^{\mathcal {A}}(X_s)\). The next proposition gives an explicit construction of \(f_{t,s}\) from f, and gives the properties of \(f_{t,s}\). The specific cases relevant to the displacement interpolation are then summarized in Corollary 4.13. All of these results are completely analogous to the classical statements.

Proposition 4.12

Let f be an E-convex function. For \(s, t \in [0,1]\), define \(f_{t,s}\) as follows: For \(s = 0\), set

$$\begin{aligned} f_{t,0} = (1 - t) q_1 + t f; \qquad f_{0,t} = \mathcal {L} f_{t,0}; \end{aligned}$$

if \(s > 0\) and \(s \le t\), set

$$\begin{aligned} f_{t,s}^{\mathcal {A}}(X)= & {} \inf _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left( \frac{t}{2s} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \frac{t - s}{s} \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \right. \\&\left. + \frac{(t - s)(1 - s)}{2s} \Vert Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + (t - s) f^{\mathcal {A}}(Y) \right) ; \end{aligned}$$

if \(s > 0\) and \(s \ge t\), set

$$\begin{aligned} f_{t,s}^{\mathcal {A}}(X)= & {} \sup _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left( \frac{t}{2s} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \frac{t - s}{s} \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}\right. \\&\left. + \frac{(t - s)(1 - s)}{2s} \Vert Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + (t - s) f^{\mathcal {A}}(Y) \right) . \end{aligned}$$

(In particular, \(f_{t,t} = q_1\) for all \(t \in [0,1]\).) Then we have the following:

  1. (1)

    \(f_{t,s}\) is E-convex and \(f_{s,t} = \mathcal {L}f_{t,s}\).

  2. (2)

    If \(s \le t\), then \(f_{t,s} - \frac{1 - t}{1 - s} q_1\) is E-convex for \(s < 1\) and \(\frac{t}{s} q_1 - f_{t,s}\) is E-convex for \(s > 0\).

  3. (3)

    If \(t \le s\), then \(f_{t,s} - \frac{t}{s} q_1\) is E-convex for \(s > 0\) and \(\frac{1 - t}{1 - s} q_1 - f_{t,s}\) is E-convex for \(s < 1\).

  4. (4)

    In particular, if \(s \in (0,1)\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), then \(\eth f_{t,s}^{\mathcal {A}}(X)\) consists of a unique point \(\nabla f_{t,s}^{\mathcal {A}}(X)\) and \(\nabla f_{t,s}^{\mathcal {A}}\) is Lipschitz.

  5. (5)

    Suppose \(0 \le s < t \le 1\). If \(u \in (s,t)\), then

    $$\begin{aligned} f_{u,s} = \frac{t - u}{t - s} q_1 + \frac{u - s}{t - s} f_{t,s} \end{aligned}$$

    and

    $$\begin{aligned} f_{t,u} = \left( \frac{t - s}{u - s} q_1 \right) \square \left( \frac{t - u}{t - s} f_{t,s} \left( \frac{t - s}{t - u} (\cdot ) \right) \right) . \end{aligned}$$
  6. (6)

    Suppose \(0 \le s < t \le 1\) and \(X, Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) with \(Y \in \eth f_{t,s}^{\mathcal {A}}(X)\). For \(u \in [s,t]\), let

    $$\begin{aligned} X_u = \frac{t - u}{t - s} X + \frac{u - s}{t - s} Y. \end{aligned}$$

    Then \(X_u \in \eth f_{u,s}^{\mathcal {A}}(X)\) and \(Y \in \eth f_{t,u}^{\mathcal {A}}(X_u)\).

  7. (7)

    For \(s, t, u \in (0,1)\), we have \(\nabla f_{u,t} \circ \nabla f_{t,s} = \nabla f_{u,s}\).

The next corollary describes the most relevant cases of the proposition for optimal transport; the claims are special cases of (4) and (6) of the proposition.

Corollary 4.13

Let \((\mathcal {A},X,Y)\) be an optimal coupling of \(\mu \), \(\nu \in \Sigma _m\). Let f be an E-convex function such that \(Y \in \eth f(X)\). Let \(f_{t,s}\) be as in Proposition 4.12. Let \(X_t = (1-t)X + tY\) for \(t \in [0,1]\). Then \(X_t \in \eth f_{t,s}(X_s)\) for all \(s,t \in [0,1]\). In particular, if \(s \in (0,1)\), then \(f_{t,s}\) has a Lipschitz gradient and we have \(X_t = \nabla f_{t,s}(X_s)\).

In order to prove Proposition 4.12, we need the following scaling relation for Legendre transform.

Lemma 4.14

Let f be a tracial \(\mathrm {W}^*\)-function and let \(c > 0\). Then \(\mathcal {L}(cf)^{\mathcal {A}}(X) = c \mathcal {L}f^{\mathcal {A}}(c^{-1}X)\).

Proof

Observe that

$$\begin{aligned} \mathcal {L}(cf)^{\mathcal {A}}(X)&= \sup _{\begin{array}{c} \iota : \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( \langle \iota (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - cf^{\mathcal {B}}(Y) \right) \\&= \sup _{\begin{array}{c} \iota : \mathcal {A}\rightarrow \mathcal {B}\\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( c\langle \iota (c^{-1}X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - cf^{\mathcal {B}}(Y) \right) \\&= c \mathcal {L}f^{\mathcal {A}}(c^{-1}X). \end{aligned}$$

\(\square \)

The bulk of the proof of the proposition is the following lemma which explains how \(f_{s,t}\) were obtained through addition of and inf-convolution with quadratic functions, using the same idea as in the proof of Theorem 1.5.

Lemma 4.15

Consider the setup of Proposition 4.12. If \(0< s < t \le 1\), then

$$\begin{aligned} f_{t,s}&= \frac{1-t}{1-s} q_1 + \frac{t-s}{1-s} \left[ \left( \frac{1}{s} q_1 \right) \square \left( (1 - s) f\left( \frac{1}{1 - s}(\cdot ) \right) \right) \right] \\ \mathcal {L}f_{t,s}&= \left( \frac{1 - s}{1 - t} q_1 \right) \square \left[ \frac{t-s}{1-s} [sq_1 + (1 - s) \mathcal {L} f]\left( \frac{1-s}{t-s} (\cdot ) \right) \right] \end{aligned}$$
(4.10)

and

$$\begin{aligned} f_{t,s}&= \frac{t}{s} q_1 \square \left[ \frac{t - s}{t} [(1-t)q_1 + tf]\left( \frac{t}{t - s}(\cdot ) \right) \right] \\ \mathcal {L}f_{t,s}&= \frac{s}{t} q_1 + \frac{t - s}{t} \left[ \left( \frac{1}{1-t} q_1\right) \square t \mathcal {L}f \left( \frac{1}{t} (\cdot ) \right) \right] . \end{aligned}$$
(4.11)

Proof

Fix \(\mathcal {A}\in {\mathbb {W}}\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), and evaluate the right-hand side of (4.11) at X to obtain

$$\begin{aligned}&\frac{1-t}{1-s} q_1(X) + \frac{t-s}{1-s} \left[ \left( \frac{1}{s} q_1 \right) \square \left( (1 - s) f\left( \frac{1}{1 - s}(\cdot ) \right) \right) \right] (X) \\&\quad = \frac{1-t}{2(1-s)} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + \frac{t-s}{1-s}\\&\qquad \inf _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left[ \frac{1}{2s} \Vert X - Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + (1 - s) f^{\mathcal {A}}\left( \frac{1}{1-s} Y \right) \right] , \end{aligned}$$

where we have used the result from Lemma 4.6 that it suffices to take the infimum over \(Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) rather than Y in \(L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) for some larger tracial \(\mathrm {W}^*\)-algebra \(\mathcal {B}\). Next, we substitute \((1 - s)Y\) instead of Y to obtain

$$\begin{aligned}&\frac{1-t}{2(1-s)} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + \frac{t-s}{1-s} \inf _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left[ \frac{1}{2s} \Vert X - (1-s)Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + (1 - s) f^{\mathcal {A}}(Y) \right] \\&\quad = \inf _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \biggl [ \frac{1-t}{2(1-s)} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + \frac{t - s}{2s(1-s)} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \frac{t - s}{s} \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \\&\qquad + \frac{(t-s)(1-s)}{2s} \Vert Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + (t - s) f^{\mathcal {A}}(Y) \biggr ]. \end{aligned}$$

Combining the two coefficients in front of \(\Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2\), we arrive at the formula for \(f_{t,s}^{\mathcal {A}}(X)\).

The equation (4.12) is obtained from (4.11) by applying the Legendre transform, using the fact that \(\mathcal {L} (cq_1) = c^{-1} q_1\) for \(c > 0\), the relation between Legendre transform and inf-convolution in Lemma 4.4, and the scaling relation Lemma 4.14.

The proof of (4.13) is similar to the proof of (4.11), and then (4.14) is obtained by taking the Legendre transform. \(\square \)

Proof of Proposition 4.12

(1) It is immediate that \(f_{t,0}\) and \(f_{0,t}\) are E-convex and are Legendre transforms of each other. Also, in the case of \(s = t\), we have \(f_{t,s} = q_1\), so there is nothing to prove. For \(0< s < t \le 1\), it follows from Lemma 4.15 that \(f_{t,s}\) is E-convex for because it is expressed by applying scaling, addition of quadratics, and inf-convolution with quadratics to f.

Next, we show that for \(0< s < t\), we have \(\mathcal {L}f_{t,s} = f_{s,t}\). We evaluate \(\mathcal {L} f_{t,s}\) starting from (4.13) as

$$\begin{aligned} \mathcal {L}f_{t,s} = \frac{s}{t} q_1 + \mathcal {L} \left[ \frac{t - s}{t} [(1-t)q_1 + tf]\left( \frac{t}{t - s}(\cdot ) \right) \right] \end{aligned}$$

Now we evaluate the second term at some \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), where \(\mathcal {A}\in {\mathbb {W}}\). Using Remark 3.18, we may compute the Legendre transform of an E-convex function by taking the Hilbert-space Legendre transform for each \(\mathcal {A}\) (without considering a larger \(\mathrm {W}^*\)-algebra \(\mathcal {B}\)). This yields

$$\begin{aligned} \sup _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left[ \langle X,Y\rangle - \frac{t - s}{t} [(1-t)q_1 + tf]^{\mathcal {A}}\left( \frac{t}{t - s}Y \right) \right] . \end{aligned}$$

We substitute \(\frac{t-s}{t} Y\) for Y to obtain

$$\begin{aligned} \sup _{Y \in L^2(\mathcal {A})_{{{\,\mathrm {sa}\,}}}^m} \left[ \frac{t-s}{t} \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm {sa}\,}}}^m} - \frac{(t - s)(1-t)}{2t} \Vert Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm {sa}\,}}}^m}^2 - (t - s) f(Y) \right] . \end{aligned}$$

Adding back the term \(\frac{s}{t} q_1^{\mathcal {A}}(X)\), we obtain

$$\begin{aligned}&\sup _{Y \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \left[ \frac{s}{2t} \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \frac{s-t}{t} \langle X,Y\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \right. \\&\qquad \left. + \frac{(s - t)(1-t)}{2t} \Vert Y\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + (s - t) f(Y) \right] , \end{aligned}$$

which is precisely \(f_{s,t}\).

Therefore, for \(s> t > 0\), we have \(f_{t,s} = \mathcal {L} f_{s,t}\), hence \(f_{t,s}\) is E-convex and \(\mathcal {L} f_{t,s} = f_{s,t}\). This is the last remaining case.

(2) Let \(0 \le s \le t \le 1\). If \(s = 0\), then \(f_{t,s} = (1 - t) q_1 + tf\), so \(f_{t,s} - \frac{1-t}{1-s} q_1 = tf\) is E-convex. If \(s \in (0,1)\), then by (4.11), \(f_{t,s}\) is \(\frac{1-t}{1-s} q_1\) plus an E-convex function, hence \(f_{t,s} - \frac{1-t}{1-s} q_1\) is E-convex. If \(s \in (0,1]\), then by (4.13), \(f_{t,s}\) is the inf-convolution of \(\frac{t}{s} q_1\) with an E-convex function and therefore \(\frac{t}{s} q_1 - f_{t,s}\) is E-convex by Proposition 4.10.

(3) Let \(s \ge t\). Then \(f_{t,s} = \mathcal {L}f_{s,t}\). Thus, can we argue symmetrically to (2) using (4.12) and (4.14).

(4) This follows from (2) and (3) together with Proposition 4.11.

(5) Consider the first relation \(f_{u,s} = \frac{t - u}{t - s} q_1 + \frac{u - s}{t - s} f_{t,s}\). If \(s = 0\), this follows from direct computation from the definition of \(f_{u,0}\) and \(f_{t,0}\). In the case \(s > 0\), we apply (4.11) to get

$$\begin{aligned} \frac{t - u}{t - s} q_1 + \frac{u - s}{t - s} f_{t,s}&= \frac{t - u}{t - s} q_1 + \frac{u - s}{t - s} \frac{1 - t}{1 - s} q_1\\&\qquad + \frac{u - s}{1 - s} \left[ \left( \frac{1}{s} q_1 \right) \square \left( (1 - s) f\left( \frac{1}{1 - s}(\cdot ) \right) \right) \right] \\&= \frac{1 - u}{1 - s} q_1 + \frac{u - s}{1 - s} \left[ \left( \frac{1}{s} q_1 \right) \square \left( (1 - s) f\left( \frac{1}{1 - s}(\cdot ) \right) \right) \right] \\&= f_{u,s}. \end{aligned}$$

Analogously, using (4.14), we obtain for \(s \in (0,1)\) that

$$\begin{aligned} \mathcal {L} f_{t,u} = \frac{u-s}{t-s} q_1 + \frac{t-u}{t-s} \mathcal {L} f_{t,s}; \end{aligned}$$

in fact, this relation also holds when \(s = 0\) by evaluating \(\mathcal {L} f_{t,u}\) on the left-hand side with (4.14) and evaluating \(\mathcal {L} f_{t,0}\) on the right-hand side \( \mathcal {L}[(1-t)q_1 + tf] = (\frac{1}{1-t} q_1) \square t \mathcal {L}f(\frac{1}{t}(\cdot ))\). Taking Legendre transforms of the previous equation implies that

$$\begin{aligned} f_{t,u} = \left( \frac{t - s}{u - s} q_1 \right) \square \left( \frac{t - u}{t - s} f_{t,s} \left( \frac{t - s}{t - u} (\cdot ) \right) \right) . \end{aligned}$$

(6) Since \(X \in \eth q_1^{\mathcal {A}}(X)\) and \(Y \in \eth f_{t,s}^{\mathcal {A}}(X)\), we have

$$\begin{aligned} X_u = \frac{t-u}{t-s} X + \frac{u-s}{t-s} Y \in \eth \left[ \frac{t-u}{t-s} q_1 + \frac{u-s}{t-s} f_{t,s} \right] ^{\mathcal {A}}(X) = \eth f_{u,s}^{\mathcal {A}}(X). \end{aligned}$$

Since \(Y \in \eth f_{t,s}^{\mathcal {A}}(X)\), we have \(X \in \eth (\mathcal {L} f_{t,s})^{\mathcal {A}}(Y)\). Hence, using the same relation as in the proof (5),

$$\begin{aligned} X_u = \frac{u-s}{t-s} Y + \frac{t-u}{t-s} X \in \eth \left[ \frac{u-s}{t-s} q_1 + \frac{t-u}{t-s} \mathcal {L} f_{t,s}\right] ^{\mathcal {A}}(Y) = \eth (\mathcal {L}f_{t,u})^{\mathcal {A}}(Y). \end{aligned}$$

So \(X_u \in \eth (\mathcal {L} f_{t,u})^{\mathcal {A}}(Y)\), so that \(Y \in \eth f_{t,u}^{\mathcal {A}}(X_u)\).

(7) In light of (4), for \(s, t \in (0,1)\), the functions \(f_{s,t}\) and \(f_{t,s}\) have Lipschitz gradients. They are Legendre transforms of each other, which implies that \(X \in \eth f_{s,t}(Y)\) if and only if \(Y \in \eth f_{t,s}(X)\). Hence, \(\nabla f_{s,t} = (\nabla f_{t,s})^{-1}\).

Suppose that \(s< u < t\). Let \(Y = \nabla f_{t,s}(X)\), and let \(X_u = \frac{u-s}{t-s}X + \frac{t-u}{t-s} Y\). Then by (6), \(X_u = \nabla f_{u,s}(X)\) and \(Y = \nabla f_{t,u}(X_u)\), hence \(\nabla f_{t,s}(X) = Y = \nabla f_{t,u}(X_u) = \nabla f_{t,u} \circ \nabla f_{u,s}(X)\).

So \(\nabla f_{t,s} = \nabla f_{t,u} \circ \nabla f_{u,s}\). By applying \(\nabla f_{s,u} = (\nabla f_{u,s})^{-1}\) on the right, we obtain \(\nabla f_{t,s} \circ \nabla f_{s,u} = \nabla f_{t,u}\). By taking inverses, \(\nabla f_{u,s} \circ \nabla f_{s,t} = \nabla f_{u,t}\). In fact, using composition and inverses in this way, we can achieve all permutations of u, s, and t. The only remaining case is when some of stu are equal to each other, but this follows from the relations \(\nabla f_{t,t} = {{\,\mathrm{id}\,}}\) and \(\nabla f_{s,t} = (\nabla f_{t,s})^{-1}\). \(\square \)

5 Optimal Couplings, Quantum Information Theory, and Operator Algebras

In this section, we give several indications of why non-commutative optimal couplings are significantly more complicated than the commutative case by making connections to other results in operator algebras and quantum information theory. Specifically, using results from [52], we show that there exist \(n \times n\) matrix tuples for which an optimal coupling requires a tracial \(\mathrm {W}^*\)-algebra of arbitrarily large dimension. Next, based on [33] and [42], we conclude that there exist matrix tuples for which the optimal coupling requires a non-Connes-embeddable tracial \(\mathrm {W}^*\)-algebra (that is, it cannot even be approximated by couplings in finite-dimensional algebras). Next, we show that the topology induced by the Wasserstein distance is strictly stronger than the weak-\(*\) topology on \(\Sigma _{m,R}\), and we characterize the points at which the two topologies agree. Finally, we show that \(\Sigma _{m,R}\) with the Wasserstein distance is not separable based on [58, Theorem 1].

5.1 Completely positive and factorizable maps

We recall some standard definitions in operator algebras; see e.g. [60]. If \(\mathcal {A}\) is a tracial \(\mathrm {W}^*\)-algebra, we denote by \(M_n(\mathcal {A})\) the algebra \(M_n(L^\infty (\mathcal {A})) \cong M_n(\mathbb {C}) \otimes L^\infty (\mathcal {A})\) equipped with the trace \({{\,\mathrm{tr}\,}}_n \otimes \tau _{\mathcal {A}}\) and the weak-\(*\) topology given by the entrywise weak-\(*\) topology on \(L^\infty (\mathcal {A})\); it is a standard fact that \(M_n(\mathcal {A})\) is indeed a tracial \(\mathrm {W}^*\)-algebra. If \(\Phi : \mathcal {A}\rightarrow \mathcal {B}\) is a linear map between tracial \(\mathrm {W}^*\)-algebras, then we define \(\Phi ^{(n)}: M_n(\mathcal {A}) \rightarrow M_n(\mathcal {B})\) as the map obtained from entrywise application of \(\Phi \). If \(\mathcal {A}\) is a tracial \(\mathrm {W}^*\)-algebra and \(a \in L^\infty (\mathcal {A})\), then we say that \(a \ge 0\) if \(a = x^*x\) for some \(x \in L^\infty (\mathcal {A})\); this is equivalent to a defining a positive operator on \(L^2(\mathcal {A})\) by left multiplication.Footnote 3

Definition 5.1

We say that \(\Phi \) is completely positive if for every \(n \in {\mathbb {N}}\), if \(a \in M_n(\mathcal {A})\) with \(a \ge 0\), then \(\Phi ^{(n)}(a) \ge 0\). For tracial \(\mathrm {W}^*\)-algebras \(\mathcal {A}\) and \(\mathcal {B}\), we denote by \({\text {CP}}(\mathcal {A},\mathcal {B})\) the space of completely positive maps \(\mathcal {A}\rightarrow \mathcal {B}\). We denote by \({\text {UCPT}}(\mathcal {A},\mathcal {B})\) the space of unital completely positive trace-preserving maps. These maps are known in quantum information theory as quantum channels from \(\mathcal {A}\) to \(\mathcal {B}\).

Definition 5.2

(Anantharaman-Delaroche [1]) Let \(\mathcal {A}\) and \(\mathcal {B}\) be tracial \(\mathrm {W}^*\)-algebras. A linear map \(\Phi : \mathcal {A}\rightarrow \mathcal {B}\) is said to be factorizable if there exist tracial \(\mathrm {W}^*\)-inclusions \(\iota _1: \mathcal {A}\rightarrow \mathcal {C}\) and \(\iota _2: \mathcal {B}\rightarrow \mathcal {C}\) such that \(\Phi = \iota _2^* \circ \iota _1\), where \(\iota _2^*: \mathcal {C}\rightarrow \mathcal {B}\) is the conditional expectation adjoint to \(\iota _2\). We also say that \(\Phi \) factorizes through \(\mathcal {C}\) if there exist \(\iota _1\) and \(\iota _2\) as above.

We denote the space of factorizable maps by \({\text {FM}}(\mathcal {A},\mathcal {B})\). We denote by \({\text {FM}}_{{\text {fin}}}(\mathcal {A},\mathcal {B})\) the set of maps that factorize through a finite-dimensional algebra \(\mathcal {C}\).

Proposition 5.3

Let \(\mathcal {A}\), \(\mathcal {B}\), and \(\mathcal {C}\) be tracial \(\mathrm {W}^*\)-algebras.

  1. (1)

    We have \({\text {FM}}(\mathcal {A},\mathcal {B}) \subseteq {\text {UCPT}}(\mathcal {A},\mathcal {B})\).

  2. (2)

    \({\text {UCPT}}(\mathcal {A},\mathcal {B})\), \({\text {FM}}(\mathcal {A},\mathcal {B})\), and \({\text {FM}}_{{\text {fin}}}(\mathcal {A},\mathcal {B})\) are convex sets.

  3. (3)

    \({\text {UCPT}}(\mathcal {A},\mathcal {B})\) and \({\text {FM}}(\mathcal {A},\mathcal {B})\) are closed in the pointwise weak-\(*\) topology.

  4. (4)

    If \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\) and \(\Psi \in {\text {UCPT}}(\mathcal {B},\mathcal {C})\), then \(\Psi \circ \Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {C})\). The same holds with \({\text {UCPT}}\) replaced by \({\text {FM}}\).

This proposition is well-known in operator algebras. For the sake of exposition, let us recall why \({\text {FM}}(\mathcal {A},\mathcal {B}) \subseteq {\text {UCPT}}(\mathcal {A},\mathcal {B})\). Let \(\Phi \in {\text {FM}}(\mathcal {A},\mathcal {B})\), and take a factorization \(\Phi = \iota _2^* \iota _1\) where \(\iota _1: \mathcal {A}\rightarrow \mathcal {C}\) and \(\iota _2: \mathcal {B}\rightarrow \mathcal {C}\) are tracial \(\mathrm {W}^*\)-inclusions. Since \(\iota _1\) and \(\iota _2\) are \(*\)-homomorphisms, they are completely positive and unital. Now observe that \(\langle \iota _2^*(c),b\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} = \langle c,\iota _2(b)\rangle _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m} \ge 0\) for \(c \in M_n(\mathcal {C})_+\) and \(b \in M_n(\mathcal {B})_+\); it follows that \(\iota _2^*(c) \ge 0\) in \(M_n(\mathcal {B})\) for every \(c \in M_n(\mathcal {C})_+\). Since \(\iota _2\) is unital, \(\iota _2^*\) is trace-preserving, and since \(\iota _2\) is trace-preserving, \(\iota _2^*\) is unital. Finally, one verifies directly that \({\text {UCPT}}\) is closed under composition, hence, \(\iota _2^* \iota _1 \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\).

To show that factorizable maps are closed under composition in (4), one uses amalgamated free products. For convexity of \({\text {FM}}(\mathcal {A},\mathcal {B})\), see e.g. [12, Lemma 2.3.6].

The next lemma summarizes some well-known facts about completely positive maps.

Lemma 5.4

Let \(\mathcal {A}\) and \(\mathcal {B}\) be tracial \(\mathrm {W}^*\)-algebras, and let \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\).

  1. (1)

    \(\Phi (X^*) = \Phi (X)^*\) for all \(X \in L^\infty (\mathcal {A})\).

  2. (2)

    \(\Phi \) extends to a contractive map \(L^2(\mathcal {A}) \rightarrow L^2(\mathcal {B})\).

  3. (3)

    There exists a unique \(\Phi ^*\! \in \! {\text {UCPT}}(\mathcal {B},\mathcal {A})\) such that \(\langle X,\Phi ^*(Y)\rangle _{L^2(\mathcal {A})} \!=\! \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})}\) for \(X \in L^2(\mathcal {A})\) and \(Y \in L^2(\mathcal {B})\).

The connection between factorizable maps and non-commutative optimal couplings is as follows.

Observation 5.5

Let \(\mathcal {A}\) and \(\mathcal {B}\) be tracial \(\mathrm {W}^*\)-algebras and let \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(Y \in L^\infty (\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\). Then

$$\begin{aligned} C(\lambda _X,\lambda _Y) = \sup _{\Phi \in {\text {FM}}(\mathcal {A},\mathcal {B})} \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Proof

In fact, we will show that the two sets \(\{\langle X',Y'\rangle _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}: (\mathcal {C},X',Y') \text { a coupling}\}\) and \(\{\langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}: \Phi \in {\text {FM}}(\mathcal {A},\mathcal {B})\}\) are equal. Suppose that \((\mathcal {C},X',Y')\) is a coupling of \(\lambda _X\) and \(\lambda _Y\). Since \(\lambda _{X'} = \lambda _X\), there is a tracial \(\mathrm {W}^*\) embedding \(\iota _1: \mathrm {W}^*(X) \rightarrow \mathcal {C}\) sending X to \(X'\). Similarly, there is a tracial \(\mathrm {W}^*\)-embedding \(\iota _2: \mathrm {W}^*(Y) \rightarrow \mathcal {C}\) sending Y to \(Y'\). Let \(\phi _1: \mathrm {W}^*(X) \rightarrow \mathcal {A}\) and \(\phi _2: \mathrm {W}^*(Y) \rightarrow \mathcal {B}\) be the canonical inclusion maps, and let \(\Phi = \phi _2 \iota _2^* \iota _1 \phi _1^*: \mathcal {A}\rightarrow \mathcal {B}\), which is factorizable by Proposition 5.3 (4) because it is a composition of factorizable maps. Moreover,

$$\begin{aligned} \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} = \langle \iota _1 \phi _1^*(X), \iota _2 \phi _2^*(Y)\rangle _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m} = \langle \iota _1(X),\iota _2(Y)\rangle _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Conversely, given \(\Phi \in {\text {FM}}(\mathcal {A},\mathcal {B})\), we may factorize it as \(\iota _2^* \iota _1\) for tracial \(\mathrm {W}^*\)-embeddings \(\iota _1: \mathcal {A}\rightarrow \mathcal {C}\) and \(\iota _2: \mathcal {B}\rightarrow \mathcal {C}\), and let \(X' = \iota _1(X)\) and \(Y' = \iota _2(Y)\) to obtain a coupling \((\mathcal {C},X',Y')\) of \(\lambda _X\) and \(\lambda _Y\). \(\square \)

5.2 Matrix tuples with optimal couplings of large dimension

This connection allows us to address a natural question: Suppose that \(\mu \) and \(\nu \) are non-commutative laws that can be realized by self-adjoint tuples X and Y in a finite-dimensional algebra; then is there a non-commutative optimal coupling \((\mathcal {A},X',Y')\) of \(\mu \) and \(\nu \) such that \(\mathcal {A}\) is finite-dimensional? And do we have some control over the dimension? The classical analog of this question certainly has a positive answer. Indeed, if \(\mu \) and \(\nu \) are finitely supported measures on \({\mathbb {R}}^m\), with supports S and T respectively, then a classical optimal coupling is given by a measure \(\pi \) on the product space \(S \times T\). Hence, there exist random variables X and \(Y \in L^2(S \times T,\pi ;{\mathbb {R}}^m)\) such that \((\mathcal {A},X,Y)\) is an optimal coupling of \(\mu \) and \(\nu \), where \(\mathcal {A}\) is the finite-dimensional algebra \(L^\infty (S \times T,\pi )\) equipped with the trace coming from \(\pi \).

Our first negative result in the non-commutative setting shows that, even in situations when an optimal coupling can occur in a finite dimensional algebra, there can be no control over its dimension. This is a consequence of the following result of Musat and Rørdam [52].

Theorem 5.6

[Musat-Rørdam [52, Theorem 4.1]] If \(n \ge 11\), then \({\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))\) is not closed, hence there exist factorizable maps \(M_n(\mathbb {C}) \rightarrow M_n(\mathbb {C})\) that do not factor through any finite-dimensional algebra.

In order to translate this result into a statement about non-commutative optimal couplings, we use the following lemma, which is an application of the hyperplane separation theorem, vector space duality, and adjointness of tensor and hom functors.

Lemma 5.7

Let \(L_{{\mathbb {R}}}(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}},M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}})\) denote the space of real linear transformations \(M_n(\mathbb {C}) \rightarrow M_m(\mathbb {C})\). Let \(K \subseteq L_{{\mathbb {R}}}(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}},M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}})\) be a closed convex set, and let \(\Phi \not \in K\). Then there exists \(k \le \min (n^2,m^2)\) and \(X \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{k}\) and \(Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{k}\) such that

$$\begin{aligned} \langle \Phi (X),Y\rangle _{L^2(M_m(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^k} > \sup _{\Psi \in K} \langle \Psi (X),Y\rangle _{L^2(M_m(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^k}. \end{aligned}$$

Proof

Recall that there is a linear isomorphism

$$\begin{aligned} T: L_{{\mathbb {R}}}(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}},M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}}) \!\rightarrow \!L_{{\mathbb {R}}}(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}} \otimes M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}},{\mathbb {R}}) \!=\! (M_n({\mathbb {R}})_{{{\,\mathrm{sa}\,}}} \otimes M_m({\mathbb {R}})_{{{\,\mathrm{sa}\,}}})^* \end{aligned}$$

that sends \(\Psi \in L_{{\mathbb {R}}}(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}},M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}})\) to the map

$$\begin{aligned} \psi : M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}} \otimes M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}} \rightarrow {\mathbb {R}}: A \otimes B \mapsto \langle \Psi (A),B\rangle _{L^2(M_m(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}}. \end{aligned}$$

Of course, \(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}} \otimes M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\) is finite-dimensional, so the double dual is isomorphic to the original space. Applying the hyperplane separation theorem on the real inner-product space \(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}} \otimes M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\), we conclude that there exists some \(v \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}} \otimes M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\) such that

$$\begin{aligned} T(\Phi )(v) > \sup _{\Psi \in K} T(\Psi )(v). \end{aligned}$$

Let us decompose v into a sum of simple tensors \(v = \sum _{j=1}^k X_j \otimes Y_j\). The smallest k for which this is possible is called the tensor rank of v. We claim that the tensor rank is at most \(\min (n^2,m^2)\). The reason is that for real vector spaces V and W, we can identify \(V \otimes W\) with \(L_{{\mathbb {R}}}(V,W)\) and then apply the singular value decomposition of the matrix in \(L_{{\mathbb {R}}}(V,W)\) corresponding to a given tensor v. Since a matrix in \(L_{{\mathbb {R}}}(V,W)\) has rank at most \(\min (\dim V, \dim W)\), it follows that the tensor rank of v is at most \(\min (\dim V, \dim W)\). In particular, taking \(V = M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\) and \(W = M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\), we see that our vector \(v \in M_n(\mathbb {C}) \otimes M_m(\mathbb {C})\) has tensor rank at most \(\min (n^2,m^2)\).

Let \(X = (X_1,\dots ,X_k)\) and \(Y = (Y_1,\dots ,Y_k)\). Then for \(\Psi \in L_{{\mathbb {R}}}(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}},M_m(\mathbb {C})_{{{\,\mathrm{sa}\,}}})\), we have

$$\begin{aligned} T(\Psi )(v) = \sum _{j=1}^k \langle \Psi (X_j),Y_j\rangle _{L^2(M_m(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}} = \langle \Psi (X),Y\rangle _{L^2(M_m(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^k} \end{aligned}$$

Thus, by our choice of v, the tuples X and Y satisfy the desired properties. \(\square \)

Corollary 5.8

If \(n \ge 11\) and \(d \in {\mathbb {N}}\), then there exist \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{n^2}\) such that for every optimal coupling \((\mathcal {A},X',Y')\) of \(\lambda _X\) and \(\lambda _Y\), \(\mathcal {A}\) must have dimension at least d. In particular, if d is sufficiently large, then

$$\begin{aligned} C(\lambda _X,\lambda _Y) > \sup _{U \in \mathcal {U}(M_n(\mathbb {C}))} \langle UXU^*,Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Proof

Let \({\text {FM}}_d(M_n(\mathbb {C}),M_n(\mathbb {C}))\) denote the set of \({\text {UCPT}}\) maps \(M_n(\mathbb {C}) \rightarrow M_n(\mathbb {C})\) that factorize through a tracial \(W^*\)-algebra \(\mathcal {A}= (A,\tau )\) of dimension at most d. As a consequence of the Artin-Wedderburn theorem, every such \(*\)-algebra A is a direct sum of at most d matrix algebras of size at most \(d^{1/2}\); see e.g. [25]. Moreover, the trace \(\tau _{\mathcal {A}}\) is a convex combination of the traces on each component. From these facts, it is not hard to see that \({\text {FM}}_d(M_n(\mathbb {C}),M_n(\mathbb {C}))\) is compact.

By Theorem 5.6, there exists \(\Phi \in \overline{{\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))}\) that does not factor through a finite-dimensional algebra, and hence \(\Phi \in \overline{{\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))} \setminus {\text {FM}}_d(M_n(\mathbb {C}),M_n(\mathbb {C}))\).

Also, we also remark that a completely positive map \(\Phi : M_n(\mathbb {C}) \rightarrow M_n(\mathbb {C})\) satisfies \(\Phi (A^*) = \Phi (A)^*\), and therefore it restricts to a real-linear transformation \(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}} \rightarrow M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}\), and \(\Phi \) is uniquely determined by its restriction to self-adjoint elements. Thus, we can Lemma 5.7 to conclude that there exists \(k \le n^2\) and \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{k}\) such that

$$\begin{aligned} \langle \Phi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^k} > \sup _{\Psi \in {\text {FM}}_{d}(M_n(\mathbb {C}),M_n(\mathbb {C}))} \langle \Psi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^k}. \end{aligned}$$

We can without loss of generality take \(k = n^2\) because we can always add additional zero entries to our tuples without changing the value of the inner product of \(\Psi (X)\) and Y. Hence, by the proof of Observation 5.5 any \(\Psi \in {\text {FM}}_{d}(M_n(\mathbb {C}),M_n(\mathbb {C}))\) cannot produce an optimal coupling. \(\square \)

5.3 Optimal couplings and the Connes embedding problem

The situation is even more wild than this. Based on the work of [42] on Tsirelson’s problem and the Connes embedding problem, as well as work of [33], we can conclude that for some n, there exist X, \(Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{n^2}\), such that a non-commutative optimal coupling of \(\lambda _X\) and \(\lambda _Y\) cannot even be approximated by couplings in finite-dimensional tracial \(*\)-algebras. We begin with some background on the Connes embedding problem, which includes first the definition of ultraproducts of tracial \(\mathrm {W}^*\)-algebras, a tool to turn approximate embeddings into literal embeddings: [12, Appendix A], [13, Sect. 2], [2, Sect. 5.4].

Let \(\beta {\mathbb {N}}\) denote the Stone-Čech compactification of the natural numbers; it is a compact space containing \({\mathbb {N}}\) as an open subset,Footnote 4 and satisfies the universal property that every function from \({\mathbb {N}}\) into a compact topological space K extends uniquely to a continuous function \(\beta {\mathbb {N}}\rightarrow K\). In particular, if \((x_n)_{n \in {\mathbb {N}}}\) is a bounded sequence of real or complex numbers, then \(\lim _{n \rightarrow \mathcal {U}} x_n\) exists for every \(\mathcal {U} \in \beta {\mathbb {N}}\). The Stone–Čech compactification \(\beta {\mathbb {N}}\) is characterized up to a canonical homeomorphism by its universal property. One construction of \(\beta {\mathbb {N}}\) is by way of ultrafilters, which is why we have used the letter \(\mathcal {U}\) for elements of \(\beta {\mathbb {N}}\). In this framework, the elements of \(\beta {\mathbb {N}}\setminus {\mathbb {N}}\) are known as non-principal ultrafilters and the limit \(\lim _{n \rightarrow \mathcal {U}} x_n\) is also called an ultralimit.

Ultraproducts of tracial von Neumann algebras are defined as follows. For \(n \in {\mathbb {N}}\), let \(\mathcal {A}_n = (A_n,\tau _n)\) be a sequence of tracial \(\mathrm {W}^*\)-algebras. Let \(\prod _{n \in {\mathbb {N}}} A_n\) be the set of sequences \((a_n)_{n \in {\mathbb {N}}}\) such that \(\sup _n \Vert a_n\Vert _{L^\infty (\mathcal {A}_n)} < \infty \), which is a \(*\)-algebra. Let

$$\begin{aligned} I_{\mathcal {U}} = \left\{ (a_n)_{n \in {\mathbb {N}}} \in \prod _{n \in {\mathbb {N}}} A_n: \lim _{n \rightarrow {\mathcal {U}}} \Vert a_n\Vert _{L^2(\mathcal {A}_n)} = 0 \right\} . \end{aligned}$$

Using the non-commutative Hölder’s inequality for \(L^2\) and \(L^\infty \), one can show that \(I_{\mathcal {U}}\) is a *-ideal in \(\prod _{n \in {\mathbb {N}}} A_n\), and therefore, \(\prod _{n \in {\mathbb {N}}} A_n / I_{\mathcal {U}}\) is a \(*\)-algebra. We denote by \([a_n]_{n \in {\mathbb {N}}}\) the equivalence class in \(\prod _{n \in {\mathbb {N}}} A_n / I_{\mathcal {U}}\) of a sequence \((a_n)_{n \in {\mathbb {N}}} \in \prod _{n \in {\mathbb {N}}} A_n\). Furthermore, we define a trace on \(\prod _{n \in {\mathbb {N}}} A_n / I_\mathcal {U}\) as follows. if \((a_n)_{n \in {\mathbb {N}}} \in \prod _{n \in {\mathbb {N}}} A_n\), then \((\tau _n(a_n))_{n \in {\mathbb {N}}}\) is a bounded sequence in \(\mathbb {C}\) and therefore \(\lim _{n \rightarrow \mathcal {U}} \tau _n(a_n)\) exists. Since \(|\tau _n(a_n)| \le \Vert a_n\Vert _{L^2(\mathcal {A}_n)}\), we have \(\lim _{n \rightarrow \mathcal {U}} \tau _n(a_n) = 0\) whenever \((a_n)_{n \in {\mathbb {N}}} \in I_\mathcal {U}\). Therefore, there is a well-defined map

$$\begin{aligned} \tau _{\mathcal {U}}: \prod _{n \in {\mathbb {N}}} A_n / I_{\mathcal {U}} \rightarrow \mathbb {C}\end{aligned}$$

given by \(\tau _{\mathcal {U}}([a_n]_{n \in {\mathbb {N}}}) = \lim _{n \rightarrow \mathcal {U}} \tau _n(a_n)\). It turns out the pair \((\prod _{n \in {\mathbb {N}}} A_n / I_{\mathcal {U}},\tau _{\mathcal {U}})\) is already a tracial \(\mathrm {W}^*\)-algebra; see [2, Proposition 5.4.1]. The proof is based on the fact that a tracial \(\mathrm {C}^*\)-algebra is a \(\mathrm {W}^*\)-algebra if and only if the operator-norm unit ball is complete in the \(L^2\) norm [2, Proposition 2.6.4]. See also [12, Appendix A].

We call the tracial \(\mathrm {W}^*\)-algebra \((\prod _{n \in {\mathbb {N}}} A_n / I_{\mathcal {U}},\tau _{\mathcal {U}})\) the ultraproduct of \((\mathcal {A}_n)_{n \in {\mathbb {N}}}\) with respect to \(\mathcal {U}\) and we denote it by

$$\begin{aligned} \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n := \left( \prod _{n \in {\mathbb {N}}} A_n / I_{\mathcal {U}},\tau _{\mathcal {U}} \right) . \end{aligned}$$

The inspiration for this notation is that ultraproduct is defined using a combination of Cartesian product and ultralimits (of the \(L^2\)-norm and the trace); in contrast to Cartesian products, the ultraproduct only cares about the asymptotic behavior of a sequence as \(n \rightarrow \mathcal {U}\).

Definition 5.9

A tracial \(\mathrm {W}^*\)-algebra \(\mathcal {A}\) is Connes-embeddable if there exist finite-dimensional tracial \(*\)-algebras \(\mathcal {A}_n\) for \(n \in {\mathbb {N}}\), an ultrafilter \(\mathcal {U} \in \beta {\mathbb {N}}\setminus {\mathbb {N}}\), and a tracial \(\mathrm {W}^*\)-embedding \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\). The Connes embedding problem is the question of whether every tracial \(\mathrm {W}^*\)-algebra with separable predual is Connes-embeddable.

Embeddings into ultraproducts are closely related to convergence of non-commutative laws in \(\Sigma _{m,R}\).

Lemma 5.10

Let \((\mathcal {A}_n)_{n \in {\mathbb {N}}}\) be a sequence of tracial \(\mathrm {W}^*\)-algebras and let \(\mathcal {A}\) be another tracial \(\mathrm {W}^*\)-algebra. Let \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) with \(\Vert X\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le R\) and suppose that X generates \(\mathcal {A}\) as a \(\mathrm {W}^*\)-algebra. Let \(X_n \in L^\infty (\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m\) with \(\Vert X_n\Vert _{L^\infty (\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m} \le R\). Then the following are equivalent:

  1. (1)

    \(\lim _{n \rightarrow \mathcal {U}} \lambda _{X_n} = \lambda _X\) with respect to the weak-\(*\) topology on \(\Sigma _{m,R}\).

  2. (2)

    There exists a tracial \(\mathrm {W}^*\)-embedding \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) such that \(\phi (X) = [X_n]_{n \in {\mathbb {N}}}\).

Proof

(1) \(\implies \) (2). Let \(Y = [X_n]_{n\in {\mathbb {N}}} \in L^\infty (\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m\). Let \(\tau _\mathcal {U}\) be the trace on the ultraproduct. Then for every \(p \in \mathbb {C}\langle x_1,\dots ,x_d\rangle \), we have

$$\begin{aligned} \lambda _Y(p) = \tau _\mathcal {U}(p(Y)) = \lim _{n \rightarrow \mathcal {U}} \tau _n(p(X_n)) = \lim _{n \rightarrow \mathcal {U}} \lambda _{X_n}(p) = \lambda _X(p). \end{aligned}$$

Because \(\lambda _Y = \lambda _X\), Lemma 2.33 implies that there is a \(\mathrm {W}^*\)-embedding \(\mathcal {A}= \mathrm {W}^*(X) \rightarrow \mathrm {W}^*(Y) \hookrightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\).

(2) \(\implies \) (1). Let \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) as above be a tracial \(\mathrm {W}^*\)-embedding with \(\phi (X) = [X_n]_{n \in {\mathbb {N}}}\). Using the fact that \(\phi \) preserves addition and multiplication as well as the definition of the trace \(\tau _\mathcal {U}\) on the ultraproduct,

$$\begin{aligned} \lambda _X(p) = \tau (p(X)) = \tau _\mathcal {U}(\phi (p(X))) = \tau _\mathcal {U}(p(\phi (X))) = \lim _{n \rightarrow \mathcal {U}} \tau _n(p(X_n)) = \lim _{n \rightarrow \mathcal {U}} \lambda _{X_n}(p). \end{aligned}$$

Therefore, \(\lim _{n \rightarrow \mathcal {U}} \lambda _{X_n} = \lambda _X\) in the weak-\(*\) topology, as desired. \(\square \)

Definition 5.11

Let \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\) be the set of non-commutative laws \(\mu \) in \(\Sigma _{m,R}\) such that \(\mu = \lambda _X\) for some \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) where \(\mathcal {A}\) is a finite-dimensional tracial \(*\)-algebra.

The following statement is almost a corollary of Lemma 5.10.

Lemma 5.12

Let \(\mathcal {A}\) be a tracial \(\mathrm {W}^*\)-algebra generated by \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) with \(\Vert X\Vert _{L^\infty (\mathcal {A})^m} \le R\). Then \(\mathcal {A}\) is Connes-embeddable if and only if \(\lambda _X\) is in the weak-\(*\) closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\) in \(\Sigma _{m,R}\).

Proof

If \(\lambda _X\) is in the closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\), then Lemma 5.10 implies that \(\mathcal {A}\) is Connes-embeddable. Conversely, suppose that \(\mathcal {A}\) is Connes-embeddable and \(\iota : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) is an embedding into some ultraproduct of finite-dimensional tracial \(*\)-algebras. Let \(X_n = (X_n^{(1)},\dots ,X_n^{(m)}) \in L^\infty (\mathcal {A}_n)^m\) such that \([X_n]_{n \in {\mathbb {N}}} = \iota (X)\), and let us also write \(X = (X^{(1)},\dots ,X^{(m)})\). By replacing \(X_{n}^{(j)}\) with \((X_n^{(j)} + (X_n^{(j)})^*)/2\), we can assume without loss of generality that \(X_n^{(j)}\) is self-adjoint. By assumption \(M := \sup _{n \in {\mathbb {N}}} \Vert X_n\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} < \infty \).

Although M may be larger than R, we can fix this issue through a standard argument with functional calculus. Let \(f: [-M,M] \rightarrow [-R,R]\) be given by \(f(t) = {{\,\mathrm{sgn}\,}}(t) \min (|t|,R)\). By the Weierstrass approximation theorem, there exists a sequence of polynomials \((f_k)_{k \in {\mathbb {N}}}\) converging uniformly to f on \([-M,M]\). Note that \(f_k(\iota (X^{(j)})) = [f_k(X_n^{(j)})]_{n \rightarrow {\mathbb {N}}}\). By the spectral mapping theorem, for each j, k, and n,

$$\begin{aligned} \Vert f_k(X_n^{(j)}) - f(X_n^{(j)})\Vert _{L^\infty (\mathcal {A}_n)} \le \sup _{t \in [-M,M]} |f_k(t) - f(t)|, \end{aligned}$$

and the same estimate holds for \(f_k(\iota (X^{(j)})) - f(\iota (X^{(j)}))\). Taking \(k \rightarrow \infty \), we obtain \(f(\iota (X^{(j)})) = [f(X_n^{(j)})]_{n \in {\mathbb {N}}}\) for each j. Since \(\Vert X^{(j)}\Vert _{L^\infty (\mathcal {A})} \le R\), we have \(f(\iota (X^{(j)})) = \iota (X^{(j)})\). Let \(Y_n = (f(X_n^{(1)}),\dots ,f(X_n^{(m)}))\). Then \(\Vert Y_n\Vert _{L^\infty (\mathcal {A}_n)^m} \le R\) and \(\iota (X) = [Y_n]_{n \in {\mathbb {N}}}\), hence \(\lambda _X\) is in the closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\) by Lemma 5.10. \(\square \)

Decades of work found many equivalent problems in operator algebras and quantum information theory; for a survey, see e.g. [13, 59]. In particular, building on the established connections with quantum information theory, Haagerup and Musat showed the following result.

Theorem 5.13

(Haagerup-Musat [33, Theorem 3.6, 3.7]) A factorizable map \(\Phi : M_n(\mathbb {C}) \rightarrow M_n(\mathbb {C})\) admits a factorization through a Connes-embeddable algebra if and only if it is in the closure of \({\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))\). Moreover, the Connes embedding problem has a positive answer if and only if

$$\begin{aligned} {\text {FM}}(M_n(\mathbb {C}),M_n(\mathbb {C})) = \overline{{\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))} \text { for all } n \in {\mathbb {N}}. \end{aligned}$$

A negative answer to the Connes embedding problem was announced in [42]. This implies the following corollary.

Corollary 5.14

There exist \(n \in {\mathbb {N}}\) and \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{n^2}\) such that

$$\begin{aligned} C(\lambda _X,\lambda _Y)= & {} \sup _{\Phi \in {\text {FM}}(M_n(\mathbb {C}),M_n(\mathbb {C}))} \langle \Phi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^{n^2}} \\> & {} \sup _{\Phi \in {\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))} \langle \Phi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^{n^2}}. \end{aligned}$$

Moreover, a non-commutative optimal coupling of \(\lambda _X\) and \(\lambda _Y\) does not exist in any Connes-embeddable tracial \(\mathrm {W}^*\)-algebra.

Proof

Let \(K = \overline{{\text {FM}}_{{\text {fin}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))}\), which is compact and convex. Because the Connes embedding problem has a negative answer [42], there exists \(\Phi \in {\text {FM}}(M_n(\mathbb {C}),M_n(\mathbb {C})) \setminus K\). By Lemma 5.7, there exist X, \(Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{n^2}\) such that

$$\begin{aligned} \langle \Phi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^{n^2}} > \sup _{\Psi \in K} \langle \Psi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^{n^2}}. \end{aligned}$$

Hence, by Theorem 5.13, if \(\Psi \) factors through a Connes-embeddable algebra, then \(\langle \Psi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^{n^2}}\) cannot be optimal. Thus, by the proof of Observation 5.5, a coupling of \(\lambda _X\) and \(\lambda _Y\) in a Connes-embeddable algebra cannot be optimal. \(\square \)

Remark 5.15

Although Corollary 5.14 is much stronger than Corollary 5.8 as stated, they are based on different types of phenomena. Corollary 5.14 relies on the existence of factorizable maps \(M_n(\mathbb {C}) \rightarrow M_n(\mathbb {C})\) that cannot be approximated by elements of \({\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))\) (of which there are not yet explicit examples known). Meanwhile, Corollary 5.8 relies on the existence of factorizable maps that are approximated by elements of \({\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))\) but are not in \({\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))\) (of which [52] gave explicit examples). Thus, the proof of Corollary 5.8 shows that for \(n \ge 11\) and \(d \in {\mathbb {N}}\), there exist tuples X and Y from \(M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{n^2}\) such that

$$\begin{aligned} \sup _{\Phi \in \overline{{\text {FM}}_{{{\,\mathrm{fin}\,}}}(M_n(\mathbb {C}),M_n(\mathbb {C}))}} \langle \Phi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^{n^2}} > \sup _{\Psi \in {\text {FM}}_d(M_n(\mathbb {C}),M_n(\mathbb {C}))} \langle \Psi (X),Y\rangle _{L^2(M_n(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^{n^2}}. \end{aligned}$$

Hence, a coupling on an algebra \(\mathcal {A}\) of dimension at most d cannot even be optimal among couplings in Connes-embeddable algebras.

5.4 The Wasserstein and weak-\(*\) topologies

At the beginning, we equipped \(\Sigma _{m,R}\) with the weak-\(*\) topology as a subset of the algebraic dual of \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \). Meanwhile, because \(d_W^{(2)}\) defines a metric on \(\Sigma _{m,R}\), it induces another topology, which we will call the Wassertein topology. We will show that the Wasserstein topology is strictly stronger than the weak-\(*\) topology. This is to be contrasted with classical probability theory where the weak-\(*\) topology on the space of probability measures on \([-R,R]^m\) is metrized by the \(L^2\)-Wasserstein distance.

Our first step is to prove an ultraproduct characterization of Wasserstein convergence analogous to Lemma 5.10.

Lemma 5.16

Let \((\mathcal {A}_n)_{n \in {\mathbb {N}}}\) be a sequence of tracial \(\mathrm {W}^*\)-algebras and let \(\mathcal {A}\) be another tracial \(\mathrm {W}^*\)-algebra. Let \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) with \(\Vert X\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le R\) and suppose that X generates \(\mathcal {A}\). Let \(X_n \in L^\infty (\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m\) with \(\Vert X_n\Vert _{L^\infty (\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m} \le R\). Then the following are equivalent:

  1. (1)

    \(\lim _{n \rightarrow \mathcal {U}} \lambda _{X_n} = \lambda _X\) with respect to Wasserstein distance.

  2. (2)

    There exists a tracial \(\mathrm {W}^*\)-embedding \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) and a factorizable map \(\Phi _n \in {\text {FM}}(\mathcal {A},\mathcal {A}_n)\) (for each \(n \in {\mathbb {N}}\)) such that

    $$\begin{aligned} \phi (X) = [X_n]_{n \in {\mathbb {N}}}, \quad \phi (Z) = [\Phi _n(Z)]_{n \in {\mathbb {N}}} \text { for all } Z \in L^\infty (\mathcal {A}). \end{aligned}$$

Proof

(1) \(\implies \) (2). The limit \(\lim _{n \rightarrow \mathcal {U}} \lambda _{X_n} = \lambda _X\) in Wasserstein distance means that there exists tracial \(\mathrm {W}^*\) algebras \(\mathcal {B}_n\) and tracial \(\mathrm {W}^*\)-embeddings \(\pi _n: \mathrm {W}^*(X_n) \rightarrow \mathcal {B}_n\) and \(\rho _n: \mathcal {A}\rightarrow \mathcal {B}_n\) such that \(\Vert \pi _n(X_n) - \rho _n(X)\Vert _{L^2(\mathcal {B}_n)_{{{\,\mathrm{sa}\,}}}^m} \rightarrow 0\) as \(n \rightarrow \mathcal {U}\). Let \(\mathcal {C}_n\) be the free product of \(\mathcal {A}_n\) and \(\mathcal {B}_n\) with amalgamation over \(\mathrm {W}^*(X_n)\), and let \({\tilde{\pi }}_n: \mathcal {A}_n \rightarrow \mathcal {C}_n\) and \({\tilde{\rho }}_n: \mathcal {A}\rightarrow \mathcal {C}_n\) be the corresponding tracial \(\mathrm {W}^*\)-embeddings. It is straightforward to check that these induce tracial \(\mathrm {W}^*\)-embeddings

$$\begin{aligned} {\tilde{\pi }}: \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n \rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {C}_n, \qquad {\tilde{\rho }}: \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {C}_n \end{aligned}$$

such that \({\tilde{\pi }}(\phi (X)) = {\tilde{\pi }}([X_n]_{n\in {\mathbb {N}}}) = \rho (X)\). Since \({\tilde{\pi }} \circ \phi \) and \({\tilde{\rho }}\) are tracial \(\mathrm {W}^*\)-embeddings, we have \({\tilde{\pi }}(\phi (Z)) = {\tilde{\rho }}(Z)\) for all \(Z \in L^\infty (\mathcal {A})\) (because for instance every element of \(L^\infty (\mathcal {A})\) can be approximated in \(L^2(\mathcal {A})\) by non-commutative polynomials of X).

Let \({\tilde{\pi }}_n^*\) and \({\tilde{\pi }}^*\) be the trace-preserving conditional expectations adjoint to \({\tilde{\pi }}_n\) and \({\tilde{\pi }}\). We claim that for \(Y = [Y_n]_{n \in {\mathbb {N}}} \in \prod _{n \rightarrow \mathcal {U}} \mathcal {C}_n\), we have

$$\begin{aligned} {\tilde{\pi }}^*(Y) = [{\tilde{\pi }}_n^*(Y_n)]_{n \in {\mathbb {N}}}. \end{aligned}$$

Let \({\tilde{\mathcal {A}}} = \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) and \({\tilde{\mathcal {C}}} = \prod _{n \rightarrow \mathcal {U}} \mathcal {C}_n\). Note that \([{\tilde{\pi }}_n^*(Y_n)]_{n \in {\mathbb {N}}}\) is in the \(\mathrm {W}^*\)-subalgebra \({\tilde{\mathcal {A}}} = \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\). Moreover, for every \(Z = [Z_n] \in \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\), we have

$$\begin{aligned} \langle Y,{\tilde{\pi }}(Z)\rangle _{L^2({\tilde{\mathcal {C}}})}= & {} \lim _{n \rightarrow \mathcal {U}} \langle Y_n, {\tilde{\pi }}_n(Z_n)\rangle _{L^2(\mathcal {C}_n)} = \lim _{n \rightarrow \mathcal {U}} \langle {\tilde{\pi }}_n^*(Y_n),Z_n\rangle _{L^2(\mathcal {A}_n)}\\= & {} \langle [{\tilde{\pi }}_n^*(Y_n)]_{n \in {\mathbb {N}}}, Z\rangle _{L^2({\tilde{\mathcal {A}}})}. \end{aligned}$$

Thus, \({\tilde{\pi }}^*(Y) = [{\tilde{\pi }}_n^*(Y_n)]_{n \in {\mathbb {N}}}\), as desired. As noted above, for every \(Z \in \mathcal {A}\), we have \({\tilde{\pi }}(\phi (Z)) = {\tilde{\rho }}(Z)\) and hence \(\phi (Z) = {\tilde{\pi }}^* {\tilde{\pi }} \phi (Z) = {\tilde{\pi }}^* {\tilde{\rho }}(Z)\). This implies that

$$\begin{aligned}{}[{\tilde{\pi }}_n^* {\tilde{\rho }}_n(Z)]_{n \in {\mathbb {N}}} = {\tilde{\pi }}^* {\tilde{\rho }}(Z) = \phi (Z). \end{aligned}$$

Therefore, \(\Phi _n := {\tilde{\pi }}_n^* {\tilde{\rho }}_n\) is a factorizable map fulfilling condition (2).

(2) \(\implies \) (1). Let \(\phi \) and \(\Phi _n\) be as in (2). Then \([X_n]_{n \in {\mathbb {N}}} = \phi (X) = [\Phi _n(X)]_{n \in {\mathbb {N}}}\) belongs to \(\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\). Letting \(E_n\) be the trace-preserving conditional expectation \(\mathcal {A}_n \rightarrow \mathrm {W}^*(X_n)\), the map \(E_n \circ \Phi _n: \mathrm {W}^*(X) \rightarrow \mathrm {W}^*(X_n)\) is factorizable by Proposition 5.3 (4), hence by Observation 5.5,

$$\begin{aligned} C(\lambda _{X_n},X) \ge \langle E_n \circ \Phi _n(X),X_n\rangle _{L^2(\mathrm {W}^*(X_n))_{{{\,\mathrm{sa}\,}}}^m} = \langle \Phi _n(X),X_n\rangle _{L^2(\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Therefore,

$$\begin{aligned} \lim _{n \rightarrow \mathcal {U}} d_W^{(2)}(\lambda _{X_n},\lambda _X)^2&= \lim _{n \rightarrow \mathcal {U}} \left( \Vert X_n\Vert _{L^2(\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m}^2 + \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2C(\lambda _{X_n},\lambda _X) \right) \\&\le \lim _{n \rightarrow \mathcal {U}} \left( \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 + \Vert X_n\Vert _{L^2(\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m}^2 - 2\langle \Phi _n(X),X_n\rangle _{L^2(\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m} \right) \\&= \Vert \phi (X)\Vert _{L^2(\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m}^2 + \Vert \phi (X)\Vert _{L^2(\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m}^2 \\&\qquad - 2 \langle \phi (X),\phi (X)\rangle _{L^2(\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m}^2 \\&= 0. \end{aligned}$$

Hence, \(\lim _{n \rightarrow \mathcal {U}} \lambda _{X_n} = \lambda _X\) in Wasserstein distance. \(\square \)

The next corollary was observed in [11, Proposition 1.4(b)], and can be proved in several ways (see for instance [35, Lemma 2.10, Corollary 2.11] for another method), but we will deduce it as a consequence of the ultraproduct characterizations for weak-\(*\) and Wasserstein convergence.

Corollary 5.17

The Wasserstein topology on \(\Sigma _{m,R}\) refines the weak-\(*\) topology.

Proof

Fix \(\mathcal {U} \in \beta {\mathbb {N}}\setminus {\mathbb {N}}\). Using the Urysohn subsequence principle, it suffices to show that if \(\mu _n, \mu \in \Sigma _{m,R}\) and \(\lim _{n \rightarrow \mathcal {U}} \mu _n = \mu \) in the Wasserstein distance, then \(\lim _{n \rightarrow \mathcal {U}} \mu _n \rightarrow \mu \) in the weak-\(*\) topology. Letting \((\mathcal {A}_n,X_n)\) and \((\mathcal {A},X)\) be the GNS realizations of \(\mu _n\) and \(\mu \), Lemma 5.16 implies that there is a tracial \(\mathrm {W}^*\)-embedding \(\mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) with \(\phi (X) = [X_n]_{n \in {\mathbb {N}}}\). By Lemma 5.10, this implies that \(\lim _{n \rightarrow \mathcal {U}} \mu _n = \mu \) in the weak-\(*\) topology. \(\square \)

The next observation is closely related.

Lemma 5.18

The metric \(d_W^{(2)}\) is weak-\(*\) lower semi-continuous on \(\Sigma _{m,R} \times \Sigma _{m,R}\).

Proof

Fix \(\mathcal {U} \in \beta {\mathbb {N}}\setminus {\mathbb {N}}\). Again using the Urysohn subsequence principle, it suffices to show that for every pair of sequences \((\mu _n)_{n \in {\mathbb {N}}}\) and \((\nu _n)_{n \in {\mathbb {N}}}\) in \(\Sigma _{m,R}\), letting \(\mu = \lim _{n \rightarrow \mathcal {U}} \mu _n\) and \(\nu = \lim _{n \rightarrow \mathcal {U}} \nu _n\), we have \(d_W^{(2)}(\mu ,\nu ) \le \lim _{n \rightarrow \mathcal {U}} d_W^{(2)}(\mu _n,\nu _n)\). Let \((\mathcal {A}_n,X_n,Y_n)\) be an optimal couplings of \(\mu _n\) and \(\nu _n\). Let \((\mathcal {B},X)\) and \((\mathcal {C},Y)\) be the GNS realizations of \(\mu \) and \(\nu \). By Lemma 5.10, there exist tracial \(\mathrm {W}^*\)-embeddings \(\phi : \mathcal {B}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) and \(\psi : \mathcal {C}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) such that \(\phi (X) = [X_n]_{n \in {\mathbb {N}}}\) and \(\psi (Y) = [Y_n]_{n \in {\mathbb {N}}}\). Then

$$\begin{aligned} d_W^{(2)}(\mu ,\nu )\le & {} \Vert \phi (X) - \psi (Y)\Vert _{L^2(\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m} \\= & {} \lim _{n \rightarrow \mathcal {U}} \Vert X_n - Y_n\Vert _{L^2(\mathcal {A}_n)} = \lim _{n \rightarrow \mathcal {U}} d_W^{(2)}(\mu _n,\nu _n). \end{aligned}$$

\(\square \)

We will use Lemmas 5.10 and 5.12 to characterize when the Wasserstein and weak-\(*\) topologies agree at a point in \(\Sigma _{m,R}\) in terms of a certain stability property. To fix terminology, if S is a set and \(\mathscr {T}_1\) and \(\mathscr {T}_2\) are two topologies on S, we say that \(\mathscr {T}_1\) and \(\mathscr {T}_2\) agree at \(x \in S\) if every \(\mathscr {T}_1\)-neighborhood of x is contained in a \(\mathscr {T}_2\)-neighborhood of x and vice versa. If the topologies are metrizable, this is equivalent to saying that a sequence \(x_n\) converges to x with respect to \(\mathscr {T}_1\) if and only if it converges to x with respect \(\mathscr {T}_2\). Furthermore, if \(\mathcal {U}\) is a given non-principal ultrafilter on \({\mathbb {N}}\), then agreement of the two topologies at x is equivalent to the claim that \(\lim _{n \rightarrow \mathcal {U}} x_n = x\) with respect to \(\mathscr {T}_1\) if and only if \(\lim _{n \rightarrow \mathcal {U}} x_n = x\) with respect to \(\mathscr {T}_2\).

Definition 5.19

(\({\text {FM}}\)-lifting) Let \(\mathcal {A}\) be a tracial \(\mathrm {W}^*\)-algebra with separable predual, and let \(\mathcal {U}\) be a free ultrafilter on \({\mathbb {N}}\). If \(\mathcal {A}_n\) is a sequence of tracial \(\mathrm {W}^*\)-algebras and \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) is a tracial \(\mathrm {W}^*\)-embedding, then an \({\text {FM}}\)-lifing of \(\phi \) is a sequence \((\Phi _n)_{n \in {\mathbb {N}}}\), where \(\Phi _n \in {\text {FM}}(\mathcal {A},\mathcal {A}_n)\), such that \(\phi (Z) = [\Phi _n(Z)]_{n \in {\mathbb {N}}}\) for all \(Z \in L^\infty (\mathcal {A})\).

Note that the sequence \(\Phi _n\) in Lemma 5.16 (2) is an \({\text {FM}}\)-lifting of \(\phi \). In other words, Lemma 5.16 describes convergence in Wasserstein distance in terms of ultraproduct embeddings that have \({\text {FM}}\)-liftings.

Definition 5.20

(\({\text {FM}}\)-stability) We say that \(\mathcal {A}\) is \({\text {FM}}\)-stable if every tracial \(\mathrm {W}^*\)-embedding \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) into the ultraproduct of any sequence of tracial \(\mathrm {W}^*\)-algebras \(\mathcal {A}_n\) has an \({\text {FM}}\)-lifting.

Our notion of \({\text {FM}}\)-stability is analogous and closely related to the notions of tracial stability and \({\text {UCP}}\)-stability studied in [5, 34]. Analogously to [5, Remark 2.2], the definition of \({\text {FM}}\)-stability can be restated as an approximation property without reference to ultraproducts. This implies in particular that the definition is independent of the choice of non-principal ultrafilter \(\mathcal {U}\) (hence it amounts to the same thing whether require the lifting condition for a particular non-principal ultrafilter or for all non-principal ultrafilters).

Proposition 5.21

Let \(\mu \in \Sigma _{m,R}\) and let \((\mathcal {A},X)\) be the GNS realization of \(\mu \) as in Proposition 2.31. Then the following are equivalent:

  1. (1)

    The weak-\(*\) and Wasserstein topologies on \(\Sigma _{m,R}\) agree at \(\mu \).

  2. (2)

    \(\mathcal {A}\) is \({\text {FM}}\)-stable.

Proof

(1) \(\implies \) (2). Let \(\mathcal {U} \in \beta {\mathbb {N}}\setminus {\mathbb {N}}\). Assume that the weak-\(*\) and Wasserstein topologies agree at \(\mu \). Let \((\mathcal {A}_n)_{n \in {\mathbb {N}}}\) be a sequence of tracial \(\mathrm {W}^*\)-algebras, and let \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) be a tracial \(\mathrm {W}^*\)-embedding. Express \(\phi (X)\) as \([X_n]_{n \in {\mathbb {N}}}\) where \(X_n \in L^2(\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m\) and \(\sup _n \Vert X_n\Vert _{L^\infty (\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m} < \infty \). Arguing with functional calculus as in Lemma 5.12, we can arrange that \(\Vert X_n\Vert _{L^\infty (\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m} \le R\). By Lemma 5.10, we have \(\lambda _{X_n} \rightarrow \lambda _X\) in the weak-\(*\) topology on \(\Sigma _{m,R}\). Hence, by hypothesis \(\lambda _{X_n} \rightarrow \lambda _X\) in the Wasserstein distance as \(n \rightarrow \mathcal {U}\). By Lemma 5.16, this implies that \(\phi \) has an \({\text {FM}}\)-lifting.

(2) \(\implies \) (1). Conversely, suppose that \(\mathcal {A}\) is \({\text {FM}}\)-stable. To show that the weak-\(*\) and Wasserstein topologies on \(\Sigma _{m,R}\) agree at \(\mu \), using the Urysohn subsequence principle, it suffices to show that if \((\mu _n)_{n \in {\mathbb {N}}}\) is a sequence such that \(\mu _n \rightarrow \mu \) weak-\(*\) as \(n \rightarrow \mathcal {U}\), then \(d_W^{(2)}(\mu _n,\mu ) \rightarrow 0\) as \(n \rightarrow \mathcal {U}\). Let \((\mathcal {A}_n,X_n)\) be the GNS-realization of \(\mu _n\). By Lemma 5.10, the tuple \([X_n]_{n \in {\mathbb {N}}}\) in \(\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) has the same law as X, and therefore, there exists a tracial \(\mathrm {W}^*\)-embedding \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) with \(\phi (X) = [X_n]_{n \in {\mathbb {N}}}\). By \({\text {FM}}\)-stability of \(\mathcal {A}\), there exist factorizable completely positive maps \(\Phi _n: \mathcal {A}\rightarrow \mathcal {A}_n\) such that \(\phi (Z) = [\Phi _n(Z)]_{n \in {\mathbb {N}}}\) for all \(Z \in L^\infty (\mathcal {A})\). Hence, by Lemma 5.16, \(\lim _{n \rightarrow \mathcal {U}} \mu _n = \mu \) in Wasserstein distance. \(\square \)

Next, we will show using the work of Connes [17] that if the weak-\(*\) and Wasserstein topologies agree at \(\mu \) and the corresponding tracial \(\mathrm {W}^*\)-algebra \(\mathcal {A}\) is Connes-embeddable, then in fact \(\mathcal {A}\) is approximately finite-dimensional. We recall the following theorem of Connes [17] that shows that approximate finite-dimensionality is equivalent to semi-discreteness for tracial \(\mathrm {W}^*\)-algebras (and these are also equivalent, famously, to the two other conditions of injectivity and amenability); related proofs can also be found in [61, 70, Sect. XIV], [12, Sect. 6.2, 6.3, 9.3], [2, Sect. 11].

Theorem 5.22

(Connes [17]) Let \(\mathcal {A}= (A,\tau )\) be a tracial \(\mathrm {W}^*\)-algebra with separable predual. The following are equivalent:

  1. (1)

    \(\mathcal {A}\) is approximately finite-dimensional (AFD), that is, there exists a sequence \((A_k)_{k\in {\mathbb {N}}}\) of finite-dimensional subalgebras with \(A_k \subseteq A_{k+1}\) such that \(\bigcup _{k \in {\mathbb {N}}} A_k\) is dense in A with respect to \(\Vert \cdot \Vert _{L^2(\mathcal {A})}\).

  2. (2)

    \(\mathcal {A}\) is semi-discrete, that is, there exists nets \((\Phi _\alpha )_{\alpha \in I}\) and \((\Psi _\alpha )_{\alpha \in I}\) of completely positive maps \(\Phi _\alpha : \mathcal {A}\rightarrow M_{n(\alpha )}(\mathbb {C})\) and \(\Psi _{\alpha }: M_{n(\alpha )}(\mathbb {C}) \rightarrow \mathcal {A}\) such that \(\Psi _\alpha \circ \Phi _\alpha (Z) \rightarrow Z\) in the weak-\(*\) topology for every \(Z \in L^\infty (\mathcal {A})\).

We recall a few more results about AFD algebras, which are well-known in operator algebras. We recall that a \(\mathrm {II}_1\)-factor is an infinite-dimensional tracial von Neumann algebra with trivial center.

Lemma 5.23

 

  1. (1)

    Let \(\mathcal {A}\) be an AFD tracial \(\mathrm {W}^*\)-algebra, let \((\mathcal {B}_n)_{n \in {\mathbb {N}}}\) be \(\mathrm {II}_1\)-factors, and let \(\mathcal {U}\) be a free ultrafilter on \({\mathbb {N}}\). If \(\phi \) and \(\psi \) are two embeddings of \(\mathcal {A}\) into \(\prod _{n \rightarrow \mathcal {U}} \mathcal {B}_n\), then there exists a unitary \(U \in \prod _{n \rightarrow \mathcal {U}} \mathcal {B}_n\) such that \(U \phi (Z) U^* = \psi (Z)\) for \(Z \in L^\infty (\mathcal {A})\). See [5, 45].

  2. (2)

    If \((\mathcal {B}_n)_{n \in {\mathbb {N}}}\) are \(\mathrm {II}_1\)-factors and U is a unitary in \(\prod _{n \rightarrow \mathcal {U}} \mathcal {B}_n\), then there exist unitaries \(U_n \in L^\infty (\mathcal {B}_n)\) such that \(U = [U_n]_{n \in {\mathbb {N}}}\).Footnote 5

Corollary 5.24

Let \(\mathcal {A}\) be an AFD tracial \(\mathrm {W}^*\)-algebra. Then \(\mathcal {A}\) is \({\text {FM}}\)-stable.

Proof

If \(\mathcal {A}= \mathbb {C}\), then the conclusion is immediate, so assume that \(\mathcal {A}\ne \mathbb {C}\). Let \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) be a tracial \(\mathrm {W}^*\)-embedding. Let \(\mathcal {B}\) be the tracial free product \(\mathcal {A}* \mathcal {A}_n * L^\infty [0,1]\) (where \(L^\infty [0,1]\) has the trace coming from Lebesgue measure). Then \(\mathcal {B}\) is a \(\mathrm {II}_1\) factor by [71, Theorem 3.7] since \(\mathcal {A}\ne \mathbb {C}\) and \(L^\infty [0,1]\) is diffuse. For each, n, there is a tracial \(\mathrm {W}^*\)-embedding \(\iota _n: \mathcal {A}_n \rightarrow \mathcal {B}_n\). Let \(\iota \) be the induced map

$$\begin{aligned} \iota : \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n \rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {B}_n. \end{aligned}$$

By construct, there also exists a tracial \(\mathrm {W}^*\)-embedding \(\psi _n: \mathcal {A}\rightarrow \mathcal {B}_n\). This sequence produces a tracial \(\mathrm {W}^*\)-embedding \(\psi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {B}_n\). By Lemma 5.23, there exists a unitary \(U_n \in L^\infty (\mathcal {B}_n)\) such that, letting \(U = [U_n]_{n \in {\mathbb {N}}}\), we have \(U \iota \circ \phi (Z) U^* = \psi (Z)\) for \(Z \in L^\infty (\mathcal {A})\).

Let \(\Phi _n: \mathcal {A}\rightarrow \mathcal {A}_n\) be given by \(\Phi _n(Z) = \iota _n^*[U_n^* \psi _n(Z) U_n]\). As observed in the proof of Proposition 5.21, ultraproducts respect conditional expectations and therefore for \(Z \in \mathcal {A}\), we have

$$\begin{aligned}{}[\Phi _n(Z)]_{n \in {\mathbb {N}}}= & {} [\iota _n^*[U_n^* \psi _n(Z) U_n]]_{n \in {\mathbb {N}}} = \iota ^* [U_n^* \psi _n(Z) U_n]_{n \in {\mathbb {N}}}\\= & {} \iota ^*(U^* \psi (Z) U) = \iota ^* \iota \phi (Z) = \phi (Z). \end{aligned}$$

Thus, \(\Phi _n\) is the desired lifting of \(\phi \) to a sequence of factorizable maps. \(\square \)

Remark 5.25

In fact, [5, Theorem 2.6] implies the converse of Corollary 5.24: If \(\mathcal {A}\) is Connes-embeddable and \({\text {FM}}\)-stable, then \(\mathcal {A}\) is AFD. The same statement is implied by the next proposition provided that \(\mathcal {A}\) is finitely generated.

Proposition 5.26

Let \(\mu \) be in the weak-\(*\) closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\), and let \((\mathcal {A},X)\) be the GNS realization of \(\mu \). The following are equivalent:

  1. (1)

    \(\mathcal {A}\) is approximately finite-dimensional.

  2. (2)

    \(\mathcal {A}\) is \({\text {FM}}\)-stable.

  3. (3)

    The weak-\(*\) and Wasserstein topologies agree at \(\mu \).

  4. (4)

    \(\mu \) is in the Wasserstein closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\).

Proof

(1) \(\implies \) (2) by Corollary 5.24.

(2) \(\implies \) (3) by Proposition 5.21.

(3) \(\implies \) (4) Since two topologies agree at \(\mu \), and \(\mu \) is in the weak-\(*\) closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\), it follows that \(\mu \) is in the Wasserstein closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\).

(4) \(\implies \) (1). Assume that (4) holds and we will show that \(\mathcal {A}\) is semi-discrete, hence approximately finite-dimensional by Connes’ theorem. Fix a free ultrafilter \(\mathcal {U}\) on \({\mathbb {N}}\). Let \(\mu _n\) be a sequence in \(\Sigma _{n,R}^{{{\,\mathrm{fin}\,}}}\) such that \(\lim _{n \rightarrow \mathcal {U}} d_W^{(2)}(\mu _n,\mu ) = 0\). Let \((\mathcal {A}_n,X_n,Y_n)\) be an optimal coupling of \(\mu \) and \(\mu _n\). Since \(\mathrm {W}^*(X_n) \cong \mathrm {W}^*(X) = \mathcal {A}\), we can assume without loss of generality that \(\mathcal {A}\subseteq \mathcal {A}_n\) and \(X_n = X\). Let \(\Phi _n: \mathcal {A}= \mathrm {W}^*(X) \rightarrow \mathrm {W}^*(Y_n)\) be the associated factorizable map. Since \(\mathrm {W}^*(Y_n)\) is finite-dimensional, if we can show that \(\Phi _n^* \Phi _n(Z) \rightarrow Z\) in \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) as \(n \rightarrow \mathcal {U}\) for every \(Z \in \mathcal {A}\), that will imply semi-discreteness of \(\mathcal {A}\) and finish the argument.

The convergence of \(\Phi _n^* \Phi _n(Z)\) follows by a similar argument to Proposition 5.21. Let \(\mathcal {B}_n\) be the free product of two copies of \(\mathcal {A}_n\) with amalgamation over \(\mathrm {W}^*(Y_n)\) and let \(\pi _n\) and \(\rho _n\) be the two inclusions of \(\mathcal {A}\) into the first and second copies of \(\mathcal {A}_n\). Then \(\Phi _n^* \Phi _n = \pi _n^* \rho _n\). Now \(\pi _n\) and \(\rho _n\) induce maps

$$\begin{aligned} \pi , \rho : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {B}_n. \end{aligned}$$

Moreover, \(\Vert \pi _n(X) - \rho _n(X)\Vert _{L^2(\mathcal {B}_n)_{{{\,\mathrm{sa}\,}}}^m} \le 2 \Vert X - Y_n\Vert _{L^2(\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m} \rightarrow 0\), and therefore, \(\pi (X) = \rho (X)\), so \(\pi = \rho \) on all of \(L^\infty (\mathcal {A})\). This implies that \(\pi ^* \rho (Z) = Z\) for \(Z \in L^\infty (\mathcal {A})\), hence \(\lim _{n \rightarrow \mathcal {U}} \Vert \pi _n^*\rho _n(Z) - Z\Vert _{L^2(\mathcal {A})} = 0\). \(\square \)

Corollary 5.27

For \(m > 1\) and \(R > 0\), \(\Sigma _{m,R}\) is not compact with respect to the Wasserstein topology.

Proof

The identity map from \(\Sigma _{m,R}\) with the Wasserstein topology to \(\Sigma _{m,R}\) with the weak-\(*\) topology is a continuous bijection. If the domain were compact, then it would be a homeomorphism. The previous proposition would then imply that every \(\mu \in \Sigma _{m,R}\) that generates a Connes-embeddable tracial \(\mathrm {W}^*\)-algebra would in fact generate an AFD tracial \(\mathrm {W}^*\)-algebra. However, there are many finitely generated and Connes-embeddable tracial \(\mathrm {W}^*\)-algebras that are not AFD. \(\square \)

Another consequence of Proposition 5.26 is the following: Let \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\) be the weak-\(*\) closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\); then the laws that generate AFD tracial \(\mathrm {W}^*\)-algebras are weak-\(*\) generic in \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\), in the sense of the Baire category theorem. This may seem surprising at first because there are many Connes-embedddable tracial \(\mathrm {W}^*\)-algebras that are not AFD. However, a closely related model-theoretic statement has already been proved in [85, Theorem 5.1], namely that \(\mathcal {R}\) is the enforceable model of its universal theory.

Corollary 5.28

The set of laws \(\mu \) that generate an AFD tracial \(\mathrm {W}^*\)-algebra is a dense \(G_\delta \) set in \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\) with respect to the weak-\(*\) topology.

Proof

Let \(\mathcal {S}\) be the set of such laws. By definition S is weak-\(*\) dense in \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\). We showed above \(\mathcal {S}\) is closed with respect to the Wasserstein distance. It follows that

$$\begin{aligned} \mathcal {S} = \bigcap _{k\in {\mathbb {N}}} \mathcal {V}_k, \text { where } \mathcal {V}_k := \left\{ \mu \in \Sigma _{m,R}^{{{\,\mathrm{app}\,}}}: d_W^{(2)}(\mu ,\nu ) < \frac{1}{k} \text { for some } \nu \in \mathcal {S} \right\} . \end{aligned}$$

For each k and each \(\nu \in \mathcal {S}\), because the weak-\(*\) and Wasserstein topologies agree at \(\nu \), there exists a weak-\(*\) open set \(\mathcal {U}_{k,\nu } \subseteq \Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\) such that \(\nu \in \mathcal {U}_{k,\nu } \subseteq \mathcal {V}_k\). Let

$$\begin{aligned} \mathcal {U}_k = \bigcup _{\nu \in \mathcal {S}} \mathcal {U}_{k,\nu }. \end{aligned}$$

Then \(\mathcal {S} \subseteq \mathcal {U}_k \subseteq \mathcal {V}_k\) and \(\mathcal {U}_k\) is weak-\(*\) open. It follows that \(\mathcal {S} = \bigcap _{k \in {\mathbb {N}}} \mathcal {U}_k\) is a \(G_\delta \) set in \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\). \(\square \)

5.5 Non-separability of the Wasserstein space

We just showed that \(\Sigma _{m,R}\) with the Wasserstein distance is not compact for \(m > 1\), but in fact we will show that it is not separable using the results of Gromov [30], Olshanskii [55], and Ozawa [58]. We first recall some terminology about groups and their associated \(\mathrm {W}^*\)-algebras.

Let \(\Gamma \) be a group and let \(\ell ^2(\Gamma )\) be the Hilbert space of square-summable functions on \(\Gamma \). Let \(u: \Gamma \rightarrow B(\ell ^2(\Gamma ))\) be the left regular representation given by \(u(g) \delta _h = \delta _{gh}\), where \(\delta _g \in \ell ^2(\Gamma )\) is the function which is 1 at g and zero elsewhere. The \(\mathrm {W}^*\)-subalgebra of \(B(\ell ^2(\Gamma ))\) generated by the unitary operators u(g) for \(g \in \Gamma \) is called the group von Neumann algebra of \(\Gamma \). The map \(\tau : L(\Gamma ) \rightarrow \mathbb {C}\) given by \(T \mapsto \langle \delta _e, T \delta _e\rangle \) is a faithful normal trace on \(L(\Gamma )\), so that it is a tracial \(\mathrm {W}^*\)-algebra.

Definition 5.29

A discrete group \(\Gamma \) is said to have property (T) if there exist generators \(g_1\),..., \(g_m\) and an increasing function \(f: [0,\infty ) \rightarrow [0,\infty )\) with \(\lim _{\epsilon \rightarrow 0^+} f(\epsilon ) = 0\) with the following property: For every unitary representation \(\pi \) of \(\Gamma \) on a Hilbert space H and every unit vector \(\xi \in H\), if \(\max _{j \in [m]} \Vert \pi (g_j) \xi - \xi \Vert < \epsilon \), then there exists \(\eta \in H\) such that \(\pi (g) \eta = \eta \) for all \(g \in \Gamma \) and \(\Vert \eta - \xi \Vert < f(\epsilon )\).

Theorem 5.30

(Gromov [30], Olshanskii [55], and Ozawa [58, Theorem 1]) There exists a group \(\Gamma \) with property (T) that admits uncountable family \(\{\Gamma _\alpha \}_{\alpha \in I}\) of quotient groups that are simple and pairwise non-isomorphic. (In fact, such a family of quotient groups exists for every group \(\Gamma \) that is hyperbolic, torsion-free, and non-cyclic.)

The next lemma will allow us to translate this result into a statement about the space of non-commutative laws. While the space of non-commutative laws is defined in terms of self-adjoint generators, it is natural in the group setting to consider unitary rather than self-adjoint generators of a tracial \(\mathrm {W}^*\)-algebra. However, this issue is easily resolved by taking real and imaginary parts of operators. More precisely, if a is an operator in a tracial \(\mathrm {W}^*\)-algebra \(\mathcal {A}\), let \({{\,\mathrm{Re}\,}}(a) = (a + a^*)/2\) and \({{\,\mathrm{Im}\,}}(a) = (a - a^*)/2i\). Then \({{\,\mathrm{Re}\,}}(a)\) and \({{\,\mathrm{Im}\,}}(a)\) are self-adjoint and \(a = {{\,\mathrm{Re}\,}}(a) + i {{\,\mathrm{Im}\,}}(a)\) and \(\Vert a\Vert _{L^2(\mathcal {A})}^2 = \Vert {{\,\mathrm{Re}\,}}(a)\Vert _{L^2(\mathcal {A})}^2 + \Vert {{\,\mathrm{Im}\,}}(a)\Vert _{L^2(\mathcal {A})}^2\).

Lemma 5.31

Let \(\Gamma \) be a group with property (T), and let \(g_1\), ..., \(g_m \in \Gamma \) and \(f: [0,\infty ) \rightarrow [0,\infty )\) be as in Definition 5.29. Let \(q_1: \Gamma \rightarrow \Gamma _1\) and \(q_2: \Gamma \rightarrow \Gamma _2\) be quotient group homomorphisms. For \(j = 1,2\), let \(\pi _j: \Gamma \rightarrow L(\Gamma _j)\) be the quotient map \(q_j\) composed with the left regular representation of \(\Gamma _j\) and let

$$\begin{aligned} X_j = \bigl ({{\,\mathrm{Re}\,}}(\pi _j(g_1)), {{\,\mathrm{Im}\,}}(\pi _j(g_1)), \dots , {{\,\mathrm{Re}\,}}(\pi _j(g_m)), {{\,\mathrm{Im}\,}}(\pi _j(g))\bigr ) \in L(\Gamma _j)_{{{\,\mathrm{sa}\,}}}^{2m}. \end{aligned}$$

If \(f(d_W^{(2)}(\lambda _{X_1},\lambda _{X_2})) < 1/2\), then \(\ker (q_1) = \ker (q_2)\) and hence \(\Gamma _1 = \Gamma _2\).

Proof

Let \(\mathcal {A}\) be a tracial \(\mathrm {W}^*\)-algebra and let \(\iota _j: L(\Gamma _j) \rightarrow \mathcal {A}\) be tracial \(\mathrm {W}^*\)-embeddings such that \(\Vert \iota _1(X_1) - \iota _2(X_2)\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^{2m}} = d_W^{(2)}(\lambda _{X_1},\lambda _{X_2})\). Note that for \(j = 1\), ..., m,

$$\begin{aligned} \Vert \iota _1(\pi _1(g_j)) - \iota _2(\pi _2(g_j))\Vert _{L^2(\mathcal {A})}^2&= \Vert \iota _1({{\,\mathrm{Re}\,}}(\pi _1(g_j))) - \iota _2({{\,\mathrm{Re}\,}}(\pi _2(g_j)))\Vert _{L^2(\mathcal {A})}^2\\&\qquad + \Vert \iota _1({{\,\mathrm{Im}\,}}(\pi _1(g_j))) - \iota _2({{\,\mathrm{Im}\,}}(\pi _2(g_j)))\Vert _{L^2(\mathcal {A})}^2 \\&\le \Vert \iota _1(X_1) - \iota _2(X_2)\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^{2m}}^2. \end{aligned}$$

Let \(\pi : \Gamma \rightarrow B(L^2(\mathcal {A}))\) be the map given by \(\pi (g) \xi = \iota _1(\pi _1(g)) \xi \iota _2(\pi _2(g^{-1}))\) for \(\xi \in L^2(\mathcal {A})\); note that this is a unitary representation. The vector \({\widehat{1}}\) in \(L^2(\mathcal {A})\) satisfies

$$\begin{aligned} \Vert *\Vert {\pi (g_j) {\widehat{1}} - {\widehat{1}}}_{L^2(\mathcal {A})}&= \Vert *\Vert {\iota _1(\pi _1(g_j)) {\widehat{1}} - {\widehat{1}}\iota _2(\pi _2(g_2))}_{L^2(\mathcal {A})} \\&= \Vert \iota _1(\pi _1(g_j)) - \iota _2(\pi _2(g_2))\Vert _{L^2(\mathcal {A})} \\&\le d_W^{(2)}(\lambda _{X_1},\lambda _{X_2}). \end{aligned}$$

Hence, by property (T), there exists some \(\eta \in L^2(\mathcal {A})\) such that \(\Vert {\widehat{1}} - \eta \Vert _{L^2(\mathcal {A})} \le f(d_W^{(2)}(\lambda _{X_1},\lambda _{X_2}))\) and \(\pi (g) \eta = \eta \) for all \(g \in \Gamma \). The latter condition implies that \(\iota _1(\pi _1(g)) \eta = \eta \iota _2(\pi _2(g))\) for all \(g \in \Gamma \). Therefore, using the triangle inequality and the fact that \(\iota _j(\pi _j(g))\) is unitary,

$$\begin{aligned} \Vert *\Vert {\iota _1(\pi _1(g)){\widehat{1}} - {\widehat{1}} \iota _2(\pi _2(g))}_{L^2(\mathcal {A})} \le 2 \Vert *\Vert {{\widehat{1}}- \eta }_{L^2(\mathcal {A})} \le 2 f(d_W^{(2)}(\lambda _{X_1},\lambda _{X_2})) < 1. \end{aligned}$$

Hence,

$$\begin{aligned} |\tau _{\mathcal {A}}(\iota _1(\pi _1(g))) - \tau _{\mathcal {A}}(\iota _2(\pi _2(g)))| \le \Vert *\Vert {\iota _1(\pi _1(g)) - \iota _2(\pi _2(g))}_{L^2(\mathcal {A})} < 1. \end{aligned}$$

Now observe that

$$\begin{aligned} \tau _{\mathcal {A}}(\iota _j(\pi _j(g))) = \tau _{L(\Gamma _j)}(\pi _j(g)) = \delta _{\pi _j(g) = 1} = \delta _{g \in \ker (q_j)}. \end{aligned}$$

Since \(\delta _{g \in \ker (q_j)}\) is either zero or one and \(|\delta _{g \in \ker (q_1)} - \delta _{g \in \ker (q_2)}| < 1\), we have \(\ker (q_1) = \ker (q_2)\). \(\square \)

We can now prove Theorem 1.8 that shows that for sufficiently large m, \(\Sigma _{m,1}\) is not separable with respect to \(d_W^{(2)}\). The method is similar to [58, Proof of Theorem 2].

Proof (Proof of Theorem 1.8)

First, we show that \(\Sigma _{2m,1}\) is not separable for some m. Let \(\Gamma \) be a property (T) group with an uncountable family \((\Gamma _\alpha )_{\alpha \in I}\) of non-isomorphic quotients. Let \(\pi _\alpha : \Gamma \rightarrow L(\Gamma _\alpha )\) be the quotient map composed with the left regular representation. Let \(g_1\), ..., \(g_m\) and \(f: [0,\infty ) \rightarrow [0,\infty )\) witness property (T). Let \(\epsilon \) be sufficiently small that \(f(\epsilon ) < 1/2\). Let

$$\begin{aligned} X_\alpha = ({{\,\mathrm{Re}\,}}(\pi _\alpha (g_1)), {{\,\mathrm{Im}\,}}(\pi _\alpha (g_1)), \dots , {{\,\mathrm{Re}\,}}(\pi _\alpha (g_m)), {{\,\mathrm{Im}\,}}(\pi _\alpha (g_m))). \end{aligned}$$

For \(\alpha \ne \beta \) in I, since \(\Gamma _\alpha \) and \(\Gamma _\beta \) are not isomorphic, the lemma implies that \(f(d_W^{(2)}(\lambda _{X_\alpha },\lambda _{X_\beta })) \ge 1/2\), and therefore \(d_W^{(2)}(\lambda _{X_\alpha },\lambda _{X_\beta }) \ge \epsilon \). Hence, \(\{\lambda _{X_\alpha }\}_{\alpha \in I}\) is an uncountable \(\epsilon \)-separated set in \(\Sigma _{2m,1}\) with respect to the Wasserstein distance.

To prove that \(\Sigma _{m,R}\) is not separable for general \(m > 1\) and \(R > 0\), we first observe that there is a bijection between \(\Sigma _{m,R}\) and \(\Sigma _{m,R'}\) given by rescaling the non-commutative random variables. Hence, for each m, if we prove non-separability for one value of R, then it holds for all values of R. Furthermore, we can define a map \(\Sigma _{m,R} \rightarrow \Sigma _{m+1,R}\) sending the law of \((X_1,\dots ,X_m)\) to the law of \((X_1,\dots ,X_m,0)\). It is straightforward to show that this map is isometric with respect to the Wasserstein distance. Hence, if \(\Sigma _{m,R}\) is not separable, then \(\Sigma _{m',R}\) is not separable for \(m' \ge m\). Therefore, to prove the theorem, it suffices to show that for some value of R, \(\Sigma _{2,R}\) is not separable.

We already know that for some m, \(\Sigma _{m,1}\) is not separable. Hence, for some \(\epsilon > 0\), there is an uncountable family \((\mu _\alpha )_{\alpha \in I}\) of laws in \(\Sigma _{m,1}\) that is \(\epsilon \)-separated with respect to the Wasserstein distance. Let \((\mathcal {A}_\alpha , X_\alpha )\) be the GNS realization of \(\mu _\alpha \), where \(X_\alpha = (X_{\alpha ,1},\dots ,X_{\alpha ,m})\). Consider the tracial \(\mathrm {W}^*\)-algebra \(M_m(\mathcal {A}_\alpha )\) with the trace \(\tau _\alpha \otimes {{\,\mathrm{tr}\,}}_m\), and let \(Y_\alpha \in M_m(\mathcal {A}_\alpha )_{{{\,\mathrm{sa}\,}}}\) be the diagonal matrix with entries \(X_{\alpha ,1} + 4\), \(X_{\alpha ,2} + 8\), ..., \(X_{\alpha ,m} + 4m\). Let \(U_\alpha \in M_m(\mathbb {C}) \subseteq M_m(\mathcal {A}_\alpha )\) be the matrix of an m-cycle permutation. By functional calculus, \(U_\alpha \) can be expressed as \(e^{iZ_\alpha }\) for some self-adjoint \(Z_\alpha \in M_m(\mathbb {C}) \subseteq M_m(\mathcal {A}_\alpha )\) with \(\Vert Z_\alpha \Vert _{L^\infty (M_m(\mathbb {C}))} \le \pi /2\). Since \(U_\alpha \) is the inclusion into \(M_m(\mathcal {A}_\alpha )\) of an element of \(M_m(\mathbb {C})\) that is independent of \(\alpha \), there is in fact a polynomial p such that \(U_\alpha = p(Z_\alpha )\), and \(Z_\alpha \) and p are independent of \(\alpha \). We claim that \(d_W^{(2)}(\lambda _{Y_\alpha ,Z_\alpha }, \lambda _{Y_\beta ,Z_\beta }) \ge (1/K) d_W^{(2)}(\mu _\alpha ,\mu _\beta )\) for some \(K > 0\), which will imply that \(\Sigma _{2,4m+1}\) is not separable and thus prove the theorem.

To accomplish this, we will express \(X_{\alpha ,j} \otimes I_m\) in \(M_m(\mathcal {A}_\alpha )\) as a function of \(Y_\alpha \) and \(Z_\alpha \) (in an explicit way which allows us to estimate Wasserstein distances), using a well-known matrix amplification trick. We first recall a foundational result that the weak-\(*\) topology of a \(\mathrm {W}^*\)-algebra can be recovered from any faithful representation on a Hilbert space; see e.g. [64, Corollary 1.13.3, Proposition 1.16.2, Theorem 1.16.7]. In particular, \(\mathcal {A}_\alpha \) can be faithfully represented on \(H = L^2(\mathcal {A})\) and \(M_m(\mathcal {A}_\alpha ) = \mathcal {A}_\alpha \otimes M_m(\mathbb {C})\) can be faithfully represented on the Hilbert space \(H \otimes \mathbb {C}^m = H^{\oplus m}\). Moreover, all the facts about spectral theory and functional calculus on B(H) and \(B(H^{\oplus m})\) can be applied to the operators from \(\mathcal {A}_\alpha \) and \(M_n(\mathcal {A}_\alpha )\). In particular,

$$\begin{aligned} {{\,\mathrm{Spec}\,}}(Y_\alpha ) = \bigcup _{j=1}^m ({{\,\mathrm{Spec}\,}}(X_{\alpha ,j}) + 4j) \subseteq \bigcup _{j=1}^m [4j-1,4j+1]. \end{aligned}$$

Let \(\gamma _j\) be the rectangular contour in \(\mathbb {C}\) bounding the rectangle \([4j-2,4j+2] \times [-1,1]\), so that \(\gamma _j\) is separated from \({{\,\mathrm{Spec}\,}}(Y_\alpha )\) by a distance of 1. Using the Cauchy integral formula and functional calculus,

$$\begin{aligned} \int _{\gamma _j} (z - 4j)(z - X_{\alpha ,k})^{-1}\,dz = \delta _{j,k} X_{\alpha ,k}. \end{aligned}$$

Hence,

$$\begin{aligned} \int _{\gamma _j} (z - 4j)(z - Y_\alpha )^{-1}\,dz = X_{\alpha ,j} \otimes e_{j,j}, \end{aligned}$$

where \(e_{j,j}\) is the jth diagonal matrix unit in \(M_m(\mathbb {C})\). In particular, \(X_{\alpha ,j} \otimes e_{j,j} \in \mathrm {W}^*(Y_\alpha )\) and thus \(X_{\alpha ,j} \otimes e_{k,\ell } = U_\alpha ^{k-j}(X_{\alpha ,j} \otimes e_{j,j}) U_\alpha ^{j-\ell } \in \mathrm {W}^*(Y_\alpha ,Z_\alpha )\) for every \(k, \ell = 1\), ..., m; this implies that \(Y_\alpha \) and \(Z_\alpha \) generate \(M_m(\mathcal {A}_\alpha )\). Moreover,

$$\begin{aligned} X_{\alpha ,j} \otimes I_m= & {} \sum _{k=1}^m \int _{\gamma _j} U_\alpha ^k (z - 4j)(z - Y_\alpha )^{-1}U_\alpha ^{-k}\,dz \nonumber \\= & {} \sum _{k=1}^m \int _{\gamma _j} p(Z_\alpha )^k (z - 4j)(z - Y_\alpha )^{-1} \overline{p}(Z_\alpha )^k \,dz. \end{aligned}$$
(4.12)

Let \(\alpha \ne \beta \). Then an optimal coupling of \(\lambda _{Y_\alpha ,Z_\alpha }\) and \(\lambda _{Y_\beta ,Z_\beta }\) on the tracial \(\mathrm {W}^*\)-algebra \(\mathcal {B}\) produces two tracial \(\mathrm {W}^*\)-embeddings \(\iota _\alpha : M_m(\mathcal {A}_\alpha ) \rightarrow \mathcal {B}\) and \(\iota _\beta : M_m(\mathcal {A}_\beta ) \rightarrow \mathcal {B}\). Because the Cauchy integral representation (5.1) can be expressed as a Riemann integral, we have

$$\begin{aligned} \iota _\alpha (X_{\alpha ,j} \otimes I_m) = \sum _{k=1}^m \int _{\gamma _j} p(\iota _\alpha (Z_\alpha ))^k (z - 4j)(z - \iota _\alpha (Y_\alpha ))^{-1} \overline{p}(\iota _\alpha (Z_\alpha ))^k \,dz, \end{aligned}$$

and the same holds for \(\beta \). Using the resolvent identity and non-commutative Hölder’s inequality,

$$\begin{aligned}&\Vert (z - \iota _\alpha (Y_\alpha ))^{-1} - (z -\iota _\beta (Y_\beta ))^{-1}\Vert _{L^2(\mathcal {B})} \\&\quad \le \Vert (z - \iota _\alpha (Y_\alpha ))^{-1}\Vert _{L^\infty (\mathcal {B})} \Vert Y_\alpha - Y_\beta \Vert _{L^2(\mathcal {B})} \Vert (z - \iota _\beta (Y_\beta ))^{-1}\Vert _{L^\infty (\mathcal {B})} \\&\quad \le \Vert Y_\alpha - Y_\beta \Vert _{L^2(\mathcal {B})}. \end{aligned}$$

Furthermore, one checks easily that \(\Vert p(\iota _\alpha (Z_\alpha )) - p(\iota _\beta (Z_\beta ))\Vert _{L^2(\mathcal {B})} \le C_p \Vert \iota _\alpha (Z_\alpha ) - \iota _\beta (Z_\beta )\Vert _{L^2(\mathcal {B})}\) for some constant \(C_p\) (since \(\Vert Z_\alpha \Vert _{L^\infty (M_m(\mathcal {A}_\alpha ))}\) is bounded by univeral constant). By estimating the difference between \(p(\iota _\alpha (Z_\alpha ))^k (z - 4j)(z - \iota _\alpha (Y_\alpha ))^{-1} \overline{p}(\iota _\alpha (Z_\alpha ))^k\) and \(p(\iota _\beta (Z_\beta ))^k (z - 4j)(z - \iota _\beta (Y_\beta ))^{-1} \overline{p}(\iota _\beta (Z_\beta ))^k\) and applying the triangle inequality for integrals, we obtain for some constant \(C_p'\) that

$$\begin{aligned}&\Vert \iota _\alpha (X_{\alpha ,j} \otimes I_m) - \iota _\beta (X_{\beta ,j} \otimes I_m)\Vert _{L^2(\mathcal {B})} \\&\qquad \le C_p' \left( \Vert \iota _\alpha (Y_\alpha ) - \iota _\beta (Y_\beta )\Vert _{L^2(\mathcal {B})}^2 + \Vert \iota _\alpha (Z_\alpha ) - \iota _\beta (Z_\beta )\Vert _{L^2(\mathcal {B})}^2 \right) ^{1/2}. \end{aligned}$$

Since \((X_{\alpha ,1} \otimes I_m, \dots , X_{\alpha ,m} \otimes I_m)\) has the same non-commutative law as \(X_\alpha \), we obtain

$$\begin{aligned} \epsilon \le d_W^{(2)}(\lambda _{X_\alpha },\lambda _{X_\beta }) \le m^{1/2} C_p' d_W^{(2)}(\lambda _{(Y_\alpha ,Z_\alpha )},\lambda _{(Y_\beta ,Z_\beta )}). \end{aligned}$$

Hence, \(\{\lambda _{(Y_\alpha ,Z_\alpha )}\}_{\alpha \in I}\) is \(\epsilon / (m^{1/2} C_p')\)-separated in \(\Sigma _{2,4m+1}\), as desired. \(\square \)

We remark that a similar non-separability result in the context of model theory for operator algebras was shown in [4, Proposition 4.2.9]. In the model theoretic context, one often encounters triples \((\Omega ,\mathscr {T},d)\) where \((\Omega ,\mathscr {T})\) is a topological space and d is a metric on \(\Omega \) that is lower semi-continuous with respect to \(\mathscr {T}\) and generates a topology that is at least as strong as \(\mathscr {T}\); such a triple \((\Omega ,\mathscr {T},d)\) is called a topometric space [9]. In particular, \(\Sigma _{m,R}\) with the weak-\(*\) topology and Wasserstein distance is a topometric space by Corollary 5.17 and Lemma 5.18. It was shown in [9, Proposition 3.20] that if \((\Omega ,\mathscr {T},d)\) is a topometric space and \((\Omega ,\mathscr {T})\) is second countable and locally compact, then the density character of \((\Omega ,d)\) is either countable or greater than or equal to the continuum. Hence, as a corollary of Theorem 1.8, the density character of \((\Sigma _{m,R},d_W^{(2)})\) is the continuum (of course since \(\Sigma _{m,R}\) with the weak-\(*\) topology is compact and metrizable, it is in particular second countable and locally compact).

6 Further Remarks

6.1 Non-commutative optimal couplings and random matrix theory

One of the motivations for our paper was the following question.

Question 6.1

Suppose that \(X^{(N)}\), \(Y^{(N)}\) are random m-tuple of self-adjoint \(N \times N\) matrices with probability distributions \(\mu ^{(N)}\) and \(\nu ^{(N)}\) respectively. Let \(\mu \), \(\nu \in \Sigma _{n,R}\). Suppose that almost surely

$$\begin{aligned}&\limsup _{N \rightarrow \infty } \Vert X^{(N)}\Vert _{L^\infty (M_N(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^m}< R, \qquad \limsup _{N \rightarrow \infty } \Vert Y^{(N)}\Vert _{L^\infty (M_N(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^m} < R, \\&\qquad \lim _{N \rightarrow \infty } \lambda _{X^{(N)}} = \mu , \qquad \qquad \qquad \qquad \lim _{N \rightarrow \infty } \lambda _{Y^{(N)}} = \nu . \end{aligned}$$

Does the classical \(L^2\)-Wasserstein distance of \(\mu ^{(N)}\) and \(\nu ^{(N)}\) (as probability measures on \(M_N(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\)) converge to the non-commutative \(L^2\)-Wasserstein distance of \(\mu \) and \(\nu \)?

The results of [23, 31] combined with [41] give a positive answer when \(\mu ^{(N)}\) is a random matrix model with density proportional to \(e^{-N^2V^{(N)}}\) where \(V^{(N)}: M_N(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m \rightarrow {\mathbb {R}}\) is a sufficiently regular convex function such as the trace of a non-commutative polynomial, and where \(\nu ^{(N)}\) has density proportional to \(e^{-N^2 \Vert X\Vert _{L^2(M_N(\mathbb {C}))_{{{\,\mathrm{sa}\,}}}^m}^2/2}\) (Gaussian). The convexity of \(V^{(N)}\) is crucial for all these arguments. By contrast, the present work shows that Question 6.1 can have a negative answer due to the obstruction of Connes-embeddability.

Proposition 6.2

Let \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\) be matrix tuples such that an optimal coupling of \(\lambda _X\) and \(\lambda _Y\) requires a non-Connes-embeddable tracial \(\mathrm {W}^*\)-algebra as in Corollary 5.14. Suppose \(X^{(N)}\) and \(Y^{(N)}\) are random (or even deterministic) elements of \(M_N(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\) that converge in non-commutative law to X and Y. Then the classical Wasserstein distance of the probability distributions of \(X^{(N)}\) and \(Y^{(N)}\) on \(M_N(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\) (with the \(L^2\) norm associated to the normalized trace \({{\,\mathrm{tr}\,}}_N\)) does not converge to \(d_W^{(2)}(\lambda _X,\lambda _Y)\).

Before proving the proposition, we make some preliminary observations. Let \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\) denote the space of Connes-embeddable non-commutative laws in \(\Sigma _{m,R}\). Let \(d_{W,{{\,\mathrm{app}\,}}}^{(2)}\) be the non-commutative Wasserstein distance on \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\) defined using only couplings in Connes-embeddable tracial \(\mathrm {W}^*\)-algebras. Since \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\) is the weak-\(*\) closure of \(\Sigma _{m,R}^{{{\,\mathrm{fin}\,}}}\), it is weak-\(*\) compact, which implies the existence of optimal Connes-embeddable couplings. Moreover, the same reasoning as in Lemma 5.18 shows that \(d_{W,{{\,\mathrm{app}\,}}}^{(2)}\) is weak-\(*\) lower semi-continuous. Of course, Corollary 5.14 shows that \(d_{W,{{\,\mathrm{app}\,}}}^{(2)}\) can be strictly greater than \(d_W^{(2)}\) (however, we do not know whether these two metrics generate the same topology on \(\Sigma _{m,R}^{{{\,\mathrm{app}\,}}}\)).

Proof

Suppose that \(X^{(N)}\) and \(Y^{(N)}\) are random variables on the diffuse probability space \((\Omega ,P)\). Let \(\mu ^{(N)}\) and \(\nu ^{(N)}\) be the classical probability distributions of \(X^{(N)}\) and \(Y^{(N)}\) as random variables with values in the vector space \(M_N(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\) equipped with inner product associated to \({{\,\mathrm{tr}\,}}_N\). Let \({\widehat{\mu }}^{(N)}\) and \({\widehat{\nu }}^{(N)}\) be the non-commutative laws of \(X^{(N)}\) and \(Y^{(N)}\) as elements of the tracial \(\mathrm {W}^*\)-algebra \(L^\infty (\Omega ,P;M_N(\mathbb {C}))\) with the trace given by \({\mathbb {E}} \circ {{\,\mathrm{tr}\,}}_N\). A classical coupling of the probability distributions \(\mu ^{(N)}\) and \(\nu ^{(N)}\) on the probability space \((\Omega ,P)\) can be interpreted as a non-commutative coupling on the tracial \(\mathrm {W}^*\)-algebra \( (L^\infty (\Omega ,P;M_N(\mathbb {C})), {\mathbb {E}}\circ {{\,\mathrm{tr}\,}}_N)\), which is Connes-embeddable. Therefore,

$$\begin{aligned} \liminf _{N \rightarrow \infty } d_W(\mu ^{(N)},\nu ^{(N)}) \ge \liminf _{N \rightarrow \infty } d_{W,{{\,\mathrm{app}\,}}}^{(2)}({\widehat{\mu }}^{(N)},{\widehat{\nu }}^{(N)}) \ge d_{W,{{\,\mathrm{app}\,}}}^{(2)}(\lambda _X,\lambda _Y) > d_{W}^{(2)}(\lambda _X,\lambda _Y). \end{aligned}$$

This problem cannot be removed using free probabilistic regularity conditions (conditions such as finite free entropy, finite free Fisher information and so forth; see the introduction of [16] for context).

Proposition 6.3

Again, let \(X, Y \in M_n(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^m\) be as in Corollary 5.14. Let S be a free semicircular m-tuple freely independent of X and Y. Then \(X + t^{1/2}S\) and \(Y + t^{1/2}S\) have finite free microstate entropy (defined in [77]) and finite free Fisher information (defined in [79]). However, \(d_{W,{{\,\mathrm{app}\,}}}^{(2)}(\lambda _{X+t^{1/2}S},\lambda _{Y+t^{1/2}}) > d_W^{(2)}(\lambda _{X+t^{1/2}S},\lambda _{Y+t^{1/2}})\) for sufficiently small \(t > 0\). Hence, as in Proposition 6.2, there do not exist random matrix approximations for \(\lambda _{X+t^{1/2}S}\) and \(\lambda _{X+t^{1/2}S}\) whose classical Wasserstein distance converges to \(d_W^{(2)}(\lambda _{X+t^{1/2}S},\lambda _{Y+t^{1/2}})\).

Proof

By [80, Theorem 3.9], \(X + t^{1/2}S\) and \(Y + t^{1/2}S\) have finite free microstate entropy, and by [79, Corollary 6.14], they have finite free Fisher information. The free product of \(M_N(\mathbb {C})\) and \(\mathrm {W}^*(S)\) is Connes-embeddable by [80, Proposition 3.3]. Hence,

$$\begin{aligned} d_W^{(2)}(\lambda _X,\lambda _{X+t^{1/2}S}) \le d_{W,{{\,\mathrm{app}\,}}}^{(2)}(\lambda _X,\lambda _{X+t^{1/2}S}) \le (mt)^{1/2}, \end{aligned}$$

and the same holds with X replaced by Y. Thus, using the triangle inequality, \(d_{W}^{(2)}(\lambda _{X+t^{1/2}S},\lambda _{Y+t^{1/2}S}) < d_{W,{{\,\mathrm{app}\,}}}^{(2)}(\lambda _{X+t^{1/2}S},\lambda _{Y+t^{1/2}S})\) for sufficiently small \(t > 0\), since this holds at \(t = 0\). The same argument as in Proposition 6.2 rules out the possibility of the classical Wasserstein distance for random matrix models converging to \(d_W^{(2)}(\lambda _{X+t^{1/2}S},\lambda _{Y+t^{1/2}S})\). \(\square \)

Thus, at the very least, Question 6.1 needs to be reformulated using the Connes-embeddable version of the non-commutative Wasserstein distance. Even with such a modification, our results illustrate why this question is so difficult.Footnote 6 Indeed, in light of §5.4, random matrix models cannot converge in Wasserstein distance to the limiting non-commutative law unless that limiting law produces an approximately finite-dimensional tracial \(\mathrm {W}^*\)-algebra. However, “good behavior” in random matrix theory and free probability often entails generating a tracial \(\mathrm {W}^*\)-algebra that is “similar to” a free group von Neumann algebra, which is far from being approximately finite-dimensional (see e.g. [70, §XIV.3]). The random matrix question suggests a more general question in the framework of non-commutative optimal couplings.

Question 6.4

Suppose that \(\mu _n\), \(\nu _n \in \Sigma _{m,R}\) and \(\mu _n \rightarrow \mu \) and \(\nu _n \rightarrow \nu \) weak-\(*\). Under what conditions does \(d_W^{(2)}(\mu _n,\nu _n) \rightarrow d_W^{(2)}(\mu ,\nu )\)?

Monge–Kantorovich duality provides one avenue to attack this question. Indeed, suppose that \((f_n,g_n)\) are admissible pairs of E-convex functions minimizing \(\mu _n(f_n) + \nu _n(g_n)\). Suppose that (fg) is an admissible pair minimizing \(\mu (f) + \nu (g)\). To give a positive answer to Question 6.4, it suffices to show that \(\mu _n(f_n) + \nu _n(g_n) \rightarrow \mu (f) + \nu (g)\). Suppose that we somehow show that \(f_n \rightarrow f\) and \(g_n \rightarrow g\) uniformly on each operator norm ball, so that \(\mu _n(f_n) - \mu _n(f) \rightarrow 0\) and \(\nu (g_n) - \nu (g) \rightarrow 0\).

Then it remains to show that \(\mu _n(f) \rightarrow \mu (f)\) and \(\nu _n(g) \rightarrow \nu (g)\). If f and g take finite values everywhere, then for each \(\mathcal {A}\in {\mathbb {W}}\), \(f^{\mathcal {A}}\) and \(g^{\mathcal {A}}\) will define continuous functions on \(L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\), and in particular, \(\lambda \mapsto \lambda (f)\) and \(\lambda \mapsto \lambda (g)\) are continuous with respect to Wasserstein distance. However, we only assumed weak-\(*\) convergence of \(\mu _n \rightarrow \mu \) and \(\nu _n \rightarrow \nu \). Thus, in order to obtain the convergence of the Wasserstein distance, we would want the stronger condition that f and g are continuous with respect to convergence in law, that is, \(\lambda \mapsto \lambda (f)\) and \(\lambda \mapsto \lambda (g)\) are weak-\(*\) continuous on \(\Sigma _{m,R}\) for each \(R > 0\).

The examples of Monge–Kantorovich duality in [41, Lemma 9.10, Remark 9.11] use functions that are continuous with respect to the weak-\(*\) topology on \(\Sigma _{m,R}\). However, we doubt that the optimizers (fg) in the Monge–Kantorovich duality can always be chosen to be weak-\(*\) continuous. Nonetheless, it is worth investigating in future research how E-convex functions and Legendre transforms behave with respect to convergence in law.

6.2 Bimodule couplings and \({\text {UCPT}}\)-convex functions

Another operator-algebraic analog of the idea of coupling arises from bimodules over von Neumann algebras, which have been very important in many areas of von Neumann algebras. For further background, see [12, Appendix F] and [2, §13].

Definition 6.5

If A and B are \(\mathrm {W}^*\)-algebras, then a Hilbert A-B-bimodule is a Hilbert space H with an A-B-bimodule structure, such that the associated maps \(A \rightarrow B(H)\) and \(B \rightarrow B(H)\) are weak-\(*\) continuous. Given tracial \(\mathrm {W}^*\)-algebras \(\mathcal {A}= (A,\tau )\) and \(\mathcal {B}= (B,\sigma )\) and a A-B-bimodule H, we say that a vector \(\xi \in H\) is bitracial if \(\langle \xi , a\xi \rangle = \tau (a)\) for \(a \in A\) and \(\langle \xi , \xi b\rangle = \sigma (b)\) for \(b \in B\).

For example, suppose that there are tracial \(\mathrm {W}^*\)-embeddings \(\iota _1: \mathcal {A}\rightarrow \mathcal {C}\) and \(\iota _2: \mathcal {B}\rightarrow \mathcal {C}\). Then \(L^2(\mathcal {C})\) is a Hilbert \(L^\infty (\mathcal {A})\)-\(L^\infty (\mathcal {B})\)-bimodule and \(\xi = {\widehat{1}}\in L^2(\mathcal {C})\) is a bitracial vector. Thus, bimodules with bitracial vectors are a generalization of pair of tracial \(\mathrm {W}^*\)-embeddings. In the case of a pair of embeddings \(\iota _1\) and \(\iota _2\), there is an associated factorizable map \(\iota _2^* \iota _1: \mathcal {A}\rightarrow \mathcal {B}\). In a similar way, general \(L^\infty (\mathcal {A})\)-\(L^\infty (\mathcal {B})\)-bimodules with bitracial vectors correspond to general \({\text {UCPT}}\)-maps.

Lemma 6.6

(See [2, Sect. 13.1.2]) Let \(\mathcal {A}\), \(\mathcal {B}\) be tracial \(\mathrm {W}^*\)-algebras. If H is a Hilbert \(L^\infty (\mathcal {A})\)-\(L^\infty (\mathcal {B})\)-bimodule and \(\xi \in H\) is a bitracial vectors, then there exists a unique \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\) such that \(\langle \xi ,a\xi b\rangle = \tau _{\mathcal {B}}(\Phi (a)b)\) for all \(a \in L^\infty (\mathcal {A})\) and \(b \in L^\infty (\mathcal {B})\). Conversely, \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\), there exists a Hilbert \(L^\infty (\mathcal {A})\)-\(L^\infty (\mathcal {B})\)-bimodule H and a bitracial vector \(\xi \) satisfying \(\langle \xi ,a\xi b\rangle = \tau _{\mathcal {B}}(\Phi (a)b)\). If we further demand that H is generated by \(\xi \) as a Hilbert \(L^\infty (\mathcal {A})\)-\(L^\infty (\mathcal {B})\)-bimodule, then the pair \((H,\xi )\) is unique up to isomorphism.

The bimodules and their associated \({\text {UCPT}}\)-maps lead to an alternative notion of couplings for non-commutative random variables.

Definition 6.7

Let \(\mu \) and \(\nu \in \Sigma _m\) be non-commutative laws, and let \((\mathcal {A},X)\) and \((\mathcal {B},Y)\) be the GNS realizations of \(\mu \) and \(\nu \) respectively. A bimodule coupling of \(\mu \) and \(\nu \) is a Hilbert \(\mathcal {A}\)-\(\mathcal {B}\)-bimodule H together with a bitracial vector \(\xi \). We define \(C_{{{\,\mathrm{bim}\,}}}(\mu ,\nu )\) to be the supremum of \(\sum _{j=1}^m \langle \xi , X_j\xi Y_j\rangle \) over all bimodule couplings of \(\mu \) and \(\nu \), or equivalently,

$$\begin{aligned} C_{{{\,\mathrm{bim}\,}}}(\mu ,\nu ) = \sup _{\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})} \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

We then define \(d_{{{\,\mathrm{bim}\,}}}(\mu ,\nu )\) by

$$\begin{aligned} d_{{{\,\mathrm{bim}\,}}}(\mu ,\nu )^2 = \sum _{j=1}^m \mu (x_j^2) + \sum _{j=1}^m \nu (x_j^2) - 2 C_{{{\,\mathrm{bim}\,}}}(\mu ,\nu ). \end{aligned}$$

We remark that \(d_{{{\,\mathrm{bim}\,}}}(\mu ,\nu )\) is the infimum of \(\Vert X\xi - \xi Y\Vert \) in \(H^m\) over Hilbert \(L^\infty (\mathcal {A})\)-\(L^\infty (\mathcal {B})\)-bimodules with bitracial vectors (this follows from (6.2) below). Moreover, the existence of optimal bimodule couplings can be deduced from the compactness of \({\text {UCPT}}(\mathcal {A},\mathcal {B})\) in the pointwise weak-\(*\) topology. The properties of \(C_{{{\,\mathrm{bim}\,}}}\) and \(d_{{{\,\mathrm{bim}\,}}}\) are quite similar to those of C and \(d_W^{(2)}\) only with factorizable maps replaced by general \({\text {UCPT}}\) maps, but we will see in Corollary 6.12 that they do not agree in general. But first, for completeness, we give proofs of some of the basic properties with the aid of the following lemma.

Lemma 6.8

Let \(\mathcal {A}\) and \(\mathcal {B}\) be tracial \(\mathrm {W}^*\)-functions and \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\). Let \(X \in L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(Y \in L^\infty (\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) with \(\Vert X\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le R\) and \(\Vert Y\Vert _{L^\infty (\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} \le R\). Then for \(i_1\), ..., \(i_\ell \in \{1,\dots ,m\}\), we have

$$\begin{aligned}&\Vert \Phi (X_{i_1} \dots X_{i_\ell }) - Y_{i_1} \dots Y_{i_\ell }\Vert _{L^2(\mathcal {B})} \nonumber \\&\quad \le \ell R^{\ell - 1} \left( \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2 \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} + \Vert Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \right) ^{1/2}. \end{aligned}$$
(4.13)

Proof

Let H be an \(L^\infty (\mathcal {A})\)-\(L^\infty (\mathcal {B})\) bimodule with a bitracial vector \(\xi \) such that \(\langle \Phi (Z),W\rangle _{L^2(\mathcal {B})} = \langle \xi , Z \xi W\rangle _{L^2(\mathcal {B})}\) for all \(Z \in L^\infty (\mathcal {A})\) and \(W \in L^\infty (\mathcal {B})\). Direct computation shows that

$$\begin{aligned} \Vert \Phi (Z) - W\Vert _{L^2(\mathcal {B})}^2&= \Vert \Phi (Z)\Vert _{L^2(\mathcal {B})}^2 - 2 {{\,\mathrm{Re}\,}}\langle \Phi (Z),W\rangle _{L^2(\mathcal {B})} + \Vert W\Vert _{L^2(\mathcal {B})}^2 \nonumber \\&\le \Vert Z\Vert _{L^2(\mathcal {B})}^2 - 2 {{\,\mathrm{Re}\,}}\langle \Phi (Z),W\rangle _{L^2(\mathcal {B})} + \Vert W\Vert _{L^2(\mathcal {B})}^2 \nonumber \\&= \Vert Z \xi - \xi W\Vert ^2. \end{aligned}$$
(4.14)

This implies that

$$\begin{aligned}&\Vert \Phi (X_{i_1} \dots X_{i_\ell }) - Y_{i_1} \dots Y_{i_\ell }\Vert _{L^2(\mathcal {B})}\\&\quad \le \Vert X_{i_1} \dots X_{i_\ell } \xi - \xi Y_{i_1} \dots Y_{i_\ell }\Vert \\&\quad \le \sum _{k=1}^\ell \Vert X_{i_1} \dots X_{i_k} \xi Y_{i_{k+1}} \dots Y_{i_\ell } - X_{i_1} \dots X_{i_{k-1}} \xi Y_{i_k} \dots Y_{i_\ell }\Vert \\&\quad \le \sum _{k=1}^\ell \Vert X_{i_1} \dots X_{i_{k-1}}\Vert _{L^\infty (\mathcal {A})} \Vert X_{i_k}\xi - \xi Y_{i_k}\Vert \Vert Y_{i_{k+1}} \dots Y_{i_\ell }\Vert _{L^\infty (\mathcal {B})} \\&\quad \le \ell R^{\ell -1} \Vert X\xi - \xi Y\Vert \\&\quad = \ell R^{\ell -1} \left( \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2 \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} + \Vert Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \right) ^{1/2}. \end{aligned}$$

\(\square \)

Proposition 6.9

\((\Sigma _{m,R},d_{{{\,\mathrm{bim}\,}}})\) is a complete metric space. If \(\lambda \), \(\mu \in \Sigma _{m,R}\), then

$$\begin{aligned} |\lambda (x_{i_1} \dots x_{i_\ell }) - \mu (x_{i_1} \dots x_{i_\ell })| \le \ell R^{\ell - 1} d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu ) \le \ell R^{\ell - 1} d_W^{(2)}(\lambda ,\mu ), \end{aligned}$$
(5.1)

and in particular, the topology generated by \(d_{{{\,\mathrm{bim}\,}}}\) refines the weak-\(*\) topology, and the topology generated by \(d_W^{(2)}\) refines the topology generated by \(d_{{{\,\mathrm{bim}\,}}}\). Moreover, \(d_{{{\,\mathrm{bim}\,}}}\) is lower semi-continuous on \(\Sigma _{m,R} \times \Sigma _{m,R}\) with respect to the weak-\(*\) topology.

Proof

In the following, let \(\lambda \), \(\mu \), and \(\nu \in \Sigma _{m,R}\), and let \((\mathcal {A},X)\), \((\mathcal {B},Y)\), and \((\mathcal {C},Z)\) be their respective GNS realizations.

First, we prove (6.3). If \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\), then using (6.1),

$$\begin{aligned}&|\lambda (x_{i_1} \dots x_{i_\ell }) - \mu (x_{i_1} \dots x_{i_\ell })| \\&\quad = |\tau _{\mathcal {B}}(\Phi (X_{i_1} \dots X_{i_\ell }) - Y_{i_1} \dots Y_{i_\ell })| \\&\quad \le \ell R^{\ell -1} \left( \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2 \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} + \Vert Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \right) ^{1/2}. \end{aligned}$$

Taking the infimum over \(\Phi \), we obtain the first inequality of (6.3). The second inequality follows because \(d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu ) \le d_W^{(2)}(\lambda ,\mu )\) since \({\text {FM}}(\mathcal {A},\mathcal {B}) \subseteq {\text {UCPT}}(\mathcal {A},\mathcal {B})\).

Next, we show that \(d_{{{\,\mathrm{bim}\,}}}\) is a metric on \(\Sigma _{m,R}\) (postponing the proof of completeness to the end). Clearly, \(d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu ) \ge 0\). If \(d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu ) = 0\), then by (6.3), we have \(\lambda = \mu \). Because every \({\text {UCPT}}\) map has a \({\text {UCPT}}\) adjoint, we have

$$\begin{aligned} C_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu )= & {} \sup _{\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})} \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \\= & {} \sup _{\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})} \langle X, \Phi ^*(Y)\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} = C_{{{\,\mathrm{bim}\,}}}(\mu ,\lambda ), \end{aligned}$$

and hence \(d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu ) = d_{{{\,\mathrm{bim}\,}}}(\mu ,\lambda )\). To prove the triangle inequality, we use the fact that \({\text {UCPT}}\) maps are closed under composition.Footnote 7 Let \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\) and \(\Psi \in {\text {UCPT}}(\mathcal {B},\mathcal {C})\) be \({\text {UCPT}}\) maps corresponding to optimal bimodule couplings between \(\lambda \) and \(\mu \) and between \(\mu \) and \(\nu \) respectively, so that

$$\begin{aligned} d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu )^2&= \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2 \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} + \Vert Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \\&= \left( \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \Vert \Phi (X)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \right) + \Vert \Phi (X) - Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&\ge \Vert \Phi (X) - Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \end{aligned}$$

and

$$\begin{aligned} d_{{{\,\mathrm{bim}\,}}}(\nu ,\mu )^2&= \Vert Z\Vert _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2 \langle \Psi ^*(Z),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} + \Vert Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \\&= \left( \Vert Z\Vert _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}^2 - \Vert \Psi ^*(Z)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \right) + \Vert \Psi ^*(Z) - Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&\ge \Vert \Psi ^*(Z) - Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2. \end{aligned}$$

Then

$$\begin{aligned} d_{{{\,\mathrm{bim}\,}}}(\lambda ,\nu )^2&\le \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2 \langle \Psi \circ \Phi (X),Z\rangle _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m} + \Vert Z\Vert _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&= \left( \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \Vert \Phi (X)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \right) + \Vert \Phi (X) - \Psi ^*(Z)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&\quad + \left( \Vert Z\Vert _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}^2 - \Vert \Psi ^*(Z)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \right) \\&\le \left( \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - \Vert \Phi (X)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \right) + \Vert \Phi (X) - Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \\&\quad + 2\Vert \Phi (X) - Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \Vert \Psi ^*(Z) - Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \\&\quad + \Vert \Psi ^*(Z) - Y\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 + \left( \Vert Z\Vert _{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m}^2 - \Vert \Psi ^*(Z)\Vert _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}^2 \right) \\&\le d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu )^2 + 2 d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu ) d_{{{\,\mathrm{bim}\,}}}(\mu ,\nu ) + d_{{{\,\mathrm{bim}\,}}}(\mu ,\nu )^2. \end{aligned}$$

It follows from (6.3) that the \(d_{{{\,\mathrm{bim}\,}}}\)-topology refines the weak-\(*\) topology, and the Wasserstein topology refines the \(d_{{{\,\mathrm{bim}\,}}}\)-topology.

Next, we show that \(d_{{{\,\mathrm{bim}\,}}}\) is lower semi-continuous with respect to the weak-\(*\) topology. Fix a non-principal ultrafilter \(\mathcal {U}\) on \({\mathbb {N}}\), and suppose that \((\lambda _n)_{n \in {\mathbb {N}}}\) and \((\mu _n)_{n \in {\mathbb {N}}}\) are sequences in \(\Sigma _{m,R}\) and \((\mathcal {A}_n,X_n)\) and \((\mathcal {B}_n,Y_n)\) are their respective GNS realizations. Let \(\lambda = \lim _{n \rightarrow \mathcal {U}} \lambda _n\) and \(\mu = \lim _{n \rightarrow \mathcal {U}} \mu _n\). Let \(\mathcal {A}= \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) and \(\mathcal {B}= \prod _{n \rightarrow \mathcal {U}} \mathcal {B}_n\). Let \(X = [X_n]_{n \in {\mathbb {N}}} \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(Y = [Y_n]_{n \in {\mathbb {N}}} \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\). By Lemma 5.10, X and Y have non-commutative laws \(\lambda \) and \(\mu \) respectively. Let \(\Phi _n \in {\text {UCPT}}(\mathcal {A}_n,\mathcal {B}_n)\) such that \(C_{{{\,\mathrm{bim}\,}}}(\lambda _n,\mu _n) = \langle \Phi _n(X_n),Y_n\rangle _{L^2(\mathcal {B}_n)_{{{\,\mathrm{sa}\,}}}^m}\). If \((Z_n)_{n \in {\mathbb {N}}}\) and \((Z_n')_{n \in {\mathbb {N}}}\) are sequences in \(\prod _{n \in {\mathbb {N}}} \mathcal {A}_n\) and if \(\lim _{n \rightarrow \mathcal {U}} \Vert Z_n - Z_n'\Vert _{L^2(\mathcal {A}_n)} = 0\), then \(\lim _{n \rightarrow \mathcal {U}} \Vert \Phi _n(Z_n) - \Phi _n(Z_n')\Vert _{L^2(\mathcal {B}_n)} = 0\) because each \(\Phi _n\) is a contraction with respect to the \(L^2\) norms on \(\mathcal {A}\) and \(\mathcal {B}\). Therefore, the equivalence class \([\Phi _n(Z_n)]_{n \in {\mathbb {N}}}\) in \(\mathcal {B}\) only depends on the equivalence class \([Z_n]_{n \in {\mathbb {N}}}\) in \(\mathcal {A}\), so that the sequence \(\Phi _n\) produces a well-defined map \(\Phi : \mathcal {A}\rightarrow \mathcal {B}\). It is straightforward to check that \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\). Let \(\Phi ': \mathrm {W}^*(X) \rightarrow \mathrm {W}^*(Y)\) be the composition of the inclusion \(\mathrm {W}^*(X) \rightarrow \mathcal {A}\), the map \(\Phi : \mathcal {A}\rightarrow \mathcal {B}\), and the trace-preserving conditional expectation \(\mathcal {B}\rightarrow \mathrm {W}^*(Y)\). Then

$$\begin{aligned} C_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu )\ge & {} \langle \Phi '(X),Y\rangle _{L^2(\mathrm {W}^*(Y))_{{{\,\mathrm{sa}\,}}}^m} = \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} \\= & {} \lim _{n \rightarrow \mathcal {U}} \langle \Phi _n(X_n),Y_n\rangle _{L^2(\mathcal {B}_n)} = \lim _{n \rightarrow \mathcal {U}} C_{{{\,\mathrm{bim}\,}}}(\lambda _n,\mu _n). \end{aligned}$$

This implies that \(d_{{{\,\mathrm{bim}\,}}}(\lambda ,\mu ) \le \lim _{n \rightarrow \mathcal {U}} d_{{{\,\mathrm{bim}\,}}}(\lambda _n, \mu _n)\), so \(d_{{{\,\mathrm{bim}\,}}}\) is weak-\(*\) lower semi-continuous as desired.

Finally, we show that \((\Sigma _{m,R},d_{{{\,\mathrm{bim}\,}}})\) is complete. Let \((\lambda _n)_{n \in {\mathbb {N}}}\) be a Cauchy sequence with respect to \(d_{{{\,\mathrm{bim}\,}}}\). Using (6.3), for each \(i_1\), ..., \(i_\ell \in \{1,\dots ,m\}\), the sequence \((\lambda _n(x_{i_1} \dots x_{i_\ell }))_{n \in {\mathbb {N}}}\) is Cauchy and hence converges in \(\mathbb {C}\) to some limit \(\lambda (x_{i_1} \dots x_{i_\ell })\). Extend \(\lambda \) linearly to a map on \(\mathbb {C}\langle x_1,\dots ,x_m\rangle \rightarrow \mathbb {C}\), and then it is straightforward to check that \(\lambda \in \Sigma _{m,R}\) using Definition 2.25. Then because \(d_{{{\,\mathrm{bim}\,}}}\) is weak-\(*\) lower semi-continuous,

$$\begin{aligned} d_{{{\,\mathrm{bim}\,}}}(\lambda _n,\lambda ) \le \liminf _{k \rightarrow \infty } d_{{{\,\mathrm{bim}\,}}}(\lambda _n, \lambda _k) \le \sup _{k \ge n} d_{{{\,\mathrm{bim}\,}}}(\lambda _n, \lambda _k). \end{aligned}$$

The right-hand side goes to zero as \(n \rightarrow \infty \) because \((\lambda _n)_{n \in {\mathbb {N}}}\) was assumed to be Cauchy in \(d_{{{\,\mathrm{bim}\,}}}\). This shows that \(\lambda _n \rightarrow \lambda \) in \(d_{{{\,\mathrm{bim}\,}}}\) as desired. \(\square \)

We saw in the preceding argument that \(C_{{{\,\mathrm{bim}\,}}}(\mu ,\nu ) \ge C(\mu ,\nu )\). In the commutative setting, we have equality by a similar argument as in [11, Theorem 1.5]. (For further discussion of bimodules over commutative tracial \(\mathrm {W}^*\)-algebras, see [2, Example 13.1.2].)

Lemma 6.10

Let \(\mu \) and \(\nu \in \Sigma _{m,R}\) be non-commutative laws that can be realized by elements of commutative tracial \(\mathrm {W}^*\)-algebras. Then \(C_{{{\,\mathrm{bim}\,}}}(\mu ,\nu ) = C(\mu ,\nu )\), and there exists an optimal coupling in a commutative tracial \(\mathrm {W}^*\)-algebra.

Proof

Let \((\mathcal {A},X)\) and \((\mathcal {B},Y)\) be the GNS realizations of \(\mu \) and \(\nu \). Consider an optimal bimodule coupling given by a Hilbert \(\mathcal {A}\)-\(\mathcal {B}\)-bimodule H and a bitracial vector \(\xi \in H\). Let \(X_j' \in B(H)\) be the operator of left multiplication by \(X_j\), and let \(Y_j \in B(H)\) be the operator of right multiplication by \(Y_j\). Let M be the \(\mathrm {W}^*\)-subalgebra of B(H) generated by \(X' = (X_1',\dots ,X_m')\) and \(Y' = (Y_1',\dots ,Y_m')\). Since \(X_i'\) and \(Y_j'\) commute and \(X_i'\) and \(X_j'\) commute and \(Y_i'\) and \(Y_j'\) commute, M is commutative. Let \(\tau : M \rightarrow \mathbb {C}\) be the map \(T \mapsto \langle \xi , T \xi \rangle \). Since M is commutative, \(\tau \) is a trace (it is a state and satisfies \(\tau (ab) = \tau (ab)\)). We have not shown that it is normal or faithful, but nonetheless, the map \(\gamma = \lambda _{(X',Y')}: \mathbb {C}\langle x_1,\dots ,x_{2m}\rangle \rightarrow \mathbb {C}\) given by \(p \mapsto \tau (p(X,Y))\) is still an element of \(\Sigma _{2m,R}\) according to Definition 2.25. Moreover, since \(\xi \) was a bitracial vector for \(\mathcal {A}\) and \(\mathcal {B}\), we have \(\tau (p(X')) = \tau _{\mathcal {A}}(p(X)) = \mu (p)\) and \(\tau (p(Y')) = \tau _{\mathcal {B}}(p(Y)) = \nu (p)\). Therefore, \(\gamma \) has the marginals \(\mu \) and \(\nu \). If \((\mathcal {C},({\widehat{X}},{\widehat{Y}}))\) is the GNS realization of \(\gamma \), then \(\mathcal {C}\) is commutative because for any non-commutative polynomials p and q in 2m variables,

$$\begin{aligned} \Vert (pq - qp)({\widehat{X}},{\widehat{Y}})\Vert _{L^2(\mathcal {C})}^2= & {} \gamma [(pq - qp)^*(pq - qp)] \\= & {} \tau ((pq - qp)^*(pq-qp)(X',Y')) = 0, \end{aligned}$$

and non-commutative polynomials of X and Y are dense in \(L^2(\mathcal {C})\) (by Lemma 2.34). Moreover,

$$\begin{aligned} \langle *\rangle {{\widehat{X}},{\widehat{Y}}}_{L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m} = \sum _{j=1}^m \gamma (x_j x_{m+j}) = \sum _{j=1}^m \tau (X_j' Y_j') = \sum _{j=1}^m \langle \xi , X_j \xi Y_j\rangle . \end{aligned}$$

Hence, \((\mathcal {C},{\widehat{X}},{\widehat{Y}})\) is a coupling in a commutative tracial \(\mathrm {W}^*\)-algebra which is also an optimal bimodule coupling of \(\mu \) and \(\nu \). \(\square \)

For general non-commutative laws, the inequality \(C(\mu ,\nu ) \le C_{{{\,\mathrm{bim}\,}}}(\mu ,\nu )\) can be strict, even for non-commutative laws of matrix tuples. We can deduce this from another result of Haagerup and Musat that \({\text {FM}}(M_n(\mathbb {C}),M_n(\mathbb {C}))\) is in general strictly smaller than \({\text {UCPT}}(M_n(\mathbb {C}),M_n(\mathbb {C}))\), and in particular there is an explicit non-factorizable \({\text {UCPT}}\) map on \(M_3(\mathbb {C})\).

Theorem 6.11

(Haagerup-Musat [32, Example 3.1], [33, Theorems 5.2 and 5.6]) For \(n > 1\), let \(W_n^-: M_n(\mathbb {C}) \rightarrow M_n(\mathbb {C})\) be the Holevo-Werner channel \(W_n^-(x) = \frac{1}{n-1}({{\,\mathrm{Tr}\,}}_n(x)1 - x^t)\). Then \(W_n^-\) is a \({\text {UCPT}}\) map, and it is factorizable if and only if \(n \ne 3\).

Combining non-factorizability of \(W_3^-\) with Lemma 5.7 similarly to the proof of Corollary 5.14, we deduce the following corollary.

Corollary 6.12

There exist \(X, Y \in M_3(\mathbb {C})_{{{\,\mathrm{sa}\,}}}^{9}\) such that \(C_{{{\,\mathrm{bim}\,}}}(\lambda _X,\lambda _Y) > C(\lambda _X,\lambda _Y)\).

This shows that the metrics \(d_{{{\,\mathrm{bim}\,}}}\) and \(d_W^{(2)}\) are distinct. It is unclear to us whether \(d_{{{\,\mathrm{bim}\,}}}\) and \(d_W^{(2)}\) generate the same topology. However, the results of §5.4 about the Wasserstein distance adapt to the \({\text {UCPT}}\) setting without much difficulty. For instance, we have the following analog of Lemma 5.16.

Lemma 6.13

Let \((\mu _n)_{n \in {\mathbb {N}}}\) and \(\mu \) be non-commutative laws. Let \((\mathcal {A},X)\) be the GNS realization of \(\mu \). Let \(\mathcal {A}_n\) be a tracial \(\mathrm {W}^*\)-algebra and \(X_n \in L^\infty (\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m\) such that \(\lambda _{X_n} = \mu _n\). Then the following are equivalent:

  1. (1)

    \(\lim _{n \rightarrow \mathcal {U}} d_{{{\,\mathrm{bim}\,}}}(\mu _n,\mu ) = 0\).

  2. (2)

    There exists a tracial \(\mathrm {W}^*\)-embedding \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) and there exists \(\Phi _n \in {\text {UCPT}}(\mathcal {A},\mathcal {A}_n)\) such that

    $$\begin{aligned} \phi (X) = [X_n]_{n \in {\mathbb {N}}}, \qquad \phi (Z) = [\Phi _n(Z)]_{n \in {\mathbb {N}}} \text { for all } Z \in L^\infty (\mathcal {A}). \end{aligned}$$

Proof

(1) \(\implies \) (2). By Lemma 5.10, there is a tracial \(\mathrm {W}^*\)-embedding \(\phi : \mathcal {A}\rightarrow \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) with \(\phi (X) = [X_n]_{n \in {\mathbb {N}}}\). Let \(\Phi _n \in {\text {UCPT}}(\mathcal {A},\mathcal {A}_n)\) such that \(\langle \Phi _n(X),X_n\rangle _{L^2(\mathcal {A}_n)_{{{\,\mathrm{sa}\,}}}^m} = C_{{{\,\mathrm{bim}\,}}}(\mu _n,\mu )\). As in the previous lemma, there exists \(\Phi \in {\text {UCPT}}(\mathcal {A}, \prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n)\) such that

$$\begin{aligned} \Phi (Z) = [\Phi _n(Z)]_{n \in {\mathbb {N}}} \text { for all } Z \in L^\infty (\mathcal {A}). \end{aligned}$$

It remains to show that \(\Phi = \phi \). Let \(X_n = (X_n^{(1)},\dots ,X_n^{(m)})\) and \(X = (X^{(1)},\dots ,X^{(m)})\). Using (6.1), for every \(i_1\), ..., \(i_\ell \in \{1,\dots ,m\}\), we have

$$\begin{aligned}&\Vert \Phi _n(X^{(i_1)},\dots ,X^{(i_\ell )}) - X_n^{(i_1)},\dots ,X_n^{(i_\ell )}\Vert _{L^2(\mathcal {A}_n)} \\&\quad \le \ell R^{\ell -1} \left( \Vert X\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 - 2 \langle \Phi _n(X),X_n\rangle + \Vert X_n\Vert _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m}^2 \right) ^{1/2} \\&\quad = \ell R^{\ell - 1} d_{{{\,\mathrm{bim}\,}}}(\mu _n,\mu ). \end{aligned}$$

Taking \(n \rightarrow \mathcal {U}\), we obtain

$$\begin{aligned} \Vert \Phi (X^{(i_1)},\dots ,X^{(i_\ell )})\! -\! \phi (X^{(i_1)},\dots ,X^{(i_\ell )})\Vert _{L^2(\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n)} \!\le \! \lim _{n \rightarrow \mathcal {U}} \ell R^{\ell -\!1} d_{{{\,\mathrm{bim}\,}}}(\mu _n,\mu )\! \!=\! 0. \end{aligned}$$

Hence, \(\Phi (p(X)) = \phi (p(X))\) for every non-commutative polynomial p. Since non-commutative polynomials are in X are dense in \(L^2(\mathcal {A})\) and \(\Phi \) and \(\phi \) are both contractions with respect to the \(L^2\) norm, we have \(\Phi = \phi \).

(2) \(\implies \) (1). The proof is the same as in Lemma 5.16, so we leave the details to the reader. \(\square \)

In a completely analogous way to Proposition 5.21, one can deduce that the weak-\(*\) and \(d_{{{\,\mathrm{bim}\,}}}\) topologies agree at some point \(\mu \in \Sigma _{m,R}\) if and only if the corresponding tracial \(\mathrm {W}^*\)-algebra \(\mathcal {A}\) obtained from the GNS construction is \({\text {UCPT}}\)-stable, meaning that every tracial \(\mathrm {W}^*\)-algebra embedding from \(\mathcal {A}\) into some ultraproduct \(\prod _{n \rightarrow \mathcal {U}} \mathcal {A}_n\) of tracial \(\mathrm {W}^*\)-algebras lifts to a sequence \((\Phi _n)_{n \in {\mathbb {N}}}\) where \(\Phi _n \in {\text {UCPT}}(\mathcal {A},\mathcal {A}_n)\). Furthermore, if \(\mathcal {A}\) is Connes-embeddable, then these two conditions are also equivalent to \(\mathcal {A}\) being approximately finite-dimensional; the proof is essentially the same as that of [5, Theorem 2.6] or that of Proposition 5.26. However, it is unknown how \({\text {FM}}\)-stability and \({\text {UCPT}}\)-stability are related in the non-Connes-embeddable setting.

To circle back to Monge–Kantorovich duality, given the relationship of optimal couplings with factorizable maps on the one hand and E-convex functions on the other hand, one might wonder whether there is an alternative version of the theory of convex functions and Legendre transforms that is based on \({\text {UCPT}}\) maps rather than factorizable maps. Indeed, this is possible, and we will sketch here some of the basic properties and the parts of the proof that are different from the E-convex case.

Definition 6.14

A \(\mathrm {W}^*\)-function with values in \([-\infty ,\infty ]\) is \({\text {UCPT}}\)-convex if either f is identically \(-\infty \), or else for every \(\mathcal {A}\), \(f^{\mathcal {A}}\) is a convex and lower semi-continuous function with values in \((-\infty ,\infty ]\), and we have \(f^{\mathcal {A}}(X) \le f^{\mathcal {B}}(\Phi (X))\) for every \(\mathcal {A}\), \(\mathcal {B}\in {\mathbb {W}}\) and \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\).

Definition 6.15

The \({\text {UCPT}}\)-Legendre transform of a tracial \(\mathrm {W}^*\)-function f is the tracial \(\mathrm {W}^*\)-function \(\mathcal {K}f\) given by

$$\begin{aligned} (\mathcal {K}f)^{\mathcal {A}}(X) = \sup _{\begin{array}{c} \mathcal {B}\in {\mathbb {W}} \\ \Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B}) \\ Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(Y). \end{aligned}$$

We have the following analog of Proposition 3.17.

Proposition 6.16

If f, g be a tracial \(\mathrm {W}^*\)-functions.

  1. (1)

    \(\mathcal {K}f\) is \({\text {UCPT}}\)-convex.

  2. (2)

    If \(f \le g\), then \(\mathcal {K}f \ge \mathcal {K} g\).

  3. (3)

    We have \(\mathcal {K}^2 f \le f\) with equality if and only if f is \({\text {UCPT}}\)-convex.

  4. (4)

    \(\mathcal {K}^2 f\) is the maximal \({\text {UCPT}}\)-convex function that is less than or equal to f.

The proof is essentially the same as that of Proposition 3.17, modulo the necessary changes to work with \({\text {UCPT}}\) maps rather than tracial \(\mathrm {W}^*\)-embeddings and conditional expectations. For instance, to show monotonicity of \(\mathcal {K}f\) under \({\text {UCPT}}\) maps, suppose that \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\). If \(\Psi \in {\text {UCPT}}(\mathcal {B},\mathcal {C})\), then \(\Psi \circ \Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {C})\). Therefore,

$$\begin{aligned} \mathcal {K} f^{\mathcal {A}}(X) \ge \sup _{\begin{array}{c} \mathcal {C}\in {\mathbb {W}} \\ \Psi \in {\text {UCPT}}(\mathcal {B},\mathcal {C}) \\ Y \in L^2(\mathcal {C})_{{{\,\mathrm{sa}\,}}}^m \end{array}} \left( \langle \Psi \circ \Phi (X),Y\rangle - f^{\mathcal {C}}(Y) \right) = \mathcal {K} f^{\mathcal {B}}(\Phi (X)). \end{aligned}$$

The relationship between the \({\text {UCPT}}\) Legendre transform and the E-convex Legendre transform is as follows (compare the relationship between the E-convex Legendre transform and the Hilbert-space Legendre tranform).

Corollary 6.17

Let f be a tracial \(\mathrm {W}^*\)-function.

  1. (1)

    If f is \({\text {UCPT}}\)-convex, then f is E-convex.

  2. (2)

    \(\mathcal {K}f \ge \mathcal {L} f\).

  3. (3)

    \(\mathcal {K}^2 f \le \mathcal {L}^2 f\).

  4. (4)

    If f is \({\text {UCPT}}\)-convex, then \(\mathcal {K}f = \mathcal {L}f\).

Proof

(1) and (2) are immediate from the definitions of \(\mathcal {L}\) and \(\mathcal {K}\) since every tracial \(\mathrm {W}^*\)-embedding is a \({\text {UCPT}}\) map.

(3) Observe that \(\mathcal {K}^2 f\) is E-convex by (1) and \(\mathcal {K}^2 f \le f\). Therefore, Proposition 3.17 (4) implies that \(\mathcal {K}^2 f \le \mathcal {L}^2 f\).

(4) We already know that \(\mathcal {L}f \le \mathcal {K}f\). For the reverse inequality, the idea is already contained in the proof of Proposition 6.16 (3). Note that for \(\mathcal {A}, \mathcal {B}\in {\mathbb {W}}\) and \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\), we have

$$\begin{aligned} \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {B}}(Y) \le \langle X,\Phi ^*(Y)\rangle _{L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m} - f^{\mathcal {A}}(\Phi ^*(Y)) \le \mathcal {L} f^{\mathcal {A}}(X). \end{aligned}$$

Taking the supremum over \(\mathcal {B}\), \(\Phi \), and Y, we obtain \(\mathcal {K}f \le \mathcal {L}f\). \(\square \)

The \({\text {UCPT}}\)-analog of Monge–Kantorovich duality is as follows.

Definition 6.18

A pair of tracial \(\mathrm {W}^*\)-functions (fg) with values in \((-\infty ,\infty ]\) is said to be \({\text {UCPT}}\)-admissible if for every \(\mathcal {A}\), \(\mathcal {B}\in {\mathbb {W}}\) and \(X \in L^2(\mathcal {A})_{{{\,\mathrm{sa}\,}}}^m\) and \(Y \in L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m\) and \(\Phi \in {\text {UCPT}}(\mathcal {A},\mathcal {B})\), we have

$$\begin{aligned} f^{\mathcal {A}}(X) + g^{\mathcal {B}}(Y) \ge \langle \Phi (X),Y\rangle _{L^2(\mathcal {B})_{{{\,\mathrm{sa}\,}}}^m}. \end{aligned}$$

Proposition 6.19

\(C_{{{\,\mathrm{bim}\,}}}(\mu ,\nu )\) is equal to the infimum of \(\mu (f) + \nu (g)\) over all \({\text {UCPT}}\)-admissible pairs of tracial \(\mathrm {W}^*\)-functions, as well as the infimum of \(\mu (f) + \nu (g)\) over all \({\text {UCPT}}\)-admissible pairs of \({\text {UCPT}}\)-convex functions.

The proof is the same as that of Proposition 3.23; similarly, there is an \({\text {UCPT}}\) analog of Proposition 3.24. However, although there is an analog of Monge–Kantorovich duality, there are many questions about bimodule couplings for which the answer is not immediately clear:

  • Is there a bimodule analog of the displacement interpolation?

  • Is there a bimodule analog of the \(L^p\) Wasserstein distance for \(p \ne 2\)?

  • Is there a useful subgradient characterization of \({\text {UCPT}}\)-convexity analogous to Lemma 3.10?

  • Do \(d_{{{\,\mathrm{bim}\,}}}\) and \(d_W^{(2)}\) generate the same topology on \(\Sigma _{m,R}\)?