Gibbs Manifolds

Gibbs manifolds are images of affine spaces of symmetric matrices under the exponential map. They arise in applications such as optimization, statistics and quantum~physics, where they extend the ubiquitous role of toric geometry. The Gibbs variety is the zero locus of all polynomials that vanish on the Gibbs manifold. We compute these polynomials and show that the Gibbs variety is low-dimensional. Our theory is applied to a wide range of scenarios, including matrix pencils and quantum optimal transport.


Introduction
Toric varieties provide the geometric foundations for many successes in the mathematical sciences.In statistics they appear as discrete exponential families [23, p. 2], and their ideals reveal Markov bases for sampling from conditional distributions [5].In optimization, they furnish nonnegativity certificates [7] and they govern the entropic regularization of linear programming [21].Notable sightings in phylogenetics, stochastic analysis, Gaussian inference and chemical reaction networks gave us the slogan that the world is toric [16,Section 8.3].
In all of these applications, the key player is the positive part of the toric variety.That real manifold is identified with a convex polytope by the moment map [16,Theorem 8.24].The fibers of the underlying linear map are polytopes of complementary dimension, and each fiber intersects the toric variety uniquely, in the Birch point.This is the unique maximizer of the entropy over the polytope [19,Theorem 1.10].In statistical physics and computer science [25], the Birch point is known as the Gibbs distribution.The name Gibbs refers to the maximum entropy state in a quantum system, and this is also the reason behind our title.
This paper initiates a non-commutative extension of applied toric geometry.In that extension, points in R n are replaced by real symmetric n×n matrices, and linear programing is replaced by semidefinite programming.There is a moment map which takes the cone of positive semidefinite matrices onto a spectrahedral shadow, and whose fibers are spectrahedra of complementary dimension.The Gibbs manifold plays the role of the positive toric variety.Each spectrahedron intersects the Gibbs manifold uniquely, in the Gibbs point.Just like in the toric case, we study these objects algebraically by passing to the Zariski closure of our positive manifold.The resulting analogues of toric varieties are called Gibbs varieties.
We illustrate these concepts for the following linear space of symmetric 3 × 3-matrices: The Gibbs manifold GM(L) is obtained by applying the exponential function to each matrix in L. Since the matrix logarithm is the inverse to the matrix exponential, it is a 3-dimensional manifold, contained in the 6-dimensional cone int(S 3 + ) of positive definite 3 × 3 matrices.Consider the quotient map from the matrix space S 3 ≃ R 6 onto S 3 /L ≃ R 3 .This takes a positive semidefinite matrix X = [x ij ] to its inner products with the matrices in a basis of L: π : S 3 + → R 3 : X → trace(X) + 2x 12 , trace(X) + 2x 13 , trace(X) + 2x 23 .
Precisely this map appeared in the statistical study of Gaussian models in [22,Example 1.1].
The fibers π −1 (b) are three-dimensional spectrahedra, and these serve as feasible regions in optimization, both for semidefinite programming and for maximum likelihood estimation.We here consider yet another convex optimization problem over the spectrahedron π −1 (b), namely maximizing the von Neumann entropy h(X) = trace(X − X • log(X)).This problem has a unique local and global maximum, at the intersection π −1 (b) ∩ GM(L).See Theorem 5.1.This Gibbs point is the maximizer of the entropy over the spectrahedron.Therefore, the Gibbs manifold GM(L) is the set of Gibbs points in all fibers π −1 (b), as b ranges over R 3 .
As promised, the study of Gibbs manifolds and Gibbs varieties is a non-commutative extension of applied toric geometry.Indeed, every toric variety is a Gibbs variety arising from diagonal matrices.For instance, the toric surface { x ∈ R 3 : x 1 x 3 = x 2  2 } is realized as GV(L ′ ) = X ∈ S 3 : x 11 x 33 − x 2 22 = x 12 = x 13 = x 23 = 0 for the diagonal matrix pencil However, even for diagonal matrices, the dimension of the Gibbs variety can exceed that of the Gibbs manifold.To see this, replace the matrix entry 2y 1 by √ 2y 1 in the definition of L ′ .This explains why transcendental number theory will make an appearance in our study.
Our presentation in this paper is organized as follows.Section 2 gives a more thorough introduction to Gibbs manifolds and Gibbs varieties.Theorem 2.4 states that the dimension of the Gibbs variety is usually quite small.The proof of this result is presented in Section 3.
In that section we present algorithms for computing the prime ideal of the Gibbs variety.This is an implicitization problem, where the parametrization uses transcendental functions.We compare exact symbolic methods for solving that problem with a numerical approach.A key ingredient is the Galois group for the eigenvalues of a linear space of symmetric matrices.We implemented our algorithms in Julia, making use of the computer algebra package Oscar.jl[18].Our code and data are available at https://mathrepo.mis.mpg.de/GibbsManifolds.
In Section 4 we study the Gibbs varieties given by two-dimensional spaces of symmetric matrices.This rests on the classical Segre-Kronecker classification of matrix pencils [6].
In Section 5 we turn to the application that led us to start this project, namely entropic regularization in convex optimization.That section develops the natural generalization of the geometric results in [21] from linear programming to semidefinite programming.We conclude in Section 6 with a study of quantum optimal transport [3].This is the semidefinite programming analogue to the classical optimal transport problem [21, Section 3].We show that its Gibbs manifold is the positive part of a Segre variety in matrix space.

From manifolds to varieties
We write S n for the space of symmetric n×n-matrices.This is a real vector space of dimension n+1 2 .The subset of positive semidefinite matrices is denoted S n + .This is a full-dimensional closed semialgebraic convex cone in S n , known as the PSD cone.The PSD cone is self-dual with respect to the trace inner product, given by X, Y := trace(XY ) for X, Y ∈ S n .
The matrix exponential function is defined by the usual power series, which converges for all real and complex n × n matrices.It maps symmetric matrices to positive definite symmetric matrices.The zero matrix 0 n is mapped to the identity matrix id n .We write exp : This map is invertible, with the inverse given by the familiar series for the logarithm: We next introduce the geometric objects studied in this article.We fix any matrix A 0 ∈ S n and d linearly independent matrices A 1 , A 2 , . . ., A d , also in S n .We write L for the affine subspace A 0 + span R (A 1 , . . ., A d ) of the vector space S n ≃ R ( n+1 2 ) .Thus, L is an affine space of symmetric matrices (ASSM) of dimension d.If A 0 = 0, then L is a linear space of symmetric matrices (LSSM).We are interested in the image of L under the exponential map: with the identification given by the exponential map and the logarithm map.
In notable special cases (e.g. that in Section 6), the Gibbs manifold is semi-algebraic, namely it is the intersection of an algebraic variety with the PSD cone.However, this fails in general, as seen in the Introduction.It is still interesting to ask which polynomial relations hold between the entries of any matrix in GM(L).This motivates the following definition.
Its Gibbs manifold GM(L) is a surface in S 4 ≃ R 10 .The Gibbs variety GV(L) has dimension five and degree three.It consists of all symmetric matrices X = (x ij ) whose entries satisfy This follows from the general result on matrix pencils in Theorem 4.4.⋄ The following dimension bounds constitute our main result on Gibbs varieties.
Theorem 2.4.Let L ⊂ S n be an ASSM of dimension d.The dimension of the Gibbs variety These bounds are attained in many cases, including Example 2.3.Our proof of Theorem 2.4 appears in Section 3, in the context of algorithms for computing the ideal of GV(L).
While it might be difficult to find all polynomials that vanish on the Gibbs manifold, finding linear relations is sometimes easier.Such relations are useful for semidefinite optimization, see Remark 5.2.This brings us to the final geometric object studied in this paper.
Definition 2.5.The Gibbs plane GP(L) is the smallest affine space containing GV(L).
Example 2.6.The Gibbs plane of the LSSM L from Example 2.3 is the 7-dimensional linear space in C 10 that is defined by the three linear relations listed in the first row of (3).⋄ We claimed in the Introduction that this article offers a generalization of toric varieties.In what follows, we make that claim precise, by discussing the case when L is a commuting family.This means that the symmetric matrices A 0 , A 1 , . . ., A d commute pairwise, i.e.A i A j = A j A i for all i, j.We now assume that this holds.Then the ASSM L can be diagonalized, i.e. there is an orthogonal matrix V such that Λ i = V ⊤ A i V is a diagonal matrix, for all i.The vector λ i ∈ R n of diagonal entries in Λ i = diag(λ i ) contains the eigenvalues of A i .
The matrix exponential of any element in L can be computed as follows: Let D denote this ASSM of diagonal matrices, i.e.
Then the linear change of coordinates given by V identifies the respective Gibbs manifolds: The same statement holds for the Gibbs varieties and the Gibbs planes: The dimensions of these objects are determined by arithmetic properties of the eigenvalues.
Recall that Λ i = diag(λ i ) where λ i is a vector in R n .Let Λ denote the linear subspace of R n that is spanned by the d vectors λ 1 , . . ., λ d .We have D = λ 0 + Λ, and therefore Here ⋆ denotes coordinate-wise multiplication in R n .Let Λ Q be the smallest vector subspace of R n spanned by elements of Q n which contains Λ.Its dimension form a basis of Λ Q .Then, inside an n-dimensional linear space defined by the diagonality condition, we have This is a toric variety of dimension d Q .Just like in [21, Section 2], the closure is taken in C n .The Gibbs manifold GM(D) is a d-dimensional subset of the real points in GV(D) for which z has strictly positive coordinates.We summarize our discussion in the following theorem.
Theorem 2.7.Let L be an affine space of pairwise commuting symmetric matrices.Then the Gibbs variety GV(L) is a toric variety of dimension d Q , given explicitly by ( 6) and (7).
For an illustration, consider the seemingly simple case d = 1 and A 0 = 0. Here, GM(L) is the curve formed by all powers of exp(A 1 ), and GV(L) is a toric variety of generally higher dimension.This scenario is reminiscent of that studied by Galuppi and Stanojkovski in [8].
Example 2.8.Let n = 3 and consider the LSSM L spanned by Here, Hence GV(D) is the toric surface {q 4 11 = q 22 q 33 } in GP(D) = {Q ∈ S 3 : q 12 = q 13 = q 23 = 0}.We transform that surface into the original coordinates via (6).The computation reveals GV(L) = {X ∈ GP(L) : x 4  23 −4x This concludes our discussion of the toric Gibbs varieties that arise from pairwise commuting matrices.In the next section we turn to the general case, which requires new ideas.

Implicitization of Gibbs varieties
Implicitization is the computational problem of finding implicit equations for an object that comes in the form of a parametrization.When the parametrizing functions are rational functions, these equations are polynomials and can be found using resultants or Gröbner bases [16,Section 4.2].A different approach rests on polynomial interpolation and numerical nonlinear algebra.This section studies the implicitization problem for Gibbs varieties.The difficulty arises from the fact that Gibbs manifolds are transcendental, since their parametrizations involve the exponential function.We start out by presenting our proof of Theorem 2.4.
As in Section 2, We shall parametrize the Gibbs manifold GM(L) in terms of the coordinates y 1 , . . ., y d on L. This uses the following formula.
Theorem 3.1 (Sylvester [24]).Let f : D → R be an analytic function on an open set D ⊂ R and M ∈ R n×n a matrix that has n distinct eigenvalues λ 1 , . . ., λ n in D. Then We note that the product on the right hand side takes place in the commutative ring R Its zeros λ are algebraic functions of the coordinates y = (y 1 , . . ., y d ) on L.
We first assume that L has distinct eigenvalues, i.e. there is a Zariski open subset U ⊂ R d such that P L (λ; y * ) has n distinct real roots λ for all y * ∈ U. Sylvester's formula writes the entries of exp(A(y)) as rational functions of y, λ i (y) and e λ i (y) for y ∈ U.These functions are multisymmetric in the pairs (λ i , e λ i ).They evaluate to convergent power series on R d .
Let V be the subvariety of U × R n that is defined by the equations where σ t (λ) is the t th elementary symmetric polynomial evaluated at (λ 1 , . . ., λ n ).We have dim V = d.Define a map φ : V × R n → S n , using coordinates z 1 , . . ., z n on R n , as follows: The closure φ(V × R n ) of the image of this map is a variety.It contains the Gibbs variety: setting z i = e λ i parametrizes a dense subset of the Gibbs manifold, by Theorem 3.1.
The Gibbs variety of the LSSM RL spanned by the ASSM L also lies in φ(V × R n ), because exp(y 0 A(y)) = φ(y 0 • y, y 0 • λ, e y 0 •λ ) for any y ∈ U and y 0 ∈ R\{0}.We thus have Finally, suppose that L is an LSSM, i.e.A 0 = 0. Then L is the linear span of an ASSM of dimension d − 1 in S n .The second inequality therefore gives dim GV(L) ≤ d + n − 1.
We finally consider the case when L has m < n distinct eigenvalues.Since symmetric matrices are diagonalizable, Sylvester's formula can easily be adapted to this case: it suffices to sum over the distinct eigenvalues of M, and to adjust the parametrization (9) accordingly.That is, we replace n by m.See [12, Chapter 6.1, Problem 14] for details.Remark 3.2.If the points exp(λ(y)) = (e λ 1 (y) , . . ., e λn(y) ), y ∈ U, lie in a lower-dimensional subvariety W ⊂ R n , then the proof of Theorem 2.4 gives the better bound dim GV(L) ≤ d + dim W .We saw this in Example 2.8.In general, no such subvariety W exists, i.e. one expects W = R n .This is an issue of Galois theory, to be discussed at the end of this section.
For ease of exposition, we work only with LSSMs in the rest of this section.That is, we set A 0 = 0. We comment on the generalization to ASSMs in Remark 3.7.Our discussion and the proof of Theorem 2.4 suggest Algorithm 1, for computing the ideal of the Gibbs variety of an LSSM L. That ideal lives in a polynomial ring R[X] whose variables are the entries Algorithm 1 Implicitization of the Gibbs variety of an LSSM L, defined over Q Input: Linearly independent matrices A 1 , . . ., A d ∈ S n with rational entries Output: Polynomials that define GV(L), where {the entries of φ(y, λ, z) − X}, with X = (x ij ) a symmetric matrix of variables 5: E 2 , D ← clear denominators in E 2 and record the least common denominator D 6: if the roots λ 1 , . . ., λ n of P L (λ; y) are Q-linearly dependent then J ← elimination ideal obtained by eliminating y, λ, z from I 13: return a set of generators of J of a symmetric n × n matrix.The algorithm builds three subsets E 1 , E 2 , E 3 of the larger polynomial ring R[y, λ, z, X].After the saturation (step 11), the auxiliary variables y, λ, z are eliminated.The equations E ′ 1 come from (8).They constrain (y, λ) to lie in V .The set E 1 generates an associated prime of E ′ 1 (step 3), see the discussion preceding Theorem 3.6.The equations E 2 come from the parametrization (9).Note that, if L has m < n distinct eigenvalues, this formula can be adjusted as in the end of the proof of Theorem 2.4, and the requirement after step 1 can be dropped.Later in the algorithm, one replaces n with m.It is necessary to clear denominators in order to obtain polynomials (step 5).The saturation by the LCD D avoids spurious components arising from this step.Finally, E 3 accounts for toric relations between the z i arising from Q-linear relations among the λ i .If no such relations exist, then Theorem 3.5 ensures that the assignment E 3 ← ∅ in step 9 is correct.
Steps 6 and 7 in Algorithm 1 require a detailed discussion.Further below we shall explain the Q-linear independence of eigenvalues, how to check this, and how to compute E 3 .Ignoring this for now, one can also run Algorithm 1 with E 3 = ∅.Then step 13 still returns polynomials that vanish on the Gibbs variety GV(L) but these may cut out a larger variety.
Example 3.3.The Gibbs variety GV(L) for the LSSM L in (1) has the parametrization Our Julia code for Algorithm 1 easily finds the cubic polynomial defining GV(L).⋄ In spite of such successes, symbolic implicitization is limited to small n and d.Numerical computations can help, in some cases, to find equations for more challenging Gibbs varieties.
Example 3.4.We consider the LSSM of 4 × 4 Hankel matrices with upper left entry zero: Algorithm 1 failed to compute its Gibbs variety.We proceed using numerics as follows.Fix a degree D > 0 and let N = 9+D D be the number of monomials in the 10 coordinates x 11 , . . ., x 44 on S 4 .We create M ≥ N samples on GM(L) by plugging in random values for the six parameters y i and applying the matrix exponential.Finding all vanishing equations of degree D on these samples amounts to computing the kernel of an M × N Vandermonde matrix.If this matrix has full rank, then there are no relations of degree D. We implemented this procedure in Julia.In our example, Theorem 2.4 suggests that GV(L) is a hypersurface, and this is indeed the case.Its defining equation has degree D = 6.We found it using M = 5205 ≥ N = 5005 samples.Our denary sextic has 853 terms with integer coefficients: x 3  11 x 22 x 24 x 34 −x Its Newton polytope has the f-vector (456, 5538, 21560, 41172, 44707, 29088, 11236, 2370, 211).Note that the package Oscar.jlconveniently allows to perform symbolic and numerical implicitization and polyhedral computations in the same programming environment.We emphasize that our numerical Julia code is set up to find exact integer coefficients.For this, we first normalize the numerical approximation of the coefficient vector by setting its first (numerically) nonzero entry to one.Then we rationalize the coefficients using the built in command rationalize in Julia, with error tolerance tol = 1e-7.Correctness of the result is proved by checking that the resulting polynomial vanishes on the parametrization.⋄ We now turn to Q-linear relations among eigenvalues of L. Our arithmetic discussion begins with a version of [1, (SP)], which is well-known in transcendental number theory: Theorem 3.5 (Ax-Schanuel).If the eigenvalues λ 1 , . . ., λ n of the LSSM L are Q-linearly independent, then e λ 1 , . . ., e λn are algebraically independent over the field C(y 1 , . . ., y d ).
On the other hand, suppose that the eigenvalues of L satisfy some non-trivial linear relation over Q.We can then find nonnegative integers α i and β j , not all zero, such that This implies that the exponentials of the eigenvalues satisfy the toric relations The linear relations (10) can be found from the ideal E ′ 1 in step 2 which specifies that the λ i are the eigenvalues of A(y).This ideal is radical if we assume that L has distinct eigenvalues.We compute the prime decomposition of the ideal over Q.All prime components are equivalent under permuting the λ i , so we replace E ′ 1 by any of these prime ideals in step 3. We compute (10) as the linear forms in that prime ideal.Using (11), we compute the toric ideal E 3 in step 7, which is also prime.This ideal defines a toric variety W ′ , whose S n -orbit is the variety W in Remark 3.2.We arrive at the following result.Theorem 3.6.Let L ⊂ S n be an LSSM with distinct eigenvalues.The Gibbs variety GV(L) is irreducible and unirational, and the ideal J found in Algorithm 1 is its prime ideal.
Proof.Sylvester's formula yields a rational parametrization ψ of GV(L) with parameters y 1 , . . ., y d , z 1 , . . ., z n .The parameters λ i in (9) can be omitted: the entries in the image are multisymmetric in (λ i , z i ), so that they can be expressed in terms of elementary symmetric polynomials of the λ i [2, Theorem 1].The point (z 1 , . . ., z n ) lies on the toric variety W ′ defined above.The domain C d × W ′ of ψ is an irreducible variety, and it is also rational.The image of ψ is the Gibbs variety GV(L), which is therefore unirational and irreducible.The ideals given by E 1 and E 2 in Algorithm 1 are prime, after saturation, and elimination in step 12 preserves primality.Hence the output in J in step 13 is the desired prime ideal.
We define the Galois group G L of an LSSM L to be the Galois group of the characteristic polynomial P L (λ, y) over the field Q(y 1 , . . ., y d ).Note that G L is the subgroup of the symmetric group S n whose elements are permutations that fix each associated prime of E ′ 1 .Hence the index of the Galois group G L in S n is the number of associated primes.In particular, the Galois group equals S n if and only if the ideal E ′ 1 formed in step 2 is prime.The existence of linear relations (10) depends on the Galois group G L .If the Galois group is small then the primes of E 1 are large, and more likely to contain linear forms.There is a substantial literature in number theory on this topic.See [9,10] and the references therein.For instance, by Kitaoka [14, Proposition 2], there are no linear relations if n is prime, or if n ≥ 6 and the Galois group is S n or A n .If this holds, E 3 = ∅ in step 9 of Algorithm 1.
The computation of Galois groups is a well-studied topic in symbolic computation and number theory.Especially promising are methods based on numerical algebraic geometry (e.g. in [11]).These fit well with the approach to implicitization in Example 3.4.For a future theoretical project, it would be very interesting to classify LSSMs by their Galois groups.
Remark 3.7.We briefly comment on how to adjust Algorithm 1 to compute the Gibbs variety of an ASSM L with A 0 = 0.In this case, algebraic relations between e λ 1 , . . ., e λn come from Q-linear relations between the eigenvalues of L, but this time modulo C: an affine relation Here γ is a Q-linear combination of eigenvalues of A 0 .Theorem 3.6 holds for ASSMs as well, provided that these Q-linear relations modulo C can be computed in practice.This can usually not be done over Q.We leave this algorithmic challenge for future research.

Pencils of quadrics
In this section we study the Gibbs variety GV(L) where L ⊂ S n is a pencil of quadrics, i.e. an LSSM of dimension d = 2.We follow the exposition in [6], where pencils L are classified by Segre symbols.The Segre symbol σ = σ(L) is a multiset of partitions that sum up to n.It is computed as follows: Pick a basis {A 1 , A 2 } of L, where A 2 is invertible, and find the Jordan canonical form of A 1 A −1 2 .Each eigenvalue determines a partition, according to the sizes of the corresponding Jordan blocks.The multiset of these partitions is the Segre symbol σ.
Proof.Block-diagonal matrices are exponentiated block-wise.The entries outside the diagonal blocks are zero.The statement follows from det(exp(X i (y))) = exp(trace(X i (y))).
Proposition 4.3 holds for the canonical pencil L σ of any Segre symbol σ.First of all, for all indices (i, j) outside the diagonal blocks, we have x ij = 0 on the Gibbs plane GP(L σ ).Next, one has equations for the exponential of a single block, like those in Theorem 4.4 below.Finally, there are equations that link the blocks corresponding to entries σ ij of the same partition σ i .Some of these come from trace equalities between blocks of L σ , and this is the scope of Proposition 4.3.In particular, blocks ij and ik for which σ ij = σ ik mod 2 exponentiate to X ij ∈ S σ ij + and X ik ∈ S σ ik + with equal determinant.We saw this in (12).In all examples we computed, the three classes of equations above determine the Gibbs variety.
We now derive the equations that hold for the exponential of a single block.To this end, we fix σ = [n] with α 1 = 0.The canonical LSSM L [n] consists of the symmetric matrices ): The 2 × 2-minors of the following 2 × (n − 1) matrix vanish on the Gibbs variety GV(L [n] ): If the Galois group G L [n] is the symmetric group S n , then the prime ideal of GV(L [n] ) is generated by ( 13) and ( 14), and we have dim GP(L n ) = 2n − 1 and dim GV(L [n] ) = n + 1.
Remark 4.5.We conjecture that G L [n] = S n .This was verified computationally for many values of n, but we currently do not have a proof that works for all n.This gap underscores the need, pointed out at the end of Section 3, for a study of the Galois groups of LSSMs.
Proof.We claim that the linear equations ( 13) hold for every non-negative integer power of Y .This implies that they hold for exp(Y ).We will show this by induction.The equations clearly hold for Y 0 = id n .Suppose they hold for ( The following identity holds for 2 i < j, and it shows that exp(Y ) satisfies the equations ( 13): (2) X = y 1 A 1 + y 2 A 2 and B have the same eigenvectors, and (3) λ 1 , . . ., λ n are the eigenvalues of X.
Condition (1) follows from Theorem 3.1 for f = exp.It implies that there are only finitely many possibilities for the z-coordinates of the point p in the fiber: up to permutations, they are the eigenvalues of B. Condition (3) follows from (y 1 , y 2 , λ 1 , . . ., λ n ) ∈ V .It says that the λ-coordinates are determined, up to permutation, by y 1 , y 2 .Therefore, it suffices to show that the matrices in L whose eigenvectors are those of B form a one-dimensional subvariety.Symmetric matrices have common eigenvectors if and only if they commute.Define This is a pairwise commuting linear subspace.Note that S contains a nonzero matrix X, since there is a point in φ −1 (B) whose y-coordinates define a nonzero matrix in L. Therefore dim S ≥ 1.Since A 1 A 2 = A 2 A 1 , we also have dim S ≤ 1. Hence dim S = dim φ −1 (B) = 1 and the upper bound dim W + 1 for the dimension of GV(L), which is given by Remark 3.2, is attained in our situation.

Convex optimization
In this section we show how Gibbs manifolds arise from entropic regularization in optimization (cf.[21]).We fix an arbitrary linear map π : S n → R d .This can be written in the form Here the A i ∈ S n , and A i , X := trace(A i X).The image π(S n + ) of the PSD cone S n + under our linear map π is a spectrahedral shadow.Here it is a full-dimensional semialgebraic convex cone in R d .Interestingly, π(S n + ) can fail to be closed, as explained in [13].Semidefinite programming (SDP) is the following convex optimization problem: See e.g.[16,Chapter 12].The instance ( 15) is specified by the cost matrix C ∈ S n and the right hand side vector b ∈ R d .The feasible region S n + ∩ π −1 (b) is a spectrahedron.The SDP problem ( 15) is feasible, i.e. the spectrahedron is non-empty, if and only if b is in π(S n + ).Consider the LSSM L = span R (A 1 , . . ., A d ).We usually assume that L contains a positive definite matrix.This hypothesis ensures that each spectrahedron π −1 (b) is compact.
As a natural extension of [21, eqn (2)], we now define the entropic regularization of SDP: Here ǫ > 0 is a small parameter, and h denotes the von Neumann entropy, here defined as We note that h is invariant under the action of the orthogonal group on S n + .This implies h(X) = n i=1 (λ i − λ i log(λ i )), where λ 1 , . . ., λ n are the eigenvalues of X. Hence the von Neumann entropy h is the matrix version of the entropy function on R n + used in [21].
Our next result makes the role of Gibbs manifolds in semidefinite programming explicit.The following ASSM is obtained by incorporating ǫ and the cost matrix C into the LSSM: Here we allow the case ǫ = ∞, where the dependency on C disappears and the ASSM is simply the LSSM, i.e.L ∞ = L.The following theorem is the main result in this section.
Theorem 5.1.For b ∈ π(S n + ), the intersection of π −1 (b) with the Gibbs manifold GM(L ǫ ) consists of a single point X * ǫ .This point is the optimal solution to the regularized SDP (16).For ǫ = ∞, it is the unique maximizer of von Neumann entropy on the spectrahedron π −1 (b).
The importance of this result for semidefinite programming lies in taking the limit as ǫ tends to zero.This limit lim ǫ→0 X * ǫ exists and it is an optimal solution to (15).The optimal solution is unique for generic C. Entropic regularization is about approximating that limit.
Remark 5.2.Theorem 5.1 implies that adding the condition X ∈ GV(L ǫ ) to ( 16) leaves the optimizer unchanged.Hence, if we know equations for the Gibbs variety, we may shrink the feasible region by adding polynomial constraints.Most practical are the affine-linear equations: imposing X ∈ GP(L ǫ ) allows to solve ( 16) on a spectrahedron of lower dimension.
To prove Theorem 5.1, we derive two key properties of the von Neumann entropy: (a) h is strictly concave on the PSD cone S n + , and (b) the gradient of h is the negative matrix logarithm: ∇(h)(X) = −log(X).
Proof.For (a), we use a classical result by Davis [4].The function h is invariant in the sense that its value h(X) depends on the eigenvalues of X.In fact, it is a symmetric function of the n eigenvalues λ 1 , λ 2 , . . ., λ n .This function equals h(λ 1 , λ 2 , . . ., λ n ) = n i=1 (λ i − λ i log(λ i )), and this is a concave function R n + → R. The assertion hence follows from the theorem in [4].For (b) we prove a more general result.For convenience, we change variables Y = X −id n so that f (Y ) = h(Y + id n ) is analytic at Y = 0. Fix any function f : R → R that is analytic in a neighborhood of the origin.Then Y → trace(f (Y )) is a well-defined real-valued analytic function of n × n matrices Y = (y ij ) that are close to zero.The gradient of this function is the n × n matrix whose entries are the partial derivatives ∂trace(f (Y ))/∂y ij .We claim that Both sides are linear in f , and f is analytic, so it suffices to prove this for monomials, i.e.

∇trace(Y
Note that trace(Y k ) is a homogeneous polynomial of degree k in the matrix entries y ij , namely it is the sum over all products represent closed walks in the complete graph on k nodes.When taking the derivative ∂/∂y ij of that sum, we obtain k times the sum over all walks that start at node j and end at node i.Here each walk occurs with the factor k because y ij can be inserted in k different ways to create one of the closed walks above.This polynomial of degree k − 1 is the entry of the matrix power Y k−1 in row j and column i, so it is the entry of its transpose (Y ⊤ ) k−1 is row i and column j.To prove the proposition, we now apply (17) to the function f (y) = (y + 1) − (y + 1) • log(y + 1).
If L = D consists of diagonal matrices then the Gibbs manifold GM(D) is a discrete exponential family [23, §6.2], and π(GM(D)) is the associated convex polytope.This uses the moment map from toric geometry [16,Theorem 8.24].In particular, if the linear space D is defined over Q then the polytope is rational and the Zariski closure of GM(D) is the toric variety of that polytope.If the space D is not defined over Q then GM(D) is an analytic toric manifold, whose Zariski closure is the larger toric variety GV(D) = GM(D Q ) seen in (7).
The key step to proving Theorem 5.1 is a non-abelian version of the toric moment map.
Theorem 5.4.The restriction of the linear map π : S n + → R d to the Gibbs manifold GM(L) defines a bijection between GM(L) and the open spectrahedral shadow int(π(S n + )) in R d .
Proof.Fix an arbitrary positive definite matrix X ∈ int(S n + ) and set b = π(X).We must show that the spectrahedron π −1 (b) contains precisely one point that lies in GM(L).
Consider the restriction of the von Neumann entropy h to the spectrahedron π −1 (b).This restriction is strictly concave on the convex body π −1 (b) by Proposition 5.3.Therefore h attains a unique maximum X * in the relative interior of π −1 (b).The first order condition at this maximum tells us that ∇(h)(X * ) = −log(X * ) lies in L, which is the span of the gradients of the constraints A i , X = b i .Hence, the optimal matrix X * lies in the Gibbs manifold GM(L) = X ∈ S n + : log(X) ∈ L .
The assignment b → X * = X * (b) is well defined and continuous on the interior of the cone π(S n + ).We have shown that it is a section of the linear map π, which means π(X * (b)) = b.We conclude that π defines a homeomorphism between GM(L) and int(π(S n + )).Proof of Theorem 5.1.For any fixed ǫ > 0, any minimizer X * = X * ǫ of the regularized problem (16) in the spectrahedron π −1 (b) must satisfy C +ǫ•log(X * ) ∈ L. This is equivalent to X * ∈ GM(L ǫ ), and it follows from the first order optimality conditions.By the same convexity argument as in the proof of Theorem 5.4, the objective function in (16) has only one critical point in the spectrahedron π −1 (b).This implies π −1 (b) ∩ GM(L ǫ ) = {X * ǫ }.We can now turn the discussion around and offer a definition of Gibbs manifolds and Gibbs varieties purely in terms of convex optimization.Fix any LSSM L of dimension d in S n .This defines a canonical linear map π : S n + → S n /L ⊥ ≃ R d .Each fiber π −1 (b) is a spectrahedron.If this is non-empty then the entropy h(X) has a unique maximizer X * (b) in π −1 (b).The Gibbs manifold GM(L) is the set of these entropy maximizers X * (b) for b ∈ R d .The Gibbs variety GV(L) is defined by all polynomial constraints satisfied by these X * (b).
This extends naturally to any ASSM A 0 + L. We now maximize the concave function h(X) + A 0 , X over the spectrahedra π −1 (b).The Gibbs manifold GM(A 0 + L) collects all maximizers, and the Gibbs variety GV(A 0 + L) is defined by their polynomial constraints.
Example 5.5.Let L denote the space of all Hankel matrices [y i+j−1 ] 1≤i,j≤n in S n .This LSSM has dimension d = 2n − 1.The linear map π : S n + → R d takes any positive definite matrix X to a nonnegative polynomial b = b(t) in one variable t of degree 2n − 2. We have b(t) = (1, t, . . ., t n−1 )X(1, t, . . ., t n−1 ) ⊤ , so the matrix X gives a sum-of-squares (SOS) representation of b(t).The fiber π −1 (b) is the Gram spectrahedron [20] of the polynomial b.The entropy maximizer X * (b) in the Gram spectrahedron is a favorite SOS representation of b.The Gibbs manifold GM(L) gathers the favorite SOS representations for all nonnegative polynomials b.The Gibbs variety GV(L), which has dimension ≤ 3n − 2, is the tightest outer approximation of GM(L) that is definable by polynomials in the matrix entries.
In Example 3.4 we saw a variant of L, namely the sub-LSSM where the upper left entry of the Hankel matrix was fixed to be zero.If C = −E 11 is the corresponding negated matrix unit, then (15) is the problem of minimizing b(t) over t ∈ R. See [16,Section 12.3] for a first introduction to polynomial optimization via SOS representations.It would be interesting to explore the potential of the entropic regularization ( 16) for polynomial optimization.⋄ One of the topics of [21] was a scaling algorithm for solving the optimization problem ( 16) for linear programming (LP), i.e. the case when A 1 , . . ., A d are diagonal matrices.This algorithm extends the Darroch-Ratcliff algorithm for Iterative Proportional Fitting in statistics.Combining this with a method for driving ǫ to zero leads to a numerical algorithm for large-scale LP problems, such as the optimal transport problems in [21, Section 3].
We are hopeful that the scaling algorithm can be extended to the problem ( 16) in full generality.By combining this with a method for driving ǫ to zero, one obtains a numerical framework for solving SDP problems such as quantum optimal transport in Section 6.
One important geometric object for SDP is the limiting Gibbs manifold, lim ǫ→0 GM(L ǫ ).This is the set of optimal solutions, as b ranges over R d .In the case of LP, with C generic, it is the simplicial complex which forms the regular triangulation given by C.This reveals the combinatorial essence of entropic regularization of LP, as explained in [21,Theorem 7].From the perspective of positive geometry, it would be worthwhile to study lim ǫ→0 GM(L ǫ ) for SDP.This set is semialgebraic, and it defines a nonlinear subdivision of the spectrahedral shadow π(S n + ).If we vary the cost matrix C, the theory of fiber bodies in [15] becomes relevant.

Quantum optimal transport
In this section we examine a semidefinite programming analogue of the classical optimal transport problem, known as quantum optimal transport (QOT).We follow the presentation by Cole, Eckstein, Friedland, and Zyczkowski in [3].Our notation for the dimensions is as in [21,Section 3.1].We consider the space S d Each such matrix is mapped to a pair of two partial traces by the following linear map: where the d 1 ×d 1 matrix Y = (y ik ) satisfies y ik = d 2 j=1 x ijkj , and the d 2 ×d 2 matrix Z = (z jl ) satisfies z jl = d 1 i=1 x ijil .If X is positive semidefinite then so are its partial traces Y and Z.
Hence our marginalization map restricts to a linear projection of closed convex cones, denoted Diagonal matrices in S d 1 d 2 + can be identified with rectangular matrices of format d 1 ×d 2 whose entries are nonnegative.The map µ takes such a rectangular matrix to its row sums and column sums.Hence the restriction of µ to diagonal matrices in S d 1 d 2 + is precisely the linear map that defines classical optimal transport in the discrete setting of [21,Section 3.1].
The quantum optimal transportation problem (QOT) is the task of minimizing a linear function X → C, X over any transportation spectrahedron µ −1 (Y, Z).This is an SDP.Our main theorem in this section states that its Gibbs manifold is semialgebraic. 2 The image of the marginalization map µ generalizes the polytope ∆ d 1 −1 × ∆ d 2 −1 , and the fibers of µ are quantum versions of transportation polytopes.These shapes are now nonlinear.Lemma 6.2.The image of the map µ is a convex cone of dimension For any point (Y, Z) in the relative interior of this cone, the transportation spectrahedron µ −1 (Y, Z) is a compact convex body of dimension Hence, if Y ∈ S d 1 + and Z ∈ S d 2 + satisfy t = trace(Y ) = trace(Z) then 1 t Y ⊗ Z is a positive semidefinite matrix in the fiber µ −1 (Y, Z).This shows that the image is as claimed on the right hand side of (20).The image is a spectrahedral cone of dimension 2 yields the dimension of the interior fibers.The fibers of this map µ are the 5-dimensional transportation spectrahedra µ −1 (Y, Z).
To illustrate the QOT problem, we fix the margins and the cost matrix as follows: We wish to minimize C, X subject to µ(X) = (Y, Z).The optimal solution X * is equal to This was derived from the KKT equations in [17,Theorem 3].We conclude that the algebraic degree of QOT for d 1 = d 2 = 2 is equal to 12.This is smaller than the algebraic degree of semidefinite programming, which is 42.That is the entry for m=5 and n=4 in [17,Table 2].
This drop arises because QOT is a very special SDP.The LSSM for our QOT problem is This defines our 5-dimensional Gibbs manifold GM(L) in the 10-dimensional cone S 4 + .Theorem 6.1 states that it equals the positive part of the Gibbs variety, i.e.GM(L) = GV(L)∩S 4  + .We compute the entropy maximizer inside the 5-dimensional transportation spectrahedron µ −1 (Y, Z) for the marginal matrices Y and Z in (22).Notably, its entries are rational: Therefore, the (i, j) entry of trace(Z) • Y is obtained as 1  2 (E ij + E ji ) ⊗id d 2 , Y ⊗Z , where E ij is the (i, j)-th matrix unit.A similar observation holds for the entries of trace(Y ) • Z.This means that µ(X) is computed by evaluating trace (A ⊗ id d 2 )X and trace (id d 1 ⊗ B)X , where A ranges over a basis of S d 1 and B ranges over a basis of S d 2 .Therefore, we have + .This variety must be the Gibbs variety GV(L).More precisely, GV(L) consists of all tensor products Y ⊗ Z where Y, Z are complex symmetric.This is the cone over the Segre variety, which is the projective variety in P ( d 1 d 2 +1 2 )−1 whose points are the tensor products X = Y ⊗ Z.
We have the following immediate consequence of the proof of Theorem 6.1.The entropy maximizers have rational entries.This explains the matrix at the end of Example 6.3 Corollary 6.4.The Gibbs point for QOT is given by Y ⊗Z trace(Y ) , with Y, Z the given margins.At this point, it pays off to revisit Section 3 and to study its thread for the LSSM in (24)., where the marginalization records the partial trace for every clique in G.It would be interesting to study the Gibbs manifold and the Gibbs varieties for these models.One may ask whether they agree for all graphs G that are decomposable.By Theorem 6.1, this holds for QOT, where G is the graph with two nodes and no edges.

Definition 2 . 2 .
The Gibbs variety GV(L) of L is the Zariski closure of GM(L) in C ( n+1 2 ) .

1 d 2
of real symmetric matrices X of size d 1 d 2 ×d 1 d 2 .Rows and columns are indexed by [d 1 ] × [d 2 ].Thus, we write X = (x ijkl ), where (i, j) and (k, l) are in [d 1 ] × [d 2 ].The matrix being symmetric means that x ijkl = x klij for all indices.

Theorem 6 . 1 .
The Gibbs manifold GM(L) for QOT is a semialgebraic subset of S d 1 d 2 + .It consists of all symmetric matrices Y ⊗ Z, where Y ∈ S d 1 + and Z ∈ S d 2 + .The Gibbs variety GV(L) ⊂ S d 1 d 2 is linearly isomorphic to the cone over the Segre variety P ( d 1 +1

1 .
By linear extension, the equation (21) serves as a definition of the marginalization map µ on S d 1 d 2 .We observe the following for the trace inner product onS d 1 d 2 : trace (A ⊗ id d 2 )(Y ⊗ Z) = trace(Z) • trace(AY ) for all A ∈ S d 1 and trace (id d 1 ⊗ B)(Y ⊗ Z) = trace(Y ) • trace(BZ) for all B ∈ S d 2 .
1 and B ∈ S d 2 .(24)Now, the key step in the proof consists of the following formula for the matrix logarithmlog(Y ⊗ Z) = log(Y ) ⊗ id d 2 + id d 1 ⊗ log(Z).This holds for positive semidefinite matrices Y and Z, and it is verified by diagonalizing these matrices.By setting Y = exp(A) and Z = exp(B), we now conclude that the Gibbs manifold GM(L) consists of all tensor products Y ⊗ Z where Y ∈ S d 1 + and Z ∈ S d 2 + .We have shown that GM(L) is the intersection of a variety with S d 1 d 2