Logarithmically Sparse Symmetric Matrices

A positive definite matrix is called logarithmically sparse if its matrix logarithm has many zero entries. Such matrices play a significant role in high-dimensional statistics and semidefinite optimization. In this paper, logarithmically sparse matrices are studied from the point of view of computational algebraic geometry: we present a formula for the dimension of the Zariski closure of a set of matrices with a given logarithmic sparsity pattern, give a degree bound for this variety and develop implicitization algorithms that allow to find its defining equations. We illustrate our approach with numerous examples.


Introduction
Logarithmically sparse symmetric matrices are positive definite matrices for which the matrix logarithm is sparse.Such matrices arise in high-dimensional statistics [2], where structural assumptions about covariance matrices are necessary for giving consistent estimators, and sparsity assumptions are natural to make.Moreover, once the sparsity pattern is fixed, the corresponding set of logarithmically sparse matrices forms a Gibbs manifold [9].As we recall in Section 2, this is a manifold obtained by applying the matrix exponential to a linear system of symmetric matrices (LSSM), here defined by the sparsity pattern.Gibbs manifolds play an important role in convex optimization [9,Section 5].
From the point of view of practical computations, it might be challenging to tell exactly whether a given matrix satisfies a given logarithmic sparsity pattern.Checking whether a given polynomial equation holds on the matrix is often much easier.This motivates studying Zariski closures of families of logarithmically sparse matrices, i.e. common zero sets of polynomials that vanish on such families.Such Zariski closures are examples of Gibbs varieties.
In this paper we study Gibbs varieties that arise as Zariski closures of sets of logarithmically sparse symmetric matrices.We explain how those can be encoded by graphs, give a formula for their dimension and show that in practice it can be computed using simple linear algebra.We present a numerical and a symbolic algorithm for finding their defining equations.We also investigate how graph colourings can affect the corresponding Gibbs variety.In addition, we prove some general results about Gibbs varieties.In particular, we give an upper bound for the degree of a Gibbs variety in the case when the eigenvalues of the corresponding LSSM are Q-linearly independent and show that Gibbs varieties of permutation invariant LSSMs inherit a certain kind of symmetry.This paper is organized as follows.In Section 2, we define Gibbs manifolds and Gibbs varieties, the geometric objects needed for our research, present a formula for the dimension and an upper bound for the degree of Gibbs varieties, study symmetries of their defining equations and suggest a numerical implicitization algorithm.In Section 3, we give a formal definition of logarithmic sparsity, explain how it can be encoded by graphs and discuss the special properties of Gibbs varieties defined by logarithmic sparsity.In Section 4, we study families of logarithmically sparse matrices that arise from trees.In Section 5, we study coloured logarithmic sparsity conditions.Section 6 features a symbolic implicitization algorithm for Gibbs varieties defined by logarithmic sparsity.Finally, Section 7 contains a discussion on the practical relevance of logarithmic sparsity in statistics and optimization.

Gibbs Manifolds and Gibbs Varieties
Let S n denote the space of n × n symmetric matrices.This is a real vector space of dimension n+1 2 .The cone of positive semidefinite n × n matrices will be denoted by S n + .The matrix exponential function is defined by the usual power series, which converges for all real and complex n × n matrices.It maps symmetric matrices to positive definite symmetric matrices.The zero matrix 0 n is mapped to the identity matrix id n .We write exp : This map is invertible, with the inverse being the matrix logarithm function, given by the series We next introduce the geometric objects that will play a crucial role in this article.We fix d linearly independent matrices A 1 , A 2 , . . ., A d in S n .We write L for span R (A 1 , . . ., A d ) , a linear subspace of the vector space S n ≃ R ( n+1 2 ) .Thus, L is a linear space of symmetric matrices (LSSM).We are interested in the image of L under the exponential map: This is indeed a d-dimensional manifold inside the convex cone S n + .It is diffeomorphic to L ≃ R d , with the diffeomorphism given by the exponential map and the logarithm map.
In some special cases, the Gibbs manifold is semi-algebraic, namely it is the intersection of an algebraic variety with the PSD cone.However, this fails in general.It is still interesting to ask which polynomial relations hold between the entries of any matrix in GM(L).This motivates the following definition.
Definition 2.2.The Gibbs variety GV(L) of L is the Zariski closure of GM(L) in C ( n+1 2 ) .
Any LSSM can be written in the form L = {y 1 A 1 + . . .+ y d A d |y i ∈ R} and therefore can be identified with a matrix with entries in R(y 1 , . . ., y d ).The eigenvalues of this matrix are elements of the algebraic closure R(y 1 , . . ., y d ) and will be referred to as the eigenvalues of the corresponding LSSM.It is known that GV(L) is irreducible and unirational under the assumption that the eigenvalues of L are Q-linearly independent and L is defined over Q [9, Theorem 3.6].
In this section we extend the results of [9] that apply to any LSSM L. We start with studying the symmetries of the defining equations of GV(L).
We consider the tuple of variables x = {x ij |1 i j n}.An element σ of the symmetric group S n acts on the polynomial ring R[x] by sending x ij to x σ(i)σ(j) for 1 i j n (we identify the variables x ij and x ji ).We will also consider the action of S n on S n by simultaneously permuting rows and columns of a matrix.Proposition 2.3.Let L be an LSSM of n × n matrices that is invariant under the action of σ ∈ S n .Then the ideal I(GV(L)) of the corresponding Gibbs variety is also invariant under the action of σ.
Proof.To prove the Proposition, it suffices to show that if B ∈ L is obtained from A ∈ L by simultaneously permuting rows and columns, then exp (B) is obtained from exp (A) in the same way.Since exp (B) is a formal power series in B, it suffices to show that B k is obtained from A k by simultaneously permuting rows and columns for any non-negative integer k.The latter fact immediately follows from the matrix multiplication formula.
The action of σ on C[x 11 , x 12 , x 13 , x 22 , x 23 , x 33 ] sends p to −p and therefore does not change the ideal.⋄ Definition 2.5.Let A be an n × n matrix, and L be an LSSM of n × n matrices.The centralizer C(A) of A is the set of all matrices that commute with A. The L-centralizer The following is an extension of [9, Theorem 2.4].
Theorem 2.6.Let L be an LSSM of n × n matrices of dimension d.Let k be the dimension of the L-centralizer of a generic element in L and m the dimension of the Q-linear space spanned by the eigenvalues of L.
Proof.It follows from the proof of [9,Theorem 4.6] that the dimension of a generic fiber of the map φ that parametrizes the Gibbs variety is equal to the dimension of the centralizer of a generic element in this fiber, i.e. to k.The domain of φ is irreducible and has dimension m + d.Thus, by fiber dimension theorem [4, Exercise II.
Note that when m = k, we have dim GV(L) = d and therefore the Gibbs manifold is the positive part of the Gibbs variety, i.e.GM(L) = GV(L) ∩ S + n .In particular, in this case the Gibbs manifold is a semialgebraic set.This is the case, for instance, for the LSSM of all diagonal matrices [9,Theorem 2.7].
We now give a degree bound for the Gibbs variety of an LSSM L. In what follows, V(I) denotes the variety in C ( As we will see below, the bound from Proposition 2.7 is usually pessimistic.
Once the degree of the Gibbs variety is known, one can use numerical techniques to find its defining equations.In general, this allows to compute ideals of Gibbs varieties that are infeasible for symbolic algorithms.
We now present Algorithm 1 for finding the equations of the Gibbs variety numerically.We write P for the ideal generated by Unfortunately, the degree upper bound in Proposition 2.7 restricts the practical applicability of this algorithm to n 3.However, if the Gibbs variety is a hypersurface, then the algorithm can terminate immediately after finding a single algebraic equation.The degree of this equation is usually much lower than the degree bound in Proposition 2.7 (for instance, the Gibbs variety in Example 2.4 is defined by a cubic, while the bound from Proposition 2.7 is equal to 3 12 ) and therefore the defining equation can be found with this algorithm for larger n.
Although Algorithm 1 uses floating point computations, for LSSMs defined over Q it can be adapted to give exact equations.This can be done using built-in commands in computer algebra systems, e.g.rationalize in Julia.Correctness of the rationalization procedure can be checked by plugging a parametrization of the Gibbs variety into the resulting equations.

Algorithm 1 Numerical implicitization of Gibbs varieties of known degree
Input An LSSM L given as an R-span of d linearly independent matrices A 1 , . . ., A d , degree k of GV(L); Output A set of equations that define GV(L) set-theoretically.

Logarithmic sparsity patterns
Every set S ⊆ {(i, j)|1 i j n} defines a sparsity pattern on symmetric matrices in the following way.
Definition 3.1.We say that A = {a ij } ∈ S n satisfies the sparsity condition given by S if a ij = 0 for all (i, j) ∈ S. The set of all symmetric matrices satisfying the sparsity condition given by S forms an LSSM with a basis , where E ij is a matrix unit, i.e. a matrix with only one non-zero entry, which is equal to 1, at the position (i, j).
We will denote this LSSM by L S and write Sparsity patterns can be encoded by graphs, which allows to study them from a combinatorial point of view.Namely, to any simple undirected graph G on n nodes we associate a set S G ⊆ {(i, j)|1 i j n} as follows: (i, j) ∈ S G if and only if there is no edge between the nodes i and j in G.In this case we will also denote the corresponding LSSM by L G .Note that if G has n nodes and e edges, then dim L G = n + e.
We are interested in an algebraic description of the set of matrices that satisfy a logarithmic sparsity pattern given by G.This set of matrices is precisely the Gibbs manifold of L G .Since disconnected graphs correspond to LSSMs with block-diagonal structure and block-diagonal matrices are exponentiated block-wise, we will only consider the case of connected G.
LSSMs given by graphs are nice in a sense that finding the dimension of their Gibbs varieties can be reduced to a simple linear algebra procedure of computing matrix centralizers.This is justified by the following result.Proposition 3.4.Let L G be an LSSM given by a simple connected graph G on n nodes.Then its eigenvalues are Q-linearly independent.
Proof.By specializing the variables y ij to zero for i = j and the variables y ii to n Qlinearly independent algebraic numbers, we obtain a diagonal element of L whose eigenvalues are linearly independent over Q.This immediately implies Q-linear independence of the eigenvalues of L.
We now address the question of computing the L G -centralizer of a generic element A ∈ L G .One way to do this is by straightforwardly solving the system of n 2 equations XA − AX = 0 in the variables x ij over the field Q(a ij ), where x ij are the entries of X ∈ L G and a ij are the entries of A. However, there is a way to give a more explicit description of the L G -centralizer.
Note that by Proposition 3.4 the eigenvalues of L G are Q-linearly independent.In particular, this implies that the eigenvalues of A ∈ L are generically distinct and that A is generically non-derogatory ([6, Definition 1.4.4]).Therefore, by [7, Theorem 4.4.17,Corollary 4.4.18],we have C(A) = span R (id n , A, . . ., A n−1 ), where id n is the n×n identity matrix.Hence, finding C L (A) reduces to intersecting span R (id n , A, . . ., A n−1 ) with L. Such an intersection can be found by solving a system of linear equations p 0 id n + p 1 A + . . .
in the variables p 0 , . . ., p n−1 , c ij .Since id n and A are both in L, the intersection is at least two-dimensional and we arrive at the following proposition.We conjecture that dim GV(L G ) = min 2n + e − 2, n+1 2 .When 2n+e−2 n+1 2 , the conjecture is equivalent to the statement that {A 2 , . . ., A n−1 }∪{E ij |(i, j) ∈ E(G)}∪{E ii |i = 1, . . ., n} is a linearly independent set.Here E(G) denotes the set of edges of G.This conjecture is true when G is a tree, as seen in the next section.
We end this section by characterizing Gibbs varieties for LSSMs that correspond to simple connected graphs on n 4 vertices.For n 3 we always have dim GV(L G ) = n+1 2 and therefore GV(L G ) is the entire ambient space C ( n+1 2 ) .For n = 4 there are 6 nonisomorphic simple connected graphs, 2 of which are trees.If G is not a tree, we once again have dim GV(L G ) = 6 = n+1 2 and GV(L G ) = C ( n+1 2 ) .If G is a tree, then GV(L G ) is a hypersurface.We discuss the defining equations of these 2 hypersurfaces in the next section.

Sparsity Patterns Given by Trees
Trees are an important class of graphs that give rise to LSSMs with the smallest possible dimension for a given number of nodes.It is remarkable that for such LSSMs the dimension of the Gibbs variety only depends on the number of nodes in the graph (or, equivalently, the size of the matrices), and the dependence is linear.Proof.By Proposition 3.4 the dimension of the Q-linear space spanned by the eigenvalues of L is equal to n.The dimension of L G is equal to 2n − 1, since G is a tree and therefore has n − 1 edges.It remains to compute the dimension of the L G -centralizer of a generic element in L G .Suppose A ∈ L G .We are looking for solutions of the equation AY − Y A = 0, Y ∈ L G .This is a system of homogeneous linear equations in the unknowns y ij .We have (AY The same is generically true for a ij .Thus, (AY − Y A) ik is not identically zero if and only if there exists j such that (i, j) and (j, k) are edges of G or if (i, k) is itself and edge of G.In terms of the graph G, this means that (AY − Y A) ik is not identically zero if and only if there is a path of edge length at most 2 from i to k.Since G is a tree, there is at most one such path.Therefore, if i and k are connected by a path of edge length 2 via the node j, the corresponding entry of AY − Y A is equal to a ij y jk − a jk y ij .It is equal to zero if y jk is proportional to y ij with the coefficient a ij /a jk (note that a jk is generically non-zero).Since G is connected, we conclude that all the y ij with i = j are proportional.If i and k are connected by an edge, the corresponding entry of AY − Y A is equal to y ii a ik + y kk a ik − (a ii − a kk )y ik .If it is equal to zero, then y kk is a linear combination of y ii and y ik .We conclude that, since G is connected and all the y ik are proportional, all the y ii can be expressed as linear combinations of y 11 and just one y jk with j = k.Therefore, the centralizer, which is the solution space of the considered linear system, is at most 2dimensional.Since it contains id n and A, it is exactly two-dimensional.The statement of the Theorem now follows from Theorem 2.6 for m = n, d = 2n − 1 and k = 2.For the 4-chain, the graph on the left, the Gibbs variety is defined by a single homogeneous equation of degree 6 that has 96 terms.For the graph on the right the defining equation is also homogeneous of degree 6.It has 60 terms.These two equations were found using Algorithm 1. ⋄

Logarithmic sparsity from coloured graphs
Sparse LSSMs defined by coloured graphs appear in the study of coloured Gaussian graphical models in algebraic statistics [5], [10].In this section we study the properties of Gibbs varieties of such LSSMs.Consider the graph G and suppose its vertices are labeled by p colours and edges are labeled by q colours.The corresponding LSSM L is cut out by the following three sets of equations.
1. x ij = 0 if (i, j) is not an edge of G 2. x ii = x jj if the vertices i and j have the same colour.

3.
x ij = x kl if (i, j) and (k, l) are edges of G that have the same colour.
It is immediately clear that dim L = p + q.We will denote coloured graphs by G and the corresponding LSSMs by L G .The corresponding uncoloured graph will be denoted by G, as usual.Note that since L G ⊆ L G , the inclusion of the Gibbs varieties also holds: GV(L G ) ⊆ GV(L G ). Since the identity matrix is in L G for any G, the dimension bound from Proposition 3.5 holds for coloured graphs as well.
Definition 5.1.We say that X ∈ S n + satisfies the coloured sparsity pattern given by G if X ∈ L G .Proposition 5.2.Let G be a coloured graph on n nodes in which vertices are labeled by p colours and edges are labeled by q colours.Then dim GV(L G ) n + p + q − 2.
Note that if G is a coloured graph, the eigenvalues of L G are not necessarily Q-linearly independent.Therefore, the upper bound from Proposition 5.2 is not always attained.The eigenvalues of this LSSM are Q-linearly dependent: they satisfy the equation 2λ 1 = λ 2 + λ 3 .We have dim GV(L) = 3 < n + p + q − 2 = 3 + 1 + 2 − 2 = 4.Note that in this case dim GV(L G ) = dim GM(L G ) and the Gibbs manifold of L G , i.e. the set of matrices that satisfy the coloured logarithmic sparsity condition given by G, is the positive part of its Gibbs variety, i.e.GM(L G ) = GV(L G ) ∩ S n + .This means that the set of matrices with the coloured logarithmic sparsity pattern given by this graph can be described algebraically.⋄ In order to illustrate how different colourings of the same graph affect the Gibbs variety, we conclude this section with analysing coloured graphs for which the underlying graph is the 3-chain.This is done using [9, Algorithm 1].

1.
The corresponding LSSM is dim GV(L G ) = 6 and there are no polynomial equations that hold on the Gibbs variety.

The corresponding LSSM is
dim GV(L G ) = 5 and the Gibbs variety is a cubic hypersurface.Its prime ideal is generated by the polynomial −x 11 x 12 x 23 + x 2 12 x 13 + x 12 x 23 x 33 − x 13 x 2 23 .

6.
The corresponding LSSM is dim GV(L G ) = 4 and the Gibbs variety is an affine subspace with the prime ideal generated by x 12 − x 23 and x 11 − x 33 .

7.
The corresponding LSSM is dim GV(L G ) = 3.The prime ideal of the Gibbs variety is generated by 7 polynomials: x

From Analytic to Algebraic Equations
Since the logarithm is an analytic function on R >0 , the set of matrices satisfying the logarithmic sparsity pattern given by a graph G can be defined via formal power series equations.One way to write these equations in a compact form is by using Sylvester's formula.
Theorem 6.1 (Sylvester [11]).Let f : D → R be an analytic function on an open set D ⊂ R and M ∈ R n×n a matrix that has n distinct eigenvalues λ 1 , . . ., λ n in D. Then We note that the product on the right hand side takes place in the commutative ring R[M].
By setting f to be the logarithm function, we obtain a parametrization of log X with rational functions in the entries x ij of X, the eigenvalues λ i of X and their logarithms log λ i .The logarithmic sparsity condition induced on X requires that some components of this parametrization are zero and therefore gives a system of polynomial equations in x ij , λ i and log λ i .By eliminating the variables λ i and log λ i from this system while taking into account the polynomial relations between λ i and x ij given by the coefficients of the characteristic polynomial, we obtain a set of defining equations of GV(L G ).This procedure is described by Algorithm 2. Theorem 6.2.Algorithm 2 is correct.The ideal J computed in step (S9) is the prime ideal of GV(L G ).
Proof.Since the eigenvalues of L G are Q-linearly independent, the ideal generated by E 2 is prime.Moreover, there is no C-algebraic relation between the eigenvalues of X and their logarithms that holds for any positive definite X (this is a consequence of Ax-Schanuel theorem [1, (SP)]).These two facts ensure that all the algebraic relations between X, λ and log λ are accounted for, and that the algorithm is thus correct.The ideal generated by E 1 and E 2 is therefore also prime, after saturation, and elimination in step (S9) preserves primality.
Note that the primality J means that GV(L G ) is irreducible, as stated in [9,Theorem 3.6].
The advantage of this algorithm compared to [9, Algorithm 1] is that it uses a smaller polynomial ring and fewer variables are eliminated.
Here we allow the case ǫ = ∞, where the dependency on C disappears and the ASSM is simply the LSSM, i.e.L ∞ = L.Note that Gibbs manifolds of ASSMs are defined analogously to the case of LSSMs.Theorem 7.1.For b ∈ π(S n + ), the intersection of π −1 (b) with the Gibbs manifold GM(L ǫ ) consists of a single point X * ǫ .This point is the optimal solution to the regularized SDP.For ǫ = ∞, it is the unique maximizer of von Neumann entropy on the spectrahedron π −1 (b).
Sets of matrices satisfying a fixed logarithmic sparsity pattern are Gibbs manifolds that correspond to a particular class of SDP constraints.If the sparsity pattern is given by a graph G, the spectrahedron consists of PSD matrices for which some of the entries are fixed.More precisely, the entry x ij is fixed if and only if i = j or (i, j) is an edge of G.If in addition the graph is coloured, then one adds the constraints x ii = x jj if the nodes i and j have the same colour and x ij = x kl if the edges (i, j) and (k, l) have the same colour.where the entries b ij are fixed and the entries x ij are arbitrary such that the matrix is PSD.⋄

n+1 2 )Proposition 2 . 7 . 2 + 2 +
defined by the ideal I ⊆ C[x].Let L be an LSSM of n × n matrices with Q-linearly independent eigenvalues.Then deg GV(L) n ( n+1 2 )+2n .Proof.By [9, Algorithm 1], the prime ideal J of GV(L) is obtained by elimination from the ideal I generated by polynomials of degree at most n.Therefore, deg GV(L) = deg V(J) deg V(I).The variety V(I) lives in the affine space of dimension n+1 2n + d, where d = dim L. Note that dim V(I) dim L and thus codim V(I) n+1 2n.Therefore, by Bézout's theorem, we have deg V(I) n ( n+1 2 )+2n , which proves the Proposition.

2 (
S3) For l = 1 to k do (a) Pick M > N +l−1 l random samples in L (b) Let E be the set of matrix exponentials of the M picked samples (c) Construct a Vandermonde matrix A by evaluating all monomials of degree l on the elements of E (d) Let I l be the basis of ker A (e) I := I ∪ I l (S4) Return a set of generators of I.

Proposition 3 . 5 .
Let G be a simple connected graph on n nodes with e edges.Then dim GV(L G ) 2n + e − 2.

Theorem 4 . 1 .
Let L G be an LSSM given by a tree G on n nodes.Then dim GV(L G ) = 3n − 3.

Example 4 . 2 .
For n = 4 there are exactly two non-isomorphic trees, shown below.By Theorem 4.1, the dimension of their Gibbs varieties is equal to 9. Therefore, these Gibbs varieties are hypersurfaces in C ( n+1 2 ) = C 10 .