Bootstraps to Strings: Solving Random Matrix Models with Positivity

A new approach to solving random matrix models directly in the large $N$ limit is developed. First, a set of numerical values for some low-pt correlation functions is guessed. The large $N$ loop equations are then used to generate values of higher-pt correlation functions based on this guess. Then one tests whether these higher-pt functions are consistent with positivity requirements, e.g., $\langle \text{tr }M^{2k} \rangle \ge 0$. If not, the guessed values are systematically ruled out. In this way, one can constrain the correlation functions of random matrices to a tiny subregion which contains (and perhaps converges to) the true solution. This approach is tested on single and multi-matrix models and handily reproduces known solutions. It also produces strong results for multi-matrix models which are not believed to be solvable. A tantalizing possibility is that this method could be used to search for new critical points, or string worldsheet theories.


Introduction
From ancient days, sages appreciated that certain large N theories simplify dramatically. For matrix theories in the 't Hooft limit, one needs to sum only a tiny subset of all possible Feynman diagrams, the planar ones [1,2]. However, with a few notable exceptions, this simplification is not enough: even in the 't Hooft limit, most matrix theories are impossible to solve. This is true even for zero dimensional statistical ensembles of a small number of matrices. With the exception of the single matrix model [3,4,5,6], almost all models remain unsolved 1 .
In this paper, we propose a method to solve multi-matrix models in the strict large N limit. For our purposes, a 2-matrix model is defined by an integral of the form where the integration measure is the uniform measure over Hermitian matrices, and V is a polynomial, the coefficients of which will be considered as couplings. Here, "solving" a matrix model means ( where O is some arbitrary polynomial in the matrices. Unlike in the finite N case, even step (a) is highly non-trivial in most cases. A complementary tool for studying such systems is numerics, e.g., Monte Carlo simulation. A downside of this approach is that the large N limit is numerically difficult; one needs to simulate ever more degrees of freedom, even though the underlying physics is simplifying. A more conceptual problem is that some matrix ensembles are ill-defined at finite N but are well-defined at infinite N . Far from a mere technicality, such situations are usually 2 what is required to compare with continuum quantum gravity calculations (see [10] for a review). In this paper, we will instead develop a method that works well at infinite N and strong (or weak) coupling. The method involves both analytics and numerics and deploys a philosophy similar to the conformal bootstrap (see [11,12] for an approach to lattice field theory that is morally quite similar). The analytical part involves writing down the loop equations and finding simple positivity relations on matrix correlators. Then one numerically searches over the space of possible values of the correlation functions which are consistent with the analytics. The resulting bounds obtained are rigorous, even if they were obtained with numerics.
A word about notation: we normalize little trace and big Trace by tr 1 = 1 N Tr 1 = 1 and will often denote single matrix correlators t k = tr A k . This paper is organized as follows. In Section 2, we review the equations of motion for (multi-)matrix models, known as the "loop equations." In Section 3, we discuss positivity constraints on correlation functions tr O(A, B) . As we will emphasize, an arbitrary list of numbers will in general not be a consistent set of correlation functions. In Section 4, we illustrate how the method works by reproducing known exact solutions and move on in Section 5 to solve a model which is not known to be integrable. We discuss some open questions in Section 6.

The Loop Equations
The loop equations of a random matrix theory are nothing but the Schwinger-Dyson equations. They are derived by integrating a total derivative. For a single matrix model, 3) The derivative can act on either the M k or the e −S term, giving In writing the double trace term as a product of single traces, we have used large N factorization. These equations also have a diagrammatic interpretation. Set V (M ) = 1 2 M 2 + g 3 M 3 for simplicity. Then consider the computation of a k-pt function. At large N , this is a sum of planar diagrams with k external lines (we will not need 't Hooft's double line notation). Choose one of the external lines and follow it into the blob: factorization, we can rewrite the double trace term as a product of sino understand these equations, it is helpful to consider their diagrammatic . Let us set V (M ) = 1 2 M 2 + g 4 M 4 for simplicity. Then, quations state the following: consider the computation of a k-pt function. At is a sum of planar diagrams with k external lines. Follow one of the external diagram. If this edge never encounters a vertex, it must become another In this case, it divides the planar diagrams into two parts. Otherwise, the in a vertex. So we get a relationship between lower-pt correlation functions ones.
Using large N factorization, we can rewrite the double trace term as a prod gle traces. To understand these equations, it is helpful to consider their dia interpretation. Let us set V (M ) = 1 2 M 2 + g 4 M 4 for simplicity. Then,

M =
The loop equations state the following: consider the computation of a k-pt f large N , this is a sum of planar diagrams with k external lines. Follow one of t lines into the diagram. If this edge never encounters a vertex, it must beco external line. In this case, it divides the planar diagrams into two parts. Oth line must end in a vertex. So we get a relationship between lower-pt correlatio and higher-pt ones.

2
The derivative can act on either the M k or the e S term.
Using large N factorization, we can rewrite the double trace term as a product of single traces. To understand these equations, it is helpful to consider their diagrammatic interpretation. Let us set V (M ) = 1 2 M 2 + g 4 M 4 for simplicity. Then, The loop equations state the following: consider the computation of a k-pt function. At large N , this is a sum of planar diagrams with k external lines. Follow one of the external lines into the diagram. If this edge never encounters a vertex, it must become another external line. In this case, it divides the planar diagrams into two parts. Otherwise, the line must end in a vertex. So we get a relationship between lower-pt correlation functions and higher-pt ones. Using large N factorization, we can rewrite the double trace term as a product of single traces. To understand these equations, it is helpful to consider their diagrammatic interpretation. Let us set V (M ) = 1 2 M 2 + g 4 M 4 for simplicity. Then, The loop equations state the following: consider the computation of a k-pt function. At large N , this is a sum of planar diagrams with k external lines. Follow one of the external lines into the diagram. If this edge never encounters a vertex, it must become another external line. In this case, it divides the planar diagrams into two parts. Otherwise, the line must end in a vertex. So we get a relationship between lower-pt correlation functions and higher-pt ones.
(2.4) 2 + There are two possibilities. If this edge never encounters a vertex, it must become another external line. In this case, it divides the planar diagrams into two parts. Otherwise, the line must end in a vertex. So we get a relationship between lower-pt correlation functions and higher-pt ones.
The diagrammatic interpretation of the loop equations is particularly useful when we consider more complicated multi-matrix models. One can easily read off the form of the loop equations without any calculation. If we consider the trace of a monomial of degree D in the matrices, we can follow each external line into the blob, and hence derive D different loop equations. (Some of these may be redundant due to the cyclicity of the trace or other symmetries.) Alternatively, we could consider

The search space
As we have seen, the loop equations relate higher-pt functions to lower-pt functions. A natural question is: what is the minimum number of correlators s * that we need to know in order to determine the rest? For reasons that will soon be apparent, we call a minimal subspace of correlators S the "search space." If the values of the correlators in the search space are known, the rest of the correlators follow. By definition, dim S = s * . While the value of s * is unique, there may be many possible choices for S. One should not confuse s * with the number of valid solutions to the loop equations. In general, the number of valid solutions is less than s * because of positivity constraints; see Appendix A for a precise analysis of the 1-matrix case. The upshot is that for a potential of degree d, s * = d − 2. This is the same as the number of formal solutions to the loop equations, e.g., solutions where the eigenvalue density does not have to be positive. An immediate question for the multi-matrix model is how to estimate s * . Since the number of correlation functions of fixed degree is growing exponentially for even a 2-matrix model, it may not seem obvious that s * should even be finite. In fact, we will argue that if we know all correlation functions of degree at most k * , then the loop equations determine the rest. Suppose we consider a general matrix model with m matrices, and a potential which is a polynomial of degree D in the m matrices. A crude estimate of k * can be obtained as follows. There are roughly m k /k correlation functions of degree k. Each correlator gives approximately k loop equations. So the number of loop equations for correlators of degree k is ∼ m k . These loop equations will produce correlators of degree k + D. So we expect that when k gets large enough so that m k ∼ m k+D /(k + D), we will have enough equations to determine the rest of the correlators. This occurs at k * ∼ m D .
In practice, this crude estimate is pessimistic. For the simple case m = 2, D = 3, direct calculation gives s * ≤ 5, e.g. knowing the 5 traces A, B, A 2 , AB, B 2 is enough to determine the rest of the correlation functions. If the model has A ↔ B symmetry, just knowing the 3 correlation functions A, A 2 , AB is enough. Additional symmetries will typically simplify both the loop equations and reduce the value of s * .
To check this, it is rather straightforward to write a simple program that will generate all the loop equations up to a fixed degree. The main subtlety is to ensure that only inequivalent traces are introduced at each new degree. First list all correlation functions at a given degree. These consist of all words in the two variables A and B, modulo the constraints: (2.6) Here g ∈ G is some global symmetry of the model, which in this case is just the Z 2 symmetry g(A) = B, g(B) = A. More formally, if we have m matrices, A = {A 1 , A 2 · · · A m } then we are interested in the equivalence classes where D k is the dihedral group. Here the dimension of O k,m is the number of observables. In models with low-degree polynomial potentials, there is often the additional symmetry In this case all correlation functions are real. If G = {1}, the inequivalent traces are then in one-to-one correspondence with so-called bracelets in the combinatorics literature. The number of bracelets of length k, for k even is where φ is the Euler totient function, and the sum is over all divisors of k. The main point of this formula is that a precise counting is somewhat complicated, which makes a precise estimate of k * difficult, but the large k behavior is just B ∼ 1 k m k as k gets large. An obvious question for future work is to understand if there is a simple criteria for calculating s * for a polynomial interaction in the matrices. 3

Positivity Constraints
Suppose one consults an oracle and receives a list of numbers which are purportedly the single-trace correlation functions of a matrix model. Here we ask the question: what consistency conditions does this list have to satisfy? In this section, we address this question. For the rest of this paper, we will restrict ourselves to the strict large N limit. An incomplete discussion of positivity for 1/N corrections is relegated to Appendix C.

Positivity for one matrix ensembles
We will consider the positivity constraints that can be derived from Here φ is an arbitrary superposition of matrices; for the 1-matrix model, φ = k α k M k . This condition is equivalent to the following statement: if we consider the matrix M ij = tr M i+j , all of its eigenvalues must be non-negative M 0. In practice, we cannot enforce all of the constraints that follow from this condition, and we must choose a set of weaker constraints. Denoting the single trace correlators by t k = tr M k , we may impose These weaker constraints are linear in the single-trace correlators. We may also derive non-linear constraints from the above equation. For example, we can enforce positivity of a sub-matrix of M: The eigenvalues of this submatrix are 2λ j,k = t 2j + t 2k ± (t 2j − t 2k ) 2 + 4t 2 j+k , which gives Notice that this inequality follows from just det M jk ≥ 0. For j = 0 this inequality has a simple interpretation. If we consider drawing eigenvalues randomly from the eigenvalue distribution ρ(λ) of M , the inequality just says that the variance of the random variable λ k is non-negative. Furthermore, the j = 0 constraint implies (3.10). In writing (3.11), we take all t k to be real, since M is Hermitian. The constraints (3.10) and (3.12) came from considering 1 × 1 and 2 × 2 submatrices, respectively. In general, if we enforce positivity of d × d submatrices of M, we will get a polynomial of degree d constraint on the t k variables. These constraints will include statements such as "all even moments of the random variable X k = (λ k − t k ) are nonnegative." These are essentially the inequalities that result from positivity of the eigenvalue distribution ρ(λ) of M .
Note that we can find the boundary of allowed regions by finding the roots of the determinant of various sub-matrices det M d×d = 0. We can check that a matrix is nonnegative by checking that the determinant of all upper-left submatrices are non-negative.

Relation to the Hamburger moment problem
Large N positivity in the single matrix model is closely related to positivity requirements on the moments of a real random variable. This is the subject of the Hamburger moment problem, which we now review. Given a sequence of real numbers T = {t 0 = 1, t 1 , t 2 · · · }, Hamburger asked for necessary and sufficient conditions on T such that t k = ∞ −∞ ρ(x)x k for some positive measure ρ. The solution is that such a distribution always exists if the matrix M j,k = t j+k is positive semi-definite. A word of caution: one should not confuse the probability distribution entering in the Hamburger problem with the measure over the random matrices. The probability distribution relevant for the Hamburger problem is the eigenvalue density. At infinite N , the eigenvalue density is a deterministic variable (it does not fluctuate); the random variable whose moments we are computing is an eigenvalue chosen at random from the large matrix.
There are various generalizations of this problem. The truncated moment problem asks for necessary and sufficient conditions when only a subset of T is given [13]. This is relevant for the practical problem at hand, where we only compute a subset of correlators. In the multi-matrix case, the analog of t k are traces of arbitrary "words" modulo cyclicity. So positivity in the multi-matrix case can be viewed as a non-commutative, multivariable generalization of the moment problem, see [14].

Multi-matrix models and the general algorithm
For multi-matrix models, the space of correlators is exponentially bigger; we need to consider not just powers of M but "words", e.g. φ = A + B + AB + · · · + AB 2 ABAB + · · · . Note that a generic off-diagonal element of M, such as tr A 2 B 2 AB can be complex but M will always be Hermitian. If the model has a transpose symmetry, then M will be real and symmetric.
Let us now spell out the matrix bootstrap in generality. We start with the large N loop equations, which are a set of quadratic equations in the single-trace correlators M. There are infinitely many such equations, indexed by a. We set aside a small subspace of correlators S, which we call the search space. We choose this space so that if S is determined, the loop equations will determine the rest of the correlators. For each point in S, we compute as many correlators as possible, assemble them into the inner product matrix M, and check to see whether M 0. (Even without the loop equations, there may be some positivity requirements on S; for example, if S is the space (t 1 , t 2 ) we should impose t 2 ≥ t 2 1 .) The region of S where this constraint is satisfied is our "allowed region." Note that for some choices of S, the loop equations may not uniquely determine other correlators. For example, there may be a branch cut leading to multiple solutions. In such cases, the allowed region of S consists of points where at least one solution has a positive M.
In practice, it is of course impossible to compute infinitely many correlators. If one considers a large number of correlators, it may also be difficult to repeatedly compute eigenvalues and check for positivity. One can imagine a variety of approaches, where only some of the correlators are computed, or only a subset of the constraints are checked. How to achieve the best performance with limited computational resources is of course an important engineering problem.
In general, one hopes that the allowed region converges to the exact solutions as one increases the number of constraints. Given a finite subregion, one estimate of the solution (assuming it is unique) is to maximize over S the smallest eigenvalue of M. For some applications, one is less interested in finding the allowed region; instead one wishes to simply know whether the allowed region is empty or not. This tests whether the model is self-consistent. In such a case, one can use, e.g., gradient ascent on the smallest eigenvalue of M and stop once the eigenvalue becomes positive.

.1 Single Hermitian matrix
For simplicity, we start with the single Hermitian matrix model We will first consider the case g > 0. We will treat the case g < 0 and g > 0 separately to emphasize some of the special features of this model. For the convenience of the reader, the exact solution of this model with our chosen conventions is reviewed in Appendix A. We will take the search space S to be a single parameter t 2 ≥ 0 and set all odd correlation functions to zero. We follow the general approach outlined above to derive constraints. Starting from some value of t 2 , we use the loop equations to compute all correlation functions up to some power 2d. We assemble these correlators into the inner product matrix M d×d , and find the region where all its eigenvalues are positive. Note that for g > 1/12 we are outside the radius of convergence of perturbation theory, so the bootstrap approach clearly does much better than a naive perturbative calculation of the same order ∼ 2d.
From figure 1, it is clear that the bootstrap approach converges rapidly to the exact solution. Furthermore, once bounds on t 2 is known, one can easily calculate bounds on any correlators t k by using the loop equations. For example, t 4 = 1−t 2 g . It is interesting to try to "look under the hood" of the approach. One can do this by looking at the constraints coming from, e.g., single correlators. It follows from the loop equations that a correlator t 2k will be a polynomial in t 2 . The polynomial will typically have many zeros on the real axis, increasing with the degree of the polynomial. The location and number of zeros will depend on g. This is displayed in figure 2. If one considers the constraints from multiple correlators, the overall allowed region will be the intersection of the individual regions. In general, if we plot the constraints coming from positivity of even correlators up to a certain fixed degree, we will always get a larger region than if we were to use the full positivity of M. In practice, the difference is substantial, the convergence of the region allowed by positivity of the full inner product M is much faster.
One might find it surprising that the bootstrap method even converges at all. Why is it that positivity is so strong that only the correct solution is allowed, as opposed to, e.g., some finite island? Actually, convergence of this method would be naturally explained if the exact solution has a null vector, e.g., if we can find some matrix φ such that tr φ * φ = 0. If such a vector existed, then a small perturbation of the matrix M could easily violate M 0. Geometrically, the constraint M 0 means that the allowed region is a cone; a (nearly) null vector would mean that the exact solution lies (nearly) on the boundary of the cone.
This criterion might seem exotic, but in fact it is satisfied by all models where at least one of the matrices in the ensemble has an eigenvalue distribution which has support on finite interval(s). Then if we are allowed to consider polynomials in λ with large degree, we can approximate a function which is zero on the support of the eigenvalue distribution but non-zero elsewhere. Furthermore, as we increase the degree of the polynomial, we expect to be able to better and better approximate such a function. This would then naturally explain the convergence of the method.
We can test this explanation by simply plotting some eigenvectors of M in the exact solution which have small eigenvalues. This is done in figure 3. Notice also that this explanation also predicts (correctly) that convergence will be much improved when we use positivity of the full matrix M as opposed to just positivity of even correlators t 2k , since the tightest constraints come not from monomials but from the special polynomials which nearly vanish on the support of ρ(λ). In the single matrix model, the eigenvalue density is supported on some finite interval(s). However, it seems reasonable that the method will perform well even when this condition is not exactly true. For example, if the eigenvalue distribution of some matrix M rapidly decays faster than any power of λ outside some interval, there should be many high degree polynomials in M which have nearly zero norm. In a multi-matrix model, M could be any composite matrix, built out of powers of the matrices that are integrated over. If the action S is a polynomial in A 1 , · · · A k that is bounded from below, it seems likely that this condition will hold. For some special potentials, it might be possible to argue that the eigenvalues of a matrix continue to have bounded support; for example, in a potential like V = 1 2 A 2 + 1 2 B 2 + g(A + B) 4 . For g = 0 the eigenvalues of A are distributed like a semi-circle and hence bounded. As we turn on g > 0, the interaction should provide an additional confining force for A. So we expect the eigenvalues of A to remain bounded.

Unbounded potentials and the tip of the peninsula
An interesting test of our method is to go to negative values of the coupling −1/12 < g < 0. One motivation is that the limit g → −1/12 is the physically interesting regime to make contact with string theory, since the typical number of interaction vertices (interpreted as the area of the planar diagram) is becoming large.
Another motivation is that in this regime, the matrix ensemble is not well-defined for finite N since the potential is unbounded from below, so a Monte Carlo simulation would be problematic (or at least subtle). This subtlety does not arise in the bootstrap approach, which works directly at large N . Nevertheless, we see an interesting behavior in the constraints for negative values of g. Applying the general method discussed above, the allowed region converges rapidly to the exact solution. Using correlators up to degree 20, the width of the allowed region is 1% in t 2 in the negative region.
To get a clear picture of what is going on, it is again instructive to consider constraints coming from positivity of single even-degree correlators. This is shown in figure 4. An interesting feature is that the constrained region looks like a "peninsula." Beyond the tip of the peninsula, no value of t 2 is allowed. This means that such models are inconsistent with positivity. In fact, if we look at the exact solution, beyond the critical value g = −1/12, the correlators do not exist; formally, the value of t 2 becomes complex. Note that the tip of the peninsula happens when there is a double zero. If we are computing the constraints from M, the tip occurs when the smallest eigenvalue m 1 of M satisfies m 1 (t 2 ) = ∂m 1 /∂t 2 = 0. One might wonder why the exact solution in figure 4 is quite close to the bottom of the peninsula. For the exact solution to approach the lower bound, we must be able to neglect t k for large k in the loop equations. This in turn is equivalent to neglecting high powers of g. But this is equivalent to truncating perturbation theory at a finite order. Since we are within the radius of convergence of perturbation theory, this is not too surprising.
A somewhat different perspective on the g < 0 computation is the following. One might forget about how we derived the loop equations, and simply view them as a set of rules for computing correlation functions of matrices. These rules are seemingly well-defined for any value of g. However, not all possible rules for computing correlation functions are sensible, as some will lead to violations of the positivity requirement (3.9). Here we have demonstrated that the rules are not sensible beyond a critical value of g * . This is similar to the CFT bootstrap philosophy: not all possible sets of dimensions and OPE coefficients are sensible.
An interesting question is the behavior of the free energy near the critical point. The critical exponent γ defined by F sing ∼ (g c − g) 2−γ can be compared with a continuum methods (a string worldsheet calculation). To compute the free energy F (g), one can integrate t 4 (4.14) In this way, we can (in principle) extract from the bootstrap the critical exponent γ. Note that γ is defined by the leading non-integer exponent in (g c − g). So it is convenient to compute derivatives of F (g) and look for the smallest power p such that ∂ p+1 g F ∝ ∂ p t 4 is diverging. Derivatives of F are related to connected components of multi-trace correlators. In practice, it may be easier to compute these than to estimate derivatives of t k . A direct computation of the connected components involves going to higher order in 1/N , see Appendix C.
In general, the critical surface in the space of the couplings where the model ceases to be well-defined is the first step in identifying the infrared/continuum theory. In the single matrix model, and for very special multi-matrix models, one can completely characterize the continuum theory as a 2D minimal model coupled to Liouville gravity. By considering more general multi-matrix models, one might hope to extract more general string worldsheet theories.

Other single-matrix models
In addition to the quartic model, we also considered the cubic model: No modification of the above method is needed to solve this model, except that the search space is the correlator t 1 . Unlike the quartic model, the cubic model does not have a strong coupling region, since the model only makes sense on a finite interval in parameter space. . We also plot in green the 1-parameter family of exact solutions. These correspond to 2-cut solutions; the boundary of the green curve corresponds to a completely asymmetric 1-cut solution. The green curve is very close (but not exactly coincident with) the boundary of the allowed region. By going to higher degree d = 35, we find good convergence to the green line to an accuracy of ∼ 0.01%. We also indicate the a priori constraint t 2 ≥ t 2 1 .
We have so far considered a case where there is a unique classical solution in the large N limit. In general, however, there could be a family of solutions. For a single matrix model, if the potential V has multiple minima, there are solutions where a fraction of the eigenvalues sit near each minimum. (There is no tunneling of the eigenvalues at large N ). We will consider the simplest possibility, where V (M ) = − 1 2 M 2 + g 4 M 4 . We can read off the loop equations from (2.3), or we can derive them diagrammatically using the fact that the free propagator comes with a negative sign.
In this model 4 , adjusting the filling fraction is equivalent to turning on tr M = t 1 . So our search space is the two dimensional space (t 1 , t 2 ) subject to t 2 ≥ t 2 1 . A nonzero expectation value t 1 means that we are considering a case where the Z 2 symmetry M → −M is spontaneously broken. So we expect the method to converge to a curve in the space of correlators, as opposed to a point. This is illustrated in figure 6. By increasing the degree of the correlators considered, we find good numerical evidence of a convergence to a 1 dimensional curve.
One could also consider matrix integrals with a potential involving double trace or higher interactions, with the coupling constants appropriately scaled with N so as to maintain the 't Hooft planar limit. We expect the bootstrap approach to work well in this case as well.

Bootstrapping multi-matrix models
We now graduate from single matrices to multi-matrices. In the single matrix case, all polynomial potentials are solvable; this is not true for the vast majority of multi-matrix potentials. A notable exception is the Ising matrix model and its variants, which we discuss in appendix D. In the Ising model, the only interaction between matrices is the quadratic interaction tr AB. The next simplest case is the cubic interaction This model has the same Z 2 × Z 2 symmetry of the Ising model. One Z 2 is generated by A ↔ B; the other is generated by A → A T , B → B T . The latter guarantees that all correlators tr A j B k A l B m · · · are real. To our knowledge, the above model is not solvable for a generic potential W . For a 2-matrix interaction, an interaction of the form A n B m can be reduced to an integral over 2 eigenvalue densities using the Itzykson-Zuber formula [15], but this trick does not extend to an interaction of the form h(A n B m +B m A n ). So this model provides a proof-of-concept that the bootstrap gives highly non-trivial constraints on a model that would be difficult or impossible to analyze analytically. An exception is when W is a cubic polynomial, in which case the model is solvable. We discuss this in D.1; we found it was nonetheless a useful exercise to solve the cubic potential using the bootstrap approach, since the dimension of the search space is quite small, dim S = 2. To demonstrate that the method works on a model that is not believed to be solvable, we will take the above interaction with W (A) = 1 4 A 4 and h = 1. This model has a higher dimensional search space. For correlators up to degree 10, we found that dim S = 8. To obtain constraints, we use the Matlab package fminsdp which tests for matrix positivity by attempting a Cholesky decomposition of the inner product matrix M = LL † where L is a lower triangular matrix with positive real entries on the diagonal. One can find the allowed region in search space by minimizing a loss function L subject to the positive semi-definite constraint on M. To derive constraints on a subspace of the search space, e.g., (t 1 , t 2 ), one can choose L = (cos θ)t 1 + (sin θ)t 2 and sweep through θ. The resulting output will be an 8 dimensional curve parameterized by θ. Projecting it into the (t 1 , t 2 ) plane gives the allowed region.
In practice fminsdp works well if a feasible initial point is provided. One can obtain such an initial point by simply performing gradient ascent on the minimum eigenvalue of M and stopping once the eigenvalue is positive. We plot the resulting bounds in 8. One can estimate the numerical uncertainty of the contours by varying the initial starting point and comparing the resulting contours. We found stability to within a few percent. We organized the inner product matrix by increasing degree of correlators in lexicographic order. By consider larger and larger matrices, we found some mild evidence of convergence to a point in correlator space, see figure 8 where constraints up to k = 35 are shown. By increasing the size of the constraint matrix to k = 45, we checked that the points on the boundary are ruled out. There are clearly many more questions that could be asked about this particular model. For example, can we characterize the critical points of this model for a more general potential W (A) and arbitrary values of h? We hope to return to these questions in future work.

Discussion
In this paper, we proposed a new method to solve multi-matrix models. We showed that the method works extremely well on integrable models and explained why it works in this case. We focused on simple models in part to gain confidence that the bootstrap method works, since the exact solutions of these models is known. We also applied the method to a 2-matrix model that is not known to be solvable and extracted relatively tight constraints.
One feature of the matrix bootstrap is that it is computationally efficient. Most of the computations in this paper could be done with only a few seconds of CPU time. The exception is the 2-matrix model considered in section 5, where a higher dimensional optimization problem had to be solved. In all of our plots, we saw evidence of convergence just by considering relatively small inner product matrices with just a few dozen rows and columns. No doubt with more computational resources (and more efficient algorithms), we could obtain extremely strong constraints from higher correlators; the main point is that even with extremely modest computational resources one can derive constraints that would be hard to obtain by any other method.
Besides the obvious goal of solving more multi-matrix models involving higher degree polynomials and/or more matrices, there are a variety of future directions. We list a variety of other complications that should be addressed in the future: 1. As we have already discussed, it would be exciting to find new critical points using this approach. In the past, matrix descriptions of "minimal string theories" were found, e.g., minimal model CFTs coupled to Liouville gravity. It would be interesting to find alternative matrix descriptions with the same continuum description, or even better, completely new continuum models. The usual conformal bootstrap can be used to (re)-discover CFTs; perhaps the matrix bootstrap can be used to find (in general non-unitary) 2D CFTs coupled to gravity! 2. Can this method be used to solve models with fermions? Some supersymmetric matrix models are of interest, for example, the IKKT model [16]. If the theory is quadratic in the fermions, one can integrate them out and obtain a determinant. If the determinant is not a positive function of the bosonic fields, we cannot run the same arguments based on positive measures. In a similar vein, we could ask whether this method could be extended to solve models with complex couplings.
3. We have focused on zero-dimensional matrix models. One could wonder whether this method could be used to solve higher-dimensional models. If we go up just one dimension (Euclidean matrix quantum mechanics), we could ask whether, e.g., the c = 1 matrix model could be solved by this method (see [17] for a review). If we discretize Euclidean time, we are left with a model that is a matrix chain. We believe that such a model could likely be solved by our method; perhaps it is possible to modify the approach so that it works directly in the continuum.
4. Does this method work for other large N theories, e.g., tensor models [18]? 5. In Appendix C we discuss the possibility of computing 1/N corrections using a similar approach. Does this work in practice? Is it possible to do better, e.g., go to the double scaling limit?
A dream would be that some sort of bootstrap method could be used to numerically solve any matrix theory, such as large N QCD, in the 't Hooft limit [11]. This seems like a far-fetched dream, but one worth having nonetheless.
Maldacena for many stimulating conversations on matrices and beyond and for suggestions on the manuscript. I am supported in part by an NDSEG fellowship.

A Review of the single matrix model
Here we review the solution of the loop equations in the 1-matrix case. (See [19] for a modern review). The strategy is to exchange the infinite number of real variables that appear in the loop equations for an unknown function.
It is useful to define the resolvent R and a closely related function P : P must be an analytic function, since the potential poles at the eigenvalues of M are cancelled by the zeros in the numerator. Furthermore, at large x, P ∼ V /x, so if V is a degree d polynomial, P must be degree d − 1. In terms of these functions, the loop equation (A.17) becomes a simple quadratic equation: the solution of which is We may write the term in the square root (V ) 2 − 4P = M 2 σ, where σ only has simple roots M 2 has roots with an even multiplicity.
So far, we have found a solution to the loop equations involving a polynomial of degree d − 1. We have one more condition coming from R(x) ∼ 1/x at large x. So we obtain d − 2 unknowns, which agrees with our estimate of the dimensionality of the search space. Now the branch cut of the resolvent famously determines the eigenvalue density ρ(λ), since

A.1 Single cut
In the simplest case, we assume that all the eigenvalues live on a single interval. This motivates the following ansatz for R: where P is some analytic function. Now since ρ integrates to 1, we must have R(λ) = 1 λ +· · · in a large λ expansion. This seemingly trivial condition determines the function P .
Let us consider the case V = 1 2 λ 2 + g 4 λ 4 , following [4]. We have R = 1 2 (λ + gλ 3 ) + P (λ) √ λ 2 − a 2 where we assume that the eigenvalue distribution is symmetric a 2 = −a 1 = a. If P contains terms that are higher order in λ 2 , then there is no way that R can have the right asymptotics. So P must be quadratic in λ and in fact there are enough conditions to determine P = − g 2 λ 2 − 1 4 ga 2 − 1 2 and the location of the branch cut satisfies a 2 (4 + 3ga 2 ) = 16. Now the discontinuity in R around the branch cut gives Note that there two solutions a 2 = 2 3g −1 ± √ 1 + 12g . For g > 0, we must choose the + sign so that a 2 is positive. For g < 0 both solutions can have a 2 > 0. However, requiring positive eigenvalue density forces the + sign solution. Notice from the exact solution the critical point g * = −1/12. Beyond this point a 2 is complex.

A.2 Multi-cut solutions
Now we consider the inverted potential V = − 1 2 λ 2 + g 4 λ 4 . We first search for a single-cut solution. We find We also get 2 conditions on a 1 and a 2 : (a 1 + a 2 )(5a 2 1 g − 2a 2 a 1 g + 5a 2 2 g − 8) = 0, (a 1 − a 2 ) 2 3 5a 2 1 + 6a 2 a 1 + 5a 2 2 g − 16 = 256 There is a positive value g * = 1/15 such that symmetry is restored a 1 = −a 2 . Below this value of g there is a 2-cut solution with support on (a 1 , a 2 ) and (−a 1 , −a 2 ). The additional undetermined parameter in this phase can be interpreted as the filling fraction f between the two minima. Here it is important to exclude the solutions where the eigenvalue density is not positive. We can also see that for g > 1/4, there is a single-cut symmetric solution.
We also search for a 2-cut solution. This means we take the ansatz where P (λ) is a degree 1 polynomial. The solution to the constraint R(λ) ∼ 1/λ then determines P (λ) uniquely in terms of a i . Furthermore, we get 3 constraints on a i , which means we have a 1-parameter family of solutions. The simplest solution of this equation is the symmetric case, where a 4 = −a 3 , a 2 = −a 1 . This was analyzed in [20]. The solution exists when g < 1/4. This solution has very simple expressions for the correlators: For the general 2-cut case, it is simple to numerically solve for the allowed endpoints a i using the constraints coming from R ∼ 1/λ in A.27. This gives the green curve displayed in figure 6. As we move along the curve, we are changing the filling fraction of the left and right side. The endpoints of this curve are given by the 1-cut asymmetric solution.
We could also consider 3-cut solutions. This would involve putting eigenvalues right at the maximum of the potential. Alternatively, we can think of the eigenvalue density as a charge density and the V (x) as an electrostatic potential. The maximum of the potential will be a minimum if we have negative charges. So a solution with a cut near the maximum of the potential will lead to negative eigenvalue densities, which are forbidden. This agrees with the results of the bootstrap, which converge to a 1-parameter solution.
To summarize, for g > 1/4 there is a single-cut symmetric solution. For 1/4 > g > 1/15 there is a 2-cut solution. For g < 1/15 there is a one-parameter family of solutions that interpolates between an asymmetric single-cut solution and a 2-cut symmetric solution.

B The bootstrap approach for computing determinants or vectors
For most of the paper, we discussed simple single-trace operators. A more complicated observable is a function of the determinant. Here we outline a strategy for computing these in the bootstrap approach. The idea is to rewrite We have absorbed some irrelevant factors of (2π) into the measure. By adding n vectors with different masses set by z i , we could compute In general, the vectors in a matrix theory will be related to open strings: in the notation of 't Hooft, matrices are represented by double lines whereas vectors are single lines and can thus be interpreted as boundaries of the planar diagrams (for a review, see [21]). In the Liouville context, determinants in the matrix model are related to FZZT boundaries. Now note the identity This means that if we can compute correlation functions of v as a function of the parameter, we can in principle reconstruct the partition function and therefore extract expectation values of determinants. Of course, other correlation functions involving vectors may be interesting in their own right, for example in large N QCD, such correlation functions probe properties of the dual string.

C The bootstrap approach for 1/N corrections
In this section, we sketch how the bootstrap approach could be extended to include 1/N corrections. We will keep the discussion fairly theoretical; a practical discussion of how to implement the constraints and their effectiveness is left to future work. At large N , we defined a matrix M, whose elements were single-trace expectation values. We will now adjust the notation slightly so that the entries of M are the operators tr O (A, B) without expectation values. In other words, M will now denote a matrix of random variables instead of a matrix of expectation values. With this notation, the large N constraints considered previously should be denoted M 0. When we consider 1/N corrections, we should enforce that M ≥ 0, not just on expectation, but including fluctuations. The probability that M has any negative eigenvalues should always be exactly zero. One way of stating this in terms of correlation functions of M is that the generalized resolvent is analytic in the region x i < 0, for all n. Such expressions might look familiar from singlematrix integrals, but one should expend mental effort to keep them separate. Note that tr M here means a trace over correlator space, not the usual trace over N × N matrices. A somewhat more practical statement is that if we take determinants of upper-left sub-matrices of M, these are all positive random variables. Now we can ask the following question: given a sequence m k , k ∈ {1, 2, · · · }, what are the necessary and sufficient conditions for m k to be the moments of some positive random variable? This is known as the Stieltjes moment problem; we will now outline the solution. Define two matrices Necessity of these conditions follow from the fact that if Y is any positive random variable and P is an arbitrary polynomial (with complex coefficients), then Proving that these conditions are not just necessary but sufficient (e.g., that any sequence m k which satisfies these set of constraints corresponds to some measure on the positive reals) is more subtle; we refer the reader to [22] for an exposition.
In our problem, we actually have a list of positive random variables d n , so we are interested in the multi-variate generalization of this problem. In addition, we will in general not be able to compute all the moments of d n but only a finite list. This is known as the so-called "truncated moment problem," see [13].
We have so far focused on just the inequalities which follow from positivity of M. However, there are also many inequalities that just follow from the usual inequalities on moments of a collection of complex random variables. If we have a list of complex random variables z 1 · · · z n , then P (z 1 , · · · ,z n )P (z 1 · · · z n ) ≥ 0. (C.35) In this case, each z i could be any single trace operator, e.g., some matrix element of M. Now if we think of the monomials as basis elements of some vector space, this becomes positivity of an even bigger inner product matrix M. For the single matrix model, we saw that at large N , the basic requirement was that the average eigenvalue density was positive. However, including 1/N corrections means that we are enforcing positivity of the eigenvalue distribution not just on average but including fluctuations ρ + δρ. Even off-shell, the eigenvalue distribution must always be positive. For multi-matrix models, we know of no such simple condition.
At leading order in 1/N , multi-trace correlators factorize, which meant that we only needed to consider loop equations of the form (2.5). At higher orders, we need to consider more loop equations to determine the values of multi-trace correlators. In the 1-matrix model, these can be derived from The corresponding generalization to multi-matrices is obvious; we just consider multi-trace insertions tr O 1 tr O 2 · · · tr O n .
In the single-cut solutions to the 1-matrix model, the large N eigenvalue density ρ(λ) uniquely determines all 1/N corrections by topological recursion, so the above discussion is moot. However, for multi-matrix models, we do not know if this is the case. Even in the 1-matrix model, when we consider multi-cut solutions, we must impose additional requirements on the 1/N corrections in order to determine them uniquely, see Section 4.3 of [19] for details. Said differently, does the dimension of search space increase when we include 1/N corrections, and if so, by how much?

D Ising model on a random planar lattice
Here we outline the bootstrap approach to the Ising model on a random lattice. This is a 2-matrix model with interaction For this model, we will assume Z 2 symmetry and adopt the notation with the boundary conditions t 0 = 1, t 0,1 = t 1 , and t 0,2 = t 2 . By substituting the third equation into the second, we can solve for all t n , t n,1 , t n,2 in terms of just t 1 and t 1,1 . A maximal subset of positivity constraints can be obtained by considering matrices of the form φ ∈ span (A n , BA m ) for all integer n, m ≥ 0. Then the entries of Gram matrix M will only depend on t k , t k+1 , t k+2 . In practice, we found that enforcing positivity just on the single matrix correlators t k is sufficient to see convergence to the exact solution, see figure 9. This is expected from our discussion of null vectors, together with the fact that the eigenvalue distribution of a single matrix still has compact support.
For simplicity, we have focused on the Ising model with zero magnetic field where there is a Z 2 symmetry. A simple exercise which is left to the reader is to generalize the above discussion to the Ising model in a magnetic field, in which case g A = ge h = g B = ge −h . Another generalization is to a chain of matrices with couplings of the form A t A t+1 . This can be thought of as a step towards solving matrix quantum mechanics, where the Euclidean time is discrete and the coupling between matrices is a discretization of the kinetic term. Here positivity is enforced on single matrix correlators t k for inner product matrices of sizes k = 8, 12, 16.

D.1 Relation to the cubic interaction
The matrix model defined by 5.16 with W (A) = 1 3 gA 3 is related by field redefinitions to the Ising model in a magnetic field [24]. As we have seen the structure of the Ising model loop equations can be efficiently solved. Nevertheless, it is also instructive to solve the model without taking advantage of any of these simplifications which are somewhat obscured without performing the requisite field redefinitions.
We somewhat arbitrarily chose to explore the model on the line g = −h; we checked that studying a nearby line, e.g., g = −0.9h does not qualitatively change our conclusions. In figure 10, we show the allowed region in parameter space for g = 3/20. For this model, we let the search space S be parameterized by tr A and tr A 2 B . We could choose basically any two low-degree correlators and get similar results.
We study convergence as we increase the number and type of constraints. For example, we considered the constraints coming from positivity of just a single matrix. We also considered positivity constraints from "slightly mixed" correlators up to a fixed degree. These are correlators of the form tr A k− B . The advantage of such things is that the number of such correlators grows quite slowly with the degree, which means that only a rather small matrix M needs to be diagonalized. Of course, one still needs to solve the loop equations for correlators which do not enter the matrix, so we cannot get rid of the exponential complexity of the problem. Including constraints from all possible correlators of degree ≤ 10 reduces the area of the allowed region by a factor of ∼ 4 in this example.
We searched for a critical point on the g = −h line. By using all the constraints up to degree 10, we were able to bound the critical point g * = −h * < 0.185. This implies that there is also a critical point at g = −h * > −0.185. In figure 11, we show convergence as we approach this critical point. In this figure, we do not vary the constraints that are being checked; instead we are changing the coupling. This figure should be viewed as the higher dimensional analog of the "peninsula" depicted in figure 4. In general, we can use this method to constrain the allowed region in the 4-dimensional space parameterized by tr A , tr A 2 B , g, h. On the left, we show constraints only coming from positivity of a single matrix. On the right, we include constraints from some mixed correlators involving both A and B (see text for details). Imposing the full positivity constraints from all possible multi-matrix correlators of degree ≤ 10 gives the tiny black region on the right.

E Mathematica code for generating loop equations
For the convenience of the reader, a Mathematica code is provided in the source directory of this paper that may be used to generate the large N loop equations for single trace correlation functions. The code may be easily modified to derive the loop equations of any multi-matrix model with arbitrary polynomial interactions. In Mathematica notation, a  Figure 11: Here we show convergence as we approach (and pass) the critical surface in parameter space. In this plot, the number of the constraints K is held fixed, but the coupling is varied. After some value of the coupling g e (K) = −h e (K), the allowed region is empty. This gives an upper bound on the critical value of the coupling g * < g e (K). As we increase the number of constraints K, we expect that g e (K) will converge to g * . For visual purposes, we display constraints from single-matrix correlators of degree up to 8. This gives us the bound g * < .196. Using more multi-matrix correlators up to degree 10, we find the bound g * < 0.185. string {1, 2, 1, 1, 2} corresponds to a string of matrices ABAAB. We use the notation e {1,2,1,1,2} = tr ABAAB.