The Universe from a Single Particle II

We continue to explore, in the context of a toy model, the hypothesis that the interacting universe we see around us could result from single particle (undergraduate) quantum mechanics via a novel spontaneous symmetry breaking (SSB) acting at the level of probability distributions on Hamiltonians (rather than on states as is familiar from both Ginzburg-Landau superconductivity and the Higgs mechanism). In an earlier paper [7] we saw qubit structure emerge spontaneously on $\mathbb{C}^4$ and $\mathbb{C}^8$, and in this work we see $\mathbb{C}^6$ spontaneously decomposing as $\mathbb{C}^2\otimes \mathbb{C}^3$ and very curiously $\mathbb{C}^5$ (and $\mathbb{C}^7$) splitting off one (one or three) directions and then factoring. This evidence provides additional support for the broad hypothesis: Nature will seek out tensor decompositions where none are present. We consider how this finding may form a basis for the origins of interaction and ask if it can be related to established foundational discussions such as string theory.

undifferentiated Gaussian Unitary Ensemble (GUE) breaks to a quite different probability distribution on Hermitian operators which emphasizes interaction between dof (degrees of freedom) corresponding to relatively low spins, e.g. qubits. In this story, unlike inflation [9], the total dimension of the Hilbert space is conserved. In spin language, the breaking is from the physics of a single particle of extremely large spin to a system of interacting particles with small spins.
Using the concept GUE, we obtain one toy model for this Hamiltonian of the universe, drawn from the Gaussian ensemble: 1 Z e −const. tr(H 2 ) , where A, B := − tr(iA * iB) is proportional to the ad-invariant Killing form on the Lie algebra su(n), where n = dim(Hilbert space).
Other toy models are obtained by choosing a functional f : {metrics g ij on the space Her 0 (n) of traceless n × n Hermitian matrices} → R, which we think of as a pseudo-energy, and replacing tr(H 2 ) with g ij v i v j in the formula above, for g ij a local minima of f . Such models are Boltzmann in nature, with the probability of the metric g ij being proportional to e −∆/k b T , where ∆ = f (g ij ) and T a pseudo-temperature. The choice of a local minima for f breaks the ad-symmetry of the Killing form.
An alternative possibility, which is numerically more difficult and has not been attempted, is to keep the choice of metric in superposition and exploit interference effects to concentrate that choice near the local minima of f . Thus one could associate with f and a real expansion parameter k a normalized metric: In this paper and [7] we have chosen the simplest device: First, a local minima of f , called g ij , usually quite different from the usual L 2 -metric − tr(iA * iB), determines a probability distribution on Her 0 (n), P θ (H) = 1 Z e − H,H g ij , H, K g ij := g ij H i * iK j , where i, j run over an L 2 −orthonormal basis of Her 0 (n) 3 . This probability distribution P θ (H) is then the source of the Hamiltonian H 0 of "our universe". H 0 's structure is conditioned by g ij , in particular by the geometry of the principal axes of g ij relative to the L 2 -metric. We study these axes, which, of course, should be operators, in search of a qubit or tensor structure. Unlike [7], we have not restricted n to be a power of 2 and present extensive numerical studies for n = 4, 5, 6, 7, and 8. In [7] the central definition was kaq, that a metric "knows about qubits," and is reviewed below. For n = 6 we find metrics which instead know about an isomorphism C 6 ∼ = C 2 ⊗ C 3 , necessitating an expansion of that concept. Even more remarkable, when n = 5 and 7, primes, we see local minima where the dimension has been "rounded down" to a composite, effectively writing 5 = 2 × 2 + 1 and 7 = 2 × 3 + 1 (for one local minima) and 7 = 2 × 2 + 3 (for another). We will discuss these cases in full detail.
Our choice of functionals f is essentially the same as in [7]; we considered f = Ricci scalar curvature, and also f = real or imaginary parts of perturbed Gaussian integrals built from the symmetric 2-tensor g ij and the structure constants (a 3-tensor) c k ij for SU(n). These functionals are also reviewed. Scalar curvature (SC), unlike our other functionals, has no 3 The space of traceless n × n Hermitian matrices perturbation parameter. So detection of local minima is harder. In this study, no new SC local minima were found beyond those in [7]. But the other functionals yielded many.
There is a historical irony which if not addressed might confuse the reader. We are using the GUE as a model for a random single particle Hamiltonian; the real and imaginary parts of g ij , i > j, as well as g ii are all i.i.d. Gaussian variables. Thus all transitions are equally likely. However, the GUE was introduced by Wigner [11] to model a totality of interactions among many particles which were too complicated to deconstruct, except on the basis of symmetry. Indeed, while statistical spectral properties, such as the semi-circle law, of random interacting Hamiltonians may quickly approach those of the GUE, the more delicate entropic measures which we will introduce shortly distinguish the GUE from interacting ensembles. So for us, the GUE represents random non-interacting Hamiltonians whereas other metrics we find as the kaq-local-minima are manifestly interacting.
Most of our numerical data on SSB comes from Hilbert of dimension 4 ≤ n ≤ 8, so caution is in order as we extrapolate to the Hilbert space of the early universe which, if finite, might reasonably be taken 4 to have dimension n ≈ 2 2 100 , or larger. In our initial study [7] of n = 4, 8 we saw a strong pattern of breaking to qubits. This leads us to speculate that if n is not a power of 2 we would see instead a breaking into qunit tensor factors corresponding to the prime factors of n. This line of thought leads to the intriguing possibility that the statistics of prime factors of large random integers would have some residual signature in physical law. 5 Indeed, our "discovery" that 6 = 2 × 3, discussed below, validates this thought, call it the "prime factor scenario." However, the further discovery that 5 = 2 × 2 + 1 (and that 7 = 2 × 3 + 1 = 2 × 2 + 3) suggest a different, equally interesting, scenario. To be concrete, when n = 5 we find that there is a fixed decomposition of C 5 ∼ = C 2 ⊗ C 2 ⊕ C 1 so that each of the 24 principal axes of a locally minimal g ij assumes a form of a tensor product 6 on the first 4 × 4 block, with apparently random behavior outside that block. That and similar behavior for n = 7 suggests symmetry breaking can be to "almost-kaq" structures in which a few dimension simply go into a repêchage which is "primordial" in the sense that it does not seem to participate in many body physics, the physics of the observed world. This scenario might also have implications at low energy, perhaps a tragic one, as overlap of one's wave function with such extraordinary states would not seem salubrious.
As we discuss in the body of this paper, all integers n are well-approximated 7 from below by an integer n 0 containing only two primes, say two and three, in their prime factorization: n − n 0 ≤ const. log(n). So in a Hilbert space H of dimension about 2 2 100 we might expect all but a tiny fraction, about 2 100 , of these dimensions to be organized into conventional 4 Derived from [3], larger black holes are estimated to have ≈ 2 90 dof. 5 The Golomb-Dickman constant ≈ 0.624 is the expected fraction, on a log scale, of the largest prime factor, i.e. one would expect the largest prime factor of a thousand digit number to have about 624 digits. Might this constant constrain string theory? 6 To high numerical precision. 7 We thank Noam Elkies for this observation. A number is k-smooth if it contains only the first k-primes in its factorization. A plausible guess for approximation of n by a k-smooth number n 0 would be n − n 0 ≤ const. log(n) −(k−1) .
interacting physics with a repêchage of about 2 100 dimensions, still of the primordial noninteracting character: H = H int ⊕ H rep . While the wave function ψ(t) of the universe evolves unitarily on the entire Hilbert space, deviations from unitarity on its projection to H int will typically be undetectably small. However, at some time t, the projection of ψ(t) to H rep of ψ(t), or at least portions of interest to us, could become large, abridging the familiar physical laws. We call this the leaky universe scenario.
Although our findings are empirical, in retrospect they are not entirely a surprise. It is well observed in many contexts that extremizing natural functionals leads to solutions exhibiting striking internal structures and symmetries. For example, in sphere-packing in dimensions 8 and 24, it has been shown that the unique extremal packing ( E 8 and Leech respectively) are, to an extent, independent of the particular choice of energy functional [6]. So the fact that the SSB we observe creates exquisitely precise tensor structures as it destroys full rotational symmetry, is not without precedent. Remark 1.1. As in [7], our inspiration for examining operator-level SSB was to see if the Brown-Susskind "penalty metrics" [5,4] (where norms decrease exponentially with body number) appeared. Our first chance to look for this is at n = 8. Indeed we find a local minimum metric with a modest −0.145 correlation between norm and a measure of body number we call b. Further work, perhaps at n = 16, will be required to determine the significance of this observation. As explained in Section 3.1.3, n = 16 is far beyond present methods.
Our paper is organized as follows: • Section 2: Reviews the notion of kaq and related concepts. • Section 3: Reviews our choice of functionals and the design of our numerical experiments.
• Section 4: Summarizes the totality of our numerical results, including those which previously appeared in [7].

Review of kaq
Our fundamental object of interest is the metric g ij normalized so that det(g ij ) = 1 on the linear space of traceless Hermitian n × n matrices Her 0 (n). Multiplying by i identifies Her 0 (n) with su(n), traceless skew-Hermitian n × n-matrices, the Lie algebra of SU(n). Thus g ij (actually −(g ij iA i , iB j ) := iA, iB ) becomes a metric on su(n) and a left-invariant metric on SU(n); this, for example, is in play when we refer to the Ricci scalar curvature as a functional on Her 0 (n). In section 3 we review all the functionals f considered on Her 0 (n). f provides us, numerically, with output a metric g ij where g ij is a local minima for f . The question we ask about g ij is whether or not it is "adapted" to some decomposition of C n into a tensor product of qubits or more generally qunits. We call such adapted metrics kaq for "knows about qubits" or more generally "knows about qunits." Below we do some dimension counting (and make additional arguments) to show that kaq metrics constitute a subvariety of roughly the square root of the ambient dimension. Startlingly, kaq metrics show up quite regularly and with high numerical precision at local minima for a variety of functionals f . Here is the definition.
First, it is easily proven by induction that if n = p 1 · · · · · p l is a prime factorization then: Her(n) = Her(p 1 ) ⊗ Her(p 2 ) ⊗ · · · ⊗ Her(p l ). Note we have temporarily dropped the traceless condition, and will use the natural inclusion Her 0 (n) ⊆ Her(n) below to rectify this. Definition 2.1. A qunit structure on C n is an equivalence class of * -isomorphisms J : − → C n where two are equivalent if related by the left action on the factors by U(p 1 ) × · · · × U(p l ). Thus qunit structures are parameterized by U(p 1 ) × · · · × U(p l )\ U(n). Note that J induces an isomorphism j : Her(p 1 ) ⊗ · · · ⊗ Her(p l ) . A metric g ij on Her(n) is kaq iff it is not ad-invariant, yet there is an isomorphism j (induced from J above) so that g ij possesses a complete set of n 2 − 1 principal axes {H k } 1≤k≤n 2 −1 with In other words, C n admits a tensor structure so that the principal axes of g ij (= eigenvectors of g j i where the Killing form is used to raise the index) all have compatible tensor structures. Note that H k ∈ Her 0 (n), but H s,k ∈ Her(p s ).
To establish how rare kaq metrics are, consider the following rough dimension count when n = 2 N , N large. To specify J, and hence j, 4 N − 1 parameters are required. To specify each H s,k requires 4 parameters for a total of 4N , but since scalars pass through the tensor factors 4N becomes 3N + 1 for each value of k of which there are 4 N − 1. This makes a total of (4 N − 1) + (3N + 1)(4 N − 1) = (3N + 2)(4 N − 1) parameters to determine a kaq metric (not normalizing so that det(g ij ) = 1), whereas the space of metrics g ij on su(2 N ) has dimension 4 N (4 N −1) 2 (again without the determinant normalized). Up to log factors, the kaq metrics are asymptotically of square root dimension. Of course since our numerics is for small N we should also investigate N = 2, where we find equality between the two counts: 8 · 15 = 8 · 15 = 120. Clearly we over counted the degrees of freedom in kaq-metrics by treating the principal axes independently. To show that even when N = 2 kaq is a proper subvariety we estimate its local dimension around a generic, normalized metric g ij which is diagonal in the so-called Pauli-word basis PB n defined below.

Definition 2.3 ([7, page 3]). We use the Hermitian Pauli operators
Pauli operators, such as I ⊗ X ⊗ Z ⊗ I ⊗ I for n = 5. All Pauli words for any n give the Pauli-word basis called PB n . The weight of w is the number of non-I letters, which is two in the given example.
We suspect that 21 is actually the maximum kaq strata dimension in Her 0 (4).

Loss Functions
First, we define the loss functions which local minima give us the metrics, and then the loss functions that check their kaqness.

Loss functions to find metrics.
We review the perturbed Gaussian integral (inspired from [2]) used to define the functionals in [7]. Let We recall the definition of the structure constants c k ij of the Lie algebra (2) [y i , y j ] = c k ij y k and c ijk = c k ij g k k . The real and imaginary part of F k will be of interest: Eq. (1) is the most natural nontrivial perturbed Gaussian integral from the tensors g and c. If instead in Eq. (1) we wrote the more obvious integration over R 4 n −1 , replacing G with g and x with y ([7, discussion around Eq. (9)]), the skew-symmetry c [ij]k = 0 would kill the cubic term leaving the Gaussian integral unperturbed. Another functional is derived from the Euclidean version of the above, where −i in the exponent is replaced by −1.
The expansion of the perturbative series of Eq. (1) yields a series with the m-th term This can be computed up to third order as in [7] (see also [2,Equation 1.7 onwards]). This gives a summation of m = 2, 4, and 6 vertex trivalent tensor networks, where vertices are labelled by c and edges by g or g −1 .
It is not hard to see that Im(F k ) = f k,2 corresponds to m ≡ 2 (mod 4), while m ≡ 0 (mod 4) gives Re(F k ) = f k,1 . The Euclidean version has alternating signs ±1 depending on m ≡ 0, 2 (mod 4). So the numerical experiments are based on m = 2, 6 and on m = 2, 4 for the Euclidean version.
Computing the above involves a contraction of a trivalent network without any loops, and m vertices c ijk and edges g ii , g jj , g kk . Furthermore, Eq. (2) implies that vertices can be labelled by c k ij instead of c ijk , while edges are labelled by g ii , g jj and g kk instead of g kk . We refer to [7, for the full list of the tensor diagrams. In Fig. 1, we borrow an example from the reference for each m = 2, 4, 6:  Figure 1. Theta, tincan and prism diagrams. All diagrams are trivalent networks without any loop, and vertices are the structure constants c k ij . Each vertex has indices i, j, k which are paired with their counterpart in another vertex. This pairing is done using g along edge of type k (colored red) and g −1 for type i and j. Tincan and prism are given with some sample labeling. Red lines are labelled by g and black lines by g −1 .
Up to third order, we obtain the following functionals: Remark 3.1. Numerical and theoretical evidence shows that each diagram is convex or concave with critical point at g = Id (see [7,Appendix C]). Hence a signed sum of these diagrams gives local minima around a local maximum at g = Id.
To fix the volume with det(g) = 1, we found it to be numerically more stable to take a Lagrangian approach instead of normalizing by det(g) = 1 [7]. Hence we added the term (det(g) − 1) 2 with a high enough coefficient to the loss function, giving us the final formulae: where r 1 ≥ 1, r 2 >> 1. The solutions found through gradient descent were experimentally checked to be local minima and these generally have highly degenerate eigenspace. There are many ways to initialize the gradient descent [7, Section 2.2]. Here, we choose the method GenPerturbId for initialization of g: The gradient descent starts at a random metric given by a Gaussian perturbation of the identity metric. If we were to restrict ourselves to only diagonal metrics, kaqness would have been trivial as the basis is the Pauli word basis ([7, Theorem 2.1]).

3.1.2.
Basis. For n a power of 2, we use the Pauli-word basis (as in [7]), and for n = 5, 6, 7, we use the generalized Gell-Mann basis [1] which is well-known trace orthonormal basis of su(n).
For n = 4, 8, the Pauli-word basis was chosen because of the favorable properties of its structure constants and the implications on the gradient descent (studied in [7, Appendix A & B]). For example, the gradient flow starting at a diagonal metric g (DiagPerturbId [7]) will remain in the space of diagonal metrics, thus always giving kaq local minima. As a result, this basis has been shown to provide a better source of kaq examples (see also Section 4.3). In this work, we investigate if the gradient flow starting at a nondiagonal metric g (GenPerturbId instead of DiagPerturbId) still delivers kaq examples.
We do not have a Pauli-word basis for n = 5, 6, 7, however the well-known Gell-Mann basis coincides with the Pauli-word basis for su(2) and the Gell-Mann matrices for su(3) (acting on qutrits as the basis for the Gell-Mann's quark model), and can be seen as the generalization of both but acting on qunits.
Remark 3.2. In the particular case of n = 6, we also consider another basis, that can potentially lead to more kaq solutions: {(Gell-Mann basis of u(2)) ⊗ (Gell-Mann basis of u(3))} − {Id ⊗ Id}. We call this basis the tensor basis for su(6). Remark 3.3. We emphasize that, for numerical reasons, the search for (almost-)kaq metrics is basis independent, but it is important to search using different well-known bases, especially those which already have a tensor factorization, such as the Pauli-word basis or the tensor basis for su (6). As noted before, some of these bases may put the search in a better position to find (almost-)kaq examples.

3.1.3.
Why simulate only up to 3 qubits or n ≤ 8? Current computers are usually capable of simulating quantum computations involving many more than just 3 qubits. However, we need to evaluate large, 6-vertex trivalent diagrams. Contraction of these diagrams has to be done in a way to minimize the memory footprint; e.g. the last step of the contraction is always a diagram like Theta, but potentially with more edges. Each edge represents n 2 − 1 indices. Some contractions would end with two vertices with 5 edges left. This makes the total dimension of these (n 2 − 1) 5 . Assuming 4 qubits, this means n = 16 and so 255 5 = 1, 078, 203, 909, 375 many parameters.
For our computations we have to use GPU acceleration as otherwise the runtime of convergence of the gradient descent would be prohibitively long. 4 billion parameters take about 1 GB of VRAM on the GPU meaning at least ∼ 270 GB just for storing the parameters of each vertex. Unlike deep learning computations which can be parallelized across multiple GPUs, our computations require at least each node to be on a single GPU and a single Tesla V100 has only 32 GB of VRAM, far smaller than 270. We should emphasize that unlike many quantum computation settings, where matrices are sparse, we do not have this advantage to help us with storage. Even though at the beginning phase, the structure constants are sparse, once a contraction by a general metric g is done, the sparsity is no longer present.
It should also be noted that the requirement for the gradient descent is even larger: Adam gradient descent involves two other quantities (momentum and acceleration) attached to each parameter, making the total VRAM requirement 270 × 3 GB for each vertex. This is a very low underestimate and the true requirement may be a multiple of that; For example, experiments have shown that su(n = 9), with (n 2 − 1) 5 ∼ 3.2 × 10 9 , requires more than 16GB of VRAM.
Remark 3.4. Another fundamental issue we face beyond n = 8 is that of numerical stability. As n grows, we must also increase k to make it possible for the computer to detect local minima around identity. Otherwise, the value of the functionals F 24 , F 26 at identity would be too close to the local minima around it for the computer to be able to distinguish them. On the other hand, increasing k makes the value of the functionals too large for the computer to give consistent results, even when using double-precision floating-point.

3.1.4.
Ricci scalar curvature. For all unimodular Lie algebras, the Ricci scalar curvature is defined as [8]: For su(2 n ), this simplifies to See [7, Figure 7] for the diagrammatic formulation. As mentioned in [7], to find critical points, ||∇R|| must be minimized. This gradient can be computed explicitly using Eq. (10). Gradient descent on this problem does not work and evolutionary algorithms must be used [7]. However we report that no new local minima were found even after a more comprehensive evolutionary search.

Loss functions for checking (almost-)kaqness.
Once solutions in the previous section are found, we would like to check if they are (almost-)kaq. Using entropy, one can define a loss function which evaluates to zero if and only if the solution found is kaq. The question is: what are the parameters of our loss function?
Let g be a solution for any of the above loss functions, and let the eigenbasis of g be {iH 1 , . . . , iH n 2 −1 } where all are normalized in l 2 -norm. There are two sources for the parameters. The first set of parameters describe the conjugation of the eigenbasis by some U ∈ U (n), which is what describes the function j in Definition 2.2. However the choice of each eigenspace basis is not unique, specifically, every degenerate eigenspace of degree d can afford an independent change of basis. Thus every such eigenspace gives additional parameters describing an orthogonal matrix V ∈ O(d). So the number of parameters is is the degeneracy pattern of g. We use θ to denote all these parameters.
3.2.1. Computing entropies. After the above two transformations, by abuse of notation, let the new orthonormal eigenbasis be {iH 1 , . . . , iH n 2 −1 }. Then viewing each matrix H j as a n 2 × 1 vector v j , and given n = l i=1 p i , we compute the entropy s ij (g, θ) for 1 ≤ i ≤ l, 1 ≤ j ≤ n 2 − 1.
This is the entropy for the decomposition v j = v j,i,1 ⊗ v j,i,2 where v j,i,1 ∈ C p i ⊗ (C p i ) * and v j,i,2 ∈ C n/p i ⊗ (C n/p i ) * for 1 ≤ i ≤ l. We compute this entropy by taking the Schmidt decomposition v j = l α jil w j,i,l,1 ⊗ w j,i,l,2 , where w j,i,l,1 ∈ C p i ⊗ (C p i ) * and w j,i,l,2 ∈ C n/p i ⊗ (C n/p i ) * , and so s ij (g, θ) = l −|α jil | 2 log(|α jil | 2 ).
The Schmidt decomposition provides the bonus of also having the best candidate v j,i,1 ⊗ v j,i,2 for the tensor decomposition of H j , i.e. v j,i,1 = w j,i,o,1 , v j,i,2 = w j,i,o,2 for o = argmax l |α jil |.
Remark 3.6. With the exception of n = 8 which has three primes in its prime decomposition, all other examples of (almost-)kaq have only two. Therefore, we sometimes use s j as s 1j = s 2j .

3.2.3.
Almost-kaq loss function. It is clear how the above loss function can be altered for cases like n = 5 (n = 7), when one might be interested in a decomposition of the form . This can be done by computing the entropy for the 4 × 4 (6 × 6) upper-left block of each H j , while also adding the norm squared of the 8 (12) entries outside the blocks to the loss function to encourage a block diagonal decomposition.
3.2.4. Gradient descent details. We had to use the less sophisticated SGD (Stochastic Gradient Descent) algorithm with learning rate 1e-3 and momentum 0.9 for the gradient descent (other hyperparameters were set as their default in PyTorch). The alternative (Adam) was found to have issues, likely due to the fact that PyTorch has recently been updated to include gradient descent on real-valued functionals with complex parameters.
Remark 3.7. We observed that kaqness had either strong indication of being present with max i,j s ij ∼ 10 −3 , or otherwise, where mostly max i,j s ij > 0.5. Thus three orders of magnitude typically separate positive from our negative finding of kaqness.
Remark 3.8 (Tolerance margin). To distinguish between different eigenvalues, we used a tolerance margin of 0.02 (before changing g −1 back to g). Hence, in our searches, we gathered the eigenvalues that were the same up to 0.02 as corresponding to the same eigenspace. This choice was made by observing that sometimes solutions with very close spectrum had slight differences (order of 1e-2) in their eigenvalues. Note that a higher tolerance margin increases the number of parameters θ in L kaq (g, θ), as it increases the degeneracy dimensions, thereby potentially increasing the chances of getting kaq solutions. We shall first list the values of k, r 1 , r 2 chosen for n = 5, 6, 7 in our experiments. As mentioned in Section 3.1.1, the gradient descent initialization in all cases is GenPerturbId. For n = 4, 8 we refer to [7, Tables 1-2 (GenPerturbId)]. We note that for n = 6, as mentioned in Remark 3.2, we have two different bases: the usual Gell-Mann basis, and the tensor basis coming from u(2) ⊗ u(3). The second table in each of Table 1 and Table 2 are for the tensor basis.  su(6) (tensor basis) k = 300, 600 r 1 10 2 r 2 10 5 su(6) k = 300, 600 r 1 10 2 r 2 10 5 su(7) k = 400, 800 r 1 10 4 r 2 10 4 Table 2. The scaling factors for L 26 .

Degeneracy patterns and (almost-)kaqness.
We list the local minima found in our search by their degeneracy patterns and their (almost-)kaqness, as we did in [7]. For each of the 8 tables listed above, we ran the simulation for 15 different random seeds. For n = 4, 8, there are also around 15 simulations for each configuration. We shall also list again the patterns found in [7] for n = 4, 8, this time with their kaqness specified. In doing so, we note that some of the patterns borrowed from [7, Section 3.1] for n = 4, 8 do not reappear exactly as they were; for example, (1,3,1,8,2) reappears as (1,4,8,2) in Table 3. This discrepancy is due to taking a different (higher) tolerance margin for declaring "degeneracy" (Remark 3.8).

Remarks on the presentation of the results.
(1) Within the description and captions, we will use (d i ) for the degeneracy pattern (Definition 3.1) and thus, d i refers to the dimension of an eigenspace. (2) In all tables, we mention the number of different patterns.
(3) In some of the tables, we have to give some explanation on the solutions and their (almost-)kaqness. (4) In some cases a lot of different patterns are found, in which case, we mention the best example(s), e.g. ones that are kaq with max ij s ij ∼ 10 −3 or the closest example to almost-kaq in terms of the value of the loss function. (5) In some other cases, like in Table 3, only a few patterns are found and we list them as "(d 1 , . . . , d t ) : x/y" meaning x solutions out of the y solutions with pattern (d 1 , . . . , d t ) are (almost-)kaq. Furthermore, as mentioned before in Remark 3.7, we have max ij s ij ∼ 10 −3 in such cases. (6) Some tables (like Tables 6-7) only show solutions for a single (higher) value of k. In all such cases, the lower value gave solutions very close to identity (see Remark 3.4), so we decided not to include them. In the next section, we will draw some conclusions on the results. We list the results below.  Table 3. Kaqness for L 24 on su(4). k = 200 k = 400 15 patterns. Best pattern (3, 2, 6, 2, 2, 2, 5, 1, 1) has max j s j = 0.344 and mean(s j ) = 0.06, indicating a fairly precise tensor structure. The average of the norm squared of entries outside the blocks is ∼ 0.046. Other patterns with maximum entropy of 0.49 and 0.69 are present as well.
12 patterns. Best pattern (1, 1, 1, 10, 2, 2, 2, 1, 2, 2) appears twice with max j s j = 0.064 and mean(s j ) = 0.011, also a fairly precise tensor structure. The average of the norm squared of entries outside the blocks is ∼ 0.044. Other patterns with maximum entropy of 0.075 and 0.69 are present as well. Table 4. Almost-kaqness for L 24 on su(5). See Remark 3.6 and Section 3.2.3 for how to compute entropy s j (g, θ) for C 2 ⊗ C 2 ⊕ C. Note that the average of a random entry from an l 2 -normalized 5 × 5 matrix is 0.04 (to be compared with above).
kaq decomposition to C 2 ⊗ C 3 with tensor basis (Remark 3.2) k = 300 k = 600 15 patterns with very small d i s; none were kaq as minimum max j s j among all solutions was 0.9. 15 patterns, one without any degeneracy. None were kaq as minimum max j s j among all solutions was > 1. kaq decomposition to C 2 ⊗ C 3 with Gell-Mann basis k = 300 k = 600 13 patterns with overall better degeneracy than above, but none were kaq as minimum max j s j was 0.69.  Table 5. Kaqness for L 24 on su(6). We note how it becomes harder to find solutions with higher degeneracy dimensions d i s, hence decreasing the dof of L kaq to find kaqness.
15 different patterns with small d i s and no pattern close to being almost-kaq.
Same as above. Table 11. Almost-kaqnes for L 26 on su (7). Note that there are 24 entries outside the matrix blocks for the C 2 ⊗ C 2 ⊕ C 3 decomposition, and 12 for C 2 ⊗ C 3 ⊕ C.

Remarks on the results.
We make a few general remarks regarding the results.
(1) The diversity of the degeneracy pattern is lesser for L 26 compared to L 24 .
(2) (Almost-)Kaqness is more present in L 26 solutions than in L 24 's.
(3) As k increases, the d i s become smaller in dimension, which should make (almost-)kaqness generally harder to find, as there are less parameters for L kaq to play with to reach the value of 0 (see Section 3.2 for parameters count). This issue can be seen e.g. in Table 11 for k = 800. (4) The d i s are also smaller for n not power of two. In some cases most or all d i = 1, especially for L 24 . This decreases the dof of L kaq to find (almost-)kaqness. (5) There are also more degeneracy patterns found for both L 24 , L 26 when n is not a power of two, so much that we did not list all patterns found in some cases. (6) With regards to almost-kaqness, we see that in most cases the entries outside the blocks have norm close to that of a random entry from a hermitian matrix of the same size. Therefore, even though the upper-left block is very close to a tensor form, the entries outside of the blocks have not been completely suppressed to zero. (7) The basis used for n = 4, 8 has been the Pauli-word basis [7]. Changing this basis to the Gell-Mann basis did not give any new pattern or kaqness result. 4.4. Lie subalgebras among (almost/partial-)kaq solutions.
Some local minima are associated with Lie subalgebras. This was observed in [7], where we defined a solution to belong to sub n when a Lie subalgebra can be found that corresponds to one (or a combination of some) of its eigenspaces. It is important to disambiguate generic kaq minima from those associated with Lie subalgebras as the latter may represent a distinct symmetry breaking process. Here, we show in Figures (2,3,4,5,6) which of our almost/partialkaq solutions are also sub n . We were previously [7] able to find kaq solutions which are not sub using diagonal initialization of gradient flow (DiagPerturbId), however as the figures show, there are also many such examples for n = 4, 6, 8 using the GenPerturbId method.
In each figure, solutions of L 24 (left) and L 26 (right) are put into a Veen diagram. Some solutions appear twice because they were instances of local minima with the same degeneracy pattern where one was sub n and the other was not.
For each solution g, first, we computed its eigenbasis {iH 1 , . . . , iH n 2 −1 } with eigenspaces su(n) = ⊕ t i=1 E i . Then, given a proper vector space V = ⊕ i∈I E i with dim V > 1, formed by a subset I of g eigenspaces, of which there are 2 t − 2 − |{i| dim E i = 1}|, we computed the distance of the bracket of every two elements from the basis of ⊕ i∈I E i to V itself. If all distances were less than 1e − 2, g was confirmed as an instance of sub. Otherwise, if no combination yielded a Lie subalgebra, g was classified outside of the sub diagram. The same approach was taken in [7, Figures 9-10].
We compute the "b-score" or "body-number-score" of the kaq solutions found for su(8) (see the motivation in Remark 1.1). Notice that such solutions were only found in the L 26 results in Table 12. There were 11 many for k = 500 and one for k = 1000. The goal here is  (2)). The same is true for (1, 1, 2, 4, 2, 1, 2, 2) where the second 1D along with the next 2D form an su(2). The 10D eigenspace in (10, 5) also gives a Lie subalgebra (likely isomorphic to sp(2) as observed in [7]).   (6): There are more non-sub patterns than sub. Using tensor basis, we obtain only non-sub solutions such as (10, 3, 1, 16, 5), however the same pattern obtained by using Gell-Mann basis, gives only sub solutions (by the combination of 10D + 3D + 1D or 10D + 3D eigenspaces), hence why this pattern appears twice.
to look for the structure of the Brown-Susskind penalty metrics emerging. In these metrics the norm of a g-principal direction decays exponentially with its weight or body-number. We see, if anything, only a weak signal at n = 8.
The b-score for any H j = A j ⊗ B j ⊗ C j decomposition is defined as (1 − det(A j ))(1 − det(B j ))(1 − det(C j )) (note that ||A j || 2 = ||B j || 2 = ||C j || 2 = 1). We compute the b-score of every one of the 63 eigenstate of g for the 12 solutions mentioned previously. We then  On the left (L 24 ), we have one degeneracy pattern which is partial-kaq and all its instances are sub. On the right (L 26 ), the degeneracy pattern (10, 15, 1, 32, 5) for k = 500 has 11 kaq instances and 17 partial-kaq instances (which obviously includes the previous 11), all of which are nonsub. However the same pattern for k = 1000, with only partial-kaq results, is sub (by the combination of 10D + 15D eigenspaces), along with all the other partial-kaq solutions for k = 1000. compute the correlation of the b-score of H j with the eigenvalue of g for H j . The best result, correlation of −0.145, was for one of the 11 patterns (10, 15, 1, 32, 5) with k = 500. In Fig. 7, we show the scatter plot of the b-score of H j and its eigenvalue. As the reader can see, the downward trend is only barely perceptible; a tendency toward Brown-Susskind geometrics is not yet confirmed. Indeed, other local minima metrics on su (8) show similarly weak trends but with the opposite sign.

Summary and Outlook
Through a numerical study of SSB from the GUE on su(n), n = 4, 5, 6, 7, and 8, kaq and almost-kaq metrics (and their associated probability distributions), we find across two classes of natural functionals a convincing pattern. Kaq local minima are common for n composite and almost-kaq local minima are common for n prime. It appears that nature "likes" to organize large Hilbert spaces into tensor products of smaller ones, leaving a few ("repêchage") dimensions to the side as necessary. This finding opens the door to number theory-being chiefly the study of primes-to enter the foundations of physics in a new way. In string theory [10], arithmetic structures on Riemannian surfaces provide a long-standing Figure 7. Pattern (10, 15, 1, 32, 5) with k = 500 and eigenvalues found by the gradient descent (before re-inverting g). x-axis is the eigenvalue which takes values at (1.32, 1.28, 1.236, 1.01, 0.24), and y-axis is the b-score. We see a modest negative (−0.145) correlation.
connection to number theory. The "prime factor" and "leaky universe" scenarios, described in Section 1, can now be added to this.
More generally, it should be incumbent on any foundational discussion such as ours to attempt contact with string theory. One way, as mentioned, could be through a common number theoretic context. Another is through geometry. We thank Greg Moore for the observation that, at least for the bosonic version, the Leech lattice could be the key to picking out the microscopic dimension of space-time. It is a natural goal, once interacting dof have appeared (as they now have) to see if they naturally organize themselves into a lattice geometry, perhaps even Leech-like (as opposed to e.g. a complete graph). This is utterly beyond naive quantum simulation, but could perhaps be approached with the help of an effective model, the analogy of Crick, Watson and Franklin studying DNA with a ball and stick model.
With more powerful computational resources and better techniques (Section 3.1.3), it might be possible to study SSB at n = 16 to see if we can confirm the "hint" of penalty metric structure that we discussed at n = 8 in the anti-correlation of the b-function with norm of the principal axes (Section 4.5).
We thank Adam Brown for a suggestion we hope to follow. Rather than looking only for the "initial Hamiltonian," he suggests one should look for a triple: (H 0 , ψ 0 , entropy(t)) = (Hamiltonian, initial state, and the behavior of entropy growth on subsystems). There appears to be something magical in how the universe's H 0 and ψ 0 conspire to allow subsystem entropies to gradually and uniformly increase over billions of years. Our universe is decidedly not a "Boltzmann brain," look under any rock and you will see entropy on the increase. It seems that we can adapt the discussion of functionals f : {metrics on such g ij } → R to f : {g ij } × {initial ψ 0 } → R by treating ψ 0 as a source in the Feynman diagram. Then a local minimum (g ij , ψ 0 ) gives rise to a probability distribution :: e −g ij H i 0 H j 0 from which we draw H 0 to obtain the pair (H 0 , ψ 0 ). At least in the case of su(4) it appears quite realistic to study entropy growth S(t) w.r.t. any kaq decomposition, 4 = 2 × 2. We would look for interesting transients in the behavior of S(t), which might be manifest before its quasi-periodic nature dominates.