SO(8) Supergravity and the Magic of Machine Learning

Using de Wit-Nicolai $D=4\;\mathcal{N}=8\;SO(8)$ supergravity as an example, we show how modern Machine Learning software libraries such as Google's TensorFlow can be employed to greatly simplify the analysis of high-dimensional scalar sectors of some M-Theory compactifications. We provide detailed information on the location, symmetries, and particle spectra and charges of 192 critical points on the scalar manifold of SO(8) supergravity, including one newly discovered $\mathcal{N}=1$ vacuum with $SO(3)$ residual symmetry, one new potentially stabilizable non-supersymmetric solution, and examples for"Galois conjugate pairs"of solutions, i.e. solution-pairs that share the same gauge group embedding into~$SO(8)$ and minimal polynomials for the cosmological constant. Where feasible, we give analytic expressions for solution coordinates and cosmological constants. As the authors' aspiration is to present the discussion in a form that is accessible to both the Machine Learning and String Theory communities and allows adopting our methods towards the study of other models, we provide an introductory overview over the relevant Physics as well as Machine Learning concepts. This includes short pedagogical code examples. In particular, we show how to formulate a requirement for residual Supersymmetry as a Machine Learning loss function and effectively guide the numerical search towards supersymmetric critical points. Numerical investigations suggest that there are no further supersymmetric vacua beyond this newly discovered fifth solution.


Introduction
Google's primary open source library for Machine Learning, TensorFlow [2], has many potential uses beyond Machine Learning. In this article, we want to show how it also is an excellent tool to address one specific technically challenging M-Theory research problem: Finding static field configurations of dimensionally reduced models with known but structurally complicated potentials, such as SO (8) supergravity [3] [4]. The underlying computational methods can be readily generalized to other models, including for example maximal five-dimensional supergravity [5].
For the very impatient reader, there is an open sourced Google Colab at [6] that runs an efficient search for vacuum candidates of SO(8) supergravity and can be used interactively from within a web browser, alongside additional Python code to analyze numerical solutions at [7].
Naturally, as this article is of interest to both the global Strings community as well as Machine Learning researchers and wants to encourage more dialog and exchange, it seems appropriate to start with brief introductions to both these topics that are accessible without specialist background. The authors hope that, rather than boring experts in the corresponding fields, they may find in these explanations some bits and pieces that might be useful for giving presentations to general audiences. Also, given the sheer amount of data that these new methods can produce so easily, there are opportunities for many related undergraduate projects, and also in that context, giving a "big picture" overview that does not assume much specialist knowledge may be useful.

On M-Theory
"M-Theory", or "The Theory Formerly Known as Strings" [8] is a so far only partially explored and understood unifying framework for studying (some) field theoretic models of quantum gravity. The five known (very likely) consistent ten-dimensional Superstring theories (including compactifications to lower dimensions), as well as 11-dimensional supergravity are understood to be different limits of M-Theory dynamics [9].
Despite the remarkable success of the Standard Model (SM) of Elementary Particle Physics [10], which quantitatively describes the properties and interactions of matter and force particles so well that the LHC at the time of this writing did not to come up with clear evidence of "new" (beyond-the-SM) physics, there are a number of unsolved problems, for example non-observation of the particles that constitute Dark Matter [11], or explaining why the neutron's electric dipole moment is too small to be measurable [12]. The most puzzling such problem of theoretical physics currently is perhaps explaining the observed positive -but from a quantum field theory perspective extremely small -vacuum energy density [13] of the universe. M-Theory currently struggles to give an answer to how this could arise naturally.

The need for Quantum Gravity
One painful shortcoming of the SM is that it is a quantum theory of only three of the four fundamental forces, namely the Electromagnetic, Weak, and Strong Force, but not Gravity. In the SM, the nongravitational forces are described by vector gauge boson exchange, whereas gravity cannot be quantized in such a way, as the conserved current that the graviton couples to is not a vectorial quantity like the conserved electrical current (which describes transport of an undirected quantity, electrical charge, in spacetime) that the photon couples to. The conserved current of gravity is the energy-momentum current, i.e. an object that describes transport of a directed(!) quantity in spacetime. Still, as gravity also is a gauge theory, the notion of field strength as (some form of) local curvature provides a useful general framework for describing models of force and matter particles.
The Heisenberg Uncertainty Principle (HUP) requires that gravity also must be governed by quantum theory -for if it were not, one could (for example) come up with a thought experiment that couples a gravitational oscillation to a quantum oscillator and breaks the HUP. This then would lead to even worse consistency problems, such as violation of causality. In general, internal consistency considerations somewhat like this one provide very strong constraints on the construction of relativistic quantum theories. More specifically, it can (and indeed often does) happen that a proposal for a quantum field theory superficially appears viable, but closer investigations reveal that higher order quantum effects violate some fundamentally important property 1 -typically some symmetry, hence invalidating the construction. If, for example, such a "symmetry violation by (quantum) anomalies" [14] [15] [16] happened for one of the gauge symmetries of a model which are essential to describe massless forcecarrier particles, we would end up with a broken "theory" that might for example predict negative(!) probabilities for some quantum processes.
In this sense, a general key question in quantum field theory is which constructions can be made to work and which not, and understanding this as well as all the relevant mechanisms is important independently of any questions about whether a particular model could describe our universe. Sometimes, one finds that a collection of problems formerly considered unrelated have a common origin and a single deep insight leads to solving them all in one go, i.e. any attempt to address them in isolation would have been doomed to fail. Concerning the problem of quantizing gravity, it might be that this can be tackled in isolation, i.e. there could be a "mostly internally consistent" theory of quantum gravity alone, along the lines of Quantum Electrodynamics being a mostly internally consistent quantum theory of photons and electrons [17]. It could also be that, given that the natural length scale for quantum gravity phenomena is the Planck energy, c 5 /G ∼ 10 19 GeV, there will be no way to detangle quantum gravity from the dynamics of the other particles and their interactions at much lower energies. This is indeed the more plausible scenario, as in Quantum Field Theory, different quantum fields as a general rule only completely detach if there is some good reason for that. Also, one notes that the energy scale at which Grand Unification of the SM forces is expected to happen is not too far below the Planck scale.

"Renormalization"
Concerning the problem of finding a mostly self-contained quantum theory of gravity, an immediate problem is that the force particle, the Graviton, is a massless helicity 2 -2 particle. This goes hand in hand with the graviton coupling to the conserved (four-dimensional) current of (four-dimensional) momentum, known as the stress-energy tensor 3 . For this reason, gravity affects all particles in the same way. Also, the "coupling constant" that describes the strength of gravity -Newton's Constant G N -hence needs to be a dimensionful number which, using fundamental constants and c, can be cast into the form of {number} · {length} 2 .
This immediately tells us that any direct approach to quantization e.g. along the lines of Quantum Electrodynamics (QED) is bound to fail -unless some miraculous cancellation of terms happens which even then one would very much like to understand. The reason is that in QED, one builds the theory from rather artificial conceptual entities, such as an electron without its electromagnetic field around it (i.e. a not-gauge-invariant quantum state), and then tries to compute quantum mechanical probabilities by taking increasingly complex photon/electron interaction processes into account. While the individual contributions are expressed as integrals over all possible intermediate-particle quantum states that typically have divergences coming from regions where the separation between intermediate particle creation and annihilation goes to zero, this is essentially an artifact of building from a highly nonphysical concept of a "naked" electron. There nevertheless is a well-defined process that produces meaningful predictions by using a fixed set of experiments at one energy in order to determine the parameters of the model (which will misbehave when naively taking all inner-particle integrals to sum over all energies), and then compute probabilities for another experiment (such as: at a different energy) [18] [19] 4 . This works because one can meaningfully decompose processes by summing over all quantum amplitudes of processes that "have a given set of terminals", i.e. where the incoming and outgoing particles are known. In QED, where the coupling constant α QED ≈ 1/137 is dimensionless, complicated scattering processes with (say) one incoming and one outgoing electron and one incoming photon look like corrections to the basic electron-photon scattering process that can be absorbed into a re-definition of the coupling strength. In quantum gravity, graviton-graviton scattering processes with the same set of incoming and outgoing gravitons that differ by the number of intermediate scattering processes involve different powers of the coupling constant, which here is dimensionful, so in order to sum such quantum amplitudes, one would need coefficients of different (length) dimension, (at least) one per power of G N 5 .
2 "Helicity" gives the number of phase oscillations that a massless particle's circular-polarized quantum state undergoes under a 180 • -rotation around the direction of flight. Rotating a polarized gravitational wave by 90 degrees around the direction of flight gives us the same polarization -for a polarized light ray, this only happens after a 180-degree rotation. 3 The spatial-spatial part of this is known in Engineering and also Fluid Dynamics as the "Stress Tensor". The temporal-temporal part is energy density, and the "mixed" part is energy flux, or, equivalently, momentum density. 4 From a more algebraic perspective, the problem is that the expression that implements the HUP, i.e. the nonvanishing commutator between the position and momentum operator [X,P ] = i , for a quantum field theory involves a failure-to-commute localized at a spacetime event, schematically [Φ X ( x),Φ P ( y)] ∼ iδ( x − y). So, as the commutator of field operators is a distribution, the field operators themselves have to be distributional quantities, and, considering quantum field theoretic interactions happening at spacetime events, it is just as much impossible to give a mathematically well-defined meaning to products of distributional field operators as one can make sense of the square of a Dirac delta distribution δ 2 (x). Renormalization is about aligning the ambiguity in the definition of short-distance physics with experiment. 5 If, of course, there were some such constant K so that a factor G m N always goes with a factor of K m , this would mean that there were a way to reformulate gravity with a dimensionless coupling constant G N K. This situation is rather similar to the story of Neutron decay before the advent of the Electroweak model [20] [21] [22]: The historical quantitative description of Neutron decay by E. Fermi [23] involved a four-fermion interaction term, also with a "bad" This would mean that we would need an infinite number of physical parameters to describe the dynamics. While such a scenario is perhaps conceivable, one would (in principle) have to perform an infinite number of measurements to determine an infinite number of parameters before such a "theory" could make predictions. Of course, physicists are prejudiced in that "physics works", i.e. we can make predictions about the world from models that require only finitely many experiments to fix all parameters (this might be wrong). This is then the problem of "renormalizability" that afflicts quantum gravity. Still, there indeed are some ideas that lead to such "miraculous" cancellations which may render quantum gravity tameable.
One such idea that is somewhat easy to understand and which we will come back to is that processes in quantum field theory in which a fermion 6 occurs (such as in the form of an intermediate electron/positron pair that gets created and destroyed as part of a larger process) must carry a factor −1 (per fermion loop) in their quantum amplitude, while corresponding processes with an intermediate boson 7 do not have such a factor. So, if there were a quantum gravity theory in which bosons and fermions are paired up, perhaps in a way that only becomes visible at high energies (equivalently: short distances), it could be hoped that the formally infinite short-distance contributions from intermediate particles can be always matched up in a way so that they cancel. In such a scenario, it would then be pointless to look for a "good" theory of only quantum gravity and not other particle physics, since only the combination of Gravity (bosons) and Matter (fermions), and also other bosons (somewhat like the photon, gluons, and Higgs boson) then would lead to a cancellation of pathologies. This essentially is the key belief about M-Theory 8 : quantum gravity likely cannot be separated from a quantum theory of all other particles -even if the main reason for the belief in its renormalizability is rather different 9 . It certainly is an appealing idea that basic mathematical consistency requirements dimensionful coupling constant, schematically of the form G F Ψ * p + Ψ * e − Ψ * νe Ψn. The further one moves away in particle energy from the energy scale of G F ∼ 1.17 · 10 −5 GeV −2 , the stronger this term with a "bad" dimensionful coupling constant gets suppressed, so that at energies that just suffice to create one or two electrons from the vacuum, processes mediated by this term have far longer half-lives than, for example, similar quantum electrodynamics processes. The time scale for the decay of an isolated Neutron is ∼ 10 3 s, while the time scale for the decay of an electron-positron pair is ∼ 10 −9 s. While this is comparing apples to oranges, the basic observation of the "Weak" force being very weak far below the electroweak energy scale indeed holds. In the "proper" renormalizable theory of electroweak interactions, the four-fermion contact term is the low energy limit of two interactions of each one electroweak gauge boson and a pair of fermions, with dimensionless coupling constant. The Weak force is weak precisely because we observe it far below its own energy scale and integrating out high energy quantum processes makes it increasingly irrelevant [24]. 6 A matter particle, such as an electron or a quark. 7 A force particle, such as a gluon, or a Higgs(-like) particle. 8 Historically, the boson-fermion cancellation mentioned here provided a good reason to think that boson-fermion symmetry, i.e. Supersymmetry, is a step in the right direction towards quantum gravity, and quantum field theories -in particular with Extended Supersymmetry -do have remarkably simplified renormalization behavior. For example, the scalar potential for SO (8) supergravity studied here will not receive any quantum corrections. Concerning the desire to make gravity renormalizable, it so turns out that the basic implementation of this idea, "minimal supergravity" still runs into problems due to higher-order quantum corrections, cf. e.g. [25] Nevertheless, even as renormalizability is resolved in a different way in Superstring/M-Theory, Supersymmetry is a crucial ingredient, and the general belief is that, ultimately, every supersymmetric field theory can be identified as some limiting case of M-Theory. In four dimensions, supergravity with the maximal, rather than minimal, amount of supersymmetry, which is also studied in the current article, has turned out to be much better behaved w.r.t. renormalizability than first expected, already without considering Planck-scale energy M-Theory corrections. It might even be fully renormalizable [26], [27] [28], and if this is indeed so, it would be very interesting to fully understand the mechanics of infinity-cancellations still at work even after pruning everything but a small number of fields from M-Theory. Still, the modern interpretation considers the observation that this very special model can be embedded consistently into an 11-dimensional model in which seven directions of space are compactified as more than a mere mathematical curiosity, and attributes physical significance to 11-dimensional dynamics. 9 For a "big picture" overview over how the renormalizability problem is solved in String/M-Theory by doing away might determine the particle content, or, to say it with Einstein, find a negative answer to the question "whether god had any choice in the creation of the world".

Unification
The unification of Quantum Electrodynamics with Quantum Chromodynamics (QCD) and the Weak Force into the SM is highly successful from a theoretical perspective, with both QCD and Electroweak theory individually being afflicted by problems that cancel in the SM [14] [30]. A key property is that in the SM, all forces are described in an uniform way by vector gauge bosons. Since spacetime symmetries (rotations and boosts) affect all vector gauge bosons in the same way, one can consider superposition quantum states between them 10 . Indeed, the SM Photon emerges as a specific quantum superposition of a Weak Force's particle and another "hidden" force's particle termed U (1) Y in a way that is governed by properties of the Brout-Englert-Higgs Boson. This also sets -for example -the relative strengths of the Electromagnetic and Weak forces. It is quite plausible that at even (much!) higher energies (but somewhat below the quantum gravity energy scale), the same mechanism also is at play in the form of "Grand Unification" [31] [32], merging the Strong Force with the Electroweak Force. The main new discoveries described in the current article are also about this "Higgs effect" [33] [34], leading to some particular sets of particles and interactions in "low energy" physics, so in terms of basic mechanisms this is rather similar to what is now well established SM physics. Our setting, however, is that of a model that might actually describe Planck-scale physics well, so if this construction ever turned out to somehow actually be related to the SM (see e.g. [35] [36] [37] for some speculation in this direction), interesting Physics that is not well understood yet would need to happen in the gap of ∼ 17 orders of magnitude between the Higgs boson energy scale and the quantum gravity energy scale 11 . In particular, there is a discrepancy on gauge groups, cosmological constant, and particle chirality properties.
More likely, this work describes a collection of (stable as well as unstable background-)solutions to the M-Theory field equations in a very different corner than the one that describes the Standard Model. Given our still limited understanding of M-Theory, it nevertheless seems useful to get a better idea about relevant properties, mechanisms, and phenomena by investigating solutions that are accessible with current technology. In the past, studying solutions to the 4-dimensional field theory equations and especially their embedding into M-Theory has taught us some very useful lessons about compactification mechanisms and about 11-dimensional dynamics. One in particular notes that SO (8) supergravity also describes the physics of a stack of M2-branes [38].

Kaluza-Klein Supergravity
The idea of extra spacetime dimensions entered quantum gravity research in two ways. String Theory, which originally was proposed to model hadron (baryon, meson) physics [39] [40] [41], was found to make two remarkable predictions: Existence of a massless helicity-2 boson, which meant that it had to contain gravity, as the only current such a boson can couple to is the energy-momentum current, and also a violation of the symmetries that are essential for the construction of the theory by quantum processes which only disappears (for a theory with matter particles) in a world with one time and nine spatial directions [42] [43].
with the notion of field interactions happening at (point-like) space-time events, see e.g. [29].
Independently, almost a century ago, so long before the advent of String Theory, it was observed by Kaluza and Klein [44] [45] [46] that it is possible to unify the two fundamental forces known back then -Electromagnetism and Gravity -by starting from the most straightforward five-dimensional generalization of General Relativity and considering a background geometry in which the extra direction is rolled up on a circle. Decomposing all fields in terms of their Fourier modes on the compactification circle, the five-dimensional kinetic term splits into a four-dimensional kinetic term plus a mass-like term in the four-dimensional theory that originates from the spatial dependency on the circle-coordinate. Taking the radius of the compactification circle to zero, the mass of the higher Fourier modes goes to infinity, and so these states decouple from the theory. The components of the five-dimensional metric g 5 decompose into a four-dimensional metric g 4 , plus the four-dimensional electromagnetic "vector gauge potential" A, plus an additional ("Higgs field like") scalar field φ which whose role was not understood in the days of Kaluza and Klein. Curiously, at about the same time, Dirac's theory of the electron [47] also asked for the introduction of a then novel mathematical entity, a complex four-dimensional spinor, for which it is straightforward to define not only the action of four-dimensional but also five-dimensional rotations.
While the observation that four-dimensional physics can be understood in the context of dynamics in higher dimension was certainly interesting, the question remained how to find guidance on what higher dimensional dynamics to start from. A major step forward happened in 1979, when Cremmer, Julia, and Scherk set out to construct the four-dimensional field theory with the maximum possible amount of boson-fermion symmetry [48] [49] [50]. Due to the complicated structure of the (Higgs-like) scalar boson interactions, this was achieved by first realizing that such a theory should be obtainable via compactification of a higher dimensional ancestor theory, and that there can only be one graviton with helicity ±2 and no interacting massless particles with higher helicity (due to nonexistence of a suitable source current), and that the highest dimension in which such a symmetry can exist is eleven (since otherwise dimensional reduction to four dimensions would give rise to too many supersymmetry generators that require going to helicities beyond +2 that cannot be consistently coupled). Constructing the 11-dimensional model first in [48], the maximally symmetric four-dimensional model in which all matter and force particles are unified succeeded in [49] via Kaluza-Klein reduction on a 7-dimensional torus.
The "auxiliary" 11-dimensional field theory, originally introduced as a mathematical trick, turned out to be very interesting in itself. For example, it so turns out that symmetry constraints completely determine its structure, and there is no way to adjust its parameters or field content. It almost certainly describes the low energy limit of an as yet unknown 11-dimensional theory of supersymmetric membranes (and perhaps other dynamical degrees of freedom) which, upon dimensional reduction on a circle also wrapped by one direction of the membrane, produces 10-dimensional Superstring Theory [9]. The unknown (perhaps) 11-dimensional theory of (likely) supermembranes has been given the provisional name "M-Theory".
As explained, a key ingredient in the effort to unify force and matter quantum fields is bosonfermion symmetry, or "Supersymmetry", as it is commonly called. This is presently is a somewhat esoteric topic outside of quantum field theory and some branches of differential topology, plus perhaps the theory of stochastic dynamical systems [51].
Very broadly speaking, symmetry is one of the most important fundamental concepts in quantum field theory. Symmetries give rise to conservation laws [52] 12 . For example, energy and momentum conservation are a consequence of fundamental equations being invariant under a shift of the coordinate and time origin. As already mentioned, in relativistic quantum field theories, "gauge" symmetries solve problems such as the correct relativistic scalar product for photon polarizations superficially allowing negative-probability quantum states for photons with polarization pointing in time-direction. For every photon wavelength and direction of flight, we have to eliminate one such problematic polarization direction from our "unphysical" description with negative probabilities, and it so turns out this is doable with a "one degree of freedom per wave-vector" (hence, infinite-dimensional) "gauge" symmetry.
For an interacting relativistic field theory, mathematical constraints strongly limit the opportunities to implement a consistent gauge theory. There are basically three options for the quanta of the gauge field strength: (1) Vector gauge bosons with an underlying abelian (giving us electromagnetic "photons") or non-abelian (giving us Strong Force "gluons") symmetry group. These are the forces of the Standard Model. (2) helicity-2 "tensor" gauge bosons (giving us General Relativity's "gravitons"). Here, as explained, it is hard to formulate a quantum theory that works. (3) helicity-3/2 (traceless) "Vector-Spinor" gauge bosons [53]. These can only be implemented if the theory also has helicity-2 gravitons [54] [55].
The corresponding symmetry transformations "rotate" bosonic (force particle) quantum states into fermionic (matter particle) quantum states and vice versa and have been named "supersymmetry transformations' [56]. Already from this "big picture" description, one can learn some interesting facts: (a) In Feynman's path integral formalism, bosonic fields are implemented in terms of "normal" numbers while fermionic fields are implemented in terms of anti-commuting numbers 13 . So, a symmetry of this type must have an anti-commuting(!) number as its "rotation angle". (b) The kinetic term of bosonic propagating degrees of freedom involves two derivatives (giving rise to equations of motion of the form ∂ µ ∂ µ φ(x, t) = 0), while the kinetic term of fermionic degrees of freedom involves one derivative (giving rise to Dirac-equation-type equations of motion of the form iΓ µ ∂ µ ψ = 0). So, for the fourdimensional spacetime volume integral d 4 x of a Lagrangian kinetic term to be a dimensionless number, a bosonic field (with kinetic term ∼ ∂φ∂φ) must have units of {length} −1 , while a fermionic field (with kinetic term ∼ ψ * (Γ · ∂)ψ) must have units of {length} −3/2 . So, a symmetry transform Q that "rotates bosons into fermions" must itself add a factor {length} −1/2 . From a sequence of two such transformations, we then can build a symmetry transformation that adds a factor {length} −1 , just like a derivative, which is the generator of shift symmetries 14 . As symmetries give rise to conservation laws, and one cannot have extra conservation laws in an interacting field theory that come from additional spacetime symmetries (this is the Coleman-Mandula theorem [57]), we indeed must find the derivative/shift symmetry/momentum operator in the product of two such "fermionic rotations" 15 . Hence, (c) such a "fermionic symmetry" actually rotates bosonic fields into fermionic fields, and fermionic fields into derivatives (spatial displacements) of bosonic fields, and so in a sense is a clever way to "take the square root of a shift". As a generalized space-time symmetry, it must act on all particles, and hence there must be a precise correspondence between bosonic and fermionic degrees of the energy E = q · Φ E , and there were some physical process that "destroys" charges, then measuring the energy required/released when destroying an electron would actually allow us to define an "absolute ground" potential, e.g. by postulating that the energy requirement for destroying an electron at Φ E = 0 is zero. 13 This realizes the Pauli Exclusion Principle in a rather intuitive way: if two electrons scatter from states A, B into states C, D, the quantum mechanical amplitudes for processes related by a cross-over carry an extra minus sign, which we get in the anticommuting-numbers formalism from swapping the order of the variables describing electrons. The overall amplitude M AB→CD − M AB→DC hence becomes zero for C = D, i.e. it is not possible for two electrons to scatter into (hence: occupy) the same quantum state.
14 Note that for an analytic function f , we have f (x + a) = f (x) + af (x) + 1 2 a 2 f (x) + . . . = (exp(a∂x)f ) (x). 15 The situation is more involved in models with more than the minimal amount of supersymmetry. freedom (i.e. every boson must have a fermionic "superpartner" particle and vice versa). Finally, (d) in General Relativity, there is no such thing as a "shift by a constant vector in space" (since spacetime is not flat), so we rather have to consider locally varying shifts, i.e. diffeomorphisms. These are an infinite-dimensional symmetry, and indeed, the corresponding gauge field for the "supersymmetry square-roots of locally varying shifts" is the helicity-3/2 "gravitino". This also sheds light on why one cannot have helicity-3/2 states in a theory without also having helicity-2 gravitons.

Supergravity in eleven and four dimensions
A supersymmetry transformation changes the helicity of a particle by 1/2 and so -in a theory of gravity -it must connect the helicity-2 graviton with a helicity-3/2 fermionic particle, the "gravitino". It is possible to construct models with more than the minimal amount of supersymmetry that then fuse more helicity states [58] [59] [60], allowing one to start with a helicity-2 graviton, apply a supersymmetry transformation to step down to the helicity-3/2 gravitino, and use another independent supersymmetry transformation 16 to further step down to a helicity-1 photon-like particle that couples with gravitational strength (a "graviphoton" [61]). A maximally supersymmetric theory in four dimensions, as obtainable through dimensional reduction of 11-dimensional supergravity, has N = 8 independent supersymmetries that connect all quantum states from the helicity +2 graviton down to the oppositely-polarized helicity −2 graviton, with (according to simple combinatorics) 8 k particles of helicity 2 − k/2, so in total one graviton, eight gravitini, 28 photon-or gluon-like force carriers, 56 spin-1/2 fermions, and 70 Higgs-Boson-like scalars. It was the discovery of this N = 8 supergravity, which manages to unify all particles and interactions starting from only a symmetry principle as input, that made S. Hawking suggest in his 1981 inaugural lecture [1] that within perhaps 20 years, we would know the "Theory of Everything".
A peculiar feature of this construction is that it interprets the 70 Higgs fields as parametrizing a very special 70-dimensional coset manifold. Just as the sphere can be regarded as the manifold of 3-dimensional rotations (read: orthogonal bases) modulo another rotation (around the outwardpointing direction), i.e. S 2 = SO(3)/SO (2), and the hyperbolic plane 17 can be regarded as the coset space SO(2, 1)/SO(2) = SL(2)/SO (2), the relevant scalar manifold of N = 8 Supergravity is 18 E 7(7) /SU (8), where SU (8) is the group of complex 8 × 8 matrices with unit determinant and the maximal compact subgroup of the 133-dimensional non-compact exceptional Lie group E 7(7) , which we describe in appendix A.
In Cremmer and Julia's original construction [49], which compactified 11-dimensional Supergravity on a 7-dimensional torus, there are 28 photon-like gauge boson particles. Soon after, it was realized that one can also obtain a four-dimensional theory with the maximal amount of supersymmetry by dimensionally reducing on the "round" 7-dimensional sphere 19 instead [65]. Here, one ends up with the 28 vector gauge bosons belonging to the non-abelian gauge symmetry SO (8), which may superficially be thought of as some sort of more complicated Quantum Chromodynamics (but with very differently behaving "quarks" and "gluons", and perhaps without confinement due to vanishing β-function). Clearly, given that such nonabelian gauge symmetries do play an important role in the SM, this looks like a major step in the right direction, but unfortunately, the group SO (8) is too small to embed the SU (3) QCD × SU (2) Weak × U (1) Y gauge group of the SM into it. Also, the experimentally observed left-right asymmetry of the SM ("chirality") cannot be obtained by compactifying non-chiral 11-dimensional supergravity on a manifold [66].
The early literature on this topic contemplated scenarios in which the observed particles would emerge as composite, being made of more fundamental "preons", somewhat along the lines of how QCD at lower energies gives rise to baryon and meson bound states. Given that there are in terms of energy perhaps 17 orders of magnitude between the quantum gravity scale and the Higgs boson energy scale, this may not be entirely unplausible. Still, considering in particular the problems associated e.g. with chiral fermions, it is nowadays generally regarded as more promising to investigate scenarios in which the SM's gauge symmetry directly emerges from some large higher-dimensional symmetry. Going from 11-dimensional M-Theory to Superstring Theory first, and then down to four dimensions, there are by now multiple options to directly get a large "Grand Unification" gauge symmetry into which the SM gauge symmetry can be embedded.
The main focus of the current article, however, is not to provide more insights into how experimental particle physics might be related to M-Theory. Rather, we want to allow deeper investigations into the structure of M-Theory by both expanding our knowledge on what possible background solutions to its field equations can look like, and also by providing tools that allow one to come to grips with some of the technical complications that arise in particular when working with high-dimensional scalar manifolds. In the past, we have time and again seen the study of models of quantum gravity produce highly surprising and useful insights even if they were not at all focused on the four-dimensional world we inhabit. Most notably, there was the realization in 1997 [67] that the partition function (/generating functional) of a Conformal Field Theory (CFT) can be the same as the partition function of a supersymmetric theory of gravity with negative cosmological constant (hence in "anti de Sitter (AdS) space") in a different spacetime dimension (i.e. with an extra spatial direction), the so-called AdS/CFT correspondence [68] [69] [67]. One example for a rather surprising further development of this idea is the insight that Quantum Field Theory may provide a lower bound for the ratio of shear viscosity to entropy density of any liquid [70]. Another modern development that could not have been anticipated is the application of this idea of gauge/gravity duality to study superconductivity [71] [72] [73]. To give another example, "holographic duality" has been employed to map solution-generating symmetries of the Einstein equations [74] [75] to solution-generating symmetries for the Navier-Stokes equation [76] [77].
Concerning specifically AdS 4 /CFT 3 duality, the holographic dual of the SO(8) supergravity studied here is (the k = 1 case of) three-dimensional ABJM theory [78], which describes the dynamics of M2-branes. This was used e.g. in [79] to construct new supersymmetric AdS 4 black holes and provide an explanation for their Bekenstein-Hawking entropy, exploiting the relation between mass-deformed ABJM theory with N = 2 supersymmetry and the AdS 4 vacuum with N = 2 SU (3) × U (1) symmetry, which here is called S0779422.

On Machine Learning
Artificial Intelligence (AI) is a broad field concerned with crafting algorithms for solving problems that require some form of human-like intelligence. To avoid any misconceptions, we clarify that the main concern of AI is not finding ways to allow algorithms to perform introspective reasoning on par with or exceeding human ability. Indeed, as famously noted by Alan Turing, the question of whether machines can think is ill-posed [80].
The earliest forms of AI consisted of explicit, manually-crafted rules. Famously, DeepBlue -the first algorithm that defeated a human world champion, Garry Kasparov, at chess, in 1997 -operated using a set of fixed rules designed by its human creators, helping it determine the value of any possible board configuration [81]. Running on a dedicated machine with massively parallel hardware system allowing it to search for 2 to 2.5 million chess board configurations per second, DeepBlue would perform a vast search (simulation of outcomes) before each move and pick the one that would lead to the most advantageous result. However, this achievement was the result of high human expertise applied to a very narrow field with explicit rules. As scientists began to tackle complex problems requiring real-world data, such as recognizing objects or faces within pictures, it became clear that manually developing rules for solving such problems was very challenging or even humanly impossible.
Machine Learning (ML) introduced a new perspective on creating artificially intelligent algorithms. This field was pioneered by Arthur Samuel, who demonstrated that a computer program can learn to play the game of checkers better than the person who programmed it [82]. Instead of operating with pre-adjusted rules and fixed numeric values, the algorithm would instead change parts of itself in order to solve the problem. In other words, given a function of an input space that represents the problem data and an output space that represents the problem solution, the challenge becomes to learn the parameters of this function in such a way that its results (the output, or the solution of the algorithm) is as close as possible to the correct solution. Usually, the learned parameters are of numeric form. The field of ML is thus primarily concerned with the pragmatic problem of finding and efficiently refining functions that usually have a large number of adjustable ("learnable") parameters, with the purpose of solving challenging problems that often involve real-world data. ML methods are suitable whenever facing a problem that is difficult to put into words or fixed rules.
In this sense, ML and physics can be regarded as intellectual antipodes: Physics tries to understand fundamental processes and important mechanisms underlying the functioning of a system, while ML tries solve a particular problem as well as possible, while eschewing the need to fully understand it. In fact, the implicit knowledge obtained by an ML algorithm by solving a problem is often difficult to analyze. Understanding how certain highly-successful ML algorithms manage to solve highly difficult problems and visualizing various parts of the learned function in order to produce an intuitive understanding of the problem and the solution space is an active field of research [83].
Example problems that have, sometimes surprisingly so, turned out to be amenable to ML approaches include text [84] or object [85] [86] recognition in images, mapping pictures to textual descriptions of their content [87], machine translation of natural language [88], scoring possible moves in the game of Go [89] and Starcraft [90], and many more. Increasingly, we also see ML methods being applied to problems that do not strictly follow this pattern, such as synthesis of realistic-looking portraits [91].
Concerning direct applications of ML to theoretical physics, it can and indeed has happened in the past that ML demonstrated an ability to predict a system's behavior well beyond what our current thinking would have considered possible, indicating the existence of extra structure that our current models cannot capture well. For example, [12] demonstrated a clever set-up that allows ML to accurately predict the behavior of a chaotic system over eight Lyapunov times.
One particularly successful family of ML algorithms is that of Artificial Neural Networks (ANNs). ANNs are loosely inspired by biological brains, which are made of billions of interconnected neurons working together to control optimally the behaviour of intelligent organisms. A simple model of a neuron, called a Perceptron, was proposed by Frank Rosenblatt in 1958 [92], but the idea of networks composed of multiple layers of Perceptrons only started becoming popular in the ML community in the 1980s [93]. ANNs consist of artificial "neurons", non-linear circuit elements that are interconnected through directed artificial "synapses" that transmit signals with different efficacies, which act like "weights" in directed graphs. The connectivity architecture of such a network is usually layered, the intuition being that each layer builds up more abstract concepts than the previous. A fully connected feedforward network includes connections between every node of a layer and every node of the next layer, but other variations also exist, such as recurrent [94] or convolutional [95] layers.
Such "layered" ANNs are popular as they are known to be universal approximators [96] and have been found to work well for many problems, but it is by no means true that ML is tied to this particular class of architectures. As long as there is a way to model a problem in terms of a function that differentiably depends on many parameters, and parameter-tuning can substantially improve performance, ML techniques are applicable. Here we use ANNs as an example of a successful ML model in order to introduce ideas that are in general applicable to any such model.
In an ANN, a neuron is represented by a non-linear function acting on a weighted sum of the outputs of neurons feeding into it.
In the simplest case, the architecture of a network that seeks to approximate a function f : R m → R n includes an input layer containing m neurons, an output layer containing n neurons, and any number of hidden layers in between. To solve one instance of a problem using an ANN, the inputs (a dataset of m numeric values, such as the pixel-wise brightness values of an image) are projected onto the m neurons of the first layer of the network, which then activates the whole network layer by layer, as explained above. The values of the neurons in the final layer then provide the output of the network for that particular set of inputs (such as the individual probabilities that the image contains each of n predefined shapes). The trainable parameters of an ANN are the synaptic weights, whose value determines the output of the networks in response to any inputs.
Deep learning, which has recently achieved resounding success in solving difficult real-world problems like the ones mentioned above, refers to ANNs with a large number of stacked layers especially designed to apply specialized operations on the input. It was not trivial to observe that such networks can work at all -until Hinton's seminal 2006 article [97], which sparked the deep learning revolution, common thinking was that networks with more than two layers were essentially impossible to train, and other ML approaches, such as kernelized support vector machines [98], would generally perform better than ANNs. Later progress uncovered a number of general misconceptions and useful tricks on how to train ANNs, for example the superior performance of the "Rectified Linear Unbounded" (ReLU) activation function [99] in comparison to the classical sigmoid non-linear activation function used in earlier research. Moreover, the development of better and specialized hardware offering more computing power and massively parallelized operations have considerably facilitated the training process of deep networks. In particular, the computation of activation and error backpropagation from one network layer to another can be expressed as matrix multiplication, which can then can easily be optimized by specialized hardware allowing massive parallelization. Overall, the combined theoretical developments and availability of specialized computing resources have led to outstanding results of deep learning on many real-world problems containing vast amounts of unstructured data.
What type of problems is ML applicable to? Depending on the amount and type of available data, there are three main paradigms for training an ML model: supervised, unsupervised and reinforcement learning. Supervised learning refers to data where the expected result is known in advance for the data available; for example, given a large set of images of people, the name of the person appearing in each image is also given. This type of learning is often used with classification ("given m labels, pick the correct one") and regression ("predict a value in a continuous domain"), but can often be adapted for other types of problems, for example in assessing the value of each possible next action in a game [89]. Supervised learning with ANNs is currently the most widely-used and successful approach to ML. In contrast, unsupervised learning occurs when no labels exist for the given data; in this case, the aim is to group the data in such a way that items similar to each other belong to the same group, or are close to each other in the output domain [100]. Examples where this approach is useful are fetching web pages, songs or videos similar to the one that an internet user might be viewing or listening to currently. Nonetheless, such problems can also be expressed as a supervised problems [101] Finally, reinforcement learning is applied when no exact labels exist, but there is some knowledge on whether a proposed output is good or bad. This is applicable in particular to automated game playing, where the algorithm acts as agent that chooses to perform a sequence of actions with the aim of winning the game; by playing multiple games, it slowly learns to pick better actions based on whether the past games resulted in wins or losses. Reinforcement learning can also be combined with supervised learning: given a very large number of possible actions, a supervised deep ANN can approximate the value of each possible action [102].
So how does learning in an ML model actually work? A key idea is that learning involves the minimization of a loss (or error) function. This function is designed such that, when applied to any given output, it provides a numerical measure of how far off this output is from an expected answer. In supervised learning, this can be thought of as a distance between the actual output and the target (desired) output of the algorithm. If the value of the loss function is smaller, the error is smaller, and the algorithm is closer to the desired output. Thus, instead of deeming an output as either correct or incorrect, the loss function provides a graded measure of "wrongness". The output of the network can thus be often interpreted as a probability. Crucially, if the loss function is differentiable with respect to the algorithm parameters (for example, in case of a ANN, the network weights), the gradient of the loss function can be used to point towards the direction of the minimum of this function. The gradient of the loss function (which usually is estimated on a random selection of examples, see below) can be used to iterative tune parameters in order to improve the performance of predictions. For many problems, there are natural choices of loss functions. For classification problems with n possible labels (e.g. "which digit does the image show" with n = 10), the predicted probabilities can be regarded as dual to chemical potentials, represented as (linearly) accumulated evidence E j for or against a particular classification label p j , i.e. p j = exp(−E j )/ n i=1 exp(−E i ) [103]. However, any type of loss function can be employed as long as it indicates the correct solution to the given problem, is differentiable with respect to the learnable parameters, and has a reasonable shape -for example, not too many local minima. One of the surprising insights of the Deep Learning revolution was that a simple loss function with discontinuous derivative that reduces to the identity in the activation region and to the null function outside that region, i.e. ReLU (x) := (x + |x|)/2, allows training deeply nested transformations to extract high-level information such as whether there is a face in an image. In some situations, finding good loss functions to represent important aspects of a problem is less straightforward, and may need some experimenting. In this work, for example, we show how the desire to have unbroken vacuum supersymmetry can be represented concisely through a ML loss function.
A key idea that made ANN-based learning possible is that, when given a computer program that computes a R n → R function f , it is possible to automatically transform this into another program that computes the n-component gradient ∇f at any given point with relatively small computational effort that is independent of n. The general theory of reverse-mode automatic differentiation of algorithms [104] can be summarized as performing a gradient computation ∇f (x 0 ) by first running the computation of f (x 0 ), but in such a way that all intermediate quantities, and also all branching decisions, are remembered, and in addition, we reserve one extra floating point accumulator-value wordq j per intermediate quantity q j that gets initialized to zero. At the end of the computation parameters might become stuck. The efficiency of this basic training procedure can often be greatly improved by not using the complete set of examples to compute the gradient of the loss function, but instead by calculating the gradient on smaller randomized subsets ("batches") and applying updates to the parameters more often. This is called "stochastic gradient descent".
During the training procedure, the value of the loss function itself never actually needs to be computed: all that is used is its gradients with respect to the trainable model parameters 20 . Nevertheless, the value of the loss function is typically monitored for feedback during the training processif everything goes well, it should keep decreasing as more updates are applied to the parameters, and eventually converge to a minimum.

Tensors in Machine Learning
To give a rough mental picture of what the training process might look like at the level of numbercrunching, we give an example problem where the goal is to predict whether a person appears or not in a given image. We assume that a labeled dataset on the order of a million images is available for training. An image could be represented as an i-by-j (row and column pixel coordinates) vector X of c color channels (often 3-dimensional, with values for red, green and blue). The label for each image is represented by the number 1 of there is a person in the image, and the number 0 if not. One would typically start by grouping example images into sufficiently large batches to get reasonable estimates for loss function gradients with respect to model parameters, perhaps b = 1024 images per batch. A batch of training images would then be naturally represented as a b-dimensional array of pairs (X b , Y b ). It has become fashionable to call these higher-rank arrays "tensors" in ML terminology, which indeed is a useful notion for expressing the ANN operations in terms of tensor products and index-splitting operations. However, symmetry groups to this date play a rather minor role in ML (with notable exceptions such as [109]), and if they actually do, one often talks about "equivariant neural networks" to discriminate these from networks with less structure. For a problem such as recognizing whether a picture contains a person, which evidently has symmetry, the common approach is to factor out translational symmetry by effectively imposing constraints on network parameters relevant for detecting the target entity, or elements of it, at different locations in the image. This is usually done by a "convolutional layer" that computes convolutions C[b, i, j, k] = ξ,η X[b, i + η, j + ξ, c]S[k, η, ξ, c] of the example images with a collection S[·, ·, ·, ·] of small stencils represented as an array of trainable parameters. The stencil parameters then get adjusted in the training phase such that they are optimally useful for coming up with a good probability prediction for the image to show a person in any location. Each such stencil will consist of lines that describe typical features associated with a person in the image, such as noses, eyes or ears. Intuitively, one could imagine one of the stencils getting tuned by training to have large inner product with "the average shape of all noses", so getting specialized to a nose-detector. Each such stencil would, in turn, be made of lower-level stencils, such as lines with particular orientations, which, in the right combination, form salient shapes. Subsequent processing layers in the network would then collect and combine different such evidence and in the end produce a Bayesian prediction roughly along the lines of: "We are highly confident to have seen a nose in the picture, and we also have moderate confidence to have seen an eye somewhere, so, with high likelihood, the image shows a person". Krizhevsky's seminal paper [85] explains in detail one such convolutional ANN architecture for image processing. Recent work on feature visualization in ANNs has spectacularly uncovered collections of shapes and patterns that hidden layers in a network learn to recognize [83] -an often fascinating visual treat.
The above example illustrates why numerical higher-rank arrays are so prevalent in modern ML. As hinted earlier in this section, one very common primitive "tensor" operation in such a setting is batched matrix-multiplication. For example, linear conversion of a set of example images from RGB color space to some implicit color space that can be trained to be optimally useful for solving the problem codified by the loss function expressed as with trainable parameters in the matrix M .
Nowadays, setting up and training an ML model, such as a multilayer ANN, is a relatively easy task. A number of modern open-source ML frameworks, such as TensorFlow [2], are freely available. These provide many standard building-blocks for ML models that have been validated in practice. For example, one can easily plug in a face recognition module, "Long Short-Term Memory" (LSTM) units for sequence-to-sequence learning, or a generic trainable one-out-of-many classifier into a more complex network. The main work in the design stage is not explicitly programming the model, but rather deciding how to combine existing blocks into a suitable model architecture, such as the number of hidden layers, their connectivity patterns, and the activation functions that links units belonging to different layers. In addition, an eminently useful feature of such frameworks is automatic differentiation, which makes use of the fact that most functions employed in ML can be expressed as compositions of simpler functions, whose derivatives are easy to compute. By creating loss and activation functions out of the simple, differentiable pre-defined functions, one eschews the need to explicitly compute any of the derivatives in order to minimize the loss function, as the software will do it automatically. The framework also provides optimization algorithms for minimizing loss functions. Moreover, it also provides lower-level hardware abstraction, allowing one to run a given computation on multicore CPU architectures, Graphics Processing hardware (GPUs), or, since recently, also specialized "Tensor Processing Units" [110] -hardware architectures specially designed for ML numerics, allowing extremely efficient operations like batched matrix multiplication. While the learning curve might be initially steep for a framework like TensorFlow, mastering it makes it very easy to solve many difficult optimization problems of any kind -including problems that arise from physics.
To conclude, ML has recently been greatly successful at solving difficult, real-world problems involving large amounts of high-dimensional, unstructured data. The explanations presented here somewhat reflect the popularity and dominance of ANNs, but the basic ideas can be applied to any other ML model. While efforts towards improving the design and even the human understanding of ML algorithms are still ongoing, enough progress has been made that such algorithms can be easily designed using existing tools and documentation. In particular, in this work, we demonstrate how TensorFlow, a popular and openly-available ML framework, can be of great use to aid discoveries in physics.

D = 4 SO(8) supergravity and its scalar sector
Let us briefly review some salient features of supergravity in four and eleven dimensions before we look into finding equilibrium solutions to the equations of motion.
Four-dimensional supersymmetry can at most unify all particle states from the helicity +2 down to the helicity -2 graviton. As there are eight helicity-1/2 steps between these helicities, we can have at most eight times the minimal amount of supersymmetry, and as each of these eight supersymmetry transformations comes as a real (Majorana) four-dimensional spinor, we are looking at a theory with 8 · 4 = 32 supersymmetry components 21 . The highest spacetime dimension in which we can have a real 32-component spinor is d = 11 (or perhaps d = 12 if we accepted a second direction of time [111]). A supersymmetric theory of gravity in D = 11 dimensions will have (D − 2)(D − 1)/2 − 1 = 44 transversal graviton polarization states (described by a symmetric traceless 9 × 9 matrix), plus 128 gravitino degrees of freedom 22 . The mismatch in the number of degrees of freedom is compensated by a gauge field with 84 degrees of freedom, describing a higher-dimensional cousin of the photon whose polarization is not given by a 1-dimensional vector, but by a 3-dimensional volume(-form) embedded into 9-dimensional transversal space 23 , A M N P , with associated (4-form) field strength F M N P Q . With the "polarization" being a 3-dimensional object, this (abelian) gauge field can not be sourced by charged particles (the 1-dimensional photon polarization couples to the 1-dimensional worldline of an electron), but by some membrane-like extended object that lives in eleven dimensions. This is now understood to be the M2-brane [112]. It is amazing to see how starting from one of the three possible gauge principles, the vector-spinor, in its very own preferred (maximal) dimension, one automatically obtains a theory that unifies all three of the possible gauge principles, and furthermore turns out to be completely fixed, i.e. not permit any free parameters.
The Lagrangian 24 of 11-dimensional Supergravity reads [113] [114] i.e. flux aligned with the submanifold of four-dimensional spacetime, will look isotropic from the four dimensional perspective. Kaluza-Klein compactification to four spacetime dimensions on a 7-sphere that is the surface of an 8-ball gives the Lagrangian of the de Wit-Nicolai model [65] [3]. For our investigations, we are mostly concerned with the scalar sector of this "SO(8) supergravity". Naturally, polarized fields in 11 dimensions give rise to different types of fields in four dimensions, depending on how 11-dimensional polarization is oriented with respect to the split into a seven-dimensional compact manifold and four-dimensional space-time, just like in original Kaluza-Klein theory, where the five-dimensional metric gives rise to the four-dimensional metric (gravitons), vector potential (photons), and scalar field (Higgs boson). Maximal supersymmetry fixes the particle content completely, and so Cremmer and Julia's construction of ungauged four-dimensional maximal supergravity that compactifies on a 7-torus and drops higher Kaluza-Klein modes must give rise to the same particle content as compactification on the surface of an 8-ball (and retaining only massless modes). The rather nontrivial input here is that both constructions actually do lead to maximally supersymmetric models. In Cremmer and Julia's construction, one gets 35 Higgs fields from the 11-dimensional "A M N P -photons" for which the 3-dimensional polarization is parallel to the direction of the 7-dimensional compactification manifold. Since reversing the handedness of threedimensional space can be expressed as an 11-dimensional rotation that also reverses the handedness of the 7-dimensional compactification manifold, which is experienced by a 3-form field as a sign reversal, these 7·6·5/3! = 35 scalar fields are pseudo-scalars, i.e. odd under a parity transformation. Correspondingly, we get 7 · 8/2 = 28 scalars from those polarization states of the 11-dimensional graviton g M N that are parallel to the embedding manifold. However, we also get seven four-dimensional-2-form potentials A µνP for which only one of the three 11-dimensional A M N P polarization directions is parallel to the compactification manifold. These give rise to four-dimensional 3-form field strengths ∼ F µνρ , which can be dualized to 1-form field strengths G λ ∼ g λσ σµνρ F µνρ , which in turn come from scalar potentials G = ∂A G . So, dualization [116] [117] of these 2-forms produces another seven scalar fields which, like the 28 from the graviton, are parity-even, so (proper) scalars. One finds that these indeed combine into one irreducible representation of eight-dimensional rotations, so, in this compactification, we have 35 scalars (35 + ) as well as 35 pseudoscalars (35 − ) from the (lowest-energy Kaluza-Klein ("Fourier") modes of the) 11-dimensional degrees of freedom. One indeed finds that these 35 + 35 scalar fields can be understood at parametrizing the coset space E 7(7) /(SU (8)/Z 2 ).
Subtly, despite ungauged maximal supergravity and SO(8) supergravity having equivalent particle spectra, and the latter also having a smooth limit in which the gauge coupling constant is taken to zero 25 ., one can not straightforwardly identify the 70 scalars of one construction with the 70 scalars of the other [26]. Rather, when compactifying on a 7-sphere, one has to work out fluctuations around a compactification background geometry, as explained e.g. in [118], [119], [120]. In the latter case, there is a straightforward way for the rotational symmetry of the 8-ball to act on these fluctuations, so all four-dimensional fields should form SO(8) irreducible representations. In the former case, the seven two-forms which one gets from A M N P clearly do not form an irreducible representation of SO(8), so the symmetry enlargement is an emergent phenomenon.
In general, determining the low-energy field content of Kaluza-Klein type compactifications of M-theory will require carefully analyzing the spectra of generalized Laplace operators which act not on scalar but tensor-valued fields (see e.g. [114], esp. chapters 4,5,9), whose eigenfunctions can be thought of as generalized spherical harmonics that live on the compactification manifold rather than the surface of a 3-dimensional ball (as the spherical harmonics do). On other compactification manifolds, the low energy particle content of the theory may be rather different, it may even contain more Higgslike fields than this SO(8) supergravity, as in the construction discussed in [121], which in total has 67 + 67 = 134.
The algebra so(8) of eight-dimensional rotations is very special in that it allows a S 3 group of outer algebra automorphisms which permute the roles of the three different real eight-dimensional irreducible representations, the vectors, spinors, and co-spinors. Due to this phenomenon of "triality", we have a choice in how to attach the "vector", "spinor" and "co-spinor" label to the different eight-dimensional representations. While it is physically reasonable and common in the literature to associate the 35 + with the symmetric traceless matrices over the so(8) vectors 35 v (given that they contain the graviton polarization states), we deviate from this convention in the present work and instead associate the scalars with the symmetric traceless matrices over the spinors 35 + ≡ 35 s , while associating (now in alignment with the literature) the pseudoscalars with the symmetric traceless matrices over the cospinors, 35 − ≡ 35 c . The advantage of this approach is that it aligns the defining 8-representation of the maximal compact subalgebra su(8) of the e 7(7) algebra with the so(8) vector representation, as well as the 35 s,c with the self-dual/anti-self-dual four-forms of su (8), i.e. we can use the geometric so(8)invariants γ ijkl αβ , γ ijkl αβ to translate between (anti-)self-dual and symmetric-traceless-matrix language. We heavily rely on this property to give simple expressions for the locations of all critical points.
While it would be tempting to give the full general N = 8 Lagrangian in the general unifying form presented in [122] that uses the gauge group embedding tensor framework to also include some alternative constructions in which the gauge group is a non-compact group 26 such as a different real form of SO(8), i.e. SO(p, 8 − p), or a contraction thereof [124] [123], or a "dyonic" variant [125], it is more straightforward for this work to instead refer to the "classical" de Wit-Nicolai Lagrangian in order to explain the physical role of some key objects for which this work provides extensive data.
The Lagrangian of SO(8) supergravity reads [65]: In the above Lagrangian, O ± µν KL is a bilinear function of the fermionic fields ψ and χ, S IJ,KL is a function of the Higgs-fields, u ij IJ and v IJKL are pieces of the E 7 "vielbein" in the 56 × 56 representation that describes a point on the Higgs scalar manifold, while the A ijk µ are Higgs-scalar kinetic velocities. For details cf. [65], [126], [122].
This Lagrangian is a consistent truncation of 11-dimensional supergravity [120], i.e. the Kaluza-Klein modes retained here do not source higher modes, and so any solution of the four-dimensional field equations can be uplifted to an exact (non-linear) solution of the equations of motion of 11-dimensional supergravity. This is a "miraculous" property of the S 7 compactification for which the F ∧ F ∧ Aterm in the Lagrangian plays an essential role. The gauge coupling constant g here is proportional to the inverse radius of the compactification manifold S 7 which, in Kaluza-Klein Supergravity, is not determined.

The scalar potential
In this work, we are mainly concerned with the ∝ g and ∝ g 2 terms in the Lagrangian. At order g 1 , we see Yukawa couplings that provide the (naive) gravitino and spin-1/2 fermion mass terms via their coupling to the Higgs-like scalars, ∼ gA 1ψ σψ, ∼ gA 3χ σχ, and ∼ A 2ψ γχ. Here, the "spin 1/2 fermion mass matrix" A ijk mn 3 , is given in terms of the gravitino-fermion Yukawa matrix A 2 as At order g 2 , we have the scalar potential Since we are restricting ourselves in this work to the single case of the compact gauge group SO(8) of the original de Wit-Nicolai model [3], we can ignore a number of subtle aspects of electric/magnetic duality in four-dimensional supergravity that become relevant when trying to generalize our investigations to other gaugings in four dimensions, for details see [126], [127], [125]. The problem at hand then consists of finding critical points of the scalar potential V (φ 0 , . . . , φ 69 ), parametrized by 70 scalar coefficients of non-compact generators of the e 7(7) algebra. In detail, the computation of the potential looks as follows, using the notational conventions of [128], apart from index-counting always starting at 0 in this work, in order to make the correspondence between tensor arithmetic and numerical code published alongside it even more straightforward.
Here, we are using the auxiliary function Z to translate integer indices for the adjoint representation of so (8) to ordered pairs of indices in the defining representation, with index-counting starting at zero, Z(i · 8 + j − (i + 1)(i + 2)/2) = (i, j), i.e. Z(0) = (0, 1), Z(1) = (0, 2), . . . , Z(27) = (6, 7). (2.6) The "input data" are the 70 φ n coefficients of non-compact e 7(7) generators g (n) . Even as in this work, we only use the non-compact and so(8) generators of e 7(7) , we give a complete construction of the 133 56 × 56 generator matrices in appendix A, mostly to ensure that all subsequent investigations into alternative gaugings can all use the same definitions.

Equilibria of the equations of motion
When looking for viable 11-dimensional field configurations of supergravity that correspond to vacua of a four-dimensional theory, one is asking for solutions to the dynamical equations of motion in which, from the four dimensional perspective, all directional quantities are zero (since a "vacuum" should not have a preferred spatial direction) -so, we can set all four-dimensional gauge boson field strengths to zero, i.e. we are here not interested in "electrovacuum" [129] type solutions. Also, in this analysis, we set all fermionic (spin-1/2 matter and spin-3/2 gravitino) fields to zero. We do not consider fermion condensates here. This leaves us with the need to pick a ground state on the 70-dimensional manifold parameterized by the Higgs-Boson-like scalars of the theory. Conceptually, one would want to look for minima of the scalar potential, but the actual story is slightly more involved here [130].

Vacuum stability
While the equations of motion for the scalar fields (and fields coupling to them) require the gradient of the potential to vanish in a vacuum configuration, it so turns out that viable vacuum states correspond not just to minima, but also some saddle points (and even a maximum at the origin!) in the potential. This is due to the value of the scalar potential playing the role of a cosmological constant in these models. So, for a negative cosmological constant, our vacuum will have the geometry of a space of constant negative curvature -an Anti-de Sitter (AdS) space. When studying stability with respect to small localized scalar field perturbations of finite total energy, one has to take into account that the spatial variation of such a perturbation can not be made arbitrarily small in an AdS background geometry. So, if a localized perturbation of a spatially constant background scalar field at a saddle point (or maximum) can decrease potential energy, the spatial gradient will lead to an increase in kinetic energy that cannot be made arbitrarily small. One finds that, overall, one can have (perturbative) stability even at a non-minimum critical point (i.e. ∇V (φ 0 ) = 0) as long as there is no direction δφ for which the 2nd derivative of the scalar potential (in a parametrization that gives a "conventionally normalized" kinetic term L kin = 1 2 (δφ) 2 ) is smaller than a threshold known as the Breitenlohner-Freedman (BF) bound [131]: [120]. Loosely speaking, "masslessness" does not correspond to zero eigenvalues of the mass matrix in the curved AdS background. For a representation theoretic perspective and explanation, cf. [127].
In fact, it so turns out that for standard SO(8) supergravity (and many other Kaluza-Klein models), the potential does not seem to have any minima at all, but there are saddle points that give rise to AdS backgrounds in which this bound is satisfied. In particular, any background geometry with some residual supersymmetry will be stable and not violate this bound. To date, there is only a single known critical point of the scalar potential of SO (8) supergravity that corresponds to a stable non-supersymmetric AdS background [132] [133] [130]. While even this detailed investigation, which presents many more critical points, did not manage to reveal any other stable non-supersymmetric solutions, and there are good reasons to believe that they are indeed rare [134], there are indications that the method used here to search for solutions tends to (unfortunately) somewhat avoid parameter space regions that do correspond to stable critical points. This is, after all, how the new N = 1 SO(3) vacuum escaped discovery in earlier investigations. So, the authors consider it possible (but unlikely) that there still are other such solutions that hide very well.

Finding Solutions
Historically, the most powerful approach to find critical points of supergravity potentials before a more effective strategy was presented in [135] was to introduce "Euler angle style" coordinate parameterizations of interesting submanifolds of the scalar manifold that have been selected according to group-theoretical considerations in such a way that critical points on the submanifold also will be critical points on the full manifold. While a full coordinate parameterization of E 7(7) /(SU (8)/Z 2 ) is easily seen to be well outside computational reach, it is indeed feasible to consider the subgroup SU (3) of SO (8) in an SU (3) ⊂ SU (4) ⊂ U (4) ⊂ SO(8) embedding and parameterize the six-dimensional manifold of SU (3)-invariant scalars. When Taylor expanding the full 70-dimensional potential around a point that is a critical point on such a subgroup-invariant submanifold, the linear term has to vanish, as the gradient then also decomposes into irreducible representations of the selected subgroup, but cannot carry any contributions that are not invariant under the chosen subgroup (since each term in the Taylor expansion is). This strategy was used in [136] to find all 27 critical points with residual symmetry at least SU (3), and led to the general belief that going substantially beyond this analysis by picking a smaller subgroup of SO(8) would be possible in principle, but technically very much infeasible, with perhaps only a few possible exceptions. This is due to the combinatorial explosion in algebraic complexity of explicit forms of coordinate-parametrized potentials as the number of coordinates increases. Now that we know many critical points that have very little or even no continuous unbroken gauge symmetry at all, hindsight tells us that insisting on a fully analytic approach to solve a "discovery"type problem limited our view. While an analytic approach easily becomes extremely complicated, all that complexity is eliminated by instead working with numerically evaluated quantities, and focusing on the use of backpropagation rather than analytic expressions in order to obtain gradients. Once one has good numerical data, one can start looking for corresponding exact expressions.
Critical points of the scalar potential correspond to (true or false) vacuum solutions, i.e. field configurations for which all directed quantities vanish, and the scalar fields do not experience any acceleration. While false vacua are unstable with respect to some small localized fluctuations that violate the BF bound, and the vast majority of critical points of SO(8) supergravity are indeed observed to be of this type, they are nevertheless interesting to study. In the past, we have learned much from such solutions. For example, the study of the SO(7) critical point S0698771 in [137] revealed the need to generalize the Freund-Rubin ansatz to include a warp factor, while some of the new solutions from [128] have been useful to identify and resolve subtleties in the uplifting from four to eleven dimensions in [120]. For some of the new solutions presented here, a deeper investigation into the nature of accidental (i.e. unrelated to any obvious symmetry) degeneracies in the mass spectra would seem appropriate.
So, while using the AdS/CFT correspondence to study e.g. condensed matter phenomena is doubtful if the AdS side, when embedded into M-Theory, has unstable modes (which may even be invisible in the truncation, as is the case for the SU (4) solution), one would nevertheless want to at least come to a deeper understanding of the 11-dimensional origin(s) of instability(-ies), perhaps even looking for ways of stabilization, cf. e.g. [138].
The scalars transform as a (reducible, nontrivial) representation of the gauge group SO(8), and a critical point with nonzero vacuum expectation values for the scalars will hence break the gauge symmetry to some subgroup of SO(8) via the Higgs effect. As the scalar potential has an overall SO(8) rotational symmetry, a shift in the scalar fields obtained by applying a small SO(8) rotation that actually moves the critical point on the scalar manifold, i.e. some generator of the SO(8) symmetry that is broken by the particular choice of the solution on its SO(8) orbit, corresponds to a flat direction in the potential. In the particle spectrum, these shifts would hence correspond to massless scalar ("Goldstone") particles, which however for a broken local (gauge) symmetry get absorbed ("eaten") by the gauge field to form the extra ("longitudinal") spin-1 polarization state that a massive vector boson has over a massless helicity-1 vector boson. Likewise, massless fermions get absorbed by the gravitinos to produce missing gravitino polarization states through the super-Higgs effect.

TensorFlow to the rescue
While we cannot use the supergravity potential directly as a ML loss function (since we are looking for saddle points, and not minima), it is possible to derive an expression that conceptually can serve as the length-squared of the gradient, S := |∇V (φ)| 2 , which can be used as a loss function and is reasonably easy to compute, cf. (2.21) in [139]: (2.8) The Q ijkl + is the (self-dual) change of the value of the potential under an infinitesimal variation of the vielbein when multiplying with an infinitesimal E 7 element from the left, i.e. we are not considering δV = V (V(φ + δφ)) − V (V(φ)), which would be the gradient of the potential with respect to the Higgs fields φ, but use the E 7 structure of the potential and rather consider i.e. the change of the potential with respect to a small E 7 rotation applied to the vielbein matrix V from the left. As the 70 parameters of δV transform as self-dual 4-forms under the SU (8) subgroup of E 7(7) , the self-dual part of the tensor Q ijkl that multiplies this variation to give the change to the potential has to vanish at a critical point. (This is also the variation one has to perform to get second derivatives at a critical point that correspond to actual particle masses, i.e. where the normalization of the kinetic term is the conventional one.) Since we want to compute the tensors A 1 , A 2 anyway as part of the search procedure, e.g. to add a supersymmetry-encouraging term to the loss function as discussed later, this is straightforward to implement. A slightly less efficient strategy would be to ask TensorFlow for the length-squared of the gradient, which would (in "classic" TensorFlow graphmode) perform a backpropagating transformation on the computational graph, costing roughly twice the memory, and six times the computation time.
From the ML perspective, minimizing the "stationarity violation" S of the potential then is a problem of just tuning 70 "learnable" parameters so that the stationarity condition is satisfied. While it is indeed possible to use the rotational SO(8) symmetry of the potential to further reduce this 70dimensional optimization problem to a 70 − 28 = 42-dimensional one, performing the search in the full 70 dimensions instead seems to make sense, as it is not very clear what a "good" random distribution to sample starting points from would be. Furthermore, even if one chooses to use SO(8) symmetry to (say) diagonalize the 35 pseudo-scalars, it may well happen that a critical point discovered in this way is more easily understood in a presentation that diagonalizes the 35 scalars. So, one should anyway always be able to diagonalize any solution for any of these two representations.
Performing numerical optimization in some 70-dimensional space looks like an unusually easy ML problem. Yet, there are some peculiarities: • We are not interested in one minimum of the loss function, but (ultimately) want to know all inequivalent ones.
• The idea of "stochastic gradient descent" does not make sense in this setting: There is a welldefined gradient, but there are no "examples to perform well on", and therefore also no humanprovided labels to tune towards.
• The loss function takes a highly uncommon form. In particular, its computation involves exponentiating a complex matrix (in a differentiable way).
• We are actually interested in high numerical accuracy in our numerically tuned "training parameters".
Our problem, then, is to: • numerically find solutions to the S = 0 stationarity condition (2.8), • canonicalize them to a form with few parameters, and obtain highly accurate numerical data, and • extract information about physical properties (such as particle charges and masses) as well as (if possible) analytic expressions for the location of the solution.
Ideally, one would like the last step to at the very least produce sufficiently accurate numerical data to leave little doubt about the actual existence of a critical point -even if its location and properties are only approximately known. In the authors' view, seeing that the stationarity condition is satisfied numerically to better than 10 −100 (as was achievable for most of the new solutions) is rather convincing.
Of the above steps, the first "discovery" step, when attempted without an efficient computational framework that can do automated backpropagation, would ask for manually re-writing Ricci-calculus code as in this example: While transforming tensor code like this is certainly doable by hand (as has been demonstrated with [128] and especially [135], which was published including hand-backpropagated code), it requires both effort and practice, and it certainly would be useful if this mechanical transformation were automated -especially when computations involve steps such as matrix exponentiation. Also, debugging hand-written gradient backpropagation code is often tedious, but at least straightforward, since one can always check the claimed sensitivitites in the backward pass by ad-hoc injecting an change into the associated quantity in the forward pass and observing the actual sensitivity.
Here, TensorFlow can help in these ways: • We only need to write code for the computation of the loss function. All code that then computes the gradient efficiently is generated automatically.
• It becomes almost trivial to do exploration that requires computing gradients for scalar(!) quantities that are themselves defined in terms of gradients.
• Tensor arithmetic be executed on hardware that has been optimized to perform well on such tasks, such as in particular GPUs.
While TensorFlow also allows executing code on specialized Machine Learning hardware, such as Google's Tensor Processing Units (TPUs) [110], this is at present not an interesting option for this research here, since ML applications generally can work with much lower numerical accuracy than what is needed in this work, and so there is not a strong economic incentive towards high numerical precision for TPUs. Similarly, while quantum field theoretic problems often involve somewhat sparse tensors (in particular due to sparsity of Gamma matrices), the general trend in ML seems to be away from designs that rely on sparsely populated tensors, and so trying to exploit sparseness to improve computational efficiency when solving field theory problems like the ones studied here with TensorFlow may often not be worthwhile.
The second point above is interesting. As is known from the general theory of reverse-mode automatic differentiation of algorithms [104], it is always possible to compute the gradient of a scalar function that is described by an algorithm in a way that needs no more than some small constant k times the effort for evaluating the original function, independent of the number of components of the gradient! In practice, k somewhat depends on e.g. cache performance, and one typically finds k ∼ 5, but never k ≥ 10.

Simplifying basic analysis
For this work, masses of the scalars had to be determined in order to check whether any modes violate the BF bound (2.7). Still, no code had to be written to implement the mass matrix formula, eq. (2.25) from [139], (2.10) Rather, scalar masses were computed directly by just left-multiplying the vielbein matrix with an exponentiated e 7 generator Taylor-expanded to 2nd order only, and then using TensorFlow's tf.hessians() function to obtain the mass matrix. This performs 70 gradient computations each no more than six times as expensive as one evaluation of the potential starting from the unperturbed vielbein, rather than ∼ 70 2 evaluations of the potential. In this sense, this work provides an independent confirmation for the correctness of (2.10), given that masses match values from the literature for critical points known earlier.
As the potential is exactly known, our gradients are not noisy estimates (as they usually are in ML), and it makes sense to employ an optimization method that can utilize this, i.e. conjugategradient optimization or BFGS optimization [141], which both try to use subsequent gradient evaluations to estimate the 2nd-order structure of the objective function. One convenient way to use TensorFlow as a "gradient machine" for various such higher order optimization methods is provided by the tf.contrib.opt.ScipyOptimizerInterface() helper function. One must be aware, however, that for degenerate minima of the objective function, these optimization methods are not expected to always perform well very close to the minimum, and given the rather special structure of the problem at hand, we may well encounter such degenerate minima.
Starting at randomly chosen locations on the 70-dimensional scalar manifold over and over again produces different critical points. For this work, the authors solved about 390 000 numerical minimization problems, each producing a critical point, that afterwards were de-duplicated. Two solutions were considered equivalent if both the cosmological constant as well as the eigenvalue spectrum of the A 1IJ A 1 JK tensor were compatible to within the estimated numerical accuracy of a solution candidate. There are some cases of critical points with very similar cosmological constant, but no degeneracies arise at the finesse provided by the Snnnnnnn naming scheme that is used in this work for solutions.
Given location information for a solution-candidate that is good to more than about five decimal digits, the discovery problem can be considered solved, and one then has to deal with the subsequent problem of finding a highly accurate -ideally, analytic -form. In some cases, one finds that the geometry of a critical point is rather special, making it hard for a higher order optimizer to produce an accurate location. In such situations, it typically helps to run basic gradient descent (still with hardware floating point accuracy) as a post-processing step, which also can be done very efficiently with TensorFlow.
Using this approach, different critical points of the scalar potential get re-discovered many times over. One finds that the relative sizes of "basins of attraction" for different solutions are very different. While details do of course somewhat depend on the probability distribution used to generate starting points, one observes (for example) that the likelihood to end up at critical point S1400056 is about 100× higher than the likelihood to end up at the S1400000 vacuum. Indeed, some of the solutions presented here were seen only once 28 . This makes it rather likely that just increasing the effort by another factor 10 would produce further solutions. Figuratively speaking, we suffer from some vacua strongly vacuuming in (pardon the pun) a large region of search space.

Loss function design
Given this situation, one naturally would like to have alternative approaches to investigate the structure of the scalar manifold. One idea -inspired by Morse Theory 29 [142] -is to look not for the minima of the scalar function that measures stationarity violation, but its saddle points, and then determine how following the gradient when starting from small perturbations along special unstable directions (such as the principal axes of the Hessian) carries one into different critical points of the potential. As it is plausible that a critical point with a small basin of attraction when minimizing stationarity violation may actually be reachable by walking down from a saddle that has a large basin of attraction in the search for such saddles, this change of perspective may offer a way to improve the efficiency of the search for overlooked critical points.
It turns out that implementing this idea in the most naive way is very easy with TensorFlow, requiring only very little coding while (due to backpropagation) still offering very good numerical performance. In order to give an impression of how little effort this is indeed, we show example code in appendix C. One notes that the corresponding calculations involve third derivatives of the potential (as the stationarity condition is a function of the gradient, and in order to determine its saddle points, we look at its gradient-squared, as well as the 2nd derivative (Hessian) of the stationarity condition). Still, as long as these derivatives get combined into intermediate scalar quantities (such as: lengthsquared of a gradient), the basic insight of reverse mode automatic differentiation holds, i.e. an extra derivative only multiplies the computational effort by a factor of about six (but retaining high numerical accuracy).
While naively following the gradient disrespects the underlying symmetry of our 70-dimensional space 30 , this may actually help rather than harm the search, with an eye on the intended purpose, by breaking up degeneracies in principal axes. With this "naive" saddle point approach, one observes that minimization is much more likely to run into a saddle than a minimum of the stationarity condition. Inspection makes it plausible that knowing the height of the saddle as well as the value of the potential to three digits after the point suffice to (mostly) deduplicate saddles, and with this, one can produce a "subway map" of how one can cross from one critical point to another via some saddle. Irrespective of whether one uses the "physically correct" geometry on the scalar manifold or not, a (mostly) complete map is too complex to be fully visualized. Figure 1 provides a glimpse on what a tiny part of the graph looks like.
Generating 600 (non-unique) near-origin saddle points and then analyzing their 12 000 unstable principal axes did indeed confirm that some critical points which are hard to find by minimizing the stationarity condition are easier to obtain by this saddle point method. In particular, the odds for hitting the non-supersymmetric stable point raise from about 1 : 20 000 to about 1 : 600. This (limited) analysis did, however, not produce any new critical points in the near-origin region where the search was performed. In the authors' opinion, observing that a somewhat independent method only reproduces the solutions found with a straightforward random search, but fails to discover new ones, suggests that the list presented here likely is the near-complete answer to the question what the critical points of SO (8) supergravity are, at least in the near-origin region. That is, the authors expect the long list to likely still miss a few cases, perhaps even rather interesting ones 31 , but not to list only a small selection of critical points that happen to be strongly attractive in a random search.
Likewise, TensorFlow makes it very simple to tweak loss functions in order to search for points on the scalar manifold with specific desired properties. Clearly, one would like to know whether the current work now gives a complete list of the supersymmetric vacua of SO(8) supergravity. While the methods employed here are insufficient to stringently prove this, it is very easy to tune the search to strongly favor supersymmetric critical points. A straightforward way to do this is to replace the length- 30 The gradient is an element of the cotangent space, and, with our parametrization of the e 7 algebra, already at the origin, taking a step in the corresponding coordinate-direction is conceptually wrong, as it needs to be mapped back to an element of tangent space with the inverse scalar product of the non-orthogonal basis used here. 31 Soon after the first release of a preprint of this article, follow-up work [143] as a by-product indeed gave early evidence for the existence of two further unstable critical points not listed here, S2096313 with SO(3) × U (1) symmetry, and S2443607 with SO(3) symmetry. These solutions will be discussed in the upcoming article. A tiny part of the "watershed map" of critical points of the stationarity condition. Minima of the stationarity condition (at value zero) are (true and false) vacua of the potential, and labeled using the naming scheme introduced in [144]. Saddles are also labeled by cosmological constant, e.g. the saddle at −V /g 2 ≈ −7.605 corresponds to W 07605. Edge labels indicate how many gradient-parallel (in naive geometry) paths along a principal axis run into a particular critical point.
squared-of-the-gradient loss function L 0 = |∇V | 2 with a loss function that includes another term which is zero for supersymmetric solutions only. The obvious idea here is that, for a supersymmetric solution, there needs to be a massless gravitino, i.e. some vector η K such that Due to the SO(8) symmetry of the potential, we do not have to compute an eigenvector in a differentiable way here, but can simply fix η K = δ K0 without loss of generality. Using not L 0 but L 0 + λL s with a BFGS optimizer is indeed observed to be extremely effective for finding supersymmetric solutions. Taking λ ∼ 10, and starting from a randomly picked 70-dimensional vector (with coordinates drawn from a normal distribution), with uniform distribution of a length multiplier, one observes that numerical optimization occasionally does get caught in a new local minimum with L 0 > 0 (i.e. not a critical point), but otherwise manages to find each of the known supersymmetric vacua multiple times with in less than an hour of computing time on moderately recent hardware. This approach also unearths one additional supersymmetric vacuum (which also is found many times over) that has N = 1 supersymmetry and breaks SO (8) to SO(3). This is the solution named S1384096, see section 3 and the appendices for properties. Running this search for a day on a single computer produced 7150 supersymmetric solutions, with each of the now five solutions (SO (8), (3)) being discovered many times over, the lowest count being 318 for S0600000.
Is it also possible to directly encode BF-stability as a ML loss function, and hence directly search for stable vacua in a similar way? In principle, this can be implemented e.g. by adding to the stationarity-violation loss L 0 another non-negative contribution L BF that can only be zero if the scalar mass matrix S with all eigenvalues shifted up by the BF bound is positive semidefinite, i.e.
where one introduces a lower triangular matrix Λ of 70 · 71/2 = 2485 trainable parameters that will, when minimizing the loss, try to (Cholesky-)factorize the shifted mass matrix. Unfortunately, the associated cost that comes with this large increase in the number of training parameters means that loss minimization becomes (in comparison) painfully slow. One notes that the mass matrix in general unfortunately is not 35 + , 35 − -block-diagonal. We anticipate that this technique might become useful for problems with smaller scalar sectors, such as perhaps maximal gauged D = 5 supergravity, but not for SO(8) supergravity in D = 4. Still, there are other minor (fixable) annoyances with the basic form of this loss contribution, such as bad behavior in the V > 0 region, and the new term driving the search too fast towards the origin.
There are many more ways in which being able to effortlessly engineer loss functions might help. For example, it might be feasible to multiply the stationarity-violation with an extra factor that increases as search approaches known strong attractors, effectively reducing the size of their basin of attraction. One obvious way in which this could be realized would be to add factors of the form where the sum over p runs over (the first few) powers of the gravitino mass matrix that we use to "fingerprint" solutions, the c n,p is the corresponding known fingerprint-value for the n-th known tooattractive solution that should be punished in the search, and f is some function with f (x) ≈ 1 away from 0 and f (x) 1 near 0. Some experimenting will be needed to find an approach that does not create many new nonzero minima of the loss function.

Canonicalization
For any critical point obtained by a numerical search, the SO(8) symmetry of the scalar potential allows us to freely pick an arbitrary point on its SO(8) orbit as an equivalent presentation. Naturally, one would want to use a form that allows describing the solution with a minimal number of parameters. This is not only desirable for typographic compactness, but also establishes the connection with simple exact analytic descriptions of these critical points. Setting an additional coordinate on E 7(7) /(SU (8)/Z 2 to zero corresponds to imposing an extra algebraic constraint on the solution, and using sufficiently many such constraints to eliminate all freedom to rotate a solution produces a 56-bein matrix with only algebraic entries, since the defining properties of the 56-bein, i.e. belonging to E 7(7) /(SU (8)/Z 2 ), can also be expressed through algebraic constraints. Specificaly, the 56-bein respects the symplectic invariant of Sp(56) as well as Cartan's quartic invariant of E 7(7) (e.g. (B.4b) in [50]), 56 → (28, 28) : (x ij , y kl ) ijklmnpq y ij y kl y mn y pq + ijklmnpq x ij x kl x mn x pq . (2.14) At the Lie algebra level (i.e. prior to exponentiation), this then means that there are exact analytic expressions for the coordinate-parameters describing a given solution, typically of the form {algebraic number} × log{algebraic number}. Given that the actual 56-bein entries may well be determined by rather complicated intersections of many algebraic varieties, actually finding algebraic forms may well be computationally out of reach in some cases (i.e. one may well imagine to encounter zeros of irreducible polynomials of degrees well beyond 1000). Still, for some of the new solutions described here, the authors were able (with reasonable computational effort) to determine analytic expressions from ultrahigh-precision numerics alone. Each such expression is correct with overwhelming likelihood.
This procedure starts with first obtaining ultrahigh-precision (hundreds to thousands of correct digits) numerical data for quantities that are known to be algebraic (i.e. vielbein entries and derived quantities, such as the cosmological constant). Unfortunately for us, as extremely high numerical accuracy is generally not very relevant for ML, TensorFlow does not support tensor arithmetics with higher numerical precision than what common hardware can provide, i.e. IEEE-754 double precision floating point. In that sense, from the perspective of M-Theory research, TensorFlow perhaps is best thought of as a "discovery machine" and not a "precision machine", carrying over terminology from accelerator physics (e.g. [145], [146]).
In principle, it would be doable to run already the "discovery" computation with adjustable accuracy, computing e.g. the algebraic entries of V to hundreds of decimal digits. This technique has partly been employed in [128], bases on highly performant compiled Common Lisp code in conjunction with an adapter library that allows a common generic limited-precision numerical optimizer to work in an ultrahigh precision setting, but all this generally involves carefully performing the code transformations needed for sensitivity backpropagation by hand 32 . This technique that produces algebraic expressions in a fully automatized way should hence not (yet) be considered as being easy to apply and widely accessible. It hence makes sense to aim for a clearer separation of the "discovery" and "precision" steps.

Parameter-reducing heuristics
An approximate location of a solution obtained by the "discovery" step, once suitably rotated to be coordinate-aligned to the largest possible extent (using the procedure described further on), gives us an idea about what coordinates on the scalar manifold can be set to zero, and what others can likely be set to identical values (or simple rational multiples of one another). One finds that most solutions are very non-generic and allow very many such simplifying linear identities. In terms of automated processing of many solution-candidates, this then requires code that tries some basic heuristics (and automatically abandons them when they turn out to not actually hold at high precision). The basic process is to go through all coordinates, check if an observed coordinate is close to another one seen earlier (or a simple rational multiple thereof, or zero) within some tolerance limit τ such as 10 −3 , 10 −4 , 10 −5 , etc. If so, the observed coincidence is assumed to hold, and codified in a linear model-parameters-tosolution-coordinates matrix map. If the attempt to improve accuracy based on such a model map runs into a dead end, the process is restarted with a less permissive tolerance limit τ . This "automated heuristic modeling" step typically reduces a 70-dimensional optimization problem to a much more manageable problem in 2-20 or so parameters, for which obtaining high-accuracy data is very often feasible even without having fast gradient computation available. The most important techniques here are using a multidimensional Newton solver (as provided by the "mpmath" [147] package), and highprecision Nelder-Mead optimization, which is feasible for up to about 14 parameters. If both these techniques fail, it is sometimes useful to use basic fixed-scaling (TensorFlow-based) gradient descent with hardware numerics to turn a solution that is good to eight digits into one that is good to at least twelve. One finds that basic gradient descent with some simple heuristic to make learning rates adaptive indeed seems to work better for this problem than any of the more advanced minimization methods that are currently popular in ML applications, e.g. Adam, RMSProp, AdaGrad, FTRL. In some situations, the parameter reducing heuristic produced a problematic canonical form, and one has to start over with canonicalization after applying a random SO(8) rotation to the solution.
This "distillation" step produces a high degree of evidence for the existence of a particular critical point (the length of the potential's gradient having been shown numerically to reach values typically below 10 −20 ), as well as a first highly accurate location described by only a few numbers, and also information on whether the solution is sufficiently well-behaved (i.e. non-degenerate) for the multidimensional Newton method to allow quick determination of coordinates and physical properties such as the cosmological constant to an accuracy of hundreds of digits.

Coordinate-aligning rotations
As a point on the scalar manifold can be described by giving two symmetric traceless matrices, one carrying the 35 s and one carrying the 35 c representation of so (8), and we can always use a SO (8) rotation to diagonalize one of these, the effective dimension of the scalar manifold relevant for finding critical points is reduced to 70 − 28 = 42. Still, even if one exploited this symmetry from the start and only looked for critical points for which one of 35 s , 35 c is diagonal, this does not eliminate the need to numerically canonicalize a solution, as any degeneracy in the entries of the diagonal matrix would leave some residual rotational symmetry that can be used to reduce the number of non-zero entries in the other diagonal matrix. For example, if the parameters in 35 s can be brought into the form diag(3A, 3A, −A, −A, −A, −A, −A, −A), this still leaves a residual symmetry of SO(2) × SO(6) which must be fixed by imposing algebraic constraints on the 35 c in order to make the entries of the 56-bein matrix algebraic.
As it hence is difficult, in a numerics-based search, to avoid the need for a "canonicalization" step that eliminates residual rotational freedom, we may just as well fish for solutions in full 70-dimensional parameter space. While our particular choice of e 7 generators leaves us with a non-diagonal scalar product on the 70-dimensional manifold of scalars 33 , we nevertheless start the search with a 70-vector picked at random from a distribution that is isotropic with respect to the coordinate-basis, not the restricted e 7 Killing form. This choice is apparently "good enough" to find many new solutions.
For elements of a 35-dimensional irreducible representation of SO(8), it is easy to numerically find a rotation G d that diagonalizes the corresponding symmetric traceless matrix. The orthonormal eigenbasis serves this purpose if we multiply the last eigenvector with ±1 in order to ensure a positive determinant. Knowing how to diagonalize, say, 35 s , how do we then find the corresponding action on the 35 c (and vice-versa)? Logarithmizing the group element to obtain the algebra element is out of the question, but we can employ a higher-dimensional generalization of Davenport chained rotations [149] [150] to write G d as a product of a sequence of up to 28 rotations in coordinate-aligned planes, G d = R 67 (α 67 ) · · · R 57 (α 57 ) · · · R 56 (α 56 ) · · · · R 02 (α 02 ) · R 01 (α 01 ), (2.15) each of which, when moved to the left size, cancels another off-diagonal entry of G d without destroying earlier such reductions 34 . Using this presentation, we can then proceed to logarithmize each factor R γδ (α γδ ) = exp (r γδ · α γδ ) (2. 16) and find the corresponding Lie algebra action on the 8 c representation by employing the γ αβ γδ invariant. Lifting this to an action on 35 c ⊂ (8 c × 8 c ) and exponentiating, we can readily determine the action of G d on 35 c . An alternative approach would be to run numerical minimization starting from Lie algebra elements that get exponentiated to obtain group actions on the 35 s and 35 c with an objective function that punishes off-diagonal elements for the matrix carrying the 35 s representation.
Once the 35 s has been diagonalized by any such method that also gives us the effect of the rotation which was employed on the 35 c , we can proceed to determine the residual subalgebra of so(8) that keeps the diagonalized 35 s unchanged. We can employ this subalgebra to reduce the number of offdiagonal entries for the matrix carrying the 35 c representation, but in general not completely. Also, this residual symmetry group will typically be rather small -such as SO(3) × SO (2). It hence makes sense to consider this step as a somewhat low-dimensional numerical optimization problem. In general, problems that maximize the number of zero entries in a matrix often are hard, but here, a common sparseness-encouraging ML technique works reasonably well: we use the L 1 -norm of the off-diagonal matrix entries as a loss function.
After this sparseness-encouraging rotation, which will in general have put many more than 28 coefficients to zero, we proceed by making an automated guess for the form of the symmetric matrices carrying the 35 s and 35 c representations as described.

"Algebraization"
Once an ultra-accurate numerical value for a known-to-be-algebraic parameter has been found, one can use an integer relation algorithm such as PSLQ [151] to find a polynomial of which this is a zero and that seems plausible, i.e. the total information content of its coefficients is much smaller than the information content of the known digits of the parameter. This works well for up to a few thousand decimal digits. More specifically, one scans for a set of integer coefficients c j , j ∈ {0, 1, . . . , N −1}, |c j | < 10 d such that for a number ξ known to D (perhaps D ∼ 300) digits after the point, with D > N ·d+30, we have | j c j ξ j | < 10 −(N d+10) . If such a polynomial is found, we can easily determine its actual zero to D digits after the point, and check if this polynomial correctly predicts many further digits of ξ that were known but not used to find the c j . Naturally, if there is a single candidate polynomial that was found by using 300 known good digits of precision manages to predict the next 50 digits, we would expect this to happen purely by chance to be ∼ 1 : 10 50 . So, by using a large enough reservoir of extra precision, we can make the likelihood to accidentally predict an incorrect algebraic expression fantastically small. While this still does not constitute a strict mathematical proof, it would be rather unreasonable to disbelieve the result. Of course, it will in many cases then be possible to independently establish the validity of a claimed exact expression, but this will generally require ingenuity and effort beyond what can easily be automated to process scores of solutions.

Tweaks to the basic procedure
In general, it makes sense to use moderately refined numerical data (e.g. known to be good to 14 digits of accuracy) and then repeat the entire procedure that starts with finding a coordinate-aligning so(8) rotation from cleaned up numerical data, as limitations on numerical precision of input data from the ML library data may have caused the (imperfect) heuristic that suggests a low-dimensional model to miss some reduction opportunities. In particular, starting from an already partially parameter-reduced model will lead to rather short products of Davenport chained rotations, which are numerically much better behaved than those seen in the initial reduction step.
Once ultra-accurate and nicely coordinate-aligned numerical data are available, inverse symbolic computation methods can automatically search for exact expressions for physically relevant properties such as the cosmological constant, coordinates, particle masses and charges (and hence also residual supersymmetry), and others.

Extracting the physics
Unbroken continuous gauge symmetries can be determined numerically by mostly straightforward methods, and Lie group theory then allows an automatic classification. In a first step, one determines the space h of so(8) generators that leave a given solution invariant and splits it into orthogonal pieces w.r.t. the Cartan-Killing metric h = [h, h] + h . This separates off the generators of the U (1) N part of the residual symmetry. For all new solutions described here, the dimension of the semisimple [h, h] =:h part is 0, 3, or 6, so the corresponding non-abelian symmetry algebra can only be k ·so(3), k ∈ {0, 1, 2}, but perhaps embedded into so (8) in different ways. The authors' automated analysis handles this case by looking for a maximal set of p orthogonal and commutingh-elementsh c,j , j ∈ {0, . . . , p − 1} (which thanks to so(8) being the algebra of a compact Lie group are simultaneously diagonalizable in the adjoint representation), then splittingh into subspaces that are simultaneous eigenspaces for these p generators, labeled by p-dimensional vectors of eigenvalues. After defining a subspace of positive roots and identifying simple roots, taking the commutators of raising and lowering operators associated with simple roots produces a basis for the Cartan subalgebra that is useful for numerically identifying angular momentum spectra, which for k ·so(3) are straightforward to map to the irreducible representation content. For U (1) N generators, the N = 2 case will in general require finding a rotation that coordinate-aligns the charge lattice. In any case, we scale every U (1) generator u such that exp(2πu) is the identity on all particle states, while no λu with 0 < λ < 1 also has this property. For solutions whose gauge group has a single U (1), we use superscripts to indicate electric charge of particle states, while for solutions with U (1) × U (1) abelian gauge symmetry, we use super-and sub-scripts for the two different types of charges.
Having split the unbroken symmetry in this way, we can proceed to present branching rules for the 8 v,s,c as well as 28, and decompose particle mass-eigenstates for the spin-1/2 fermions, spin-3/2 gravitinos, and spin-0 scalars in terms of irreducible representations of the residual gauge algebra. For the scalars, we first try to split mass eigenspaces into orthogonal subspaces of the parity operator P that is +I on 35 s and is −I on 35 c . However, the analysis presented here only splits off the pure P = 1 and P = −1 subspaces and subsequently merges their joint orthogonal complement into one subspace. Mass eigenstates are then determined on these three subspaces, and marked with a superscript of (s), (c), or (m) for the "mixed" case, respectively. This step is numerically slightly problematic, as it so turns out that, for some critical points, we find P = 1 − eigenstates with < 10 −3 . Hence, auto-identification of some mass eigenstate as a pure scalar or pure pseudoscalar state may be somewhat unreliable. It is nevertheless reassuring that decomposition of weights into irreducible representations is observed to always succeed, which it would not if some P -eigenvectors were wrongly assigned to subspaces. All particle masses are known with an accuracy of more than eight digits, but it may happen that different particle masses look the same when truncated to three digits after the point for presentation. This explains why some solutions list the same mass as belonging to more than one mass-subspace. This happens e.g. for S1039624, which lists scalars with m 2 /m 2 there is a two-dimensional subspace of mass eigenstates with electric charges ±4 and another one-dimensional mass eigenstate with a too-close-to-be-discriminated-at-presentation-accuracy mass. For a solution to be perturbatively stable, no scalar mass-eigenstate must have m 2 /m 2 0 that violates the BF bound of −9/4. Unstable mass-eigenstates that violate this bound are marked with an asterisk ( * ) in the tables.
As none of the new solutions has a residual gauge group that contains a simple group of rank ≥ 2, and so apparently all such solutions already have been identified and studied in detail in the 80's [136], there is no compelling reason to automate assignment of quantum numbers to such larger symmetry groups.
Residual supersymmetry can be identified [152] numerically by using Singular Value Decomposition (SVD) to look for solutions of The corresponding gravitino states are marked with an asterisk in the tables.

A guide to the new solutions
Detailed data on all the solutions obtained in a large TensorFlow based cluster search are presented in appendix E. The structure of the location data presented in these tables is explained in section 2.4. Given the sheer amount of data (masses and charges for 26k particles!), it makes sense to include a lookup table that lists the most important properties. This is presented in appendix D. Let us in this section highlight some specific examples with interesting properties. S1384096 is a new stable vacuum with N = 1 supersymmetry. This is clearly the most remarkable new discovery. As the 10-parameter form presented here resisted all attempts to increase numerical accuracy by employing a multidimensional Newton solver, a computationally rather expensive Nelder-Mead optimization had to be performed to extract 500+ digits of numerical accuracy. With this, PSLQ based analysis (using 400 digits of accuracy to predict 120 further digits) was able to determine the cosmological constant algebraically as a root of the following (rather remarkable!) polynomial: Further, the gravitino masses are roots of these algebraic expressions: The gauge group SO(3) is embedded in a triality-invariant way as the diagonal subgroup of a SO(3) × SO(3) ⊂ SO (8). There are ten SO(3)-invariant scalars which all have different masses apart from a pair of two with m = 0. A detailed study of the analytic properties of this new solution is in progress [143]. Remarkably, S1384135, which has only a minimally lower cosmological constant has a Gravitino state that almost would satisfy the Killing spinor equation (no other Gravitino in the long list comes this close) and also has its gauge group -here, U (1) -embedded in a triality-invariant way. The mass spectra of these two solutions are very similar.
One notes that all the cosmological constant polynomials that have been identified have rather special form which, given their 2-3-5-factorizations, somehow seems to be suggestive of an underlying E 8 structure. S1600000 and S1800000 extend the known list of solutions with rational (even integer) cosmological constant by two new entries, the others being S0600000 with N = 8 SO(8) symmetry, S0800000 with SU (4) symmetry, S1200000 with N = 1 U (1) × U (1) symmetry, and the only known stable nonsupersymmetric point S1400000, which has SO(3) × SO(3) gauge symmetry (listed as SO(4) in the summary table). Despite both S1600000 and S1800000 having no residual gauge symmetry, they have rather remarkable and currently unexplained degeneracies in particular in the fermion mass spectra. Such "accidental degeneracies" also occur for some other critical points, such as S1046018 or S1176725.
The S0847213 solution is the only "modern" critical point with residual gauge symmetry for which the 8 v,s,c branching does not have any residual triality symmetry. Also, it is the only case with a 3-component gauge group.
The S1039230 solution has been first described in [153]. In total, there are three known critical points with residual gauge symmetry SO(4). The gauge group embedding has V ↔ S triality symmetry for this solution, while it has V ↔ C triality symmetry for the S0880733 solution and S ↔ C symmetry for the (BF-stable!) S1400000 solution.
S2099419 and S2099422 are a pair of critical points with extremely similar but nevertheless different particle properties and also cosmological constant. There are a few other examples for critical points with very small difference in the cosmological constant and similar properties, such as the pair S2511744-S2512364, or S3254262-S3254576.
For almost every solution, the number of scalar modes with m 2 /m 2 0 = 0 matches the number of broken so(8) generators, as it has to according to Goldstone's theorem. Two solutions have extra massless (w.r.t. "naive mass", not AdS-massless) modes that survive being eaten by gauge bosons: S0800000 (i.e. the SU (4) critical point) and S1200000 (i.e. the N = 1 U (1) × U (1) vacuum).
Solutions S1200000, S2279257 and S2279859 are listed in the summary table with symmetry U (1) 1 , since in these cases, the charge of all U (1)-charged particles is the same in magnitude. Still, these particle-states are listed as having charges 1 ++ etc. in the tables for typographic reasons, due to the 8 s,c representations branching to 4 × 1 + + 4 × 1 − , but the spectrum not having any particles transforming under these representations.
While this expanded list did not uncover any other stable non-supersymmetric critical points beyond the known SO(3) × SO(3) solution S1400000, we observe a another instance of the phenomenon that instabilities can become invisible when truncating a solution to the scalar submanifold that is invariant under the unbroken gauge group. This was first observed and discussed in detail for S0800000 in [138]. There, the unstable scalars come as a single 20-dimensional irreducible representation of SU (4), and the question was raised whether one could project out these unstable modes with an orbifold construction which would have to use a non-abelian discrete subgroup of SU (4) other than the Weyl group (since in these cases, the 20' decomposes with singlets). This phenomenon also happens for the SO(7) critical points S0668740 and S0698771, where the unstable modes transform as a 27, as well as for S1424025, where the unstable modes form a single 5 of SO(3). Clearly, the analysis of all viable discrete subgroups of the gauge group under which this irreducible representation branches without singlets is greatly simplified by going from SU (4) to SO (3), where there is an obvious candidate symmetry (namely the icosahedral group) under which 5 → 5.
Remarkably, S2416856 has no residual gauge symmetry, but six unstable scalar modes with 1+2+3 mass-degeneracy. While trying to find a way to stabilize this solution seems hopeless here, it would be interesting to understand what symmetry is responsible for the accidental mass degeneracies in the unstable modes.
The solutions S1195898 and S2503105 feature a peculiar SO(3) gauge symmetry which is embedded in a triality-invariant way, with branching 8 v,s,c → 3 + 5. In both cases, there are only two SO(3)-invariant scalars, but unfortunately, one of them is unstable. This is also the only instability of these solutions. Remarkably, the minimal polynomial for the cosmological constant is identical for these two solutions, so this seems to be an example of the algebraic equations for a vanishing gradient allowing a pair of real solutions that are Galois conjugates. It is noteworthy that even in the large numerical search performed here, S2503105 was a chance hit that only showed up once. Had it been missed, it is conceivable that it ultimately might have been found in a detailed study of S1195898 as an algebraically equivalent solution. So, looking for Galois conjugates seems to be one new method to fill possible gaps in the list of solutions.
The same Galois-doubling phenomenon occurs also for the pair S1039624-S1402217, which both also have identically embedded gauge group. One might speculate that Galois conjugates may well occur more often, in particular with solutions without residual symmetry and a cosmological constant that is not known algebraically. Perhaps then, it may be possible to extract algebraic numbers that are easier to identify from pairs of numerically known cosmological constants of critical points with identically embedded gauge group.
A rather unique property of the S4168086-solution will be the topic of a short follow-up article.

Conclusions and outlook
Some problems in the world are very amenable to ML based approaches, others not so much. As we are, during the current ML revolution, working on finding out which is which, we sometimes encounter pleasant surprises. Being able to present an elegant way to address a fundamental need in quantum gravity research -the need to be able to analyze potentials on complicated high-dimensional scalar manifolds -certainly is one of these.
More work remains to be done that analyzes other relevant cases using the methods presented here, in particular the scalar sector of maximal D = 5 gauged supergravity [5], i.e. the AdS side of the best studied case of the AdS/CFT correspondence, as well as CSO(p, q, r)-gaugings of four-dimensional maximal supergravity, plus their dyonic variants [125], and other gaugings discussed in [122], as well as gauged supergravities in three and two dimensions.
Also, while the present article is a major step forward towards a proven-exhaustive classification of the critical points of SO(8) supergravity, this challenging problem is still out of reach. It is quite likely that the rather peculiar form of the minimal polynomials of the cosmological constant that could be obtained for about three dozen critical points (including all solutions that can be described with up to six parameters) is a very major clue which the authors currently do not know how to utilize. Nevertheless, having an algebraic expression for the cosmological constant typically means that it is also feasible to find algebraic expressions for the entries of the 56-bein matrix, and hence quite a bit of the work required to uplift a critical point to a solution of the equations of motion of 11-dimensional supergravity can be automated. Not surprisingly, giving exact expressions for coordinates (which are not algebraic, but typically {algebraic} · log {algebraic}) is quite a bit harder (in fact, with the techniques employed here, this could only be achieved for a few coordinates), but might actually be unnecessary to answer most questions as long as the vielbein entries are known exactly and one can show that the vielbein indeed is an element of E 7(7) . Clearly, it would be very interesting to know in particular for all the supersymmetric vacua what 11-dimensional geometries these solutions correspond to -and also what the CFT renormalization group flows on M2-branes between the corresponding fixed points look line.
For all solutions presented here, numerical data on their location are available in the source package of the arXiv.org preprint of this article. In many cases, the authors have been able to numerically reduce the violation of the stationarity condition to much less than 10 −1000 , and for each claimed critical point, it would be unreasonable to doubt its existence. Unfortunately, going from machine accuracy as obtained with TensorFlow to ultrahigh accuracy is a somewhat messy process as the heuristics to guess a low-dimensional model as well as the attempt to use a multi-dimensional Newton solver do occasionally fail (for quite a few independent reasons) and require manual intervention. Due to limited time and a rather large number of cases to analyze, the quality of numerical data is rather uneven, typically providing 100+ good digits, but sometimes only providing as few as 16. With a bit of numerical experimenting, it typically is possible to obtain 1000+ good digits for any given solution within about two days of work.
Quite a few of the solutions feature (occasionally large) accidental degeneracies in the spectrum that should be understood and may point to exploitable symmetry properties. It may well be that the accidental U (1) ⊂ SO(8) "background round S 7 diffeomorphism" symmetry of the S1400000 solution discussed in [133], is a first example of such an extra symmetry. Also, it is interesting to observe that, out of all the solutions newly discovered by employing numerical methods, there is a single one for which the residual symmetry group is not embedded in a way that has triality symmetry. Clearly, triality seems to play a rather important role, and so re-phrasing the problem in octonionic language might reveal additional structure.
In this work, we have only scratched the surface on cleverly engineering loss functions in order to extract additional information from the scalar potential. We expect that, with some ingenuity, much more is possible here, perhaps even allowing an efficient direct scan for BF-stable critical points.
Given that even this long list will still have some gaps, what are the most promising approaches to fill them? One strategy still is to parametrize and study interesting submanifolds, such as the manifold of scalars that are invariant under the SO(3) gauge group of the new supersymmetric vacuum. Also, assuming that there indeed are perfect algebraic "Galois twin" solutions (i.e. not only the cosmological constant, but also all particle masses are related by root flipping), it may be possible to discover new solutions by obtaining algebraic expressions for some at present only numerically known solutions and then check whether root-flipping can produce new real solutions. A very promising further approach may be to study the extremal structure of the "dyonic" variants described in [125] and study the fate of critical points when changing the new ω-parameter. In [153], this method found the third SO(4) critical point S1039230 before it was independently rediscovered here, and it may well reveal further solutions.
Historically, the potential of SO(8) supergravity was first written down in [3]. About fourty years later, we now have what is likely the almost-complete list of critical points, and very likely the complete list of supersymmetric vacua of this theory. Given that some of the ideas that enabled this analysis were about as old as the original problem but largely unknown in the String Theory community, and were identified as useful and had to be brought together as part of one of the authors' personal journey, it seems likely that other technically hard problems in String Theory also would benefit from more exchange with other fields of research. We hope that by aiming to keep the introduction in this article accessible to readers with a technical background who are not experts in field theory, we might attract the attention of experts who can contribute missing puzzle pieces that we are not even aware of yet, perhaps even allowing a completeness proof of the list of solutions (perhaps after filling the last gaps).

A E 7 Conventions
Spin(8) triality provides some intuitive insights into the question why the exceptional Lie group E 7 exists. Let us start from the observation that one can, for example, understand the algebra rotations in nine spatial dimensions, so(9), as an expansion of the group of rotations in eight spatial dimensions, so (8), by eight extra elements r k8 that, when scaled and exponentiated, rotate each of the coordinate axes of eight-dimensional space against the ninth axis, and themselves transform under so(8) as eight-dimensional vectors, while giving rise to other so (8) rotations in their commutator, [r 08 , r 18 ] = −2r 01 so that we get an overall structure of [g, g] ∼ g, Likewise, one can obtain the 63-dimensional algebra sl(8) from the 28-dimensional algebra of so(8) by adding 35 generators that transform as symmetric traceless matrices under so(8).
One easily sees how for the 8-dimensional vector representation of sl(8), the generators can be expressed in a basis of symmetric and anti-symmetric traceless matrices, where the latter form the subalgebra of so(8) and the former are the generators that extend this to sl (8). One notes that one also gets a 63-dimensional real algebra (over complex matrices) that closes if one multiplied each of the extra 35 generators with i, so that commutator relations are of the form The extra minus sign here also shows up in the signature of the quadratic invariant that can be formed from the generators, the Killing form, G mn := Tr (g m g n ). Applying this "Weyl unitarity trick" to sl (8) produces the Lie algebra of the compact Lie group SU (8), i.e. su (8). Now, the so(8) (or "spin(8)") algebra is special in that there is a non-abelian S 3 symmetry that acts on its irreducible representations. This symmetry permutes the roles of the three inequivalent eightdimensional irreducible representations, the "vectors", "spinors", and "co-spinors". Let us consider extending so(8) with an eight-dimensional vector representation v as before in order to get so(9), but simultaneously with eight-dimensional spinors s and co-spinors c in such a way that we get commutator relations of the form  (8), while the other two each extend so(8) to sl (8). This then gives the noncompact real form e 7(7) that shows up in four-dimensional N = 8 supergravity. So, in a sense, the e 7 algebra can be thought of as a generalized algebra of rotations with "siamese triplet" structure, three organisms cojoined at the so(8) heart, but functioning as one whole body. In order to make this work self-contained, we spell out the conventions underlying our construction of e 7(7) in detail. These match [128], apart from index counting always being 0-based the present article.
We start from the so(8) "Pauli-matrices" γ i αβ in the conventions of Green, Schwarz, and Witten [41], but with all indices shifted down by 1, as it makes much more sense in a computational setting to consistently start index counting at zero. The nonzero entries of γ i αβ are all ±1, and we list them in compact form iαα ± , so e.g. 357 − translates as γ i=3 α=5,β=7 = −1, etc. The Spin(8) traceless symmetric matrices over the vectors and spinors are "bosonic" objects, i.e. cannot discriminate between a 360-degrees rotation and the identity, so they must be expressible in terms of SO(8) representations alone. It so turns out that the 35 s and 35 c are equivalent to the self-dual, respectively anti self-dual four-forms of SO (8), and the corresponding Spin(8)-invariant dictionaries are the tensors We choose the basis for e 7(7) such that the last 28 generators (elements #105 to #132) form the so(8) subalgebra, elements #0 to #34 transform as symmetric traceless matrices over the spinors 35 s , elements #35 to #69 as symmetric traceless matrices over the co-spinors 35 c , and elements #70 to #104 as symmetric traceless matrices over the vectors. Thus, the last 63 elements form the subalgebra su (8), and the first 70 elements the 70 non-compact directions of the coset manifold of supergravity scalars. An E 7 adjoint index A splits as (with the underline representing a single index that can be associated with a pair of so(8)-indices): We furthermore choose the basis for so (8) in such a way that element 105 + n, when acting on the vector representation of so (8), would be represented as the rotation matrix (r [jk]  For the three 35-dimensional symmetric traceless irreducible so (8) representations, we use the convention that the first 7 basis elements correspond to the diagonal matrices diag(1, −1, 0, . . . , 0), diag(0, 1, −1, 0, . . . , 0), diag(0, . . . , 0, 1, −1) (in that order), while element 7 + n corresponds to the matrix (S (jk) ) m p = δ j m δ k p + δ j p δ k m , again with 0 ≤ j ≤ k ≤ 7, and also n = j · 8 + k − (j + 1)(j + 2)/2 -and with a likewise lexicographical order for the corresponding non-diagonal parts of S (αβ) and S (αβ) . With these conventions, the e 7 symmetric bilinear form obtained from the fundamental representation, g AB = T AC D T BD C is almost diagonal, with entries +96 for B = C < 70, entries −96 for B = C ≥ 70, entries −48 for the non-orthogonal generators corresponding to the diagonal parts of the symmetric traceless matrices over the spinors and co-spinors (i.e. g A=0 B=1 , g A=3 B=2 g A=35 B=36 , etc.), and entries +48 for G A=70 B=71 , etc. for the diagonal part of the 35 v representation.
In order to define the 56-bein, we need explicit generators for the pseudoreal 56-dimensional fundamental representation of e 7 . In the expressions below, the Einstein summation convention does not apply for "technical" auxiliary indices that are set in typewriter font and do not belong to irreducible representations).
Given T (E7) AB C , the e 7 generator matrices g used to define the scalar potential are g (n) C B = T (E7) A=n B C for n = 0, . . . , 69. These 56 × 56 matrices g (n) look as follows (with Z(n) given by (2.6)):

B The Octonions and the spin(8) invariant γ i αβ
This section provides a simple and fully self-contained instructive example for using TensorFlow to numerically solve tensorial algebraic constraints with very little mental effort. The basic techniques are essentially the same as the ones used for the main part of this work. Also, this section provides a detailed answer to the question how two common conventions for Octonions and spin(8) gamma matrices are related.
The Lie group Spin(8) has three inequivalent irreducible eight-dimensional representations, the vectors 8 v , for which we use indices i, j, k, . . ., the spinors 8 s , indexed with α, β, . . ., and the cospinors 8 c , indexed withα,β, . . .. The spin(8) invariant γ i αβ provides a unique way to map spinors and co-spinors to vectors, φ(·, ·) : This means that we can use γ i αβ to define an invertible 8-dimensional product, that is, an 8dimensional real division algebra. Now, using e.g. an explicit form of the (8,8,8) tensor γ i αβ such as the one given in [41], eq. (5.B.3), which has no reason to know about the division algebra interpretation, some arbitrary choice has been made for the vector space bases of the 8 v , 8 s , 8 c representations. Without loss of generality, we can identify the chosen basis of 8 v with the basis of the "output" vector space of the octonionic product as it is defined in [154], with the octonionic imaginary units e 1 . . . e 7 satisfying: e j e j = −1, e j e k = −e k e j (j = k), e 1 · e 2 = e 4 e i · e j = e k =⇒ e i+1 · e j+1 = e k+1 and e 2i · e 2j = e 2k (mod 7).
Then, as we want to retain orthonormality (but not necessarily handedness), we can try to find one O (8)  . Rather, we design our optimization problem such that we in principle allow arbitrary elements of GL(8), but introduce a term in our objective function that punishes deviations from orthonormality.
The important point about the piece of code shown below is that, while this solves a numerical optimization problem in 2 · 64 = 128 parameters both with very good performance and accuracy, nowhere did the need arise for us to provide any code that computes gradients. This is all handled by the TensorFlow framework.

C TensorFlow code for Watershed Analysis
Finding saddles in the stationarity condition (2.8) as well as their principal axes can in principle be done symbolically, at the level of tensor equations (requiring substantial effort), or by manually performing the backpropagation code transformation, and then once again on the backpropagated code (requiring substantial effort). The code below illustrates how TensorFlow allows one to achieve this objective with minimal additional coding effort (27 lines of code) if one already has code to compute the stationarity condition violation. The code shown here sets up a TensorFlow session context (roughly: an association between a graph describing tensor arithmetic operations and hardware resources) and then executes a sequence of independent minimization-searches as specified, serializing results to the filesystem. Page on which the solution can be found. C Citations, major articles that covered this solution. The tag in the S-column uses the truncated, rather than rounded, value of the potential, in alignment with earlier articles that use this naming scheme -which has become necessary due to the large number of solutions without any unbroken gauge symmetry. The N -column is empty for (BF-)unstable critical points, and shows the number of unbroken supersymmetries for stable critical points. Supersymmetric solutions are automatically stable. The H-columns lists residual gauge symmetry, where we make an effort here to be specific about the actual group. So, if a so(3) subalgebra is embedded into so(8) in such a way that some particle state transforms as a spinor, we call the gauge group "Spin (3) (4), we use the name "SO(4)" if all particle states can be arranged into representations of SO(4) (which would not hold e.g. for an isolated (3, 1) representation), and SO(3) × SO(3) otherwise. For U (1) factors, we indicate with a subscript the largest observed particle charge once the generator has been re-scaled to the minimal length that makes all charges integral. So, for example, the gauge group of S0880733 that is associated with the Lie algebra so(3)+so (3) is "SO(4)", while the gauge group of S1075828 is "Spin(3)×U (1) 4 ".
The T -column indicates what subgroup of the "triality" outer automorphism group S 3 of so(8) the embedding of H is invariant under. Here, "V SC" means full triality invariance, while "SC" means invariance under a S ↔ C exchange, etc. The M, P, D-columns provide different rough measures of the complexity of the solution. The M -column shows the dimension of the H-invariant submanifold of the scalar manifold, if there is some residual gauge symmetry H. If one wanted to find exact expressions for the location and properties of a solution without resorting to inverse symbolic computation, one would want to coordinate-parametrize this manifold M . A critical point of the restricted potential on M is then guaranteed to also be a critical point on the whole scalar manifold [136]. The P -column lists the number of different numerical parameters used in this work to describe the solution. It is possible that, in some cases, one can use a SO(8) rotation to find an alternative form with even fewer parameters, so this value only gives an upper bound on how many coordinate-parameters are necessary. The D-column shows the degree of the smallest polynomial with integer coefficients that could be found which has a zero at −V /g 2 -if algebraic identification of the cosmological constant was successful given the number of available digits. An entry a indicates an a-th order polynomial, while an entry a b indicates an a-th order polynomial in x b . The A-column shows the decimal logarithm of the residual value of the stationarity condition |Q ijkl | 2 . So, an entry of 100 means that |Q ijkl | 2 < 10 −100 for the numerical location data that have been made available alongside the preprint of this work. The p column shows the page number on which the solution can be found, and the C-column lists major articles in which the corresponding critical point is discussed. A "*" indicates a new discovery.

E The list of solutions
For each of the 192 solutions, a location is given in the form of two symmetric traceless matrices, M αβ and Mαβ, which have been rotated to maximize (but in some cases perhaps not globally) the number of zero entries. The normalization of these matrices is such that the e 7 generator G in the fundamental 56representation that is parameterized by the matrices M αβ , Mαβ satisfies In the literature, locations of critical points are typically given in φ ijkl four-form language, e.g. (4.7) in [152] for the solution S1400000, The dictionary to translate between φ ijkl and M αβ , Mαβ reads 35 : Precision numerical data on the location of each critical point are available in the T E X-source archive of this article on arxiv.org. This allows checking all claims made here about the existence of particular critical points to numerical machine precision with the code that has been made available as [156]. For some critical points, the numerical data provided are accurate to beyond 1000 digits, which should in some cases be sufficient to obtain algebraic expressions for the 56-bein matrix entries via inverse symbolic computation techniques, hence allowing automation(!) of the uplifting to 11 dimensions along the lines of the construction presented in [120], [133]. Unfortunately, for about 40 critical points, the authors were not able to obtain precision data that go substantially beyond numerical hardware (i.e. IEEE-754 double float) precision.