Systematics of Aligned Axions

We describe a novel technique that renders theories of $N$ axions tractable, and more generally can be used to efficiently analyze a large class of periodic potentials of arbitrary dimension. Such potentials are complex energy landscapes with a number of local minima that scales as $\sqrt{N!}$, and so for large $N$ appear to be analytically and numerically intractable. Our method is based on uncovering a set of approximate symmetries that exist in addition to the $N$ periods. These approximate symmetries, which are exponentially close to exact, allow us to locate the minima very efficiently and accurately and to analyze other characteristics of the potential. We apply our framework to evaluate the diameters of flat regions suitable for slow-roll inflation, which unifies, corrects and extends several forms of"axion alignment"previously observed in the literature. We find that in a broad class of random theories, the potential is smooth over diameters enhanced by $N^{3/2}$ compared to the typical scale of the potential. A Mathematica implementation of our framework is available online.


Introduction
Fields protected by exact or approximate shift symmetries play an integral role in a wide variety of physical phenomena, ranging from magnetic fluctuations of topological insulators in condensed matter physics to inflation in early universe cosmology. Axion fields are a particularly interesting example: to all orders in perturbation theory they have a continuous shift symmetry which is broken to a discrete one by non-perturbative effects. Axions were originally proposed to solve the strong CP problem of QCD [1], and many (∼ 100) axions often arise in compactifications of string theory (see e.g. [2][3][4][5][6][7][8][9][10][11][12]). Axions can be dark matter [13][14][15][16][17], drive inflation [18][19][20][21][22][23][24] and (similar to quantized fluxes [25]) can account for the observed vacuum energy [26]. In order to analyze the interplay between these distinct phenomena, one requires a comprehensive framework for multi-axion theories. The purpose of this paper is to present such a framework, which can be employed to unify the cosmological mechanisms mentioned above. Theories of N 1 axion fields constitute an extremely complex "landscape"that is, they have an exponentially large number of minima with different energies and a large diversity of regions of the potential. We will study general multi-axion theories, providing a systematic approach that renders even complex theories analytically tractable. 1 In this paper we focus on properties of the axion potential. We identify all exact and approximate shift symmetries, provide the location of local minima and characterize features of the potential through a natural partition of the axion field space. In a companion paper [27] we study the dynamics of these theories in the context of cosmology. A brief summary of our results can be found in [28]. In this paper and its companions we find that generic theories of N ∼ 100 axions with a single energy scale close to the fundamental scale and with O(1) random coefficients can simultaneously account for the Big Bang (tunneling from a parent minimum), inflation (because such potentials generically have light directions with enhanced field ranges), "fuzzy" dark matter [13] with roughly the correct abundance [14,15], and provide many minima with energies consistent with observation that can solve the cosmological constant problem anthropically [29][30][31][32][33]. 2 This paper is divided into three parts. In §2 we identify the exact and approximate discrete shift symmetries of multi-axion theories by introducing a P -dimensional auxiliary field space. We discuss how in many cases the approximate shift symmetries can be used to eliminate all phases from the potential to good accuracy. Following the discussion of symmetries, the two subsequent sections can be read independently. In §3  we apply the framework of symmetries to locate the critical points of the potential. We provide an algorithm that finds all minima in exponential time, while a representative sample of all minima can be obtained in polynomial time. More specifically we demonstrate how an exponentially large number of minima can be located analytically via a polynomial in N algorithm. We estimate the magnitude of the remaining phases, as well as the number of minima in certain ensembles of random axion theories. In §4 we provide a general discussion of the geometry of the approximately quadratic regions of the potential surrounding minima. This discussion generalizes and corrects misleading prior results in the literature (including those of one of the authors [35]).
A Mathematica demonstration of our framework for multi-axion theories is available online [36].

Systematic framework
Consider the general two-derivative theory of N axions θ i . The N continuous shift symmetries of the free theory are broken to discrete ones by P leading non-perturbative effects. 3 The Lagrangian takes the form Λ 4 I 1 − cos (Qθ + δ) I + . . . , (1.1) 3 In some cases the shift symmetries are entirely broken for some linear combinations of fields by couplings to classical objects, such as sources or fluxes. In this work we restrict our attention to fields that retain discrete shift symmetries. Appendix B discusses the appropriate coordinate transformations that eliminate axions that couple to classical sources from the theory.
where K is the metric on field space, Q is the P × N integer matrix where the Ith row contains the charge associated with the axions' coupling to the Ith non-perturbative effect, and Λ I is the energy scale of this effect. The dots denote subleading terms in the potential that we will generally neglect (but see §3.7), as well as a possible additive constant (a bare cosmological constant) that will be irrelevant in this paper, as we do not consider coupling to gravity here (but see [27]). We denote matrices and vectors by bold font, with upper and lower indices identifying rows and columns, respectively. The lower case indices i, j run from 1 to N , a runs from 1 to P − N , and I runs from 1 to P . 4 Throughout this work we assume that K is independent of the axions θ.
Potentials of the form appearing in (1.1) are very complex when N, P 1. However, because the cosine arguments consist of integer linear combinations of the axions, the potential is manifestly invariant under the N discrete shifts θ i → θ i + 2π. 5 The existence of exact discrete symmetries is a fundamental characteristic of axion theories, and it allows us to restrict our attention to a finite region in field space -a single periodic domain defined by these symmetries. What is not so obvious is that the potential in (1.1) additionally possesses as many as P −N approximate discrete shift symmetries, which can be used to eliminate all phases δ I to good accuracy. We develop a framework that identifies these symmetries, and provides a natural division of the field space into domains over which none of the individual terms in the potential exceed their period. As we shall see, the identification of approximate symmetries consists of finding short lattice vectors of a P -dimensional rank P − N lattice, which at fixed P − N requires a number of evaluations that scales polynomially in N . The (approximate) symmetries then allow us to identify regions that are very similar. Furthermore, our framework allows us to identify a vast number of distinct minima by considering the approximate symmetry transformations away from a given minimum.
Axions are protected from perturbative corrections of the potential and therefore constitute prime candidates in the constructions of models for large field inflation and tests of quantum gravity more generally [37][38][39][40][41][42][43][44][45][46]. Large field inflation requires very flat potentials, therefore there has been much interest in the invariant distances over which axion potentials remain featureless [47][48][49][50][51][52]. The potential certainly is featureless in a field space region within which none of the cosine terms traverses more than its period. The invariant size of these regions depends both on the kinetic matrix K and the charge matrix Q. Historically, two mechanisms have been proposed to construct theories with potentials that remain featureless over large invariant distances: lattice alignment [20], which relies upon an almost exact degeneracy between the axion charges, and kinetic alignment [47], which relies upon the delocalization of eigenvectors of the kinetic matrix. In this work we clarify the relation between these mechanisms and demonstrate that the diameter of featureless regions is bounded from above by where λ min ( · ) returns the smallest eigenvalue and we defined |Q| ≡ √ Q Q. The bound (1.2) is approximately saturated in large classes of axion theories. Note that the diameter, perhaps surprisingly, scales with √ P ≥ √ N .

Results for random ensembles
To illustrate our framework we apply it to random ensembles of axion theories defined by a collection of integer charge matrices Q, energy scales Λ 4 I and axion-independent field space metrics K that are loosely motivated by flux compactifications of string theory. While our techniques apply to all N, P , they are most powerful in the regime N 1 and P < 2N . We will take Q to be a P × N matrix of independent, identically distributed (i.i.d.) random integer entries with mean zero and standard deviation σ Q . As long as at least a small fraction 3/N of the entries is non-vanishing -which at large N includes very sparse matrices -the universality of random matrix theory takes over and yields simple analytic results. 6 As it turns out, in this regime the approximate shift symmetries become exponentially close to exact.
Even for the simplest case P = N + 1 we will find that the number of minima scales factorially with the number of terms in the potential (see also [26]), with a simple generalization to larger P − N . In these potentials there is a natural definition of neighboring minimum: those that are separated by no more than one traversal of the maxima of each cosine term of the potential. We will find that even when P = N + 1 the neighboring minima realize a wide variety of energy densities so long as N 1. In other words these theories have extremely complex potentials that look random in the vicinity of any point or along a randomly chosen direction. However, they also possess nearly exact symmetries that make their analysis tractable.
In particular, we can use the symmetries to identify minima with energy close to any desired value to exponential accuracy in polynomial time [28]. This kind of tractability in complex landscapes was recently discussed in [53].
We will consider both specific examples and ensembles of isotropic, positive definite kinetic matrices and parametrize the resulting field space diameters in terms of the largest eigenvalue f 2 max of K. Both the field space diameters and the distribution of energies in minima have previously been studied in such random axion theories. In this work we unify and generalize many of those results. We find that the field space distance suitable for inflation is typically as large as (see also [49]) This result is robust even when large hierarchies are present between the energy scales Λ 4 I . The approximate shift symmetries are lost in the limit P N . In this case the potential ceases to be analytically tractable and approaches a Gaussian random field instead. In appendix G we discuss a connection between multi-axion theories in this limit and Gaussian random fields with a Gaussian power spectrum.

Exact and approximate axion symmetries
The leading non-perturbative potential for the N axions θ in (1.1) is where we postpone the discussion of non-vanishing phases δ I to §2.5. This potential depends on the P energy scales Λ I and the P N integers in the charge matrix Q. At large N the field space volume becomes large, but given the limited number of parameters and the periodicity of the cosines, one might suspect that the entire structure of the potential is analytically tractable, at least so long as P is not too great. In the following we will make this expectation precise.

Auxiliary fields and a geometric picture
To analyze (2.1) it turns out to be useful to consider a set of P real scalar fields φ, subject to an auxiliary potential Comparing to (2.1) we observe that the argument of the Ith cosine (Qθ) I has been replaced by an independent field φ I . Hence the physical potential (2.1) is identical to (2.2) if φ I = (Qθ) I , or more compactly where this equation defines a hyperplane Σ in the auxiliary field space R P (which we call φ-space). Specifically, note that is a linear combination of the columns Q j of Q. The surface Σ is the hyperplane spanned by these columns (the column space of Q ≡ colsp(Q)), and (2.1) and (2.2) coincide when φ is constrained to Σ : For this reason we refer to Σ as the constraint surface (cf. Figures 2 and 3). An efficient way to impose this constraint on φ is to introduce P − N Lagrange multiplier fields into the action; we will do so explicitly in §2.5. For P ≥ N the dimension of Σ is N if the columns of Q are linearly independent, and so the map is one-to-one. In this case Q is called full rank. We can assume this to be true without loss of generality and will do so from now on (cf. appendix A). The utility of framing the problem in the extended P -dimensional space stems from the fact that the symmetries of V aux are manifest: φ I → φ I + 2πn I with n I ∈ Z, so that V aux is identical in P -cubes of side-length 2π that we take to be centered on the sites of the scaled integer lattice 2πZ P . Each cube can be labeled by an integer P -vector n: where the ∞ -norm of a vector returns its largest absolute component. 7 Within a single 7 Clearly the ∞ -norm is basis dependent. We denote the basis of a vector by its symbol, i.e. φ is P -cube the potential V aux is relatively featureless and every P -cube contains a single minimum at its center φ = 2πn where V aux = 0. Points where Σ passes through the center of a P -cube are therefore global minima of the physical potential. This set of points forms a sublattice L Σ ≡ Σ ∩ 2πZ P . It is simple to see that this sublattice is rank N if Q has integer entries and full rank. 8 The auxiliary potential is manifestly invariant under shifts between any pair of such points, and therefore so is the physical potential V . In other words, this sublattice defines the N exact shift symmetries of (1.1).
The tiling of φ-space into P -cubes (2.6) provides a useful way to divide the physical field space into distinct regions. The constraint surface Σ slices across the cubes, and the regions of intersection of Σ with various P -cubes are an N -dimensional tiling of Σ (see Figure 3). Within each tile the potential is relatively smooth because none of the individual cosine terms in (2.1) traverses its respective period. We can label each tile by the integer P -vector n of the corresponding P -cube (2.6): a vector in the standard basis for φ-space, while θ is a vector in the standard basis for θ-space (R N ). 8 One might worry that GL(N, R) transformations of the axion fields do not preserve the fact that Q has integer entries. In fact, the necessary and sufficient condition on Q such that L Σ = colsp(Q) ∩ 2πZ P is rank N is that P = Q(Q Q) −1 Q , which is the orthogonal projector onto colsp(Q), contains only rational entries. This property is preserved under GL(N, R) transformations. Figure 3. Left: Contour plot of the auxiliary potential on the constraint surface Σ in φcoordinates, for an example with N = 2, P = 3. The solid black lines denote one periodic domain of the potential, while the dashed gray lines denote the boundaries of the tiles defined in (2.7) (the intersections of the cubical tiling (2.6) of the auxiliary potential with the constraint surface). The depicted cubes constitute a full set of those with distinct intersections with Σ. Right: Contour plot of the physical potential and its tiles in aligned coordinates ω (defined later in (2.18)). Opposing boundaries of the periodic domain are identified.
Not every integer P -vector n corresponds to a tile because some P -cubes do not intersect Σ.
When some or all of the angles of Σ with respect to the grid defined by (2.6) are small, one expects that at least some shifts from an initial tile to an adjacent one (adding 1 to one of the components of n) will leave the physical potential close to invariant. This is the case, but we will see in §2.3 that we can define a different set of shifts that are in general closer to exact symmetries than these, and have the desirable property that they form a complete (but not overcomplete) basis for the set of all distinct tiles (2.7).

Exact periodicities and periodic domains
As discussed above, displacements between points in the sublattice L Σ ≡ Σ ∩ 2πZ P leave V aux unchanged and lie within the constraint surface Σ. These are the exact shift symmetries of the physical potential.
In general, a basis for a rank M lattice is a set of M linearly independent vectors with the property that integer linear combinations generate all lattice points. Consider an M -parallelepiped, with edges defined by the basis vectors of a lattice. This parallelepiped is a periodic domain for the lattice and contains exactly one lattice point. A simple example is a P -hypercube in (2.6) with e.g. n = 0, which is a periodic domain for the P -dimensional lattice 2πZ P . We will use the notation {t i } to denote the N integer vectors that generate the lattice Σ ∩ Z P (so that the vectors {2πt i } generate L Σ ). 9 A single cell of this lattice sublattice is a region in which all distinct features of the potential are captured -in other words, it is a periodic domain of the physical potential, and we can restrict our attention to one such cell.
Any basis {2πt i } for L Σ forms a primitive set for the full auxiliary lattice 2πZ P (see appendix C for a proof). This means there exists a set of P − N lattice vectors {t ∦ 1 , . . . , t ∦ P −N } that are not parallel to Σ and that when combined with the N vectors {t i } form a basis for Z P . It will be important in a moment that the P −N supplemental vectors are not unique. The only condition is that the matrix containing all P basis vectors is unimodular (determinant one with integer entries). Whenever the vectors t ∦ a are (at least somewhat) aligned with Σ, in a sense we shall make more precise below, we refer to this basis for Z P as the aligned lattice basis.

Approximate symmetries and well-aligned theories
In general the transverse lattice vectors t ∦ a will not be orthogonal to Σ. Their decomposition into Σ and its orthogonal complement Σ ⊥ will be important. We label the orthogonal projectors onto these subspaces by P and P ⊥ , respectively. Since Σ = colsp(Q), it follows that Σ ⊥ = ker(Q ) and we have the following explicit form of the projectors in terms of the charge matrix, Now consider a shift of the fields generated by the projection of a non-parallel lattice vector onto Σ : This shift is projected onto Σ and hence corresponds to a physical shift of the potential V = V aux | Σ . However it is not an exact symmetry because P t ∦ a is not an integer vector. The amount by which this shift breaks the symmetry is proportional the projection of t ∦ a onto Σ ⊥ : where in the second step we used that the potential is invariant under φ I → φ I + 2π. Therefore, if the integer vectors t ∦ a can be chosen so that each component of P ⊥ t ∦ a is much less than one -if P ⊥ t ∦ a ∞ 1 -the correction to each cosine term in (2.2) is small and the shift φ → φ + 2πP t ∦ a is an approximate symmetry. To identify both the exact and approximate symmetries, we choose a basis (2.8) for Z P which is as aligned as possible with Σ. The first N vectors t i lie in Σ (and are a basis for the lattice Σ ∩ Z P ), thus (2.14) These vectors generate the N exact shift symmetries of the physical potential and any parallelepiped with the t i as edges is a periodic domain of the potential. The remaining P − N vectors t ∦ a should satisfy The vectors 2πP t ∦ a generate P − N approximate symmetries of the potential. 10 The aligned lattice basis for a simple axion potential, one with P = 2 and N = 1, is shown in Figure 2. We refer to theories where the orthogonal projections of all elements of the aligned basis are small as well-aligned. That is, a well-aligned theory satisfies ) defines what is known as a reduced basis for the lattice generated by P ⊥ . We are purposefully vague in the precise definition of "smallest possible" and "reduced": there are multiple definitions, such as Minkowski, LLL, or Rankin reduction. For example, depending on the precise application, one may be interested in a basis that aligns only some of the P − N transverse directions. These details are irrelevant for the present discussion and we refer to the literature [55][56][57][58][59][60][61]. A particularly simple approximation is given by the Mathematica package for the extended LLL algorithm [54]. We thank Liam McAllister and John Stout for discussion on this point.
(cf. (3.28) for the origin of the 1/(P − N ) on the right-hand side).
To illustrate the utility of these approximate symmetries, suppose φ is chosen to be a global minimum of the potential that lies on Σ (for instance φ = 0). Repeated shifts by 2πP t ∦ a then identify the approximate location (and determine the energy, as we discuss in §3.3) of many physically distinct minima with slightly different properties (see Figure 2). We will see later that in the random ensembles we study P ⊥ t ∦ a ∞ can generically be chosen so that it is exponentially small in N , for all 1 ≤ a ≤ P − N . More generally it can be small when det(Q Q) is large, since this determinant appears in the denominator of the projector. This allows us to locate and characterize many minima in an otherwise intractably complex landscape very easily and to good accuracy. Furthermore, in well-aligned theories all P phases δ I can be eliminated to good accuracy (cf. §2.5).

Aligned coordinates and similar tiles
The tiling (2.7) of Σ was introduced in §2.1 as a means of delineating relatively featureless sections of the potential by using the basic infrastructure provided by the auxiliary lattice. Here we show how this tiling enables the identification of many similar regions within one periodic domain of the physical potential.
As a preliminary illustration of a more general method, consider the periodic domain surrounding a global minimum of the potential (say the origin θ = 0), and a P -cube centered at this position on Σ. Now choose a specific non-parallel lattice vector t ∦ a and consider the set of P -cubes obtained by sequentially shifting the center of each cube by 2πt ∦ a , together with the tiles defined as their intersections with Σ. In well-aligned theories the auxiliary potential evaluated on successive intersections in the list (and hence the physical potential in the corresponding tiles) will be very similar. Now, regardless of whether or not a model is well-aligned, after a certain number of shifts the P -cube will have receded far enough from Σ that it no longer intersects it. If the number of shifts before this point is m, we have identified m distinct tiles that are labeled by consecutive integer multiples of t ∦ a . 11 These tiles may be scattered across multiple periodic domains because accumulating shifts eventually push part or all of the intersection out of the original periodic domain. Shifts by integer linear combinations of the 2πt i can then be used to uniquely return all portions of tiles into the periodic domain containing the origin. All such tiles are distinct because they originated from distinct intersections with Σ.
A complete tiling of the periodic domain can be found by following a generalization of the procedure outlined above -shifting the P -cube containing the global minimum 11 Note that m ∼ 1/ P ⊥ t ∦ a ∞ , which is large in a well-aligned theory.  by linear combinations of all non-parallel directions 2πt ∦ a , and scanning this space until all intersecting P -cubes are identified. A more convenient labeling of the regions that tile the periodic domain is achieved by defining aligned coordinates ω for the auxiliary space: 12 φ ≡ t | t ∦ ω . (2.17) Recall that the matrix appearing in (2.17), which is identical to (2.8), is unimodular (determinant one) and thus has a unimodular inverse. Integer P -vectors in φ-coordinates are in one-to-one correspondence with integer vectors in ω-coordinates. The components of ω separate into components parallel and not parallel to Σ : A shift by 2π of any of the first N entries leaves the physical potential invariant, since this is a shift of φ by a 2πt i . The periodic domain in ω-coordinates is simply an N -cube of side-length 2π in the ω -plane. Since opposing sides of the periodic domain are identified, any point on Σ is readily identified with a corresponding point in the central periodic domain ω (mod 2πZ N ). It is likewise easy to recognize similar tiles in well-aligned theories: they correspond to P -cubes with similar ω ∦ -coordinates. The above is illustrated in Figure 4. Note finally that an advantage of using aligned coordinates is that distinct tiles within one periodic domain are labeled by distinct integer (P − N )-vectors m, 13 as P − N is the amount of t ∦ a s and integral shifts along the ω -directions do not change the tile.
Notational intermezzo -To simplify equations like (2.17), from now on we adopt the notation T αβ for a transformation between the components of a vector in two bases. The subscript pair is to be read left to right: "transform from α-coordinates to β-coordinates". In this notation (2.17) becomes The inverse transformation is simply When the transformation is between spaces of equal dimension as here (so that T is square), T φω = ( T ωφ ) −1 . When a transformation matrix's columns or rows separate in a useful way, as is the case here, the (rectangular) submatrices are labeled in the natural way : (2.20) We can apply this notation to the transformation (2.3) between the N -vector θ and the P -vector φ constrained to Σ, Combining the transformations (2.20) and (2.21) we can also relate the coordinates ω and θ, Finally, we can express the exact and approximate shift symmetries of the theory in terms of the θ-coordinates we started off with in (1.1) (in §2.2 and §2.3 these were only expressed in φ-coordinates). The exact symmetries are given by while the approximate symmetries are given by Returning to the tiling of the periodic domain, it is not hard to see that there are only finitely many tiles: consider sliding the center of a P -cube along any real linear combination of the t ∦ a (which corresponds to a line emanating from the origin in the (P − N )-dimensional ω ∦ -space) -a generalization of the illustration at the beginning of this section. As the perpendicular distance from the center of the cube to Σ is everincreasing along the line, eventually the cube will no longer intersect Σ. Thus there are only finitely many distinct tiles, which are labeled by certain integer (P − N )-vectors m. We now argue that the allowed values for m lie in a particular (compact) convex region C in ω ∦ -space. We define C by the set of all points in ω ∦ -space which correspond to centers of P -cubes in φ-space that have some intersection with Σ (note that this includes also vectors with irrational entries). This is the same as the collection of the ω ∦ -coordinates of all the points lying inside the P -cube of side-length 2π centered at the origin in φ-space, or yet in other terms, the orthogonal projection of that P -cube onto Σ ⊥ (in ω ∦ -coordinates). It follows that distinct tiles are labeled by distinct integer vectors m ∈ C with 14 (2.27) Quite clearly the extreme points 15 of C correspond to the projections of certain vertices 15 Extreme points of a convex set are points which are not interior points of any line segment belonging to the set. of the P -cube. Since a compact convex set equals the (closed) convex hull 16 of its extreme points by the Krein-Milman theorem [55,62], we have the following alternate definition of C : where Conv( · ) denotes the convex hull of a set. C is a polytope in the (P − N )dimensional m-space, illustrated in Figure 5. 17 To recap, the aligned coordinates ω are very convenient to identify a complete set of tiles covering exactly one periodic domain of the potential of (1.1). First, it suffices to fix ω = 0. This guarantees that only one periodic domain's worth of tiles will be counted, and the value of ω at a point is irrelevant to how or if a P -cube centered there intersects Σ. Therefore, all distinct tiles are labeled by the ω ∦ -coordinates of the centers of their P -cubes, and there is some compact and convex region C of ω ∦ -space that contains them all.
In general it is computationally very hard to identify the vertices that define the polytope C in (2.28). A simple sufficient condition for a lattice site 2πm to lie within the polytope is that its projections onto the φ-coordinate axes do not exceed π, while the inverse is not true. This subregion of the polytope is illustrated in Figure 5.
As mentioned above, another advantage of the ω-coordinates is that in well-aligned theories, regions of Σ corresponding to intersections with P -cubes that are close together in ω ∦ -space will be nearly identical, because they differ by only a small number of shifts by approximate discrete symmetries. The corresponding regions may be very far apart even after modding to one periodic domain in ω -space, because the shifts 2πP t ∦ a can be very long. Neighboring regions on Σ are not in general similar, while specific distant regions are. This is illustrated in Figure 1: there is no clear structure in the potential along an arbitrary ray in field space, but when considering lines that intersect widely separated tiles related by exact or approximate shift symmetries the potential becomes structured. 16 The convex hull of a set is the intersection of all convex sets containing that set. For the vertices of a polytope, the convex hull is the polytope. 17 In principle one could eliminate all the redundant points in the set of which the convex hull is being taken in (2.28), which correspond to vertices of cubes that lie on the constraint surface but which are not the only intersection of the P -cube with Σ (cf. Figure 5), and maintain the same polytope C. The amount of remaining (extreme) points of C is much less than the 2 P used in (2.28): we believe an upper bound scales only polynomially as P P −N −1 . However, we are not aware of a polynomial time Figure 5. Illustration of the ω ∦ -coordinates for every distinct tile on Σ, in an example where P = N + 2 = 8. The central six-sided region denotes the area bounded by the simple sufficient condition (2.29), while the full shaded region C contains all lattice sites whose Pcubes intersect the constraint surface. Blue crosses (large and small) denote the coordinates of centers of cubes which have a vertex that lies on Σ.

Phases
We now return to the phases δ appearing in the original Lagrangian (1.1), with potential Just as before, we promote the cosine arguments to P independent fields φ I that must be constrained to a hyperplane in order to reproduce the physical potential. That is, which defines a hyperplane parallel to Σ = colsp(Q), such that the constraint surface on which the auxiliary potential reproduces the physical potential (2.30) is Σ + δ.
To impose the constraint (2.31) in the action we introduce P − N Lagrange multipliers ν a : Here R is any (P − N ) × P matrix with the property that its row space is Σ ⊥ , the orthogonal complement to Σ. For instance, for R one could use any P − N linearly algorithm that can find C.
independent rows of the matrix P ⊥ = 1 P − P . The equations of motion for the ν a constrain φ − P ⊥ δ to be perpendicular to Σ ⊥ ; that is, they constrain φ to lie in Σ + P ⊥ δ = Σ + δ. In checking this the identity RP ⊥ = R is useful. Since P ⊥ δ is a vector in the (P − N )-dimensional subspace Σ ⊥ , the projection in (2.32) has already removed all but P − N of the original P phases (this reduction is simply the obvious freedom to continuously redefine the N fields θ in (2.30)). We will now demonstrate that in well-aligned theories, the remaining phases P ⊥ δ can be reduced to small values using the approximate shift symmetries of the theory. Consider the shift φ → φ + 2π T ω ∦ φ n δ , where n δ is an arbitrary integer (P − N )-vector. This shift is an exact symmetry of the cosines, but affects the constraint terms in (2.32) : To identify the integers n δ that minimize the remaining phases in (2.33), recall the relation between the aligned coordinates ω and φ-coordinates Using these definitions, the vector n δ that minimizes the remaining phases is where [. . . ] n.i. denotes the nearest integer vector. Using P ⊥ = P ⊥ T ω ∦ φ T φω ∦ P ⊥ , one can see that this choice of n δ reduces the phases to zero with an error bounded above by π P ⊥ T ω ∦ φ ∞ , which is small when the theory is well-aligned. 18 Explicitly, the field redefinition θ → θ + θ shift that reduces the phases, and the 18 The ∞ -norm of a matrix A can be defined as the maximum absolute row sum, A ∞ = max i j |A i j | . After the shift specified by (2.35), the remaining phase is δ r = 2πP ⊥ T ω ∦ φ α for some (P − N )-vector α with α ∞ ≤ 1/2. We may bound the magnitude of the largest component of this remaining phase by using the general inequality Av ∞ ≤ A ∞ v ∞ for any matrix A and vector v: δ r ∞ ≤ π P ⊥ T ω ∦ φ ∞ . To make the connection with our definition of well-aligned remaining phases are given with (2.21) and (2.35) by From now on, to simplify the discussion we will focus on well-aligned theories where we can neglect the phases. As we shall see in §3.6 this holds to good accuracy for large classes of axion theories with P 2N . An explicit example illustrating the use of the aligned lattice basis and phase reduction can be found in appendix D.

Minima and saddle points
In the previous section we identified the most suitable basis to identify the symmetries, treating all P terms in the potential on equal footing. These symmetries can be employed to systematically explore all potential minimum locations, which in principle yields all minima to arbitrary accuracy. In typical applications, however, it may be more efficient to include other data as well, such as the scale of each of the nonperturbative terms. In order to find the stable minima, for example, non-perturbative terms that are entirely subleading will have essentially no effect on the location of the minimum, but may split the degeneracy between minima as discussed in [26] and reviewed in §3.7. We will now discuss how the minima of a given axion theory can be determined to various degrees of accuracy.

Systematics of all minima
In §2 we described how to decompose the field space into tiles, each of which corresponds to an intersection of Σ with a distinct P -cube in φ-space centered on a lattice point of 2πZ P . The auxiliary potential V aux (φ) is identical inside all P -cubes, but they can have a distinct (or empty) intersection with the constraint surface, and therefore contain a distinct region of the physical potential V (θ). Just as already done in §2.5 we introduce P − N Lagrange multipliers ν a that enforce the constraint of the auxiliary potential to Σ. To find extrema on Σ within a tile labeled by m we then minimize the potential, within the corresponding P -cube φ = 2π T ω ∦ φ m + δφ, where δφ ≤ π. Requiring a vanishing gradient gives Since R is a set of row vectors that span Σ ⊥ , the first condition is the requirement that the gradient of V aux is perpendicular to Σ -in other words, that the gradient projected onto Σ vanishes. The second condition ensures that the point is in Σ. Solving the optimization problem (3.2) within all P -cubes labeled by m ∈ C yields all distinct extreme points, including all minima.

Neighboring minima
The aligned basis is ideally suited to identify similar regions of the axion potential. These tiles need not be close to one another in the physical field space, as we saw in §2.7. Recall that this is because the t ∦ a may contain large integers and therefore generate a large separation between the tiles' associated P -cubes (in φ-coordinates). This means that similar tiles are not generally immediate neighbors. For the purpose of this paper, we define immediately neighboring minima to be those whose P -cubes share a face or corner.
A minimum located at φ, which is within in the P -cube labeled by n = [φ/(2π)] n.i. , has 3 P neighboring P -cubes labeled by only some of which have a non-vanishing intersection with Σ. In order to identify all immediately neighboring minima we have to consider all neighboring P -cubes (3.3) that do intersect Σ. That means we need to consider all vectors e that correspond to lattice sites which satisfy 2π T φω ∦ P ⊥ (n + e) ∈ C (cf. (2.28)). Again, a simple sufficient condition is given by The precise locations of neighboring minima can be found by solving the optimization problem (3.2) with φ = 2π(n + e) + δφ for each candidate e.

Minima to quadratic order
In principle we can solve (3.2) and determine the location of all minima in the theory. However, although we can certainly solve (3.2) in any one P -cube to arbitrary accuracy, the very large number of domains make this impossible at large P . Instead, we will Figure 6. Contour plot of the relative error made by expanding cos(φ 1 )+cos(φ 2 ) to quadratic order. The dashed lines indicate, from inner to outer, the levels of 0.25, .5, 0.75 and 1 relative error, respectively. The maximum relative error is 1.5. The quadratic domain is indicated by the gray square of side length π while the periodic domain is the whole box. employ a quadratic expansion of the potential and the approximate symmetries to find an analytic expression for the approximate location of many minima.
The auxiliary potential has one single minimum located at the center of each Pcube, around which we can use a quadratic expansion. We will refer to the (somewhat arbitrary) region within which the quadratic expansion is a good approximation as the quadratic domain. Since the non-perturbative potential consists of simple cosines, we define the quadratic domain as such that the relative error made never exceeds 25% within that region. This choice might change if the underlying periodic function deviates from a cosine, but we chose it with some foresight in a way that this region will, in well-aligned theories, capture many minima of the full non-linear potential. The periodic and quadratic domains along with the relative error made by approximating cosines by a quadratic function are illustrated in Figure 6. The quadratic expansion dramatically simplifies the problem of finding minima. Consider a small displacement δφ from the auxiliary lattice point 2π T ω ∦ φ m. The potential in the corresponding quadratic domain evaluates to Figure 7. The blue shaded region illustrates the polytope C in ω ∦ -coordinates containing all lattice points corresponding to distinct tiles for an example with P = N + 2 = 8 (cf. Figure  5 where the same region is shown). The central six-sided region is bounded by the 2P halfplanes that determine the validity of the quadratic approximation (3.10). Tiles that contain a minimum as determined by numerical minimization are denoted by blue crosses, and nearly perfectly overlap with the quadratic region.
so a vanishing gradient is implied by the conditions (3.7) Solving this system of equations for the location of a minimum φ m on the constraint surface gives 19 where 1 − ∆ ⊥ is a non-orthogonal projector onto the constraint surface: Scanning over all m, we can check which φ m lie within the quadratic domain of their respective lattice site 2π T ω ∦ φ m; namely, within the intersection of 2P half-planes, (3.10) For these minima we will have succeeded at finding an approximate location of the constrained system. The energy density is given by where we used the specific choice R = T φω ∦ to simplify the expression. In Figure 7 we illustrate the tiles in which there is a minimum located by numerically minimizing the potential, along with those for which the quadratic approximation predicts a minimum (i.e. predicts a minimum located within the quadratic domain where the approximation is self-consistent). In general these sets of points are not immediately related, but for explicit examples we typically found a substantial overlap.
Finally, let us comment on the special case of equal scales Λ I = Λ and P = N + 1. In this case the energies of the minima are where c is an integer. The derivation of this result can be found in appendix E. The approximate signs in (3.12) denote the quadratic approximation which is valid for N vac ∼ √ det Q Q minima. No other approximations are made in (3.12).

A uniform sample over all minima
The most direct approach to find all minima is to consider the (P − N )-dimensional polytope C defined in (2.28). This region contains all vectors m ∈ Z P −N that label the tiles of one periodic domain of the potential constrained to Σ (see Figure 4). Minimizing the potential in all the tiles in C covers one entire periodic domain of the potential, yielding the set of all minima. However, the number of tiles is typically exponential in N or P , so even at moderately large values of these parameters this comprehensive approach becomes intractable. Instead, we can take advantage of the approximate symmetries of the potential that we identified. The auxiliary potential has minima only at the centers of each P -cube, which suggests that many tiles containing minima of the physical potential will occur in those P -cubes for which the constraint surface Σ passes through the quadratic domain near the center of the cube. Such tiles lie within a connected region in m-space; that is, a subset R ⊂ C. (This compactness property only applies for the specific tiling and labeling of the tiles defined by the aligned coordinates.) The region R contains exponentially fewer lattice sites than C, so the problem of comprehensively sampling R is much less computationally intensive, but still requires a number of computations that is exponential in P .
We can further simplify the problem if we are interested only in statistical properties of the potential for which a relatively small, but representative sample of distinct tilings suffices. Such a representative sample can be obtained by uniformly sampling over lattice sites contained within the polytope C, for example by performing a random walk that samples the polytope in a time polynomial in P − N [63][64][65]. A simpler but much more computationally intensive mechanism to uniformly sample a polytope would be to define a simple region that fully contains the polytope and sample that, rejecting any sample that is not contained in the polytope. Again, in order to determine a statistical sample of most of the minima it suffices to sample only the region R. The sampling techniques above apply for both the non-linear optimization problem of §3.1 and the analytic result for the energies of the minima in the quadratic approximation, (3.11).
We illustrate the distribution of energies at the minima for a specific theory obtained via three different approaches in Figure 8. The probability distribution of the energy density obtained by sampling a small number of all minima agrees well with the exact distribution. Furthermore, as expected, the quadratic approximation works best for relatively low minima, and becomes increasingly inaccurate for higher minima.
It is possible for the physical potential to have minima in tiles for which Σ does not intersect the quadratic domain (that is, tiles that are outside R). However such minima are rare, at least in the well-aligned regime N 1, P − N N . This can be understood qualitatively as follows. Each P -cube of sidelenght 2π can be decomposed various regions -the quadratic domain at the center φ ∞ < π/2 (which contains a fraction 2 −P of the volume of the cube), surrounded by rectilinear regions defined by allowing some of the components of φ to exceed π/2 in magnitude. The Hessian of the auxiliary potential (2.2) is This is positive definite precisely in the quadratic domain around the center. Because the Hessian of the physical potential is a projection of H aux onto Σ, any critical point of the physical potential that occurs in the quadratic domain must be a minimum.
For each component of φ that lies outside the quadratic domain, the auxiliary Hessian matrix (3.13) has an additional negative eigenvalue. Critical points of the physical potential in such regions can be minima (rather than saddle points) only if the negative eigenvalue(s) of the Hessian are projected out when the potential is constrained to Σ. For P − N N , only a small fraction P − N/P of the P directions are projected out and therefore it is unlikely (or impossible if P − N is less than the number of negative modes) that an critical point outside the quadratic domain will be a minimum. We have verified this expectation numerically. Therefore, in this regime the great majority of the exact minima lie within R.

Saddle points and maxima
The cosine function changes sign under a half-period shift, cos(φ) = − cos(φ+π), so the physical characteristics of maxima mirror those of minima. Due to the "1"s in (1.1), the global maximum has energy V max ≈ I 2Λ 4 I . (As explained below, this would be an equality if the phases were exactly δ I = π, and is a very good approximation in well-aligned theories.) All the techniques we apply to minima carry over to maxima and other critical points almost unchanged. In particular, the locations and energies of maxima and saddles can be found efficiently by using the approximate symmetries. In fact, the property of "well-alignedness" that allowed us to reduce the phases to values very close to zero allows us to set the phases to anything we like, subject to errors of order those in our original procedure. In other words, we can set the phases δ = δ arb + O P ⊥ T ω ∦ φ ∞ , where the P -vector δ arb denotes any arbitrary phases. Geometrically, this is possible because in well-aligned theories the angles between Σ and the grid in φ-space are small, so that Σ approaches very close to every distinct point in φ-space.
For studying maxima it is convenient to set all phases as close as possible to π.
Relating φ and θ in this way corresponds to shifting the centers of the P -cube tiling of φ-space so that they fall on global maxima rather than global minima -that is, the new cubes are centered on the corners of the original ones. With this change nearly every equation in this paper carries over unchanged or with the obvious changes from minima to maxima. In particular maxima have a quadratic domain defined in the same way as for minima in (3.5), as the cube of side-length π surrounding a now maximum of V aux , and they have identical statistics for their number, energies (except subtracted from V max rather than added to V min , etc. This trick of setting the phases to a desired value is also useful for studying saddle points of any given degree. For instance, to study saddles of degree one (critical points where the Hessian has 1 negative eigenvalue and N − 1 positive eigenvalues) we should set one phase equal to π and the rest as close as possible to zero. These points are the centers of the faces of the original cubes, and are points where the auxiliary Hessian (3.13) has precisely one negative mode (and the auxiliary potential has a degree one saddle). Tiles where Σ passes through the quadratic domain of these points often contain degree one saddles of the physical potential, and tiles that do not may not, for the same reason described in the previous subsection for the case of minima. Such degree one saddles occur between tiles that correspond to P -cubes that are neighbors along a face, and play a crucial role in the analysis of tunneling transitions (cf. [28]).

Estimates in random ensembles
In the previous section we discussed how to systematically enumerate and locate all minima of a given axion theory. We found an analytic expression for their energy densities in the quadratic approximation. We now turn to a discussion of the expected number and distribution of minima in ensembles of random axion theories.
We define these theories by ensembles of random integer charge matrices Q and energy scales Λ 4 I , as discussed in §1. To repeat our assumptions, Q is a P × N matrix of independent, identically distributed random integer entries with vanishing mean and standard deviation σ Q . We assume the universal limit of random matrix theory such that the precise distribution (including the fact that the entries of Q are integer) becomes irrelevant and all expectation values only depend on σ Q . This assumption roughly holds whenever 3/N of the entries of the charge matrix are non-vanishing, and the distribution of the entries is not heavy-tailed. The field space metric is irrelevant for the discussion in this section.
In general it is a very difficult task to analytically obtain the distribution of minima, or even the number of stable minima. Even in the quadratic approximation this problem amounts to determining the number of lattice sites within a non-trivial high-dimensional polytope defined by (3.10). In this section we therefore mostly restrict our attention to the simplest case of one single auxiliary field, P = N + 1 and equal scales Λ I = Λ, unless otherwise noted. We will find that the energy density (3.12) is valid for superexponentially many minima.
3.6.1 The quantity and energies of minima for P = N + 1 We now determine the number of distinct minima N vac that are well-approximated by the quadratic expansion in (3.12). This count is simply given by the number of sites of the P -dimensional, rank P − N = 1 sublattice δφ m = 2πP ⊥ T ω ∦ φ m that are contained within the quadratic domain δφ m ∞ ≤ π/2. When all phases in the original Lagrangian exactly vanish there exists a two-fold degeneracy of all minima. If the phases do not precisely vanish, they can be absorbed up to a finite remainder that is typically of order the change of δφ between similar minima, see §2.5. To further simplify the problem we assume identical scales Λ I = Λ. The number of distinct minima in the quadratic domain is then simply (3.14) Note that since Q has independent, identically distributed random entries, P ⊥ projects onto a random direction that is isotropically distributed, hence the vector P ⊥ T ω ∦ φ is isotropically distributed, with (E.2) its two-norm is given by and we defined the positive integer c as in (E.2). A vector that is distributed isotropically on the sphere consists of independent, normally distributed entries. Matching the expected norm of that vector to (3.15) then determines the distribution of the entries, where N (0, σ) denotes a normal distribution of mean zero and standard deviation σ. It is now straightforward to evaluate the median of the largest absolute entry of P ⊥ T ω ∦ φ , which yields the number of minima as 17) where (N ) ≡ √ 2 erf −1 (2 −1/N ) is the median largest absolute entry of an N -vector with entries that are unit normal distributed. In (3.17) we used that c is an order one integer, which we confirmed in extensive simulations for the ensembles under consideration.
The matrix Q Q is a real Wishart matrix, the determinant of which is distributed as the product of P − N chi-squared random variables with P , P − 1, . . . , P − N + 1 degrees of freedom, respectively [66], which gives for our case Finally, we find a simple expression for the expected number of minima, which is exponentially large in N in the universal regime, where at least a fraction 3/N of the entries in Q are non-vanishing. To give a sense of these numbers, with P = N + 1 = 150, and σ Q = 1, one obtains N vac ≈ 10 131 . Note that this result was only derived for P − N = 1, but as we discuss below we expect similar results to hold more generally (see (3.30)). The scaling with N is identical to that observed in [26] for a specific case where P N . Finally, let us estimate the energy levels at which the minima arise. To that end, we will assume that the distribution of minima can be well-approximated by all lattice sites that lie within the quadratic domain, i.e. |δφ I | ≤ π/2. Since the displacements δφ are proportional to P ⊥ T ω ∦ φ , which by (3.16) is roughly normal distributed, we can easily estimate the typical magnitude of the entries of the displacement vector, when the largest component is π/2, |δφ I | ≈ π 2 (P ) . (3.20) The maximum energy density in a minimum is therefore well-approximated by 20 where we used (N ) ≈ 3 for N ∼ 100 in the last approximation, and used the mean of the potential V = P i=1 Λ 4 I . Since the potential is simply quadratic the median 20 Note that this expression applies for general P . energy density is given by

Hessian eigenvalues
Beyond their energies, another interesting characteristic of critical points is the spectrum of eigenvalues of the Hessian. If the kinetic matrix is K ij = f 2 δ ij , the canonically normalized fields are Θ ≡ f θ. Defining Q ≡ K −1/2 Q = Q/f , the potential in canonically normalized coordinates is The eigenvalues of the Hessian of this potential at a critical point are then the masses of the canonical fields at that point. In §4 we will perform a more detailed analysis of the size of the tiles surrounding minima and the masses of the canonically normalized fields for various less trivial choices of kinetic matrix. Expanded around a minimum labeled by m, the Hessian of (3.23) is For simplicity let us take all Λ I = Λ. In that case Q Q is a Wishart matrix, and at large N the empirical density (i.e. the amount of eigenvalues in a small interval) follows the Marchenko-Pastur distribution [67]. For a Wishart matrix with standard deviation 1, the mean smallest eigenvalue is O(1/N ) while the largest is O(N ) (if P = N + 1 the precise values are 1/4N and 4N , respectively). The Marchenko-Pastur distribution has a sharp peak near the minimum and a long tail to larger values. The mean and median are both of order N . Putting the dimensions back in, this means the masses will range from σ 2 As mentioned previously, the Hessian on a degree one (one negative mode) saddle is of interest for questions involving tunneling from one minimum to another [28]. Such saddles are most easily analyzed by setting one phase to π and the rest to zero. This corresponds to considering points where Σ passes close to the center of one face of the P -cube (note that such points are degree one saddles of V aux ). Since Q is isotropic it does not matter which phase we set to π. Choosing the first one, it is easy to see that This is not a Wishart matrix and we are unaware of any analytic results for its eigenvalue spectrum. It has at most one negative eigenvalue. If the smallest eigenvalue λ min turns out to be positive, this critical point is in fact a minimum rather than a saddle. Numerically we established that the mean and standard deviation of the minimum eigenvalue are Hence, at large N is is extremely unlikely that there is no negative eigenvalue and the would-be degree one saddle is actually a minimum. This at least partially confirms the expectation explained in §3.4, that most local minima occur in tiles where the constraint surface intersects the quadratic domain of the minimum of V aux , rather than in neighboring regions such as these. Similarly most saddles of degree k will occur in regions where Σ intersects the quadratic domain of a degree k saddle of V aux . The relation between the number of saddles N (k) of degree k and the number of minima can then be estimated from (3.25):

Neighboring minima
In the previous section we estimated the total number of distinct, non-degenerate minima in the entire potential. For some questions one might however only be interested in the immediate neighborhood of one particular minimum. We therefore turn to determining the expected number of immediate neighboring minima, including degenerate ones. To allow for a simple estimate, consider all sites neighboring the origin, n = 0+e, with unit or zero entries for e as in (3.3). Let us count the immediate neighbors that lead to a minimum within the quadratic domain (3.5), (3.26) We verified numerically that in the universal limit the entries of the orthogonal projector matrix P ⊥ have variance (P − N )/P 2 . Using the central limit theorem and denoting The median largest entry of P ⊥ e evaluates to (P ) n e (P − N )/P , such that for P 2N a large fraction of the 3 P neighbors are stable minima, i.e. P ⊥ e ∞ π. Therefore, not only is the total number of minima extremely large, but each minimum has a vast number number of immediately neighboring minima.

Phases
As discussed in §2.5, in well-aligned axion theories the N exact and P − N approximate shift symmetries allows one to set N phases to precisely to zero and make the remaining P − N phases very small. The accuracy to which all phases can be eliminated, as measured by the largest remaining phase δ max , depends on how aligned the basis is: In order to get some analytical intuition for how well-aligned theories in our ensemble tend to be, let us assume that the vectors P ⊥ t ∦ a are orthogonal and are the shortest they could possibly be, as in (3.15), and that the volume of the cubic, but arbitrarily oriented periodic domain of the lattice generated by P ⊥ is given by det Q Q −1/2 .
reproducing (3.19) in the special case P = N + 1. We can furthermore easily obtain the largest components of the orthogonal projections of the aligned lattice basis, .
(3.31) Using Stirling's approximation we observe that when random matrix universality applies the basis is well-aligned for P ≈ N , and for order unity σ Q the basis ceases to be well-aligned with growing P at P 2N . We illustrate how the largest phase increases with the number of non-perturbative terms along with the analytic estimate (3.31) in Figure 9.

Band structure of subleading terms
Finally, let us address the last feature of the axion Lagrangian that we ignored so far; the subleading terms in the non-perturbative potential, denoted only by ellipses in (1.1). Explicitly, we have the full axion potential where we introduced a subleading potential −Λ 4 sl ≤ V sl ≤ Λ 4 sl of scale Λ 4 sl that is negligible compared to the leading P terms in the non-perturbative potential. Remember that we chose coordinates such that θ i → θ i + 2π are the discrete shift symmetries respected by the full theory, such that also V sl (θ) breaks any larger symmetries respected by the P leading terms to those fundamental symmetries. If there are any shift symmetries respected by the P leading terms that are broken by the subleading potential this will Potential (a.u.) result in a multiplicative increase in the number of distinct minima, as discussed in [26]. This effect is related to, but distinct from the mechanism discussed thus far. The leading potential is invariant under the P shifts (Qθ) I → (Qθ) I + 2π, which generates an N -dimensional lattice denoting the shift symmetries in terms of the θcoordinates. In the notation of §2 a basis for this lattice is given with (2.22) by B = T ω θ , i.e. the leading potential is (minimally) invariant under the N shifts θ → θ + 2πB i . The subleading potential, however, is only invariant under shifts on the integer lattice 2πZ N . This means that the periodic domain of the full potential contains N sl = 1/ √ det B B periodic domains of the leading potential (as this is the inverse volume of that domain). If the leading P terms in the non-perturbative potential contain N Q minima, each of these minima degenerates into N sl distinct minima due to the further symmetry breaking in the subleading potential. The total number of minima therefore becomes We illustrate this energy level splitting in Figure 10.
Of course we could have included the charges of the subleading terms in the P rows of the leading potential and found the corresponding aligned basis that includes all possible charges in the theory. However, the subleading potential is irrelevant for all practical purposes when identifying approximate shift symmetries of the potential and therefore would only introduce a spurious complication of the computational problem by increasing the dimensionality P of the auxiliary lattice. When identifying the shift symmetries according to §2 it is therefore important to identify which terms in (1.1) can safely be ignored for a problem at hand.

Aligned axion diameters
In this section we provide a systematic discussion of the theory in the vicinity of local minima; that is, within the tiles T n . Recall that the tiles are defined as regions within which none of the individual terms in the potential exceeds its maximum (see §1), and therefore define the characteristic scale on which the potential changes. Within each of these tiles the potential is relatively flat and hence provides for a natural environment to study large field inflation. Several specific cases were previously studied in the literature [19,20,[47][48][49][50][51][68][69][70][71][72][73][74][75][76][77]. In this section we describe a systematic approach to determine the size of an arbitrary tile. We restrict our discussion to well-aligned theories (cf. §2) where we can set all P phases in (1.1) to zero to good accuracy by a shift of θ.
Recall the Lagrangian (1.1) of a well-aligned axion theory, where we retain only P ≥ N leading terms in the potential. In this section we are interested in invariant field space distances, so it is convenient to introduce canonically normalized fields Θ, where √ K is the positive matrix square root. 21 The Lagrangian in canonically normalized coordinates reads where the canonical charges are related to the integer charges by The matrix square root satisfies √ K √ K = K, and is related to the matrix S K containing the (column) eigenvectors of K and its eigenvalues f 2 i by Figure 11. Illustration of the tile T (3 0) of the potential shown in Figure 4 in canonical fields Θ. The solid arrows denote the field ranges R ± along the two lightest directions in the vicinity of the minimum, while the dashed arrow denotes the diameter D of the tile.
In canonically normalized coordinates the tiles (2.7) are given by (4.5) The tiles are polytopes in N dimensions defined by the intersection of 2P half-planes 22 , and spherical shells determine the surfaces of constant invariant distance to the center of the sphere. We illustrate this polytope in Figure 11.

Diameters and field ranges in well-aligned theories
To characterize the scale of these domains we will consider two distinct measures of size. One is the diameter D n , that is, the length of the longest straight line contained in the tile T n . It is clear that this line will run between two vertices of the polytope, and that the tile with the largest diameter is the one at the origin n = 0. If we denote the vertices of the polytope by d n,l , we have therefore the corresponding diameter Note that the diameters of the tiles depend only on the charge and kinetic matrices of the theory (not on the couplings Λ 4 I ). The expression (4.6) defines a unique size for every tile, but in practice there are exponentially many vertices, making it hard to evaluate. Furthermore, the low energy physics in the vicinity of a minimum is generally dominated by the lightest degrees of freedom, which do not necessarily coincide with the axis of largest diameter. This motivates our second characterization of the scale (which does depend on the couplings), namely the field range R n± within T n along the lightest direction -the line defined by the eigenvectorΨ H of smallest eigenvalue of the Hessian H at the minimum in the tile. The Hessian is given by More precisely we define two field ranges R n (±Ψ H ) ≡ R n± as the canonically normalized field space distance between a minimum at Θ n and the boundary of the corresponding tile in the least massive directions ±Ψ H . Solving the equation defining the boundary of the tile, yields the field ranges Note that (4.9) holds for the field range R n (Θ) along an arbitrary directionΘ.
In well-aligned theories the sizes of exponentially many tiles T n are well-approximated by the size of the tile T 0 containing the origin Θ = 0 (or, when P = N , this is the only tile), so we will focus our attention on this last tile in the remainder of this section. In T 0 the expressions for the diameter and field ranges simplify. For any unit N -vectorΘ (in particular the lightest directions ±Ψ H ) we have which indeed follows from the general expression (4.9). The diameter can alternatively be expressed as This last expression for the diameter can be used to derive bounds on it in an arbitrary theory, namely 23 2π λ min (|Q|) where λ min (|Q|) denotes the smallest eigenvalue of the matrix |Q| ≡ Q Q, i.e. it is the smallest singular value of Q.

N-flation, lattice and kinetic alignment
In the previous sections we defined two notions of size of the axion field space in the vicinity of minima. To illustrate these definitions we now apply them to three special cases, focussing in particular on how the diameter in multi-axion theories may be enhanced compared to the single-axion diameter 2πf . We will consider N-flation [19], lattice 24 (or KNP) alignment [20], and finally kinetic alignment [47]. Each of these models was originally restricted to P = N non-perturbative terms, but we will generalize the main ideas behind lattice and kinetic alignment to well-aligned theories with P ≥ N . As mentioned in §4.1 in well-aligned theories it suffices to consider the representative tile T 0 around the origin, which we will do in the following.

N-flation
As our first example we consider the N -axion theory with diagonal kinetic matrix K = diag(f 2 I ) and P = N trivial charges Q = 1 N [19]. In terms of canonical coordinates the Lagrangian is given by This theory has only one distinct tile and has one minimum at the origin Θ = 0, as illustrated in Figure 12. The mass matrix is diagonal, H = diag(Λ 4 I /f 2 I ) ∝ 1 N , where for simplicity we selected the scales Λ I such that all masses are equal.
The tile consists of an N -dimensional hyperrectangle with side-lengths 2πf I , which yields the diameter as the Pythagorean sum For fixed f max ≡ max I {f I }, the largest possible diameter is obtained when all metric eigenvalues are equal, f I = f max : D = 2π √ N f max . By contrast, if there are large hierarchies in the f I the diameter is D > ∼ 2πf max . Since all directions are equally massive, the lightest direction is degenerate and the field ranges accessible from the minimum at the origin are just half of the diameter, R ± = D/2. In the cosmological context this scenario is known as N-flation, a particular realization of assisted inflation [78]: while none of the individual fields Θ I traverse a displacement larger than f max , the simultaneous displacement of N fields realizes an invariant field range parametrically as large as √ N f max .

Lattice alignment
We now consider the lattice alignment (or KNP) mechanism, first discussed by Kim, Nilles and Peloso [20] for the special case N = P . Lattice alignment relies on a small singular value of the charge matrix. For simplicity, we assume a kinetic matrix proportional to the identity, K = f 2 1 N , while the P ≥ N integer charges are left general, Figure 13. Illustration of the tile T for lattice alignment. The diameter is enhanced by the inverse of the smallest singular value of the canonical charge matrix.
Recall the general bound (4.12) on the diameter of the tile T 0 , .
Lattice alignment is the observation that one can arbitrarily enhance the field range in these theories compared to the single-field 2πf by decreasing the smallest eigenvalue λ min of |Q| = √ Q Q. 25 As λ min decreases, the lower bound on the diameter increases. We illustrate this phenomenon in Figure 13.

Kinetic alignment
Finally let us discuss models with kinetic alignment [47]. In the original discussion one assumed P = N , a trivial charge matrix Q = 1 N and a general kinetic matrix K, but the definition of kinetic alignment can just as easily be given in the more general context of well-aligned theories with P ≥ N and K, Q unspecified. Note that the upper bound in (4.12) is saturated if in other words, if the direction defined by QΨ |Q| aligns with a diagonal of the P -cube, andΨ |Q| denotes the eigenvector of |Q| with smallest eigenvalue. For this to be possible it is in particular necessary that the constraint surface Σ contains a diagonal of the P -cube. The sufficient condition (4.17) to saturate the upper bound in (4.12) is the extension of the original kinetic alignment proposal to arbitrary well-aligned theories with P ≥ N . In models with (perfect) kinetic alignment we therefore have a diameter D = 2π . (4.18) One might naively expect that alignment with a diagonal requires a great amount of fine-tuning in the canonical charge matrix Q. In large dimensions N, P 1, however, the converse is true: a P -cube has many more vertices (2 P ) than faces (2P ). Therefore, an isotropically oriented vector within an isotropically oriented constraint surface Σ is much more likely be pointing towards a vertex of a P -hypercube than towards a face. In §4.3 we will make this expectation more precise and demonstrate that in broad classes of random axion theories the relation (4.17) is indeed approximately satisfied, and kinetic alignment is generic.
For completeness let us consider the original model where P = N , Q = 1 N . Here Q = K −1/2 , and the eigenvectorΨ K −1/2 is equal to the eigenvector corresponding to the largest eigenvalue f 2 max of K. The vector QΨ K −1/2 =Ψ K −1/2 /f max points towards a diagonal whenΨ K −1/2 does, which is the condition for (perfect) kinetic alignment as given in [47]. In this case Note that here the diameter only depends on the largest metric eigenvalue f 2 max , and is independent of all other f I ≤ f max . When there are large hierarchies in the metric eigenvalues, the diameter (4.19) in kinetically aligned theories is larger by a factor of √ N relative to the diameter (4.14) of the N-flation scenario. The enhancement of the diameter by √ N relative to f max was originally referred to as kinetic alignment, which we illustrate in Figure 14. More generally we have (4.18): the enhancement of the diameter by √ P relative to the inverse of the smallest singular value of Q.

Alignment in random ensembles
We now discuss diameters and field ranges in ensembles of random axion theories. 26 The Lagrangian is given by and we study random ensembles of kinetic matrices K, integer charges Q, and dynamical scales Λ 4 I introduced in §1 and used already in §3.6. We work at 2N P ≥ N 1, where the theory (1.1) is generically very well-aligned so that it is consistent set the phases to zero in (4.20). Furthermore as in the previous section we restrict our attention to the tile around the global minimum at Θ = 0, because at least for minima in the quadratic regime (see §3.4) the diameters and field ranges along particular directions are similar up to O(1) factors.
Before discussing the details we first briefly review the main results for diameters in random axion theories. With (4.18) the diameter of a tile in a well-aligned theory is given by where, again, |Q| = Q Q. The diameter is enhanced by √ P due to the fact that a random P -vector is very likely to be aligned with a vertex of the P -cube periodic domain of the auxiliary lattice rather than with one of its faces (kinetic alignment, see §4.2.3), as well as by 1/λ min (|Q|) (lattice alignment, cf. §4.2.2).
In the universal limit the matrix Q Q resembles a Wishart matrix so we expect its eigenvalue distribution to depend only on P , N and the scale of the charges. In the simple case of K = f 2 1 N and random Q this scale is (σ Q /f ) 2 . We can substitute a naive random matrix theory expectation [79] for the smallest eigenvalue in (4.21) and obtain where the last approximate equality is valid when P − N N . For aligned theories where P ≈ N , there are three parametric enhancements that each can scale as ∼ √ N . The first factor of √ P ≈ √ N in (4.22) is due to kinetic alignment and depends on the fact that the canonical charge matrix is isotropic. The second factor can arise from the sparsity of the charge matrix, encoded in σ Q , which may be as small as ≈ 3/ √ N , while retaining universality. The last factor is due to the eigenvalue distribution of a Wishart matrix and leads to two different parametric scalings: when P − N = constant and N is large, we have a third parametric enhancement of √ N , while for P − N ∝ N and N large the third term decreases the diameter parametrically as 1/ √ N . Using the least possible entries in the integer charge matrix we therefore have the following scaling with N : both valid when N is sufficiently large. Even though these naive expectations are very crude, they turn out to accurately represent the mean diameter in a broad class of random models as we show below. Furthermore, we will find that the field range R 0± along the lightest direction scales with N in a manner very similar to the diameter, and that this scaling is robust even when there are large hierarchies present in the dynamical scales Λ 4 I . The simple expectation (4.23) from random matrix theory can then be compared to fundamental theories with axions, such as compactifications of string theory [10].
Finally, via (4.21) the results (4.22) and (4.23) can also be applied to determine the scaling of the smallest eigenvalue m 2 of the Hessian matrix (4.7) -that is, the mass-squared of the lightest field around a minimum. Up to O(1) factors and with all Λ I = Λ equal, (4.24)

Diameter estimates
To estimate the diameter in random axion theories, recall from §4.2.3 that perfect kinetic alignment implies that the diameter D 0 of the tile is given by twice the field range along the eigenvectorΨ |Q| . In the random theories we introduced (modulo a caveat on the kinetic matrices K that we will discuss) we claim that that kinetic alignment is well-satisfied at large P . More precisely 2π is satisfied with ever-increasing probability as P → ∞. 27 This hinges on the following fact: the images of eigenvectors of Q Q under Q are uniformly distributed on the unit P -sphere and therefore their entries are approximately normally distributed. In other words they are delocalized. 28 The asymptotic exactness of the relation (4.25) as P → ∞ provides us with a reliable lower bound on the diameter D 0 at large P , Here we have extracted a scale σ Q from the entries in Q = Q K −1/2 via the definition Q =Q K −1/2 , where the entries ofQ are distributed according to N (0, 1) in the universal regime. This separates a trivial scaling factor σ Q appearing in λ min (|Q|) from its more intrinsic scaling properties with N, P . The lower bound (4.26) is significant because it differs from an upper bound on D 0 (cf. §4.1) only by the logarithmic factor (P ) : An intuitive understanding of (4.25) was given in §4.2.3: in a large-dimensional P -cube the number of vertices vastly outnumbers the number of faces, thus it is much more likely for a vector to (approximately) point towards a vertex than towards a face. More quantitatively, the matrix Q Q is rotationally invariant (i.e. its form is preserved under Q → OQ with O a P × P orthogonal matrix) so from general considerations in random matrix ensembles [80] we expect the eigenvectors of Q Q (includingΨ |Q| ) to be delocalized (see also [35]). Furthermore, provided K does not introduce significant anisotropy, the vector QΨ |Q| will be delocalized as well.
In the following three sections we will verify the delocalization of QΨ |Q| in various ensembles of kinetic matrices. Having established this, we will use (4.26) to analytically obtain a reliable lower bound on the diameter of T 0 in the different ensembles. We will examine two distinct regimes : 27 Recall the definition of (P ) in (3.17), (P ) = √ 2 erf −1 (2 −1/P ) ≈ √ 2 log P at large P . 28 In fact, the eigenvectors of Q Q themselves are delocalized (in particularΨ |Q| ), but this is of subordinate relevance.  Figure 15. Mean field range alongΨ |Q| for three ensembles of kinetic matrices K. The field range alongΨ |Q| is a significant lower bound on the diameter D 0 as explained in §4. • "hard edge" : as N → ∞, P − N is held fixed, • "soft edge" : as N → ∞, N/P is held fixed.

Unit metric
In random axion theories where K = f 2 1 N with f a fixed scale, we can formulate the most precise results. In this case Q Q = (σ Q /f ) 2Q Q , is well-described by a Wishart matrix. At large N , the eigenvectorΨ |Q| is delocalized and we further verified numerically that the P -vectors QΨ |Q| are delocalized as well. The field range alonĝ Ψ |Q| is therefore given via (4.26) by For any specific N, P the probability distribution of the smallest eigenvalue in the Wishart ensemble can be calculated (either recursively [81,82] or directly [83]). When P − N is held fixed as N tends to infinity, the asymptotics of its mean satisfy λ min (Q Q ) ∼ 1/N as N → ∞, hence the name "hard edge" statistics, as the smallest eigenvalue approaches the constraint that the matrix is positive definite. The knowledge of the distribution of the smallest eigenvalue can be translated to calculate the probability distribution of a lower bound on the diameter D 0 via (4.28). In general only the first P − N moments of the probability distribution of the diameter alongΨ |Q| are finite, while higher moments diverge. 29 More specifically, at large N , we find for the zth moment (z ≤ P − N ): for some constants c(z, P − N ). 30 As σ Q ∼ 1/ √ N in sparse models, we see the mean field range alongΨ |Q| scales as N 3/2 for all P − N > 0, up to a logarithmic correction factor. 31 The standard deviation also exhibits this N 3/2 scaling with N . 32 It may be instructive to recapitulate how the N 3/2 scaling arises, namely as the product of three factors N 1/2 with different origins: one comes from the alignment of QΨ |Q| along a diagonal of the P -hypercube (kinetic alignment), another from the square root of the smallest eigenvalue of a Wishart matrix scaling like 1/ √ N in the large N -limit where P − N is held fixed (lattice alignment), and finally a √ N arising from the assumed sparsity of the charge matrices.
For soft edge statistics where N/P is held fixed, we may use the result of [79], which implies λ min (|Q|) → ( P/N − 1) √ N . We only discuss the mean field range alongΨ |Q| in this case. We find So in sparse models where the amount of non-perturbative effects scales linearly with N , the mean field range alongΨ |Q| (and hence, typically, the diameter D 0 ) is enhanced only by the minimal amount N 1/2 compared to the single-axion f . 29 In particular for P = N the distribution of the diameter alongΨ |Q| is heavy-tailed. In that case one can show that the median diameter behaves as c(1, 0)(2πf N/ (P )σ Q ) with c(1, 0) = ( √ log 4 + 1 − 1) −1 ≈ 1.84. 30 We found c(1, 1) = π/2 ≈ 1.25, c(1, 28. To our knowledge there is no closed-form expression for c(z, P − N ) -generic values must be determined numerically. This can be done algorithmically [81]. 31 This parametrically improves the lower bound on the diameter of the tile obtained in [35] for this random model, where for P − N > 0 one found a lower bound that scales only linearly with N . 32 However, the probability distribution on R 0 (Ψ |Q| ) is super-exponentially suppressed at small values, and only power-like suppressed at large values. Thus the field range alongΨ |Q| may easily become larger than the mean, but not smaller. This holds in the hard edge limit for all P ≥ N . For the P = N case see also the discussion in appendix A of [35].

Wishart metric
Here we discuss the random ensemble where K is a Wishart matrix, constrained to have largest eigenvalue equal to a fixed scale f 2 max . Specifically, we draw the entries of a matrix A ∈ R N ×N from a normal distribution with zero mean and unit variance, and form the combination A A. After, we rescale this matrix to have largest eigenvalue f 2 max . As in the case of unit kinetic matrix, we find the eigenvectorΨ |Q| to be delocalized to good accuracy for all P ≥ N . For constant P − N as N → ∞, we numerically established that to good approximation (in the distributional sense). For fixed N/P , we found for some profile g(N/P ) which decreases monotonically from 1 at N/P = 0 to 1/2 at N/P = 1. This allows us use the results of the K ∝ 1 ensemble discussed in the previous section. Thus, in the hard edge limit, the field range alongΨ |Q| satisfies while for soft edge asymptotics we find As in the ensemble with K ∝ 1, the mean field range alongΨ |Q| scales as N 3/2 in the hard edge limit and as N 1/2 for soft edge statistics, up to a logarithmic correction factor.

Heavy-tailed metric
Finally we consider an example of a heavy-tailed ensemble of kinetic matrices K, i.e., large fluctuations of its eigenvalues are polynomially suppressed (as opposed to exponentially, as in the Wishart ensemble). Specifically, we consider inverse-Wishart matrices K, with largest eigenvalue rescaled to f 2 max . (We rescale the combination (A A) −1 of the previous section.) 33 Once againΨ |Q| is delocalized to good approximation, and we established numerically that the mean field range alongΨ |Q| qualitatively behaves as and as In the hard edge case, a factor of √ N is lost compared to the unit and Wishart kinetic matrix ensembles because the distribution of the smallest singular value of the canonical charge matrix is qualitatively different. In particular, we found An intuitive explanation of this goes as follows: in the ensemble where the kinetic matrix is a rescaled Wishart matrix, K = A A/λ max (A A), the largest eigenvalue λ max is not too different from a typical eigenvalue; the ratio λ max /median i (λ i ) is of order 1. This is because large eigenvalues occur with exponentially small probability. So a typical eigenvalue of K is expected to be broadly distributed on the interval [0, f 2 max ]. In other words,

Variable couplings: dynamic alignment
To investigate the dynamics we consider the field range along the lightest direction Ψ H emanating from Θ = 0, as discussed in §4.1. From H = Q diag (Λ 4 I ) Q we observe that if the couplings Λ 4 I are not all equal, this is not the same direction as the previously consideredΨ |Q| (which we showed was well-aligned with the direction providing the actual diameter D 0 of the tile T 0 due to kinetic alignment). However if the two directions are sufficiently aligned the available field range within the tile along each will be similar. Below we illustrate in two specific examples how much the couplings may deviate from overall equality before this alignment fails and the expected field ranges alongΨ |Q| andΨ H become parametrically different in N . In these examples we find thatΨ |Q| andΨ H remain well-aligned although the couplings may differ from one another to a certain degree. While a general analysis of the alignment between the lightest and the kinematic direction lies beyond the scope of this work, 34 the insensitivity of this alignment to the distribution of couplings has been called dynamic alignment in a previous discussion [49].
For simplicity we consider ensembles with trivial kinetic matrix, K = f 2 1, and consider two qualitatively different hierarchies in the couplings. These are illustrated in Figure 16. In a first example, assume the couplings Λ 4 I are uniformly distributed on the interval [Λ 4 min , 1]. As we dial down Λ 4 min from one to zero, we expect the field range alongΨ H to diminish with respect to the field range alongΨ |Q| . In a second example, consider the case where the couplings are log-uniformly distributed on the interval [10 −10 , 1]. Although the couplings may wildly differ from one another, the expected field range along the lightest direction is very robust.

Conclusions
In this paper we presented the details of a novel formalism that allows us to analyze a class of periodic functions of many variables, focusing on those of the form (2). Our technique identifies the exact periods, breaks up a unit cell of the resulting lattice into conveniently labelled tiles, and makes it possible to identify approximate shift symmetries. This last feature is very powerful, because (at least in the large N random ensembles we consider) these approximate symmetries are extremely close to exact. As a result if we analyze one region of the function, the results can be translated to exponentially many other regions with exponential accuracy. In particular the number of critical points is exponentially large, and the spacing of their energy levels is exponentially fine.
We employ this technology to determine the number and characteristics of critical points of the potential (2), and to analyze the vicinity of a typical minimum. For N ∼ 100 there are generically (in our ensembles) an enormous number of distinct minima, each with a unique vacuum energy. This number can be larger than 10 120 and the distribution of energies is smooth, so this theory provides values for the vacuum energy consistent with observation even if the energy scales in the potential are close to the Planck scale [26,28]. Furthermore we find that there is a range of masses for the canonically normalized fields that is enhanced by powers of N . The lightest of these provide long gentle slopes that may turn out to be suitable for large-field inflation if the energy scales are high (or small field inflation if they are lower) [28]. Lastly, the characteristics of the critical points are such that the barriers between basins of attractions of adjacent minima tend to be quite thin. We will explore the consequences of this for tunneling transitions in [27], where we will also discuss the natural candidate for dark matter that arises in these theories. valuable discussions. OJ is supported by a James Arthur fellowship. The work of MK is supported in part by the NSF through grant PHY-1214302, and he acknowledges membership at the NYU-ECNU Joint Physics Research Institute in Shanghai. The work of TB and KE was supported by DOE under grant no. DE-SC0011941.

A Separating flat directions in V
In this appendix we discuss how to separate flat directions in V (that is, directions in field space with exactly zero potential) from the non-flat directions. Such directions are present when the charge matrix Q has rank R < N . This can happen either when P < N (in which case necessarily R ≤ P < N ) or because Q is not full rank. After the separation procedure described in this appendix, we are left with N − R massless fields decoupled from a reduced theory of R axions with a full rank charge matrix, to which the techniques in the bulk of our paper apply.
We start with the N -axion theory (1.1), and change coordinates to canonically normalized fields Θ = √ Kθ, in which the charge matrix takes the form Q = Q K −1/2 . Suppose the rank of Q is R < N . This happens either when P < N , or when P ≥ N but not all columns of Q are linearly independent. Then there are L = N −R flat directions; moving along these directions does not change V . In other words, the null space of Q, ker(Q), is L-dimensional, which is the same as the dimension of ker(Q). Find an orthonormal basis t 1 , t 2 , . . . , t L of ker(Q), and extend it to a basis of R N by the adherence of R vectors t L+1 , t L+2 , . . . , t N (note these are generally not integer-valued vectors). Now define new coordinates Ω via the rule where we have split the N -vector Ω into a piece of length L and a piece of length R, and the matrices T Ω L Θ and T Ω R Θ are composed by placing the t 1,...,L respectively the t L+1,...,N on consecutive columns. Note that T ΩΘ is an orthogonal matrix. In these coordinates the flat directions are manifestly separated from the non-flat ones. Indeed, since only the R fields Ω R appear in the potential. Furthermore the kinetic term reads such that the massless fields Ω L decouple. In (A.3) Q R is a full rank P × R matrix (but it is not integer-valued, in general). For the final step, note that there exists an invertible R × R matrix R such that Q R R is an integer-valued matrix. 35 Transforming to coordinates Ω R = R Ξ R , the Lagrangian in Ξ R -coordinates has a kinetic matrix R R and an integer-valued, full rank charge matrix Q R R. As P ≥ R, there are effectively more non-perturbative contributions to the potential than axions. So one can apply the techniques developed in the body of this work to the reduced system of R axions Ξ R .

B Eliminating very massive axions
In this appendix we discuss how to eliminate axions that receive large masses, e.g. due to their coupling to classical sources. At low energies these axions are effectively fixed to a certain value. Specifically let us assume the N -axion theory (1.1), is supplemented with L < N such classical sources, where the rows of a full rank L × N matrix C specify which axion combinations couple to each source, 36 and that these directions are fixed according to where δ C is a certain L-vector. We would like to perform a (linear) coordinate transformation θ → ξ that disentangles the massive directions from the others in (B.1). In order to retain the same discrete shift symmetries in the ξ-basis as in the θ-basis, such a transformation must be unimodular. To construct it, note that the directions unaffected by the classical sources lie in ker(C), which has dimension N − L ≡ R. The intersection ker(C) ∩ Z N is thus a lattice of rank R. 37 Extend a basis t 1 , t 2 , . . . , t R of this lattice to a 35 This is because colsp(Q R ) = colsp(Q), and the projector onto a linear subspace is basisindependent. So we know the projector onto colsp(Q R ) has rational entries, implying the existence of R (see also footnote 8). 36 In general identical combinations may couple to more than one source, implying that C would not be full rank, or L may be greater than N . These cases are easily dealt with. 37 We assume the projector onto ker(C) contains only rational entries. basis for Z N by adding L integer vectors t R+1 , t R+2 , . . . , t N (see appendix C for a proof that this can always be done). The N × N matrix T ξθ = ( T ξ R θ | T ξ L θ ) is unimodular, where T ξ R θ ( T ξ L θ ) is formed by placing the t 1 , t 2 , . . . , t R (t R+1 , t R+2 , . . . , t N ) on consecutive columns. If we denote the first R components of the N -vector ξ by ξ R and the final L = N − R by ξ L , and relate the coordinates ξ to θ by θ = T ξθ ξ, we have Cθ = C T ξ L θ ξ L , and thus via (B.2) The Lagrangian (1.1) then effectively becomes The axions that couple to classical sources have been eliminated while preserving the original form of the theory.

C Extending a sublattice basis
In this appendix we prove that any basis for the rank N sublattice defined by the intersection of an N -dimensional linear subspace Σ with the integer lattice Z P can be extended to a basis for the full integer lattice. In particular, one can always supplement the N P -vectors t i with P − N additional vectors t ∦ a to form a basis for Z P (cf. §2). Before giving the proof, it is perhaps worth giving an example of a sublattice that cannot be extended this way. First, recall that since we are discussing lattices, one should consider only linear combinations of the basis vectors with integer coefficients. Now consider the rank one sublattice of Z 2 that is the even integers along the x-axis; that is, the sublattice generated by the vector (2, 0). It is clear that this cannot be extended to a basis for Z 2 by the addition of any vector. However, note that this sublattice is not the intersection of any linear subspace with Z 2 -the intersection of the x-axis with Z 2 is generated by the vector (1, 0).
The proof is as follows: 38 every rank P lattice L can be thought of as a finitely generated free Abelian group (under addition of the lattice vectors). Any rank N ≤ P sublattice L of L is then a subgroup. The structure theorem for finitely generated Abelian groups implies that there always exists a special basis B ≡ {b 1 , ..., b P } for L with the property that {a 1 b 1 , ..., a N b N } is a basis for L , where {a 1 , ...a N } are a set of integers with the property that each divides the next. However if L is the intersection of a linear subspace Σ with the lattice L, then a i b i ∈ L implies b i ∈ L . Therefore {b 1 , ..., b N } must in fact be a basis for L (because it generates {a 1 b 1 , ..., a N b N }). But this basis can trivially be extended to the basis B for L by appending {b N +1 , ..., b P }.
To see that any basis for L can be extended to a basis for L, note that any basis for L is related to any other basis (for instance, {b 1 , ..., b N }) by some N × N unimodular matrix. But any such N × N unimodular matrix can obviously be extended to a blockdiagonal P × P unimodular matrix. Acting with this matrix on B gives the extended basis.

D An explicit example
In this appendix we illustrate the construction of the aligned lattice basis and the reduction of the relative phases in an explicit example with P = N + 1 = 3. In particular, we consider a theory with charges and phases and take the non-perturbative scales Λ 4 I to be identical for simplicity. The potential is therefore V (θ) = Λ 4 3 − cos θ 1 + θ 2 + 2.04 − cos 2θ 1 − 3θ 2 + 6.20 − cos −3θ 1 + 4.16 . (D. 2) The auxiliary coordinates φ are constrained by the condition P ⊥ φ = δ to reproduce (D.2) on-shell, where the orthogonal projector onto the orthogonal complement of the constraint surface Σ is given by As expected, the rank of the orthogonal projector is P − N = 1 in this example. Employing the LLL lattice reduction algorithm [54,56], we find the aligned basis vectors, It is easy to verify that the first two basis vectors, T ω φ are parallel to Σ, while the last basis vector T ω ∦ φ has a very small projection onto the orthogonal complement of Σ, Note that the length of the shortest lattice vector of the lattice generated by P ⊥ agrees with 1/ det(Q Q) = 1/ √ 115, as expected. The inverse transformation is given by We now can express the potential in terms of the aligned coordinates ω = T φω φ, V (ω ) = Λ 4 3 − cos 2.04 − ω 1 − ω 2 − cos 3ω 1 − 2ω 2 + 6.20 − cos −3ω 2 + 4.16 .
(D.7) Finally, we note that considering the shift φ → φ + 2π T ω ∦ φ n δ the constraint equation can be rewritten as P ⊥ (φ + 2π T ω ∦ φ n δ − P ⊥ δ) = 0 , E Derivation of (3.12) In this appendix we derive (3.12), starting from (3.6). For the case of equal Λ I , one has ∆ ⊥ = P ⊥ = R RR −1 R. Using (3.6), it is easy to see that energies at the minima can be written To derive (E.2) we first used that T ω ∦ φ P ⊥ T ω ∦ φ = T ω ∦ φ P ⊥ 2 T ω ∦ φ = P ⊥ T ω ∦ φ P ⊥ T ω ∦ φ is a number, because P − N = 1, and thus equal to the determinant of the matrices that form it. Then we use the identity To see this note that det T ωφ = 1 where we used the invariance of the determinant under adding linear combinations of some columns to other columns. Then, by computing det T ωφ T ωφ and using that the columns of T ω φ and P ⊥ T ω ∦ φ are orthogonal, one obtains (E.3).
F Derivation of (4.12) To derive the bounds (4.12) on the diameter of the tile containing the origin, note first the general inequality for P -vectors v : The field range inside T 0 along any specific direction is a lower bound for (half of) the diameter (cf. (4.11)). This holds in particular for the field range along the eigenvector Ψ |Q| corresponding to the smallest eigenvalue of |Q| ≡ Q Q. Combining this with the left-most inequality in (F.1) we have therefore . 39 (F.2) On the other hand, each P -vector QΘ is subject to the right-most inequality in (F.1). Therefore the same inequality holds between the maxima of both sides over allΘ ∈ S N −1 . Using this relation in the expression (4.11) for the diameter, we arrive at where we used λ min (|Q|) = QΨ |Q| 2 ≤ QΘ 2 for allΘ ∈ S N −1 .

G Axion potentials and Gaussian random fields
In the body of this work we developed tools that allow for a systematic approach to general (multi-)axion theories. This analytic approach is most powerful for well-aligned axion theories. Unfortunately, when the number P of non-trivial non-perturbative terms becomes very large, alignment typically fails and all approximate shift symmetries are broken. We now turn to a complimentary description of the axion potential, in terms of a Gaussian random field, that is valid precisely when the theory ceases to be well-aligned, and again allows for a simple statistical description of the theory. In particular, we find that at large P N the potential statistics are typically well approximated by an isotropic Gaussian random field with Gaussian covariance function, henceforth referred to by the shorthand Gaussian field. The statistical properties and the distribution of minima in Gaussian fields is very well understood, and efficient numerical algorithms exist to numerically sample such fields [35]. Therefore, by providing an effective description of the axion potential in terms of Gaussian fields a host of tools become available to study axion theories.

G.1 Gaussian fields
Before discussing the connection to axion potentials, let us review some of the basic properties of a stationary, isotropic Gaussian field V G (χ) in N dimensions χ i . We will 39 It is not hard to see that this lower bound can never be saturated. assume a Gaussian covariance function for simplicity, although more general results exist. The ensemble is specified fully by the mean and the two-point function, By considering the case where the smallest eigenvalue is no longer negative, λ min > 0, we can therefore easily solve for the mean value of stable minima in the large N -limit. We find The probability distribution function of energies at minima with barely positive definite Hessian matrix is simply obtained by considering the smallest eigenvalue of the Hessian in (G.6) and solving for V : it is approximated by the convolution of the Tracy-Widom distribution with a normal distribution.

G.2 Axion theories at large P
In the previous section we reviewed the statistical properties of Gaussian random fields with Gaussian covariance function. We are now in a position to consider the statistics of the axion potential in the large N and P N limit, and compare those results to a Gaussian random field.
The non-perturbative potential for the axions with unbroken discrete shift symmetry is given in (2.1), where as above Q is a P × N integer charge matrix. In the following we will assume that the couplings Λ 4 I are of similar magnitude and independent of the charges Q I . In the large P -limit the potential approaches a Gaussian random field. However, the potential (G.10) clearly is not isotropic (it is periodic under v i → v i + 2π only for some directions v ), nor does it have a Gaussian covariance function. Curiously, however, when sampling over random one-dimensional slices through the N -dimensional potential, the mean power spectrum is very well approximated by a Gaussian. It is therefore not very surprising that there are some similarities between the distribution of minima in Gaussian random landscapes and random axion landscapes, as we make precise below.
When sampling over ensembles of potentials defined by random Q, containing i.i.d. random integers distributed uniformly in the interval [−s, s] and random phases, the meanV np and variance Λ 8 np of the potential V respectively are given bȳ Just as in the case of a Gaussian field, the gradient is not correlated with the potential, or the Hessian. The correlations of the Hessian H ab ≡ ∂ a ∂ b V (θ) of the non-perturbative potential are given by H ab (θ)H cd (θ) = δ ab δ cd + δ ad δ bc + δ ac δ bd − 6 5 δ ac δ bc δ cd We can therefore propose an approximate random matrix model for the Hessian, that reproduces the correct correlations, except for the variance of the diagonal terms, where M is a GOE matrix with standard deviation σ M = Λ 4 np /∆ 2 np . We display the eigenvalue spectrum of the full Hessian matrix along with this simple random matrix model in the left part of Figure 17. Using the main result of [85], this implies that in the large N -limit we expect most minima at m.s 1 − 0.6 N 2/3 , (G. 16) so that the leading term behaves just like in the case of Gaussian fields. It is extremely hard to accurately sample the distribution of minima of the axion potential at large P . However, we can obtain a (not necessarily representative) sample of minima by numerically solving for local minima. We find a good agreement between the numerical results and the random matrix theory expectation.