Entanglement entropy and the large $N$ expansion of two-dimensional Yang-Mills theory

Two-dimensional Yang-Mills theory is a useful model of an exactly solvable gauge theory with a string theory dual at large $N$. We calculate entanglement entropy in the $1/N$ expansion by mapping the theory to a system of $N$ fermions interacting via a repulsive entropic force. The entropy is a sum of two terms: the"Boltzmann entropy", $\log \dim (R)$ per point of the entangling surface, which counts the number of distinct microstates, and the"Shannon entropy", $- \sum p_R \log p_R$, which captures fluctuations of the macroscopic state. We find that the entropy scales as $N^2$ in the large $N$ limit, and that at this order only the Boltzmann entropy contributes. We further show that the Shannon entropy scales linearly with $N$, and confirm this behaviour with numerical simulations. While the term of order $N$ is surprising from the point of view of the string dual - in which only even powers of $N$ appear in the partition function - we trace it to a breakdown of large $N$ counting caused by the replica trick. This mechanism could lead to corrections to holographic entanglement entropy larger than expected from semiclassical field theory.


Introduction
A longstanding challenge in quantum gravity is to understand the Bekenstein-Hawking entropy (1.1) It has been suggested that some or all of this entropy could come from entanglement [1][2][3]: the entanglement entropy associated to a region of space also follows an area law, and diverges in the ultraviolet. The entanglement entropy can be viewed as part of the one-loop correction to the Bekenstein-Hawking entropy, so it is most natural to consider the generalized entropy where S out is the entropy of fields outside the horizon [4][5][6].
In AdS/CFT, the relation between the Bekenstein-Hawking entropy and entanglement is made precise via the Ryu-Takayanagi formula and its subsequent generalizations [7][8][9][10]. The entanglement entropy of a holographic large N conformal field theory has an expansion of the form Here the large N expansion in the bulk becomes a small G expansion, where G ∼ 1/N 2 is Newton's constant in the bulk gravitational theory. The leading term of order N 2 is the area of a classical extremal surface. The first subleading correction appears at order N 0 and is given by the entanglement entropy of bulk quantum fields across the minimal surface [9]. Thus to understand the Bekenstein-Hawking entropy and its corrections, we should study entanglement entropy of gauge theories in the large N expansion. While entanglement of effective field theory on a fixed classical background captures the corrections to the Bekenstein-Hawking entropy, it does not provide a statistical explanation for the leading term in (1.2): the latter is simply a term in the effective action. Unlike field theory, string theory could account for this term in the entropy [11]. In string theory the sphere diagram contributes to the entropy at order g −2 s ∼ 1/G, and its contribution can be understood as counting open string states with endpoints anchored to the entangling surface. To calculate entanglement entropy directly within string theory remains fraught with challenging conceptual and technical issues [12][13][14][15], though in some cases it is feasible [16,17].
In this paper we will consider a particularly tractable example of large N gauge/string duality: two-dimensional Yang-Mills theory. Although Yang-Mills theory has no local degrees of freedom in two dimensions, it is a surprisingly rich theory. On the sphere, the large N theory has two phases: at weak coupling it has a deconfined phase similar to random matrix theory and at strong coupling it has a description as a two-dimensional string theory [18]. The two phases are separated by a third-order phase transition [19]. This is a useful model because it's exactly solvable, we know how to calculate the entanglement entropy for any entangling surface [20], and results are known to leading order in the 1/N expansion [21].
The entanglement entropy in this model has a description in the string theory, where it counts configurations of open strings ending on an entanglement brane [22,23].
Entanglement in gauge theory has some subtleties because the Hilbert space of physical states does not decompose as a usual tensor product [24,25] (see [26] for a recent review). Instead, the Hilbert space of a spatial region with boundary contains edge modes. Hilbert spaces are combined using an entangling product [27] which glues two regions along a shared boundary.
In two-dimensional Yang-Mills theory, this entangling product can be described quite explicitly [20,22,27]. The Hilbert space of an interval is spanned by states |R, a, b where R is an irreducible representation of the gauge group G, and a and b are indices in the irreducible representation R which live at the endpoints of the interval. We can extend this to a general subsystem consisting of an arbitrary number of circles and intervals, whose boundary consists of m points. Any density matrix ρ arising from a gauge-invariant state commutes with the action of the gauge group G and takes the form ρ = R p R ρ R , where p R is a probability distribution over the irreducible representations which specifies the state and ρ R is a maximally mixed state on all the degrees of freedom a, b. The entropy is then a sum of two terms: The Boltzmann entropy term is so called because it counts the number of indistinguishable states associated to the single "macrostate" labelled by R; a similar definition of quantum Boltzmann entropy was introduced in the context of black hole physics [29]. The Shannon entropy, on the other hand, measures fluctuations of the gauge-invariant information. Equation 1.4 can also be derived from the replica trick [20,21,30]. We note that there is an alternative algebraic approach to studying entanglement in gauge theory in which one associates entropy to the algebra of gauge-invariant local observables [31]. Here one has to be careful about the precise definition of the algebra, and different choices are possible. Generically, the local algebra will have a center, and one distinguishes the different algebras by their center. For an abelian gauge theory, S Boltzmann vanishes and the entropy associated with the electric center coincides with (1.4). However, for a nonabelian theory the algebraic entropy coincides with S Shannon and does not include the S Boltzmann term [32]. Similarly, we can consider quantities such as the mutual information I(A : B) = S(A)+S(B)−S(A∪B) between two intervals A and B separated by some finite distance. The mutual information is sometimes viewed as a regularized version of the entanglement entropy [33]. It is not hard to see that the Boltzmann entropy S Boltzmann , which is exactly additive, cancels out of the mutual information leaving only the contribution from the Shannon entropy. In this work we will consider both terms in (1.4) separately, and we will see that they behave quite differently in the large N limit.
Several authors [27,34,35] have noted the similarities between the expansion (1.4) and the holographic entanglement entropy formula (1.3). Like the leading term in the Ryu-Takayanagi formula, the Boltzmann entropy term is both linear in the state and local to the entangling surface. Moreover it is additive in the sense that it is linear in m, the number of points in the entangling surface. Generically the entanglement entropy is neither linear nor local, which raises the question of why the leading order term of the large N expansion has these properties.
The goal of this paper is to understand the relation between the expansion of the entropy (1.4) and the asymptotic expansion at large N . We therefore focus on the large N behaviour of the different terms of the entropy. Our main question is: how do they each scale with N ? Does one dominate over the other? It is a natural (and important) problem to consider, given that S Boltzmann arises purely from counting edge states; it is our analog of the Bekenstein-Hawking area term. We hope that understanding the behaviour of different parts of entanglement entropy in the large N limit of two-dimensional Yang-Mills will lead us to a better understanding of entropy in large N gauge theories in general, and particularly those relevant for holography. This paper is organized as follows. In section 2, we review two-dimensional Yang-Mills theory and the expression of the partition function as a sum over irreducible representations of the gauge group and specialize to the case where the gauge group is U(N ). We demonstrate how the entropy calculated via the replica trick splits into a sum of the local "Boltzmann" term and the nonlocal "Shannon" term as in (1.4).
In section 3, we map the U (N ) Yang-Mills theory as a theory of N interacting fermions on a one-dimensional lattice. The fermions are subject to a confining quadratic potential, as well as a repulsive "entropic force" similar to eigenvalue repulsion appearing in random matrix theory. In the fermion description the Shannon entropy term arises as the thermal entropy of the fermions, while the Boltzmann term is the expectation value of the "entropic potential". The fermionic model thus gives a natural explanation for the large N scaling of the different terms of the model: the Boltzmann term, being the expectation value of a pairwise interaction potential, is O(N 2 ); while the the Shannon term, being the entropy of the fermions, is naturally O(N ).
In section 4 we review the saddle point analysis of the large N limit, following Douglas and Kazakov [19]. In the weak coupling/low temperature phase the fermions behave like eigenvalues of a random matrix, and their density follows a Wigner semicircle distribution. At a critical value of the coupling there is a third-order transition to a strong coupling/low temperature phase where the fermions form a Fermi sea. We calculate the leading part of the entropy coming from the saddle point, find agreement with the calculation of Ref. [21] up to a constant, and confirm this result with numerical simulations. In this saddle point approximation we find that only the Boltzmann term contributes to the entropy; this shows that in this model, the analog of the "area operator" is the local operator which counts the logarithm of the number of edge states. Section 5 is dedicated to studying 1/N corrections to the entanglement entropy. Recall that the 1/N expansion of the partition function takes a particularly simple form: in the weak coupling phase there are no perturbative 1/N corrections to the partition function [18], while in the strong coupling phase the 1/N expansion of the partition function is an expansion in even powers of 1/N [36]. In fact this property of the large N expansion was the first evidence for a string description of the theory further elucidated in Refs. [37][38][39]. Surprisingly, we find that the Shannon entropy term scales linearly with N . While this would seem to be in conflict with the string expansion, we argue that it is not. The odd powers of N arise from taking the large N limit after the analytic continuation required by the replica trick. In the special case that the replica trick does not modify the topology of the underlying surface, we find that the string expansion is saved by a precise cancellation between the order N term of the Shannon entropy and the subleading order N term in the Boltzmann entropy.
We conclude in section 6 with a discussion and summary of our results.

Two-dimensional Yang-Mills theory
Two dimensional Yang-Mills theory has no local degrees of freedom and can be solved exactly [40,41]. It is therefore surprising that the theory is actually quite rich, particularly when put on the sphere. At large N and strong coupling, the theory can be described as a twodimensional string theory in which the partition function can be expressed as a sum over branched covers of spacetime by a two-dimensional worldsheet [36][37][38]. At a finite value of the coupling the string description breaks down, and the theory exhibits a third-order transition into a phase which can be understood in terms of a random matrix model [18,19]. For details we refer the reader to the extensive review [42]. For our purposes, the main feature of Yang-Mills is that, given its almost topological status, the entanglement entropy is finite and can be computed explicitly with relative ease; to wit, it is the simplest gauge theory in which one can begin to deeply understand entanglement for general gauge theories.
We consider two-dimensional Yang-Mills theory on a compact, orientable 2D Riemannian manifold. To specify the theory we specify the gauge group G, which we will take to be U (N ), and the Yang-Mills coupling constant g. We will be interested in the large N limit, for which we introduce the 't Hooft coupling λ = g 2 N which is held fixed as N → ∞. On a manifold of Euler characteristic χ and total area A, the partition function is given by This depends only on the Euler characteristic and the total area, which is a large simplification.
To calculate the partition function explicitly using (2.1), we need a parametrization of the irreducible representations R as well as their dimensions dim(R) and the quadratic Casimir C 2 (R). Every irreducible representation of SU(N ) has an associated Young diagram, usually represented as a collection of boxes arranged in N rows of length n 1 ≥ n 2 ≥ · · · ≥ n N = 0. Each diagram corresponds to a tensor representation with n indices, where n = n 1 + n 2 + . . .+n N is the total number of boxes. For example, the trivial representation corresponds to a diagram with no boxes, n = 0, while the fundamental representation is a tensor with one index and therefore corresponds to a diagram with one box, n 1 = 1, n 2 = n 3 = . . . = n N = 0. A single row of length k corresponds to a symmetric k-tensor, n 1 = k, n 2 = n 3 = . . . = n N = 0. An antisymmetric k-tensor is a single column of height k, which is represented by a diagram with n 1 = n 2 = · · · = n k = 1 and n k+1 = · · · = n N = 0. In SU(N ) we require that k < N because a column of height N is proportional to det(U ), which is trivial in SU(N ).
To instead use U(N ), we consider all products of each representation by multiples of det(U ). This amounts to shifting all the row lengths by a fixed constant, or equivalently, relaxing the restriction n N = 0 and allow for an arbitrary sequence of row lengths n 1 ≥ · · · ≥ n N ∈ Z.
In terms of the Young diagram, the expressions for C 2 (R) and dim(R) are (see e.g. [19]) The distinction between SU(N ) and U(N ) only appears at order N 0 in the partition function, while we will primarily be interested in effects at O(N 2 ) and O(N ).

Entanglement entropy via the replica trick
To calculate an entanglement entropy in the theory we will proceed by the replica trick. Suppose we start with a 2D Riemannian manifold M of Euler characteristic χ and area A. We can cut it along a one-dimensional surface Σ without boundary and we interpret the Euclidean path integral over M as producing a mixed state on the surface Σ. In general, Σ is the union of some number of circles. We further divide Σ into two disjoint parts Σ = A ∪ B.
Each of A, B consists of some number of circles and some number of intervals. Let m be the total number of points where we cut Σ into two intervals; we call this collection of points the entangling surface. We will be interested in the amount of entanglement between the states of the regions A and B in the state produced by the path integral over M .
Using the replica trick we form a new manifold M n by taking n copies of M , cutting them along Σ, and regluing. Let us denote the n copies of M by M i , i = 1, . . . n. When we cut along A, we introduce two boundaries, one on either side of the cut; we denote the two boundaries on M i by A i ± . Similarly, by cutting along B we introduce boundaries B i ± . To obtain the replicated manifold, we glue B i We continue the gluing cyclicly so that A n − gets glued back to A 1 + . The resulting replicated manifold will have area A n and Euler characteristic χ n where The latter equation is not immediately obvious but follows from the inclusion-exclusion formula for the Euler characteristic. We choose a triangulation of M and apply Euler's formula χ = V − E + F . Each edge and face of M appears n times in M n , except for the points lying on the entangling surface which appear just once. Correcting for this overcounting leads to the formula for χ n (2.4). The (unnormalized) reduced density matrix of region A then satisfies We find the entanglement entropy by differentiating with respect to the replica number n: One might worry that this does not make a lot of sense: we are taking derivatives with respect to the integer parameters n and χ. To perform this differentiation, we have to analytically continue the partition function Z to complex values of χ. In general such an analytic continuation from the integers would not be unique, but Carlson's theorem gives a set of sufficient conditions for uniqueness. The present case is simpler: the sum (2.1) is well-defined for complex A and χ as long as the real part of A is positive, so it can be taken as the definition of the analytic continuation.
To relate the entropy (2.7) to the fluctuations of the observable R we introduce the probability distribution over representations This distribution determines what a local observer would detect by measuring gauge-invariant local operators. In terms of the probability distribution p R , the entropy takes the form where the Shannon term is the entropy associated with fluctuations of the representation R, while the Boltzmann term is associated with the indistinguishable microstates residing at the endpoints of the intervals, Note that this term is local in the sense that it can be written as the expectation value of the local operator log dim(R) summed over the m points of the entangling surface. An important special case is the two-dimensional de Sitter entropy, which corresponds to an entangling surface that is a single interval on a sphere for which χ = m = 2 [20]. Another special case is the thermal entropy of 2D Yang-Mills theory on a spatial circle, for which χ = m = 0. Some more general cases were considered in [23].

Fermion description of two-dimensional Yang-Mills
To study this concretely, it is useful to map the problem to a system of fermions. The fermion mapping was studied in [43][44][45] for Yang-Mills theory on a torus, i.e. at χ = 0. Here we will allow for χ > 0.
Let us consider a system of N fermions which can occupy sites on a 1D lattice labelled by integers. Since fermions are indistinguishable and cannot occupy the same site, we can assign each configuration a sequence h 1 , . . . , h N of fermion positions where h 1 < h 2 < · · · < h N (i.e. the Pauli exclusion principle is satisfied). 2 We can associate each fermion configuration to a Young diagram by defining where c is a constant that we are free to choose. Under this mapping the difference between successive row lengths of the Young tableau is the number of empty spaces between fermions.
In terms of the fermion positions h i , the quadratic Casimir and dimension of the representation (2.2, 2.3) take a simpler form: Here we have chosen c = −(N + 1)/2 to eliminate a linear term in the potential (3.2). The constant terms ensure that the trivial representation has C 2 (R) = 0 and log dim(R) = 0. The constant term in C 2 is just shifts the ground state energy, and does not change the entropy or any other physical observable. However, the constant appearing in log dim(R) is important as it contributes directly to the entropy. In terms of the fermion system, we can write the partition function as where β is the inverse temperature of the fermion system. Given that h labels the sites in a one-dimensional lattice, the C 2 term in the energy represents a confining external quadratic potential while the log dim(R) term is a repulsive potential between the fermions akin to eigenvalue repulsion in random matrix theory. Physically, this system corresponds to a wire composed of a discrete number of sites in two spatial dimensions. The electrons repel each other by the Coulomb potential (which is logarithmic in 2D), and are confined in a quadratic potential φ ∼ r 2 (which is what we would get at fixed density ρ). Therefore the potential energy of a configuration is the sum of the external potential and the interaction potential Here q is the charge of the fermions, a is the lattice spacing, and C is a constant with dimensions of length. This gives a mapping between the variables of the fermionic model (q, ρ, a and β) and the Yang-Mills theory (A, λ and χ). This mapping is not one-to-one because the parameters have units, but we can identify dimensionless combinations of parameters of the two models.
In this simple picture we can already intuitively see why the system has two phases. The weak coupling phase occurs when the logarithmic repulsion term dominates over the external potential creating a low fermion density. As the coupling constant is increased, the fermions are compressed together and eventually the density in the center reaches a maximum because of the Pauli exclusion principle. This creates a phase transition, as the strong coupling forces the electrons into a "Fermi sea" in the middle of the wire.
We can now study the entanglement entropy of the gauge theory in the language of this fermion model. To obtain the thermal entropy of the fermion system we vary β in the partition function (2.1). In the Yang-Mills theory this corresponds to a simultaneous variation of A and χ: Thus we find the fluctuations of the representations in Yang-Mills are captured by the thermal fluctuations in the Fermi gas. Furthermore, in the fermion model, i.e., the Boltzmann entropy of the Yang-Mills theory is proportional to the fermion interaction energy.
The discrepancy between the entropy of the fermion model and the entropy of Yang-Mills theory originates in the treatment of the log dim(R) term. In the fermion model we treat this as a force term, so it does not contribute to the entropy. The expectation value of the "entropic potential" that gives rise to this entropic force is precisely the missing entropy (3.9).
The fermion model gives some insight into the large N scaling of the two terms in the entropy. The Shannon entropy is the entropy of a system of N fermions, so we expect it to scale linearly with N . The Boltzmann entropy term from the fermion perspective is the expectation value of the interaction energy; since there is a contribution for each pair of fermions we expect that this term scales as N 2 . This intuitive picture will be borne out by the precise calculations in the following sections.

Large N limit
In this section we analyze the entropy of two-dimensional Yang-Mills at leading order in the large N expansion. We find the saddle point configuration and comment on the phase transition found by Douglas and Kazakov [19], correcting the value of the constant term in the free energy. We find the entropy to leading order in the large N limit, which agrees with the result of [21] up to the aforementioned constant. To conclude the section we give a direct argument showing that only the Boltzmann entropy contributes at leading order, while the Shannon entropy is subleading. We find it is a consequence of the fact that the partition function is dominated by a single saddle point.

Continuum limit
In taking the continuum limit, it will be useful to start from the fermionic description, for which the partition function is given by (4.1) The last two terms are independent of h and simply shift the ground state energy and entropy. Now we change variables to x i ≡ i/N , h(x i ) ≡ h i /N and take N → ∞. We also define a continuum density of fermions ρ(h) = ∂x ∂h which is bounded between zero and one. In the large N limit the sums appearing in the partition function become integrals, with the outer sum over {h i } turning into a path integral over all functions h(x) satisfying h(x) − h(y) ≥ x − y (which just enforces the fact that the density of fermions is bounded by 1). The continuum partition function then takes the form The measure D[h] in this path integral is determined because it arises from a discrete sum for which the measure is fixed. The action associated with a fermion configuration is The parameter comes into the expression because the double sum was over i < j, leading to a cutoff |x − y| > 1/N . Taking → 0 + creates a principal value integral Since the exponential term in (4.2) scales as N 2 , the method of steepest descent gives a good approximation to the partition function for large N . This means we should find a minimum of the effective action. Consider a variation h(z) → h(z) + δh(z) and keep only first order terms: .
This leads to the saddle point equation where we have made a change of variables dy = ρ(s)ds to work with the density rather than the fermion positions, and we integrate from −a to a, assuming on physical grounds that ρ vanishes outside of some finite interval (−a, a). This equation is simply the condition that the force coming from the external potential balances against the entropic force coming from the other fermions. Notice that in the interval of interest, h ∈ [−a, a], the integrand has a pole.
In addition to the saddle point equation (4.6), we will have to impose boundary conditions to arrive at the solution. We also need to ensure that the Pauli exclusion principle is satisfied, i.e. ρ(s) ≤ 1. This constraint is ultimately what provokes a phase transition.

Ground state configuration
We are interested in solving the saddle point equation, which will tell us the ground state configuration of our system. This was done by Douglas and Kazakov [19], who solved the theory at large N and found that it has a phase transition. Note that if we restrict to physical values of χ = 2−2g, the phase transition is only present on the sphere, i.e. at χ = 2. However, for later convenience we will allow χ to take arbitrary positive values.
We begin by finding the ground state for the weak coupling phase, assuming that ρ(s) is an analytic function and defining the resolvent The Sokhotski-Plemelj theorem relates the value of the resolvent at the branch cut to the density ρ and is given by where R ± (h) = lim →0 R(h ± i ). So our task is to find R ± , which will immediately give us ρ. From the Cauchy integral formula we can write where we choose g(z) = z 1 − a 2 /z 2 because it has the same branch cut as the integrand in (4.6). Since g has a branch cut for z ∈ [−a, a], the contour C z (which surrounds z) must exclude this interval. After some algebra and contour manipulation (which can be found in the appendix), we arrive at which gives a = 2χ/λA. For the strong coupling phase, we no longer assume ρ(h) is analytic in all of its support. This is because it can hit a "roof" at ρ = 1, which may create a discontinuity. Indeed, the solution we found earlier only satisfies the ρ ≤ 1 constraint for 2λA ≤ χπ 2 (which is why we call it the weak coupling phase). If this inequality is not satisfied, the Wigner semicircle is no longer a physical solution.
Instead we can look for a solution in which the distribution saturates the ρ(h) = 1 bound near the origin and make the following ansatz: withρ(h) analytic. Inserting this into (4.6) gives where the principal value integral now runs over [−a, −b] and [b, a]. We define the resolvent, (4.14) using the Cauchy integral formula as before, and choose the function which has branch cuts at [−a, −b] and [b, a]. After some algebra and contour integration which can again be found in the appendix, we arrive at where Π(n|m) is the complete elliptic integral of the third kind (A.3). The constants a and b can now be determined in terms of α = λA/χ by applying two conditions. The first condition is that there is no term linear in z in the resolvent, 17) and the second is the normalization condition,

Free energy and entropy
With the ground state configuration found, we can now make the saddle point approximation (4.19) where h 0 (x) is the configuration resulting from the ground state density ρ(h) found in the previous section, and the effective action is given by Since we are working with the ground state density ρ(h), we can use the saddle point equation and integrate it with respect to η from 0 to h, Plugging into (4.20), we get In the weak coupling phase, we can use the solution (4.10) for ρ(h) to find to leading order in N . We note that this result is in disagreement with the one found in [21]; however, we find agreement with a numerical evaluation of the partition function, as shown in figure 2. The saddle point approximation for the free energy in the strong coupling regime does not yield an expression in terms of elementary functions, but it can be evaluated by numerical integration and also shows agreement with the simulation. Now we can readily show that to leading order in N , the only term that contributes to the entropy is indeed the Boltzmann term. We do not need the precise functional form of the partition function Z; it suffices to realize that the parameters of the problem enter the saddle point in the ratio λA/χ, so the saddle point ρ(h) depends on the parameters of the problem through this ratio. This is manifest in the formula (4.24), but holds in both the weak and strong coupling phases. This implies that in the saddle point approximation the partition function is of the form Z exp N 2 (λAf (λA/χ) + χg(λA/χ) , (4.25) where f and g are some functions obtained by integrating ρ which only depend on the ratio λA/χ. It is then easy to show that for Z of this form Then the Boltzmann term clearly dominates the entropy at leading order in N for both strong and weak coupling. In the weak coupling phase we have the explicit formula In the strong coupling phase we have no explicit formula, but we can still obtain saddle point results by a simple inversion; the result is plotted in figure 3. The key result is that the leading term in the entanglement entropy for large N comes solely from the local term counting the edge modes: it is linear in the density matrix and proportional to the number of entangling points m.

1/N corrections
We now consider 1/N corrections to the entropy, beyond the leading order results of the previous section. In particular, we will calculate both the leading term in S Shannon and subleading terms in the 1/N expansion of S Boltzmann . Our results were produced by numerical simulations which we will describe in section 5.1; in section 5.3 we will give some analytical checks which show agreement with the simulations. What should we expect from the subleading corrections? In the weak coupling phase, Gross and Matytsin studied subleading corrections to the sphere partition function and found that the 1/N expansion consists of only a leading order N 2 term, a subleading N 0 term and nonperturbative corrections, the largest of which is O(N −1/2 e −N ) [18]. The strong coupling  . The Boltzmann entropy from a numerical simulation at N = 41 is compared with the prediction from the large N saddle point. We see that the simulated Boltzmann entropy lies slightly below the saddle point value. This is consistent with the fact that the first subleading correction in the 1/N expansion is negative, as we will see in section 5. Here we have set χ = 2 and λ = 1, so A is dimensionless.
case is more interesting: Gross and Taylor [37] showed that the large N expansion of Z in the strong coupling phase could be organized as the partition function of a closed string theory. The genus expansion of the partition function is an expansion in even powers of N , where the term of order N χ (Σ) counts branched covers of the target space by a worldsheet Σ. In both phases, the perturbative 1/N expansion contains only even powers of N .
However, surprisingly, the Shannon entropy and the Boltzmann entropy each have a term linear in N . We argue that this is a consequence of the analytic continuation required by the replica trick. The replica trick can change the topology of spacetime, and the powers of N appearing in the partition function therefore depend on the replica number. Thus there is an issue with the order of limits; the correct thing is to do the replica trick at finite N and only then do the asymptotic expansion at large N .

Numerical method
The main approach we used is the Markov Chain Monte Carlo (MCMC) method, see e.g. [46] for a review. The state space of our system consists of a list of fermion positions h 1 > · · · > h N . Using the definition of C 2 (R) and log dim(R) in terms of fermion positions, we compute the Boltzmann factor exp(−βE 1 ) for the configuration. A new candidate fermion configuration is proposed by randomly shifting the position of one of the fermions by one lattice site. We then compute the new energy E 2 , and switch to the new configuration if the quantity exp(−E 2 +E 1 ) is greater than a random number between 0 and 1. 3 The detailed balance condition then ensures that after an initial "burn-in" period, the distribution of configurations follows the Boltzmann distribution. To reduce the burn-in time, we initialize the fermion configuration so that the density of fermions approximates the analytic form (4.10), (4.16) derived in the large N limit.
The MCMC method allows us to sample from the probability distribution of fermion configurations, but does not allow us to find the partition function or entropy directly. Instead we calculate derivatives of the partition function by using expectation values Similarly, second derivatives of the partition function are encoded in the (co)variances where we define the connected correlation functions as This lets us find the Shannon entropy in the following way, which we denote the variance method. The derivatives of the Shannon entropy are given by We can therefore carry out the simulation for various values of A and integrate ∂ A S Shannon numerically to find the Shannon entropy.
Since we have two parameters, A and χ, we can integrate the Shannon entropy along any curve in this two-dimensional space of couplings. This leads to a method we call the first law method. Recall that varying A and χ simultaneously does not change the saddle point configuration, which prompts the consideration of the combination To find S Shannon we can start from (A, χ), and integrate the derivative (5.8) along the line (A/τ, χ/τ ). More precisely, we define a function f (τ ) = S Shannon A τ , χ τ which then satisfies where the variance is calculated at area A/τ , and Euler characteristic χ/τ . Integrating this from 0 to 1 allows us to find S Shannon . This method has the added benefit that starting at τ = 1 and decreasing to τ = 0 causes the simulation to converge to the equilibrium distribution more efficiently-a phenomenon known as annealing. We call this the first law method because we are essentially integrating the first law of thermodynamics for the fermion system: is the energy of the fermion system.
We implemented both the variance and first law methods, and found they agree. However, we found the Monte Carlo method does not always converge at sufficiently strong coupling; we attribute this to the fact that at strong coupling the potential is very steep, increasing the probability that the system will get stuck in a local minimum. This led us to consider a "brute force addition" method, where we priority queue a list of configurations, sorted by their Boltzmann weights. At each iteration we remove the configuration with the largest weight, mark it as visited, then add all unvisited neighbouring configurations to the queue. This allows us to iterate over configurations in increasing energy order without repetition, from which we can directly calculate the partition function and entropy. We truncate the sum once doubling the number of terms added changes the total by less than 1%; this leads to good convergence, as illustrated in figure 4. At weak coupling, the brute force method converges more slowly, but in this regime the MCMC simulation is efficient. At intermediate value of the coupling, where both methods are reliable, we also found agreement.

Numerical results
From the simulations of the previous section we can extract both the way that S Shannon scales with N and its dependence on A.
In figure 5 we plot the Shannon entropy as a function of N for several values of the coupling, and see a clear linear scaling with N . This can be compared with the quadratic scaling of the Boltzmann entropy shown in figure 6. The N dependence of the Shannon entropy is surprising because it means S Shannon = (1 − A∂ A − χ∂ χ ) log Z has a term linear in N , while the partition function Z contains only even powers of N . As we will discuss in 5.4, this is due to the change of topology required by the replica trick. In figure 7 we plot the Shannon entropy as a function of A at fixed N . While we do not have an analytic prediction for comparison, we will show in the following section that it agrees qualitatively with an analytic approximation of the Shannon entropy.

Approximate entropy from the fermion model
The results of the previous section were obtained from numerical simulations, but we can give an analytic estimate of the O(N ) term in the Shannon entropy. One generically expects the entropy of a system of N fermions to scale linearly with N , and this is indeed what we find. However, we can go further and obtain a reasonable estimate of the dependence of the coefficient on A. This estimate also provides an upper bound for the Shannon entropy, which gives a further check on the numerical results. Let us work in the occupation number basis n = {n h } h∈Z , where n h ∈ {0, 1}. For each value of λ, A, and χ there is probability distribution p( n) for which S Shannon is the entropy, S Shannon = − n p( n) log p( n). (5.10) There are nontrivial correlations between the occupation numbers at different sites, but we can obtain a crude approximation by neglecting these correlations. From the point of view of the lattice fermion model, this is essentially a mean-field approximation where one ignores density-density correlations. Let ρ h be the probability that site h is occupied, then the "onepoint entropy" 4 is defined as In the large N limit, ρ h = ρ(h/N ), where ρ is given by (4.10) or (4.16) depending on the phase. In this limit the sum becomes an integral, and This quantity is plotted in figure 7 along with the Shannon entropy obtained from our simulation, and we find a qualitative agreement. The one-point entropy S one-point is an upper bound for S Shannon as shown in figure 7.

Large N counting and the replica trick
In section 5.2 we found that the Shannon entropy scales linearly with N to leading order. From the perspective of the fermion model, this is unsurprising: the entropy of a system of N fermions grows linearly with N . From the string description, where the partition function contains only even powers of N , it requires some explanation. The key point is that when performing the replica trick we analytically continue in the replica index, which requires analytically continuing the Euler characteristic χ of spacetime. The powers of N appearing in the partition function depend on χ, so differentiating with respect to χ disrupts the large N expansion. To see more explicitly how this works, suppose we were to try to calculate the entanglement entropy for two intervals on a sphere, order by order in N , using the Gross-Taylor expansion of the partition function [37,38]. The n th replica is a surface with area A n = nA and Euler characteristic χ n = 4 − 2n; for n = 1 we have a sphere, for n = 2 a torus, etc. The partition function of replica number n has a large N expansion beginning at order N 4−2n . This means if we hold the power of N fixed while carrying out the replica trick we will analytically continue a sequence which is identically zero beyond some finite point, leading to nonsensical results. The resolution is that we should first calculate the entanglement entropy at finite N and take the large N limit only at the end.
In some special cases, the entanglement entropy does not require analytically continuing to different spacetime topologies. When χ = m, the total entropy (2.7) can be obtained by differentiating the partition function with respect to A.
This happens when the entangling surface consists of two points on a sphere for which χ = m = 2 or when the entangling surface consists of zero points (i.e. when considering the thermal entropy) on a torus for which χ = m = 0. 5 These special cases are precisely those where the modular flow is geometric, i.e., where the density matrix ρ associated to a region can be written as e −H where H is a generator of a spacetime symmetry. In this case, we can express the partition function in the 1/N expansion and differentiate term-by-term and so the total entropy must have no term linear in N . Since the Shannon entropy scales linearly with N , the only way this can happen is if the positive O(N ) term in the Shannon entropy cancels against a subleading negative O(N ) term in the Boltzmann entropy. We test this numerically and find that this is indeed the case: this cancellation is illustrated in figure 8. We note that this cancellation is not at all obvious from the point of view of the fermion model, illustrating the advantage of having multiple dual descriptions of the same physics. This cancellation is rather special and occurs only for a single-interval entangling surface on the sphere. If we increase the number of intervals, the Boltzmann entropy scales linearly with the number of points, while the Shannon entropy stays the same. Thus for two or more intervals we get a negative O(N ) term in the entropy. This is relevant for the mutual information, for example. If we calculate the mutual information between intervals A and B, at order N the mutual information comes from the negative term in S(A ∪ B), leading to a positive mutual information of order N . Given the apparent potential hazards of applying the replica trick to the string expansion, one might wonder why the leading O(N 2 ) term seems to be unaffected. The answer is that, if we define the analytic continuation of the partition function by the sum (2.1), the saddle point exists for any χ > 0. At the saddle point, we can replace χ derivatives with A derivatives following the argument of section 4.3 and therefore we can calculate term-by-term in the 1/N expansion of the partition function. This argument applies only to the leading N 2 term in the saddle point approximation.

Discussion
We have calculated the entanglement entropy for two-dimensional Yang-Mills theory in the large N limit, focusing on the division of the entropy into what we have called the Boltzmann and Shannon entropies. The Boltzmann entropy is the expectation value of a local operator on the entangling surface analogous to the Ryu-Takayanagi formula. We have shown that this term dominates the large N limit, giving further support for this analogy.
The appearance of a term in the entropy linear in N , while evident from the fermion model, is surprising from the point of view of the closed string theory. We have explained the appearance of such terms as a breakdown of the genus expansion in the string theory when we analytically continue the target space topology. However we do not have a prescription for computing coefficients of the large N expansion of the entropy from the string expansion. This would require resumming some class of diagrams across different target space topologies; this is further complicated by the fact that we have to include diagrams with an arbitrary number of interaction vertices. The resurgence analysis applied to the torus partition function in Ref. [48] might be useful in this regard.
The analogy with random matrix theory may also provide an explanation for the term linear in N 6 . In the weak coupling phase, the fermion positions behave like the eigenvalues of a random matrix, with the factor dim(R) χ playing the role of eigenvalue repulsion in the random matrix model. The Euler characteristic χ is then analogous to the parameter β of random matrix ensembles, where β = 2 corresponds to the unitary matrices [49]. At β = 2, the Feynman diagrams have the topology of oriented surfaces and so the large N expansion contains only even powers of N . But perturbing away from β = 2 introduces non-oriented surfaces, and, in particular, diagrams with the topology of the real projective plane contribute at linear order in N . This suggests that the string description may involve non-orientable worldsheets; these appear naturally in the string description of two-dimensional Yang-Mills theory with gauge group O(N ) or Sp(N ) [50].
While the fact that the Shannon entropy grows linearly with N is evident from the fermion model, the fact that it cancels (in certain cases) with a subleading term in the Boltzmann entropy is surprising. A similar cancellation sometimes occurs in quantum field theory at one loop: while the entanglement entropy is always positive, it can cancel against the expectation value of the Wald entropy. For example, a conformally coupled scalar field theory in four dimensions has an ultraviolet-divergent entanglement entropy proportional to the area of the entangling surface, but this divergence cancels against the Wald entropy term which takes the form −ξ φ 2 integrated over the entangling surface [51]. It would be interesting to understand whether such cancellations can arise from an underlying closed string description.
Perhaps the most intriguing consequence of this work is the possibility that the breakdown of large N counting implied by the replica trick is a more general phenomenon. Suppose, for instance, that the same effect is present in N = 4 supersymmetric Yang-Mills theory. This would imply a breakdown of the closed string genus expansion in the bulk, and corrections to holographic entanglement entropy larger than expected from semiclassical field theory [9]. This would have potentially drastic implications for recent discussions of the black hole information paradox [52][53][54] which rely on an interplay between the leading term in the entanglement entropy and its subleading correction. We believe the possibility of large quantum corrections to black hole thermodynamics is compelling enough to warrant further investigation in other instances of large N gauge/string duality.
with C b chosen to surround the branch cut and exclude z. In particular, we choose The semicircular boundaries of the contour go to 0 as → 0. Notice that g(h ± i ) = ±i a 2 − h 2 , (A.8)