Entropy, extremality, euclidean variations, and the equations of motion

We study the Euclidean gravitational path integral computing the Renyi entropy and analyze its behavior under small variations. We argue that, in Einstein gravity, the extremality condition can be understood from the variational principle at the level of the action, without having to solve explicitly the equations of motion. This set-up is then generalized to arbitrary theories of gravity, where we show that the respective entanglement entropy functional needs to be extremized. We also extend this result to all orders in Newton's constant $G_N$, providing a derivation of quantum extremality. Understanding quantum extremality for mixtures of states provides a generalization of the dual of the boundary modular Hamiltonian which is given by the bulk modular Hamiltonian plus the area operator, evaluated on the so-called modular extremal surface. This gives a bulk prescription for computing the relative entropies to all orders in $G_N$. We also comment on how these ideas can be used to derive an integrated version of the equations of motion, linearized around arbitrary states.


Introduction and summary of results
Quantum entanglement has become a crucial aspect of understanding many physical systems including quantum gravity. A universal property of quantum gravity is that entropy satisfies an area law. This was first discovered for black holes [1,2,3], and more recently it was generalized in the context of AdS/CFT correspondence [4,5,6] by Ryu and Takayanagi [7,8]. They gave an elegant prescription for the entanglement entropy of any spatial region R in a holographic boundary theory in terms of the area of an extremal surface in the bulk spacetime: (1.1) Here the entanglement entropy is defined in the boundary theory as the von Neumann entropy S R ≡ − Tr ρ R log ρ R of the reduced density matrix ρ R , and is a measure of entan-glement between the region R and its complement R. The constraint X ∼ R means that the Ryu-Takayanagi (RT) surface X is homologous to the boundary region R, and G N denotes Newton's constant. This prescription for holographic entanglement entropy was derived from AdS/CFT in [9]. Furthermore, it is valid in general time-dependent cases [10,11]. In general, the gravitational theory in the bulk is described at low energies in terms of Einstein gravity corrected by higher derivative interactions. These interactions generate higher derivative corrections to the RT formula (1.1). A prescription for these corrections was given in [12,13] and has the form A gen = S Wald + S extrinsic (1.2) where the first term is the Wald entropy and the second consists of corrections from the extrinsic curvature of the RT surface. Since A gen is the full classical contribution to the gravitational entropy, we will refer to it as the "generalized area". 1 However, it has been an open question whether the extremization procedure in (1.1) works for general higher derivative gravity, using variations of the action. Our first result is that it does: As a byproduct of this result, one can generalize the derivation of the integrated linearized equations of motion from the first law of entanglement [14,15,16,17] to arbitrary regions and states. This is done by defining the variation of the modular Hamiltonian using the replica trick and from the linearized equations of motion for an arbitrary state one should in principle be able to get the nonlinear equations of motion.
The RT prescription (1.1) and its higher derivative generalization (1.3) are valid in the large-N limit of the boundary theory. Beyond the leading order in this limit, they would receive 1/N corrections from quantum effects in the bulk. A natural prescription for these quantum corrections is S R = ext X∼R S gen (X), S gen ≡ A gen + S bulk , (1.4) where the "generalized entropy" S gen is the sum of the expectation value of the generalized area A gen and a bulk entanglement entropy S bulk . The bulk entanglement entropy is defined with respect to the bulk spatial region between the RT surface X and the boundary region R. The domain of dependence of this region defines the notation of the entanglement wedge [18,19,20]. It is worth noting that after extremization X is known as the quantum extremal surface. The prescription (1.4) agrees with the one-loop result of [21,22] and was conjectured in [23] to hold for all loops. Our second result is to establish this from AdS/CFT to all orders in 1/N. Furthermore, entanglement entropy is not the only measure of quantum entanglement. To better understand the structure of entanglement, we also need the modular Hamiltonian K ρ ≡ − log ρ (1.5) for a quantum state described by the density matrix ρ, as well as the relative entropy which is a measure of distinguishability between an arbitrary state ρ and a reference state σ. Our third result is where K R,σ is the modular Hamiltonian for the boundary region R for the state σ, A gen is viewed as an operator on the surface X giving its generalized area, and K bulk is the bulk modular Hamiltonian in the spatial region between X and R. After extremization we call X the "modular extremal surface" for the state σ.
Using the prescription (1.7) for the modular Hamiltonian, we find for the relative entropy S rel (ρ|σ) = A Xσ gen + K Xσ bulk,σ ρ − A Xρ gen + K Xρ bulk,ρ ρ (1.8) where X σ and X ρ are modular extremal surfaces defined by (1.7) for the states σ and ρ respectively. Here we have dropped explicit references to the boundary region R for brevity, and · · · ρ denotes the expectation value Tr (ρ · · · ) in the state ρ.
The results (1.7) and (1.8) agree with one-loop results of [24]. As we will show using AdS/CFT, they are valid to all orders in 1/N. It is interesting to note from (1.8) that the boundary relative entropy is equal to the bulk relative entropy only at the one-loop order [24], and they generally differ at two loops or higher. This is because the two modular extremal surfaces X σ and X ρ differ by O(G N ) in general.
Recently, the AdS/CFT dictionary has been clarified by viewing holography as a quantum error correcting code [25]. The relation between the bulk and boundary relative entropy was used in [26] to prove a theorem for reconstructing bulk operators in the entanglement wedge of R in terms of boundary operators on R, and the one-loop result can be used to obtain an explicit large-N reconstruction formula in terms of the modular flow [27]. As we will see, the all-loop result (1.8) can be used to extend the reconstruction theorem to all orders in 1/N, at least for bulk operators at a fixed distance away from the RT surface, but it is not yet clear how to generalize the modular flow construction beyond one loop. A related issue is that the complementary recovery property discussed in [28] holds only at the one-loop order.
The outline of this paper is as follows. We begin in Section 2 with a review of the classical statement of extremality and rephrase it in a way that can easily be generalized to arbitrary theories of gravity, using variations of the action. Section 3 is independent of the rest of the paper and uses the variational principle to derive the integrated equations of motion around an arbitrary background using the first law. In Section 4, we generalize the classical discussion of Section 2 by including quantum fields in the bulk theory, providing a derivation of quantum extremality. In Section 5, we use quantum extremality for mixtures of states to write a formula for the bulk dual of the modular Hamiltonian to all orders in G N . We conclude with some closing thoughts in the discussion.

Classical statement of extremality from variations
Let us start with a review of the replica trick applied to AdS/CFT. In the boundary theory, the von Neumann entropy may be determined by the n → 1 limit of the Rényi entropy where n is known as the Rényi index. When n is an integer greater than 1, the Rényi entropy can be calculated from where Z n is the partition function of the boundary theory on a manifold known as the n-fold branched cover. This partition function can be calculated via AdS/CFT. In the large-N limit, we find the solution M n to the bulk equations of motion with the n-fold cover as the boundary condition and calculate its on-shell action I n . Up to 1/N corrections, we have log Z n = −I n . When there are more than one bulk solution, we choose the dominant one which has the smallest on-shell action.
The n-fold cover on the boundary enjoys a Z n symmetry permuting the n replicas cyclically. As in [9], we assume that the Z n replica symmetry extends to the dominant bulk solution M n . Let us take the quotient of the bulk solution M n by the Z n replica symmetry. This quotient amounts to considering the actionÎ n = I n /n which can be thought of as the on-shell action of the orbifold geometryM n ≡ M n /Z n . The orbifold has a conical singularity at the Z n fixed points. The derivative of the orbifold action with respect to n is the modular entropy introduced in [29]: S n ≡ −n 2 ∂ n 1 n log Tr ρ n = n 2 ∂ nÎn .
Since the orbifold geometry is seemingly singular, when doing variations one has to be careful with possible boundary terms at the tip of the cone. In other words, (2.3) reduces to a boundary term on the conical defect, and taking the n → 1 limit we find the von Neumann entropy S in terms of some geometric quantity A gen on a codimension-2 surface X.
The goal of this section is to show that for classical theories of gravity, the equations of motion close to n ≈ 1 imply that the surface X has to be extremal with respect to the entanglement entropy functional A gen : where diff denotes to a diffeomorphism that would change the location of X where the functional is evaluated.

Double variations
If we vary the action around the solution g n to the equations of motion with an off-shell deformation δg n that preserves the conical deficit angle and vanishes on the asymptotic boundary, we have where we have used the notation of [30]: E n ≡ δÎ n /δg denotes the equations of motion at integer n, and Θ(g n , δg n ) is the boundary term at the tip of the cone, linear in δg and obtained from integrating the Lagrangian by parts after a variation. The solution g n satisfies the equations of motion, leading to E n = 0. The boundary term is evaluated on a regulated surface r = ǫ where r is the radial distance from the tip of the cone, and we take the ǫ → 0 limit at the end of the calculation. The claim of (2.5) is that the boundary term vanishes in this limit. For integer n, it is clear that (2.5) holds, since we can go to the parent space M n where there is no physical boundary at the Z n fixed points.
In the next subsection, we will argue that (2.5) holds for general values of n. For now we will explore the consequences of this, saving the details for later. Since (2.5) is zero for any n, its derivative with respect to n is also zero: ∂ n δÎ n n=1 = 0. (2.6) Note that this follows as long as the equations of motion are obeyed at n ≈ 1. We can take the two variations ∂ n and δ in (2.6) in the opposite order, so that ∂ n gives us the entanglement entropy functional A gen for a metric in the neighbourhood of the onshell metric. Up until now we have kept the variation of the metric δg n arbitrary except for the boundary conditions of preserving the conical deficit angle and vanishing on the asymptotic boundary. Let us now choose δg n to become a diffeomorphism at n = 1. 2 If we consider the variations in the opposite order for a diffeomorphism at n = 1, we obtain where A gen is defined from (2.7) and can be computed using the conical method of [12,13,31] or directly using the n → 1 limit of the Wald entropy (see Section 2.3). This discussion is independent of how one computes it. To derive the second line, we used that g 1 + δg 1 is a solution to the equations of motion at n = 1 and we can use the same entropy functional A gen evaluated on a slightly dislocated surface X + δX. In Section 2.3, it will be clear how this works when one can take the ∂ n variation inside the action. Taking the difference of the two equations in (2.7) and compare it with (2.6) we get In other words, the entanglement entropy functional should be stationary with respect to shifts in the surface. This argument uses the equations of motion linearized in n − 1 which is the same condition that led to extremality in [9]. However, the advantage of our method here is that by considering variations of the action, we do not have to evaluate the equations of motion explicitly. We expect this to be true for an arbitrary theory of gravity. In the next subsections we discuss the subtleties that lie in these cases.

Boundary terms and the n → 1 limit
In the previous discussion, we used the equations of motion at integer n and at the same time deformed the metric off-shell (at finite n − 1). However, since we want to do two variations of the action, we want to be able to define ∂ n I(g n ) for an slightly off-shell metric, g n + δg n . We want to restrict to "regular" δg n : deformations of the metric which give a finite contribution to the action and do not change the strength of the conical singularity. This constraints the variation and allows for a well defined action for the deformed off-shell geometry.
We would first like to show that δÎ n = 0 for all n. We can first consider Einstein gravity, where we get Because √ g n ∝ r = ǫ, it is clear that only if δg n diverges approaching the tip one can get a non-zero answer. More generally, for an arbitrary higher derivative theory, we have [30]: where E rbcd would be the equations of motion for R abcd , viewed as an independent field.
For example for f (Riemann), E rbcd = ∂L ∂R rbcd . It is clear for Einstein gravity that a regular variation of the metric cannot give a finite contribution to the boundary term. However, while (2.10) = 0 at integer n, we would also like to argue that this is true for 1 < n < 2. The regularity condition for the variation requires the boundary term (2.10) to be finite if not zero. This is because there are no divergent terms at n = 1 and we are choosing the δg n to keep the variation finite for n > 1. However, the most general metric compatible with replica symmetry will be an expansion with positive powers of r n−1 and integer powers of r (see next section). Given that we are working at integer n until the very end, ǫ n−1 → 0, which implies that there cannot be a finite term. This implies that (2.10) is zero.

Variational approach for the gravitational entropy
While ∂ nÎn | n=1 in (2.7) can be computed explicitly using squashed cones, that approach requires being careful with several subtleties that arise in the n → 1 limit and there is currently no complete formula for an arbitrary theory of gravity. In this subsection, we are going to propose an equivalent but perhaps clearer approach than (2.7), where we think of ∂ n as a variation inside the action.
We would like to understand if we can treat ∂ n g n outside the r = ǫ tube as a small variation inside the action integral. This is not true at n = 1: the metric might include terms g n ∝ ǫ 2(n−1) , which give ∂ n g n ∝ ǫ 2(n−1) log ǫ, which is not small as n → 1 (at fixed but small ǫ). However, we can avoid this issue by working at n > 1. In this case, we expect that ∂ n g n is a small variation 4 and thus we can apply (2.10) for ∂ n g n . All the contribution from ∂ n g n comes from the g τ τ component in (2.11). This gives the Wald entropy at finite (but non-integer) n − 1 : This formula is valid for non-integer n and it is a finite n − 1, off-shell version of (2.7). 5 In order to avoid contradictions, it is important that the n → 1 limit of the Wald entropy at finite n − 1 is not the Wald entropy at the n = 1 solution. The reason is that the Wald entropy at finite n − 1 is written in terms of the g n,0 fields in (2.11), while at n = 1 one only have access to the sum over them (2.12). We expect the equations of motion close to n = 1 to determine g 1;0 in terms of g n=1 . By carefully taking the limit, one gets the generalized area: where we used the n = 1 equations of motion. Note that this approach was used before for Einstein gravity in [9,29]: because of the simplicity of this theory, one can evaluate (2.14) directly at n = 1 without worrying about subtleties in the limit. For readers familiar with the squashed cone approach to higher-derivative entanglement entropy [12,13,31], (2.14) might look surprising, because A gen has a contribution from the Wald entropy at n = 1, but it also has an "anomalous" contribution which depends on the extrinsic curvature [12]. The anomalous contribution depends on the details of how the metric splits. In our case, S Wald is explicitly defined in terms of the Lagrangian and the g 1;0 metric. In this way, our approach gives an explicit formula for the holographic entanglement entropy: the Wald entropy of the split metric g 1;0 . However, to determine its form in terms of n = 1 quantities, one has to solve the most divergent part of the equations of motion.
One should be able to show explicitly how (2.14) relates the squashed cones contribution and the Wald entropy. For Lovelock theories, it is easy to see how this works: A gen is just given by the Wald entropy in terms of induced Riemann tensor, which is the n → 1 limit of the projected Riemann tensor on the surface. In Appendix A, we consider a set of two-dimensional examples which we believe capture (2.14) more generally.
In our discussion, we have always focused on families of metrics (not necessarily onshell) which keep the action finite. It is often the case that in the r ≈ 0 expansion, the most general form for the metric gives rise to an infinite action. In other words, there are some divergent terms in the equations of motion which give a divergent contribution to the gravitational action, while other metric contributions with divergent equations of motion have a finite action (for example, changes in the location of the surface). We will always work with metrics which have a finite action, which is equivalent to imposing the most divergent part of the equations of motion. Even if this class of metrics will depend on the Lagrangian, it is rather universal: it will not depend on the location of the conical singularity. In this way, by requiring the action to be finite, we expect that one can understand the relation between g 1;0 and g n=1 , which would determine S Wald explicitly in terms of n = 1 quantities.

The first law of entanglement and equations of motion
This section is a side product of the previous section. It is independent of the rest of the paper and it will not be mentioned again until the discussion. In the previous sections, we have explained how, in classical gravity, the commutativity of the double variation ∂ n , δ diff implies the extremality of the entangling functional. We can also use this framework to consider more general variations which do not vanish at the boundary. In holography, it is natural to consider turning on a small source. This framework naturally allow us to derive the integrated equations of motion by assuming that the entanglement entropy is given by the area thus generalizing [16,17].
The idea is that, from the field theory perspective, we can think of the second variation commuting as the first law [14,15]: [δ, ∂ n ] log Trρ n n | n=1 = δS − ∂ n Trδρρ n−1 = δS − δK. We would like to understand if we can recover this from the bulk point of view.
In order to do this, we want to be in the same setup as [17]. Consider a deformation of the density matrix which changes the one point function of the stress tensor by a small amount, δ T µν ≪ 1, which is achieved by turning on the respective source, the boundary metric. If we add a term λ d d xδg µν bdy T µν to the Lagrangian, then the stress tensor will get an expectation value linear in λ (to first order in the deformation). In the original geometry, we expect the same change in the action by computing the variation of the action: The variation of the action will be given by the equations of motion plus a boundary term, the usual integral of the Brown-York stress tensor. This boundary term will vanish if the expectation value of the stress tensor is zero. Now, if we repeat the same for the Rényi entropies, we obtain: We can analytically continue this expression in n, take its n derivative, and express it in terms of boundary quantities using the standard dictionary T BY = T : 3) This formula for the variation of the boundary Hamiltonian from analytically continuing the one point function at integer n was discussed previously in [35,36]. Note that in the case where the modular Hamiltonian is local, the right-hand side (RHS) will be given by R dΣ µ ξ ν δ T µν and this can be understood from the left-hand side (LHS) because δ T µν = d d x T µν T αβ δg αβ bdy . So we are in exactly the same setup as [17]. We can try to understand the variations in the opposite order: where we have not yet used any equation of motion.
In this way, given that the variations commute with each other, we obtain: We have derived this equation by assuming that there is some action, but this equation should be a true equation independently of how we derive it. Note that, to derive it, we did not need to use the background equations of motion since they cancel in the double variation. This gives a gravitational entanglement first law, in a very similar to Wald's first law [37]. In both cases one derives the first law by varying the Lagrangian. In Wald's case, the first law is a consequence of having a Killing vector: the conservation of diffeomorphism current relates the difference between the area in the extremal surface and the energy at infinity with the gravitational constraints, integrated in a Cauchy slice in the entanglement wedge. In our case, we do a ∂ n variation, which is less symmetric and we obtain that the two boundary terms differ by a codimension 0 integral. In this way, under the assumptions that the entanglement entropy is given by the generalized area and that the background equations of motion are satisfied close to n = 1, we have derived the following equation: withg = ∂ n g, but the equation is true even if we do not know whatg is. In this case, δE is integrated over the whole manifold. Since we have less symmetries that in Rindler (where there is a Killing vector), the integral is higher dimensional, but it does not seem possible to do better from the first law. From the assumptions that the background metric to satisfy the background equations of motion at leading order in n − 1, the standard bulk-boundary dictionary and that the entropy is given by the area, we have deduced that δS = δK ⇐⇒ δEg = 0. Since this is true for an arbitrary entangling surface, this probably implies δE = 0 everywhere. In principle, the linearized equations around an arbitrary background could be integrated to give the nonlinear equations of motion. However, given that the leading order in (n − 1) background equation of motion is a necessary assumption for this discussion, one might need to assume the background equations of motion for all n to derive the nonlinear Einstein equations. 6 Note also that this expression for the modular Hamiltonian is compatible with [24]. In fact, for Einstein gravity, we can think of δA = RT γ αβ δg αβ and express δg = M∞ dxG(X, x)T (x). This gives an expression for δK from which we can read KT µν in holographic theories (similar comments were made in [14,38]). The reason why this is only true given the equations of motion is because in order to write the metric operator in terms of the boundary fields one imposes the linearized equations of motion for the graviton. The good thing about the euclidean prescription described above is that it provides a bulk definition for the modular Hamiltonian which is independent of the area through the asymptotic one point functions at n ∼ 1.

Quantum corrections to entanglement entropy
In the presence of quantum corrections, we will have a path integral in the replicated space M n . The presence of quantum corrections will modify the equations of motion to all orders in G N , we are going to denote the backreacted background metric by g cl,n and will expand it in G N : g cl,n = g (0) cl,n + G N g (1) cl,n + · · · . 7 As in [9], we assume that the background metric g cl,n is Z n symmetric.
We are going to define the "orbifolded" partition function by dividing by n: Let us review the discussion of [21], where they describe how to think about log Z n , ∂ nÎn at non-integer n. In the previous classical discussion, because of the Z n symmetry of the background, the calculation of the action only needed the metric in the quotient space, however the quantum partition function is only defined in the parent space. 8 We can exploit the Z n symmetry of the background metric, to write the partition function as: where the gravitational density matrix ρ n is defined by the boundary condition that the background metric g cl,n has a conical singularity of strength 1/n. By taking n powers of this seemingly singular density matrix, one ends up with a geometry which does not have a conical singularity. Given that ρ n is defined for arbitrary n, one can analytically continue (4.2) to real n: it is just the n-th power of ρ n . In this way, we can express the derivative ofÎ n as the sum of the derivatives with respect to the lower and upper arguments of Trρ n n : These first term is obtained by taking a derivative with respect to the background metric inside the path integral and using the expectation value of the equations of motion as in [21] (but to all orders in G N ). To exploit the semiclassical part of the problem (which allowed us to use the ρ n notation), where we have a well defined background metric, one needs to work perturbatively in G N around a given saddle g cl,n . This discussion only makes sense in the G N expansion. This formula is formally true for arbitrary n, however to get the corrections to the background metric g (k>0) cl,n one needs to analytically continue the expectation value of the stress tensor, T n , to non-integer n.
We can take the n → 1 limit: To one loop, this is the same as [21]. The notation is a little different. There, A gen was explicitly separated into two terms: one coming from the generalized area evaluated in the background metric g cl (which was denoted δA 4G N ) and a contribution coming from matter fields which couple with derivatives of the metric, S wald−like . This last term is easily illustrated with a scalar field with a term Rφ 2 , where S wald−like = RT φ 2 . In this original notation, the expectation value of the area due to graviton fluctuations should be thought as included in S wald−like .
This procedure is in principle well defined to all orders in G N : log Z n is a completely standard partition function, although equation (4.3) requires introducing a r = ǫ artificial boundary in our gravitational background. This "brick wall" partition function has been discussed in detail in [39,40].
More concretely, at integer n, the partition function is well defined and nothing special happens at the Z n symmetric fixed point. In order to take the n derivative, it is convenient to define the partition function with a boundary at r = ǫ. We want to do this in a way that we recover the original partition function when ǫ → 0. This is achieved by choosing a set of boundary conditions for the quantum fields at r = ǫ and then integrating independently over all possible boundary conditions. This integration is often referred to as summing over edge modes [39,40], there they write the partition in a smooth black hole background for abelian gauge fields in terms of the partition function in a brick wall geometry summed over all possible electric fluxes across the boundary. Of course, after setting up these boundary conditions to define the partition function in the presence of a boundary, the entropy (n derivative) will also have the same boundary conditions and edge modes. We can think of these edge modes as the center variables of [41]. We expect this story to generalize straightforwardly to gravity, see [24] for a discussion about gauge invariant boundary conditions for free gravitons.

Variations
In order to take variations with respect to the background metric, we have to define our partition function slightly off-shell. We can do this by adding a background stress tensor which couples with the metric operator: dx d √ gT bkg µν h µν , with h µν = g µν −g µν cl,n , the background subtracted metric, it is hopefully clear from context that h, g denote operators while g cl is a c-number. This term in the Lagrangian naturally splits the metric operators into the background metric, g cl,n and background subtracted fluctuation, which we will denote by h. 9 Derivatives with respect to the background stress tensor generate then background subtracted metric correlations. The role of the background stress tensor is to turn on-shell an arbitrary background metric 10 which allows us to think of the partition function as a function of the background metric.
At integer n, we will consider the variation ofÎ with respect to the the background metric: where we used the quantum corrected equations of motion and the results from the previous section. Since this equation is valid for arbitrary n, its n derivative will be zero. The boundary term appears when g cl,n has to be integrated by parts and it should be thought as including an expectation value with respect to the fluctuating fields, but we omitted it to simplify the notation. By turning a background stress tensor, we can also take variation of (4.3) As our variation would be off-shell at integer n, the last term will not cancel. However, if we consider a variation which is on-shell close at n = 1, a diffeomorphism, the variation of last term will be zero, so, asking for δ∂ nÎn = ∂ n δÎ n = 0 implies that This is the quantum extremality condition of [23]. To leading order in G N , we will later show explicitly that this is true using the equations of motion at n ∼ 1, but this approach is valid to higher orders in G N . An example with finite backreaction would be that of the Polyakov action (see Appendix B), but this example might be too simple, since its effective action is local.
G N perturbation theory, the stress tensor and gravitons The previous discussion applied order by order in G N and here we will be more a little bit more explicit about how it is defined. 9 To each order in G N , we can think of the Einstein equation as simply the tadpole equation for the metric operator: g cl,n = g n . 10 We can think of T bkg = T bkg [g cl ], since the equations of motion (tadpole equations) are E(g cl ) − G N T = T bkg and the LHS defines T bkg [g cl ]. Equivalently, we can do Legendre transformation and obtain the effective action, which is a function of the off-shell background metric.
The Einstein equation is an operator equation, which means that: We can expand the Einstein tensor in terms of g = g cl +h in G N and, to each order, we can basically think of the gravitons h as interacting matter with an their effective stress tensor determined by the expectation value of the Einstein tensor, expanded around with E( g ). In this way, we can write the O(G k N ) term in the previous equation as: (4.9) where the first term in the LHS is the linearized Einstein tensor and this equation determines g cl,n . Note that T grav is defined order by order in G N by expanding E n (g) . We schematically denote the RHS as T .
We can think of the equations of motion as a background field expansion of the action order by order in G N and consider the variation of the (effective) action with respect to the background metric. If we think about gravitons order by order, they are basically the same as complicated matter with an effective stress tensor determine by the previous equation.

The definition of quantum extremal surfaces
In the previous sections, we derived the quantum extremality condition. In this section, we will explore the quantum extremality equations. Note that, in order to have a nontrivial quantum extremal surface, there has to be some asymmetry between the inside and outside region, and, for the symmetric case of a sphere in the vacuum, there will not be corrections to the extremal surface.
In our framework, we will always have a well defined background metric g cl and interacting gravitons on top of it. We can think of the location of the entangling surface in similar terms: X = X 0 + G N X 1 + · · · , X denotes the location of the surface to all orders. 11 For Einstein gravity (it generalizes trivially to higher derivatives but we are going to focus on Einstein for simplicity), the leading term corresponds to the location of the extremal surface 1 where K is the extrinsic curvature of the surface at X 0 and it depends on the position on the RT surface y and in the background metric, since it is codimension 2 surface, there are two normal directions which we denote by I. To leading order in G N , we can write 11 Note that there are no G an equation for the quantum extremal surface using the results of [35,42]. One can use perturbation theory to understand how the entropy changes by a small change in the subregion. As in the previous discussion, we are going to denote by r = ǫ the tubular region close to the entangling surface. Using their work, one can show that to first order in G N : This is a linear equation for X 1 , determined in terms of quantities evaluated at X 0 (the classical extremal surface) which are well defined. T is the RHS of (4.9) and it is evaluated ǫ away from the entangling surface. The finite contribution to the variation of the entanglement entropy comes from a divergent contribution of T K . In general terms, we expect this object to diverge when the stress tensor approaches the boundary of the region and the leading divergence goes like 1 ǫ d−2 . All the contributions that give a divergent variation of the entropy will correspond to the renormalization of the gravitational couplings, and should disappear after adding the proper counterterms. So, only the divergent contribution T K ∝ 1 ǫ will contribute. If the background has a Killing vector, this correlator will not have an odd divergent term. The higher orders can be obtained from solving the exact equation where K X is the modular Hamiltonian of the bulk surface X. This equation can be expanded in X order by order in G N . Of course, K should be thought as an expectation value and (4.12) as a tadpole equation for X, for example to O(G 0 N ), we can think of adding an extra term in the RHS − K X 0 (h; y) .
To leading order in G N , we can also see how one would obtain the quantum extremality condition from the equations of motion around n ∼ 1. The extremality of the area in RT is obtained by expanding the equations of motion near n ∼ 1, r ∼ 0 [9]. Schematically: that is, extremality is derived from regularity of the metric close to the Z n symmetric fixed point. In the presence of quantum matter, we will have: 4.14) It is now clear that if there is a 1/r divergent term in T K , regularity of the metric close to the Z n symmetric fixed point will shift the surface to the quantum extremal. It is also clear from this equation that the stress tensor that appears in (4.12) is just the RHS of Einstein equations.

Subtleties with gravitons
It might not be completely clear how to evaluate the entanglement entropy in the quantum extremal surface for gravitons, or whether it is well defined (see [24] for a set of boundary conditions that works for extremal surfaces). We certainly expect log Z n to be well defined to all orders in G N and g cl,n should also be well defined in the G N expansion. Upon the inclusion of a boundary and summing over the proper edge modes, we expect that (4.4,4.12) makes sense order by order in G N . Of course, in order to make this more concrete one should understand better the entanglement entropy of gravitons. For free gravitons, we expect that one can apply the ideas of [39,40] together with [24] to compute the entanglement entropy. Then, we expect that the interacting graviton can be treated in the same way, by considering the interaction in entanglement perturbation theory [35,43,44]. In the same way, we expect that the deformation of the surface away from extremality can be understood in similar terms. More explicitly, as long as the displacement is small, we will schematically have S bulk = m RT dy 1 ds 1 · · · RT dy m ds m δX(y 1 ) · · · δX(y m )× × T s (X 0 , y 1 ) · · · T sm (X 0 , y m )f (K 0 ) , (4.15) with T s = e iK 0 s T e −iK 0 s , the modular evolved stress tensor. That is, the bulk entanglement entropy in a neighboring surface will be a correlator of (modular evolved) stress tensors and some function of the modular Hamiltonian K 0 integrated several time over the extremal surface. So, in principle, we might only need the modular Hamiltonian in extremal surface to obtain the entanglement entropy in other surfaces. In this expression, part of the G N will come from δX, part from changing the background metric and part from the correlator: stress tensors and K 0 , for example to O(G N ) we will have S bulk = S bulk,f ree (X 0 ) + S bulk,G N (X 0 ) + RT dyδX I T Ir (X 0 , y)K 0 + dxδg ab T ab (x)K 0 . Alternatively, we could just define this graviton entanglement entropy in terms of the boundary replica trick. We expect the partition function in this smooth manifold to be perfectly well defined.
Note that quantum extremality relates the contributions from δX of S bulk to the contribution from the area. We will discuss this more explicitly in the next section.

Quantum extremality and mixtures
Up to here, we have discussed quantum extremality in terms of partition functions Trρ n which have a well defined path integral preparation and correspond to a unique classical saddle. We would like to understand how to extend the previous methods to mixtures of states: Even if Z ρ+σ,n cannot be prepared in the Euclidean path integral, each of the terms in the RHS of (4.16) can, so we can think of (4.16) as a sum of path integrals. So, Z ρ+σ,n is in principle well defined for integer n: we have an asymptotic circle with perimeter 2πn which is divided into n slices and set boundary conditions in each of the slice determined by a configuration in the RHS of (4.16). Because this definition is an n−dependent sum of path integrals, it seems hard to analytically continue in n.
At this point, it is useful to make a remark about mixtures of path integrals in general. In the effective action formalism that we described before, whenever we have a mixture of states, we want to fix the same background source across the different states. Since there is only one background source, there is only one corresponding classical value for the field, that is, one tadpole equation. Consider the example of the linear mixture of two density matrices: 1 2 (ρ + σ). In the presence of the same background tensor δ δT bkg (Z ρ + Z σ ) = g ρ + g σ = 2g cl,ρ+σ , we can Legendre transform by adding the term − dx d T bkg g cl,ρ+σ to the path integrals. This means that δ δg cl,ρ+σ Z = 0 will give the sum of the equations of motion, the tadpole condition will be g − g cl,ρ+σ ρ+σ = 0 which is not explicitly linear in ρ + σ because it is expanded around a background. If Z ρ and Z σ share the same saddle to leading order in the saddle point expansion 12 cl,σ , then we can understand this formalism as adding a quantum mixture of states to the classical geometry and solving the sum of the equations of motion E(g cl,ρ+σ ) = T g cl,ρ+σ ρ+σ , which we can now compute in G N perturbation theory. 13 Note that even if the Einstein equations E(g) = T matter are linear in the mixture, the expectation value of the tadpole is background dependent. This makes the linearity of Einstein equations hard to see if we write them around g cl,ρ+σ , however it is clear that g cl,ρ+σ = 1 2 (g cl,ρ + g cl,σ ) (yet this is clear because g ψ = g cl,ψ ). The previous discussion gives a prescription to extend our result to mixtures of states that have the same O(G 0 N ) value of the metric: g cl,σ . These two states share, to 12 If g (0) cl,ρ = g (0) cl,σ , g cl,ρ+σ is not a saddle. While g ρ+σ appears when coupling of the two path integrals through a background stress tensor, it does not have a clear semiclassical interpretation and we will not be considering this situation. 13 This gives a well-defined procedure to compute the partition functions. If the two states are macroscopically distinguishable, the gravitons h = g−g cl,ρ+σ would not have a well-controlled one-loop partition function. However, for h = g − g cl,ρ+σ , this graviton is only slightly off-shell with respect to the path integral of ρ or σ, so the difference is small and it has a well-defined partition function. Alternatively, we can compute these partition functions with respect to their on-shell background first and use linearity g cl,ρ+σ = g cl,ρ + g cl,σ . leading order in G N , the same (Z n symmetric) saddle for Z n . We can think of the sum of path integrals in terms of a mixture of quantum states in the g (0) cl geometry, satisfying the equations of motion: where we think of the RHS as a sum over partition functions and the superscript denote that we are expanding the gravitons around the g cl,(ρ+σ) n background. It is key that we phrase the problem in terms of a unique geometry and not a mixture of them since this will allows us to analytically continue in n. To do this, we note that g cl,(ρ+σ) n is Z n symmetric, which allows us to think of Tr(ρ + σ) n in terms of taking the n-th power of (ρ + σ) n , where the subscript n denotes that it has the metric determined by (4.17). Upon analytic continuation of the RHS of (4.17), the previous gives a prescription to compute Z n (ρ + σ) = Tr[(ρ + σ) n ] n for non integer n. Given this, the discussion from the previous section follows and we get quantum extremality for mixtures: It is clear that we want to think of the n = 1 solution as given by a unique geometry, g cl,ρ+σ where quantum states can be entangled. Note that the fact that at integer n we have complicated sums of partition functions makes the quantum extremal surface nonlinear in the state, since it depends on the modular Hamiltonian of the mixture, ie X ρ+σ = X ρ + X σ because g cl,(ρ+σ) n = g cl,ρ n + g cl,σ n .

Modular extremality
A simple consequence of quantum extremality for mixtures is that we can compute the expectation value of modular Hamiltonians for states close to each other (same g cl ). The modular Hamiltonian is just the log of the density matrix : We can get this from a mixture σ + λρ, since ∂ λ Tr(σ + λρ) n | λ=0 = Trρσ n−1 .
In this way, if we combine this with quantum extremality for mixtures 14 , we get a formula for the dual to the modular Hamiltonian: The boundary modular Hamiltonian is just given by the area plus the expectation value of the bulk modular Hamiltonian of σ in the ρ background. For simplicity of notation, we will illustrate this for Einstein gravity, but it generalizes trivially to higher derivatives. The surface where these terms are evaluated is determined by quantum extremality for the mixture, which implies that the sum of the two terms is extremized. We will call the X σ surface modular extremal. The variation can be carried out [44]: 15 : T Ir s (r = ǫ; y) : σ ρ (5.3) where : T : ρ = T − T ρ and : T s : σ ≡ exp(iK Xσ bulk,σ s) : T : σ exp(−iK Xσ bulk,σ s). As we discussed before, one should also add an expectation value of the extrinsic curvature for the gravitons in the RHS but we omitted it for simplicity. The finite contribution arises from a 1/ǫ divergence in the first term, as in quantum extremality, and the second term can in principle get finite contributions from the s integral (for a local modular Hamiltonian there are contributions from s ∼ − log ǫ that make this finite). We can think of the first term of the variation of ρ and the second term the variation of log σ. If ρ = σ, then the second term does not contribute, since it is proportional to the one point function of the stress tensor and we recover quantum extremality for a single state.
To leading order in G N , X σ (ρ) is just the classical extremal surface and this is the bulk expression for the modular Hamiltonian discussed in [24]. In that paper, it was also discussed what the dual of the relative entropy is to leading order in G N and our result generalizes it to higher orders: Given that the surfaces where the modular Hamiltonians are evaluated are different, the relative entropy does not have a simple description. Its difference from the bulk relative entropy can be understood as coming from the difference in areas localized O(G N ) away from the classical extremal surface. From our point of view, the object which has a natural bulk description is the modular Hamiltonians, since it has a well defined path integral.

A linear mapping of surfaces
From the point of view of the path integral at integer n, Trρσ n−1 , it is clear that our expression should be linear in ρ. At n = 1, this is the statement that we should be thinking of the position of the modular extremal surface X σ (ρ) as linear function of the state ρ.
In this way, given a state σ and its quantum extremal surface X σ (σ) , we can think of X σ (ρ) as a mapping from the quantum extremal surface in the σ background to a surface in the ρ background (this is similar to [26], where some unspecified mapping was proposed).
The G N corrections generalize the extremal area operator appearing in [24] to the σdependent modular area operator: A Xσ depends on the modular Hamiltonian of σ. Since our equation can be understood as the expectation value of an operator in the state ρ, we can write is as an operator equation:

Linearity and state dependent divergences
In principle, one could worry about the fact that (5.3) is not linear in ρ because of lim ǫ→0 ǫ T Ir (r = ǫ, y)K bulk,σ ρ − T Ir (r = ǫ, y) ρ K bulk,σ ρ (5. 6) Note however that in the second term, the divergent contribution has to come from K bulk,σ , since there is nothing special happening at r = ǫ in the original state. Now, if this divergent contribution from K bulk,σ was state independent, K X bulk,σ ρ = c(X) ǫ and thus we recover a linear expression. 16 K bulk,σ ρ could in principle have state dependent divergences. State dependent divergences in the entropy were studied in [45], and they look like ∂R O ρ , which using the first law they can be mapped to a contribution to the modular Hamiltonian ∂R O [24], which will lead to state dependent divergences in the modular Hamiltonian. However, because our contribution to the entropy includes A gen , S gen will not have these divergences. In other words, K bulk,σ + A gen does not have state dependent bulk divergences and we can just shift the possible term from K bulk,σ to A gen in a way that none of the terms will have state dependent divergences and we get a clearly linear (5.6).
From this expansion, one could require the relative entropy to be given by the bulk relative entropy of some surface, which is neither the modular nor quantum extremal surface. We could set up an equation which should be solved order by order in G N by expanding the RHS using (5.8) and solving for X S (ρ, σ). While it is clear that this can be done to leading order, we are not completely sure if there it has a solution to all orders. If that is true, it might be helpful to think about the interpretation of modular extremality: it relates variations of the area with variations of the modular Hamiltonian and this can be used to write the relative entropy as the bulk relative entropy in some X S surface. However, even if it is the case, it is clear that X S will be complicated and nonlinear in ρ, σ.

K bulk,σ ρ and local modular Hamiltonians
At this point, even if we have a formal definition for this modular extremal surfaces, it would be nice to understand better what the different terms mean.
To compute K bulk,σ ρ , in gravity in G N perturbation theory, we have to account for three facts: the surface changes, the background metric changes, the quantum state changes. Only the latter is present in usual field theories. As we discussed before, the fact that the surface changes can be understood in terms of entanglement perturbation theory (and can be combined with the change in the area), and we are going to ignore this dependence in this section. Given that the background metric changes, we should think of the change in the state as a combination of a change in the matter fields plus a shift in the metric due to backreaction. We could deal with by deforming the path integral inserting an operator that changes the metric and this would give us a deformed modular Hamiltonian, as for shape deformations. However, given that the theory is gravitational there seems to be a more natural way to do it: we should think of the bulk modular Hamiltonian in terms of the G N expansion, to leading order it will be quadratic on the fields and then interactions will be present at higher orders. Backreaction is easily introduces by just shifting the tadpole g cl which appears in the modular Hamiltonian, . This is just a shift of the variables, but the different expressions are useful when evaluated in the respective g cl state.
As an example, we can consider K σ , the modular Hamiltonian of a sphere R in the vacuum and ρ some state which varies by an O(1) expectation value of the boundary stress tensor. In this case, the modular Hamiltonian is local: When we have local modular Hamiltonian, we can use Wald's version of Gauss' law [46,30] (see also [24]): where S is an arbitrary gauge-invariant surface that is well defined for the original and the perturbed state (for example by picking a gauge where the surface stays at the same position). Σ S is the spacelike surface between the boundary region R and the surface S. Now, we can use (5.11) to integrate in K σ ρ for ρ perturbatively close to σ, to all orders in perturbation theory. The reason is simple, if we write g ρ = g σ + k λ k δ k g, we have that R ξ.T ρ − R ξ.T σ = k λ k E ∞ (δ k g) is linear in the metric (and ρ) and we can use the gravitational Gauss' law for each term individually.
Now, E tt,lin (δ k g) is nothing but the tadpole of equation (4.9) (technically, (4.9) referred to the G N expansion, but it of course applies to any other perturbative expansion) which we can morally think of as the stress tensor to that order. So, we can write the previous formula as: We expect that this can be used to write K bdy = K S bulk + A S for an arbitrary gauge invariant surface S, but this requires a careful analysis of boundary terms which we will not pursue further. 17 This means that modular extremality is not very helpful for local modular Hamiltonians. As the surface S, the most natural candidates are classical extremal or modular extremal surfaces, but one could choose any other families of gauge invariant surfaces. It is clear from this discussion that we should think of the change in background in K bulk,σ as simply shifting the tadpole from g σ to g ρ . Now, we would to connect the previous story with that of [47]. We can think of their setup in our terms as ρ being a bulk coherent state on top of σ with a semiclassical amplitude, schematically |Ψ ρ = e i √ λ/G N a † |Ψ σ , with a † the graviton creation operator.
We can to work in the limit where the amplitude is large (so that the state is classical) but the states only change the metric perturbatively in λ. Since g ρ , g σ correspond to the same saddle, we can apply our discussion. In this limit, even if in the entanglement entropy the area changes to order G −1 N , the bulk entanglement entropy stays O(G 0 N ), so we do not need quantum extremality. It is less clear if the modular extremal surface changes for coherent states, but we do not need it because of (5.12). We can instead consider the simpler case when S is the extremal surface. In this case, since the bulk entanglement entropy is O(G 0 N ), but the bulk modular Hamiltonian is O(G −1 N ), we deduce that: where we used our expectation that K bdy = K S bulk + A S and for S being the extremal surface the areas cancel in the relative entropy, they would not cancel for modular extremal surfaces. In this way, it is very suggestive to think of the Hamiltonian of [47] as the bulk modular Hamiltonian in the entanglement wedge, in which case the positivity of relative entropy would be a consequence of the positivity of the bulk relative entropy. Again, modular extremality does not seem important in their case because in this symmetric situation, one can choose an arbitrary gauge invariant surface where to integrate the boundary modular Hamiltonian. Of course, to make full connection between (5.12), modular extremality and [47] more precise, one should understand better how the boundary terms and E lin combine to give the bulk modular Hamiltonian to all orders.
More broadly, understanding if (classical) coherent states give an O(1) shift to the position extremal surface when considering modular extremality seems interesting, since quantum extremal surfaces can only shift the entangling surface by O(G N ). This might give a simpler classical setup to compute the dual of the modular Hamiltonian. For example, if we consider a coherent state of scalar fields, where φ λ = 0 + λG −1 N φ cl , we expect the modular extremal surface to shift by a classical δX I (X, y) ∝ λ 2 ds sinh 2 (s/2+iε) T Ir s when computing K bdy (λ = 0) λ holographically. Of course, this is hard to do explicitly, because we have little control over modular Hamiltonians other than those which are local, where we can apply (5.12).
is the bulk modular Hamiltonian modulo boundary terms which turn the linearized area operator into the full area operator. One can see how this works to second order by carefully rewriting T grav as the canonical energy (bulk modular Hamiltonian) plus the quadratic area operator [24].

Discussion
In this paper, we have exploited the variational principle at the level of the replicated path integral to derive the extremality of the entangling functional of higher derivatives, quantum extremality and modular extremality. This is done by thinking about the Rényi entropies and taking the n → 1 limit carefully. This gives closure to the approach of [9] which naturally gives the entanglement entropy functional but makes it hard to derive the extremality condition for general gravitational theories and higher orders in G N . This variational framework is also useful to generalize relation between the equations of motion and the first law for general states.
We would like to close with some comments and future directions. As a general note, across this paper, we have assumed that the bulk saddles have replica symmetry. It would be nice if one could relax this or justify it better (see [48,49] for some discussion about this ).

Higher-derivative gravity
By working at integer n > 1 and then taking the n → 1 limit in higher-derivative theories of gravity, we have discussed how one should in principle determine the splitting terms of [32]. These are determined by demanding that the gravitational action is finite. After fixing these terms, the only remaining freedom comes from changing the location of the surface, and this deformation keeps the action finite.
In Appendix A, we have demonstrated in some nontrivial examples how the n → 1 limit of the Wald entropy at n > 1 gives the gravitational entropy of [12,13]. While our approach is strongly suggestive that this is true generally, it would be useful to work it out explicitly for more general examples.

The equations of motion
About the equations of motion, it would be nice to understand better if by varying the regions that in consideration, one can derive the local equations of motion from the integrated equations of motion. Note that, in contrast with [17], the equations are integrated over one more dimension because of the lack of symmetry.
In order to derive the equations of motion from the first law of entanglement of [14,15], one has to understand the modular Hamiltonian. In general it is complicated, yet its variations are well defined in terms of analytically continued one point functions in the replicated theory. We expect that, in the absence of a more explicit expression for the boundary Hamiltonian, the only way in which one can obtain the equations of motion from the first law is by using the replica trick via the procedure described in Section 3.
Of course, there are other ways in which one could try to get the equations of motion from the RT formula. An alternative option pursued by [50,51] is to show that the boundary expression for the relative entropy around the vacuum for a spherical region matches the expression for bulk relative entropy. The bulk and boundary relative entropies differ off-shell by an integral of the equations of motion and thus one can derive the backreacted equations of motion from the equality of these two quantities. More generally, one might be able to use similar ideas to the ones that we described combined with modular perturbation theory to generalize this approach to other surfaces and states.

Entanglement entropy of gravitons
We defined the entanglement entropy of gravitons by analytically continuing the finite n−1 partition function. Technically speaking, only S gen is well defined, since the separation into two terms is ambiguous: it depends on the details of how the boundary is inserted. This ambiguity is related with the choice of center of [41]. It would be nice to understand better the graviton entanglement entropy from a Hilbert space perspective, along the lines of [39,40,24]. It would be interesting to carry out the perturbation theory described in Section 4.2 to define the entanglement entropy of gravitons beyond the extremal surfaces in G N perturbation theory.

Local modular Hamiltonians and modular extremality
We have also given an argument of how one can in principle think of the results of [47] in terms of bulk relative entropy. Of course, it would be nice to understand this more precisely, by being careful about the boundary terms in the graviton modular Hamiltonian to higher orders.
Modular extremality does not seem necessary when the modular Hamiltonian is local, since there we can just use Gauss' law to integrate in the energy at infinity. It seems hard yet very interesting to understand explicitly some examples of modular extremality for modular Hamiltonians which are non-local. In contrast with quantum extremality , we expect the modular extremal surface to be different from the extremal surface for deformations which are classical (coherent states).

Modular flow and bulk reconstruction
To leading order in G N , the commutator between a properly dressed local operator at a point Z in the entanglement wedge and the modular Hamiltonian is given by the commutator with the bulk modular Hamiltonian. This was used in [26] to show that one can reconstruct operators in the entanglement wedge in terms of the boundary subregion and more recently, it was used in [27] to derive a boundary expression of the bulk operators.
Furthermore, [26] showed that if ρ bulk = σ bulk → ρ = σ, which is clearly true from (1.8), then one can also reconstruct operators deep inside the entanglement wedge. As has been argued recently [52], the analysis of [25,26] is stable under G N perturbations and we expect that our discussion can help find the explicit bulk to boundary mapping in the presence of backreaction. Because of the previous, we do not expect the approach of [27] to break down when G N corrections are considered. To next order, it seems like the correction to the difference between modular flows is determined by the shift in the surface: We leave for future work understanding this contribution to the commutator, but we expect that by carefully understanding the previous one can generalize [27] to higher orders in G N . the Institute for Advanced Study. A.L. acknowledges support from the Simons Foundation through the It from Qubit collaboration, as well as the support of a Myhrvold-Havranek Innovative Thinking Fellowship. A.L. would also like to thank the Department of Physics and Astronomy at the University of Pennsylvania for hospitality during the development of this work.

A Dilaton gravity with higher derivative interactions
In this appendix, we study the gravitational entropy in toy models of higher derivative gravity: 2d dilaton gravity coupled to matter fields with higher derivative interactions. These theories can arise from dimensional reduction of higher derivative gravity in more than two dimensions. We demonstrate how to solve the "splitting problem" and calculate the entropy functional A gen in these toy models. Furthermore, we verify (1.3) and (2.14) by showing directly from the equations of motion that the entropy is obtained by extremizing A gen , and its extremal value agrees with the n → 1 limit of the Wald entropy.
Throughout this appendix, we define ǫ ≡ n−1 and adopt a complex coordinate system (z,z) on M n such that the metric is in the conformal gauge ds 2 = e 2ψ(z,z) dzdz (A.1) and the origin is the Z n fixed point. The Z n symmetry acts as a discrete rotation z → ze 2πi/n . We will study solutions of the equations of motion for ψ, the dilaton φ, and additional matter fields. At n = 1, these fields have regular Taylor expansions around z = 0. For example, we have for the dilaton. Away from n = 1, such expansions become much more complicated. Near n ≈ 1, we may generally expand the dilaton as φ(z,z) = φ 0 + φ 1 (zz) ǫ + φ 2 (zz) 2ǫ + · · · + z 1+ǫ [φ z,0 + φ z,1 (zz) ǫ + · · ·] + c.c.
and similarly for other fields. Here c.c. denotes complex conjugate. As we go away from n = 1, each term in the expansion (A.2) "splits" into a Taylor expansion in (zz) ǫ . Continuity at n = 1 therefore requires the following matching conditions: φ µν = φ µν,0 + φ µν,1 + φ µν,2 + · · · , (A. 6) and their higher-order analogues. Here µ = z,z, and we have only kept zeroth-order terms in ǫ in coefficients such as φ m and φ µ,m . Higher-order terms in ǫ are negligible for the purpose of calculating the von Neumann entropy in our examples. The gravitational entropy A gen can be calculated as in [12], but the result would depend on how the n = 1 coefficients split into n = 1 coefficients in (A.4)-(A.6). On the other hand, A gen should depend only on the n = 1 solution (A.2) in order to be a useful entropy functional. This is the "splitting problem." As we will demonstrate explicitly below, the solution to this problem is that the equations of motion near n ≈ 1 are sufficient to fix the split of coefficients in (A.4)-(A.6), at least to the extent of allowing us to write A gen in terms of the n = 1 coefficients appearing in (A.2).

A.1 One matter field
Let us first consider the following theory of dilaton gravity coupled with a single scalar field σ with higher derivative interaction: The equation of motion for the metric is whereas the equations of motion for the dilaton φ and the scalar σ are Using (A.9) we find a flat space with the conformal factor ψ = 0, greatly simplifying the other equations. If we want, we could get an AdS solution instead by replacing R with R + 2 in (A.7); this leads to ψ = − log 1 − zz 4 but our conclusion is largely unaffected. Solving the other equations of motion near n ≈ 1, we find σ m>0 = 0, σ µ,m>0 = 0, σ µν,m>0 = 0, (A.11) φ z,0 = 2λσ z,0 σ zz,0 , φ z,1 = 2λσz ,0 σ zz,0 , φ z,m>1 = 0. σ 0 =σ, σ µ,0 =σ µ , σ µν,0 =σ µν , φ 0 =φ − 2λσ zσz , (A.14) Let us make two comments before continuing. First, these relations are uniquely determined from a local analysis of the equations of motion near a small conical defect in the quotient spaceM n , and are universal in the sense that they do not depend on whatever boundary conditions we impose at the asymptotic boundary of spacetime. The reason for this is that these relations arise from setting to zero the most singular terms in the equations of motion expanded around z = 0. This is a good feature because the entropy functional A gen should only depend on local geometric quantities once we fix the gravitational action. Our second comment is that the split ofσ z (andσz) is over-constrained as shown in (A.15), but we will see that this is a feature not a bug.
The gravitational entropy can be easily calculated as in [12]: As promised, this entropy functional can be rewritten 18 in terms of fields and their derivatives at n = 1: A gen = S Wald + S anomaly , S Wald = 2πφ, S anomaly = −4πλσ zσz . (A.17) Moreover, it agrees with the n → 1 + limit of the Wald entropy which is identical to (A.17) after using (A.14). It is worth noting that in taking the above limit we need to calculate the Wald entropy at n > 1, and φ 1 does not contribute to this. Therefore, the Wald entropy has a discontinuity of the amount 2πφ 1 at n = 1, which is precisely compensated by S anomaly in (A.17). We satisfy the extremality condition ∂ µ A gen = 0 because it reduces to ∂ z A gen = ∂ z (2πφ − 4πλσ zσz ) = 2π φ z − 2λ (σ zσzz +σzσ zz ) (A. 19) which vanishes due to the extra constraint (A.15).

A.2 Two matter fields
The previous example may seem too simple for experts, so let us now study a more complicated theory of dilaton gravity coupled with two scalar fields σ and ω with higher derivative interaction: The equation of motion for the metric is whereas the equations of motion for the dilaton φ and the other scalars σ, ω are Again we find a flat space with the conformal factor ψ = 0. It is difficult to solve the other equations of motion for arbitrary λ, so we will work perturbatively in λ and write the solution as φ = φ (0) + λφ (1) + λ 2 φ (2) + · · · (A. 25) with similar expansions for other fields. At the zeroth order in λ, we find the familiar case of dilaton gravity without any higher derivative interaction: z,1 = −σ From these results we can determine the gravitational entropy as in [12]. Let us find the contribution to A gen from each term in the action (A.20). We will work to second order in λ. The contribution of the φR term is 2π φ (A. 36) From the (∇σ) 2 term we get This also agrees with the n → 1 + limit of the Wald entropy lim n→1 + S Wald (g n ) = 2π φ which can easily be shown to be identical to (A.40).
It is worth noting that if we forgot about splitting and proceeded naïvely, we would miss the λ 2 term in (A.45). Therefore, this example shows that we cannot in general forget about splitting in calculating the gravitational entropy.
It is possible to check the extremality condition ∂ µ A gen = 0 by working out the relevant part of the zz component of (A.21) to second order in λ.

B Polyakov action
A toy model to understand these issues would be to consider 2d dilaton gravity in the presence of m quantum scalar fields [53] I = 1 2π dx 2 √ g e −2φ (R + 4(∂φ) 2 + 4λ 2 ) − m 96π R∇ −1 R (B.1) In the limit of large m, one can compute stuff at finite N = m . The second term can be thought of as (∂η) 2 − 2ηR, with ∇η = R. This expression suggests that S Wald = N 12 η 0 . This might seem too quick, but [54] showed using the Noether charge methods that S Wald = N 12 η 0 , so that the total entropy is where η 0 , which is non-local in general, was expressed in terms of the metric in conformal gauge, ds 2 = e 2ρ dzdz. The quantum extremality condition would be −4e −2φ 0 ∂φ 0 + N 6 ∂ρ 0 = 0.
The equations of motion are [53]: and similarly for∂. Now, the question is whether given some metric ρ, the equations of motion can be solved if one adds a small conical singularity δ n ρ = (n − 1) log zz. If N = 0, then it was shown [9] that a δ n φ change cannot cancel the singularity of ∂δρ = (n−1) z , so one concludes that ∂φ = 0.
In the presence of N, there will be two kind of terms linear in δ n : ∂δ n ρ(4∂φe −2φ − N 6 ∂ρ) + δ n (e −2φ ∂φ)∂ρ + 2δ n (e −2φ ∂ 2 φ) − N 12 ∂ 2 δ n ρ = 0 (B.4) with δ n ρ = (n − 1) log zz. If we consider φ = ρ = 0, then the equation is solved by setting δ n φ = N 24 δ n ρ. I imagine that for a non trivial background, one can cancel the n−1 z 2 between brackets by picking an appropriate δ n φ. This then results in the condition (4∂φe −2φ − N 6 ∂ρ) = 0 which is the quantum extremality condition. Naively, it seems non trivial that one would get the quantum extremality condition because the gravitational action is non-local. However, in this particular case, after adding an extra field the action becomes local and thus the usual arguments apply.