Relative entropy and the RG flow

We consider the relative entropy between vacuum states of two different theories: a conformal field theory (CFT), and the CFT perturbed by a relevant operator. By restricting both states to the null Cauchy surface in the causal domain of a sphere, we make the relative entropy equal to the difference of entanglement entropies. As a result, this difference has the positivity and monotonicity properties of relative entropy. From this it follows a simple alternative proof of the c-theorem in d=2 space-time dimensions and, for d>2, the proof that the coefficient of the area term in the entanglement entropy decreases along the renormalization group (RG) flow between fixed points. We comment on the regimes of convergence of relative entropy, depending on the space-time dimensions and the conformal dimension $\Delta$ of the perturbation that triggers the RG flow.


Introduction
The renormalization group (RG) flow describes how physics changes with scale in a quantum field theory (QFT). In recent years, interesting connections of these flows with quantum information theory (QIT) have been discovered. A universal term in the vacuum entanglement entropy (EE) was shown to decrease monotonically along the RG for space-time dimensions d = 2, 3 [1,2,3]. This gives an alternative proof of the c-theorem in d = 2 [4] and a proof of the F-theorem in d = 3 [5,6]. In addition to unitarity and Lorentz covariance of the QFT, the key property of these proofs is strong subadditivity of entanglement entropy. Holographically, the monotonicity of the RG flow is related to the null energy condition in the bulk [5,7]. More generally, the fine-grained RG flow in terms of tensor networks [8] has been proposed as a description of the spatial structure of the holographic gravity dual [9].
A natural information theory tool to study changes between states is the relative entropy. This meassures distinguishability between different states in a precise operational way [10]. In the context of the renormalization group flows a natural idea is to use relative entropy to quantify how a theory (or its vacuum state) gets modified as we change the scale. 1 In this work we consider quantum relative entropies in real time, between vacuum states of two theories reduced to certain regions, and look at the consequences of positivity and monotonicity of relative entropy. We follow the steps of the recent work [13], where relative entropy was shown to lead to a simple proof of the g-theorem for d = 2 conformal field theories (CFT) in a space with a boundary at x = 0.
Evidently, not every pair of vacuum states of two different theories can be compared through the relative entropy. Different theories, i.e. containing one and two free scalar fields respectively, usually live in different Hilbert spaces, and there is no natural meaning in taking a relative entropy in this case. In order to compute a relative entropy, we need that (at least in presence of a physical UV cutoff such as a lattice) the microscopic constituents of the two models be the same. For this reason, we will study theories with the same UV fixed point, where this can in principle be achieved. More precisely, we will fix as a reference state the UV conformal fixed point itself, and study the relative entropy with another state arising from the CFT by perturbing it with a relevant operator. We will argue that relative entropy gives a useful notion of statistical distance between these theories, and is well-suited for capturing global properties of RG flows.
Relative entropy is notoriously efficient in distinguishing states. It essentially takes into account all fine grained information about the states. In our setup this is reflected in the possible presence of divergences. In order to get definite results for RG flows, we need to avoid these divergences and prevent the relative entropy from distinguishing the states too much.
Divergences may be of UV origin, due to the fact that even if the two theories we consider approach each other at short distances, the correlators of the deformed theory do not converge to the ones of the CFT fast enough to make the relative entropy finite. We will find a range of the conformal dimension ∆ of the perturbation that triggers the RG flow where relative entropy is free from UV divergences.
There are also divergences of infrared origin, coming from the difference between the states that pile up for large distances. In fact, if we take the two full vacuum states relative entropy will always be divergent as they correspond to two different pure states. However, this problem is circumvented by looking at the states reduced to a finite region in space. The size R of the region will be the parameter with which we can look at the RG scale. In general, we find that relative entropy increases super-volumetrically as R d due to the contribution of the modular Hamiltonian. Following [13], we will then compare the states on a null surface. This effectively reduces the relative entropy to terms increasing like the area ∼ R d−2 , giving direct information on the entanglement entropy and aspects of its RG flow.
The main result is a new proof of the c-theorem in d = 2, that extends to higher dimensions d > 2 as a statement about the renormalization of the area term in entanglement entropy. This is shown to be always decreasing between fixed points, but there is a restricted window of conformal dimensions ∆ < (d + 2)/2 where the change is finite. This is parallel to studies of the renormalization of the Newton constant [14,15,16,17].
The expression in terms of relative entropy gives a more transparent informationtheoretic interpretation to these RG monotonicity results. The c-theorem is equivalent to the following QIT statement: The vacuum ρ 1 of an RG-running theory can be distinguished (using the relative entropy measure) from the vacuum ρ 0 of the UV fixed point, compared on the null Cauchy surface of a sphere of radius R, by the amount for radius R bigger than the scale m characterizing the RG flow; c U V and c IR are the central charges of the UV and IR fixed points. Then the central charge difference c U V − c IR controls the distinguishability, or statistical distance, between the two theories. The c-theorem then amounts to positivity and monotonicity of the relative entropy, and can be explained as due to the increased distinguishability of two states as we increase the algebra of operators that are available to probe them. In higher dimensions, we prove a similar inequality for the difference in the EE area terms of the two theories. The work is organized as follows. In Sec. 2 we study relative entropy for the vacuum states of two theories, its dependence on the Cauchy surface where the states are compared, and whether this relative entropy is finite or UV divergent. In Sec. 3 we study the consequences of positivity and monotonicity of relative entropy evaluated on a null Cauchy surface. We prove the c-theorem in d = 2 and the area theorem for the entanglement entropy in d > 2. In Sec. 4 we discuss the results. Finally, the Appendix describes explicit computations for free fields.

Relative entropy for states of different theories
The relative entropy between two density matrices ρ 0 and ρ 1 is defined by We are interested in the relative entropy of the vacuum states of two theories, reduced to certain surfaces. The surfaces are usual spatial, but we will also consider the null case. The two theories are denoted by T 0 , and T 1 . We are going to take T 0 as a CFT and T 1 is obtained by perturbing T 0 with a relevant deformation, starting an RG flow: The scaling dimension of the operator O at the fixed point g = 0 is denoted by ∆; the perturbation is relevant for ∆ < d. This construction ensures that T 0 and T 1 have the same operator content in the UV. As these states belong to two different theories, they are evolved in time with two different Hamiltonians. Hence, we have to be more specific on the instant of time when we compare the states, because they will undergo different unitary evolutions, and as a consequence relative entropy will depend on time.
As shown in [13] for the simpler setup of the g-theorem, the dependence of relative entropy on the Cauchy surface can be exploited to reduce (and eventually eliminate) contributions from the modular Hamiltonian to relative entropy. In this case, the entanglement entropy inherits the monotonicity and positivity properties of relative entropy, and this can be used to understand RG flows. We will apply this idea to flows of the type (2.2). In this section we study the dependence of relative entropy on the Cauchy surface, and analyze in detail the null limit. In Sec. 3 we will consider the consequences for the RG.

Reduction to a spatial region of two states of different theories
In order to clarify the dependence of relative entropy on time, let us first consider only one QFT and review the standard way the state reduction is achieved in space-time. We can describe the operator content of the theory T 0 in any global Cauchy surface Σ gl (where gl stands for global) by a set of fields we call generically φ λ (x), with x ∈ Σ gl , that form a complete set of generators for the operators in the Hilbert space. These set of operators may include time derivatives of the fields, or to adapt this description to Σ gl , derivatives in the normal direction to Σ gl instead of time derivatives. For any Σ ⊆ Σ gl we can form the algebra A Σ generated by polynomials of the operators localized in this spatial region. Given a global state ρ 0 gl , its restriction to A Σ gives the reduced state ρ 0 Σ to Σ. This is just the state 2 on A Σ that gives place to the same expectation values than the global state would give for all operators in this region. Notice that we can take an arbitrary state and have not used the dynamics or the Hamiltonian of the theory in this construction. 2 We are using the abstract definition of a state as a positive normalized linear function on the operators of an algebra with values in the complex numbers. See for example [18]. This is a density matrix once a basis has been selected to write the operators. We often interchange between the abstract state and its the density matrix representation. Let us consider another spatial surface Σ with the same causal development D as Σ (see figure 1). In the Heisenberg representation, states do not depend on time and operators obey the usual Heisenberg equations of motion. Operators localized at points in Σ belong to the causal development of Σ and can be written in terms of the ones in Σ using the equations of motion. This identification depends on the Hamiltonian of the theory. Taking this into account we see that the algebra generated by the fields on Σ coincides with the one on Σ. Since the global state does not depend on any choice of Cauchy surface in the Heisenberg representation, and the algebra on the two surfaces is the same, we conclude the reduced states ρ 0 Σ and ρ 0 Σ are the same. That is, they give the same expectation values, for the same operators on the same algebra, where operators are identified between Σ and Σ using the equations of motion. Therefore, the entanglement entropies are the same, if they are regularized in the same manner (for instance, by using the mutual information to provide a geometric cutoff). Relative entropies for two different states in this theory will be independent on the choice of Cauchy surface. The subalgebra of operators, and the reduced states, can then be thought of as functions of the causal completion or causal development D of Σ (which coincides with the one of Σ ), rather than functions of Cauchy surfaces. Now, let us modify the Hamiltonian by adding a source term as in (2.2), in such a way that we can still describe a generating basis for the operators in a Cauchy surface by the same set of fields, that we callφ λ (x) for this new theory T 1 . We might need to impose a cutoff to do so. 3 Let us also consider the Heisenberg representation with respect to this new Hamiltonian, and another global state ρ 1 gl for this new theory. Again, ρ 1 Σ and its entropy will be invariant in changing Cauchy surfaces Σ and Σ (the density matrix representing this state can of course change if we change basis). Accordingly we will drop the subindex Σ of these states.
If we want to compare the two states of the two theories with relative entropy we need to identify the Hilbert spaces, or equivalently, the algebra of operators of the theories, in a precise way. For doing this identification we will use a Cauchy surface. Given a Cauchy surface Σ we naturally identify the field operators φ λ (x) withφ λ (x) for x ∈ Σ. Formally, the identification φ λ (x) ↔φ λ (x) is carried out by a unitary operator U Σ that maps Hilbert spaces and operators between theories such that The expectation values of the operators φ λ (x) on Σ computed with the two states ρ 0 and U Σ ρ 1 U † Σ define two different reduced states on the same algebra. The state U Σ ρ 1 U † Σ gives just the same expectation values on the fields of the first theory as ρ 1 on the fields of the second theory, We can then compute the relative entropy S(U Σ ρ 1 U † Σ |ρ 0 ). Analogously we can compute S(ρ 1 |U † Σ ρ 0 U Σ ), with the same result. This follows from the invariance of relative entropy under the simultaneous change of the states by the same unitary.
To be clear, both states, ρ 0 of T 0 and ρ 1 of T 1 define expectation values for operators in D in each theory. To compute the relative entropy between these states we map the algebras by identifying its local basis elements: φ λ (x) ↔φ λ (x), that is, with (2.3). We can write this relative entropy simply as This construction does not differ form the usual way relative entropy is computed in lattice systems. For instance, we can imagine a lattice on the surface where spin degrees of freedom sit at the vertices. We have two states, coming for example from the fundamental states of two different Hamiltonians. Then we can compute the relative entropy between these two states by assuming the spin operators are identified.
We do this at each Cauchy surface under consideration. If we pick another Cauchy surface Σ in the same Causal domain D of Σ, the relative entropy we have defined will depend on the Cauchy surface; S Σ (ρ 1 |ρ 0 ) will differ from S Σ (ρ 1 |ρ 0 ). The reason for this change is that the identification of local basis elements φ λ (x ) ↔ relative entropy in the continuum limit will be translated into the question about the finiteness of this quantity as we remove the cutoff. φ λ (x ), x ∈ Σ , will be different from the identification in Σ, or, in the above language, U Σ is different from U Σ . This is because the local fields φ λ (x) of Σ can be expressed, by the equation of motion of T 0 , as a certain non-local function ] to express the fields, because the theories Since F Σ 0 and F Σ 1 are different functions, this is not compatible with the identification of local fields on Σ . As a result, identifying local operators in different surfaces leads to different relations between Hilbert spaces.
In a general interacting theory it is difficult to obtain F Σ explicitly. Fortunately we will not need it. As an example where the evolution between surfaces can be made explicit, consider as T 0 a free scalar field of mass m 0 . We have where x ∈ D and x ∈ Σ , h is the induced metric on Σ , η µ is the unit vector normal to Σ , and is the commutator function of the scalar field of mass m 0 . The normal derivative is the momentum operator adapted to Σ , and has to be consider an independent operator on this surface ( We can consider as the theory T 1 a scalar field with a different mass m 1 . This has a different commutator function C 1 in place of C 0 in (2.6), givingφ(x) as a different combination of fields in Σ .

Conformal interaction picture
The previous construction based on the Heisenberg representation makes manifest the dependence of the relative entropy on the choice of Cauchy surface. However, it is not the most convenient approach for concrete calculations. For this reason, we now present an equivalent discussion in terms of a "conformal interaction picture", which is a generalization of the standard interaction picture representation of QFT.
In the interaction picture of weakly coupled QFT, the Hamiltonian is split into a free part H 0 and an interacting part H int . Operators in the interaction basis are chosen in the Heisenberg representation of the free Hamiltonian H 0 , and states then evolve unitarily according to the evolution operator for H int , Here T denotes time-ordering, and H int is written in the interaction picture. This leads to the standard perturbative expansion around the free theory. In our case, instead of a free theory we have a CFT, and the interaction is given by the perturbation . We then define a conformal interaction picture where operators are in the Heisenberg representation of the CFT Hamiltonian, while the state evolution is given by H int . In more detail, let us denote the Heisenberg vacuum of the CFT T 0 by |0 and its Heisenberg operators by φ λ (x) as in the previous section. For the perturbed theory T 1 , we note the corresponding objects by |Ω andφ λ (x). Time-ordered correlators of T 1 become, in the interaction picture, The factor in the denominator 4 arises from the evolution that maps |0 0| into |Ω Ω|. In this way, an expectation value in T 1 is reduced to the calculation of a correlation function in the CFT T 0 . In particular, for small g the right hand side in (2.9) can be evaluated using the standard rules of conformal perturbation theory.
We can now redo the steps in section 2.1 in the interaction picture. The operators for T 0 and T 1 are now the same, φ λ (x), corresponding to the Heisenberg CFT operators. Therefore, and recalling the map (2.9), we can now think in terms of two different states ρ 0 and ρ 1 in the same theory. For concreteness, consider reduced states on a spatial region associated to the vacuum states (it is easy to extend the following discussion to more general states). As before, we choose a global Cauchy surface Σ gl , and let Σ be a part of it. The Heisenberg vacuum of T 0 gives a state ρ 0 that is independent of Σ. However, the state ρ 1 for T 1 evolves explicitly with time.
For a surface of constant time, the evolution is given by (2.8). For instance, the state at t = 0 is given by with K a normalization factor that sets trρ 1 = 1. For a more general surface, we can evolve the state using a source g(x; Σ gl ) that is nonzero and equals g only for x in the region of spacetime below the surface Σ gl : and From here we have for two surfaces, Σ and Σ to the future of Σ, where V ΣΣ is the spacetime region between Σ and Σ . This exhibits how the state ρ 1 Σ depends explicitly on Σ in the interaction picture; expectation values calculated with this state (such as the relative entropy) will also depend on the Cauchy surface.

Modular Hamiltonian
It is convenient to express the relative entropy by the equivalent expression is the difference of von Newmann entropies, and is the difference of the expectation values of the modular Hamiltonian In (2.16) and (2.17) the states appear in the same order as they enter in the arguments of S(ρ 1 |ρ 0 ). In the present case, ∆S gives the difference between the entanglement entropies of the two vacuum states in the same region. This term does not depend on the choice of Cauchy surface. The dependence on Σ comes exclusively from the expectation value of the modular Hamiltonian, 5 H is an operator in the theory T 0 . Its expectation value in the state ρ 0 is independent of the Cauchy surface. However, its expectation value using the second state ρ 1 Σ depends on which surface we have identified operators.
In order to proceed we will choose T 0 to be a CFT, ρ 0 is its vacuum state, and restrict attention to the case where the boundary of Σ is a d − 2 dimensional sphere. The modular Hamiltonian for this case has a simple expression in terms of the energy momentum tensor T µν of the theory T 0 [19,20], (2.20) Here η µ is a norm one, future pointing, normal vector to the Cauchy surface Σ, and ξ ν is the conformal Killing vector corresponding to conformal transformations keeping the sphere fixed. For a sphere centered at the origin in the plane where R is the radius of the sphere. One can check that the current j µ = ξ ν T µν is conserved using that T µν is symmetric, conserved, and has zero trace. This makes H a conserved charge independent of the Cauchy surface in T 0 , but this is not the case when we evaluate its expectation value using ρ 1 Σ . In order to evaluate ∆ H Σ we need to understand the change in expectation value of the stress tensor ∆ T µν (x) Σ . This is a local operator and its expectation value in the new state ρ 1 Σ depends on the structure of the state (the correlation functions) near the point x on this surface. Then, we expect a local expression, that can involve only local tensors. These are g µν and all local geometrical quantities that can be constructed with the Cauchy surface, such as the normal η µ , the extrinsic and intrinsic curvatures, etc.. Given the Lorentz invariance of each vacuum in its respective theory, no other tensors can appear.
However, curvature terms can only appear as corrections accompanied by positive powers of the cutoff, for example in the form K 2 ij 2 , with K ij the extrinsic curvature of Σ and a short distance cutoff. This is because we are evaluating the expectation value of a local operator for a QFT in flat space, and the shape of Σ only enters in the correlation functions through the distance between points. For example, in a lattice regularization T µν can be written in terms of operators at a point and few of its neighbors, and the expectation value in the state ρ 1 Σ depends only on short distance correlations functions on the lattice. We show some explicit examples for free fields in the Appendix. The curvature then only enters modifying the distance of nearby points, and is always accompanied by the cutoff. These terms can be neglected if the curvature is much smaller that the cutoff scale. We will always assume that this is the case. This is also necessary since we can define the position of the Cauchy surface only at scales larger than the cutoff.
Therefore we have the general form We have used the fact that the stress tensor of the CFT is traceless. This expectation value depends on the Cauchy surface through the normal vector η µ , and this is crucial in order to have a traceless symmetric tensor in an otherwise Lorentz invariant computation. Note ∆ T µν (x) Σ does not transform as a Lorentz tensor unless Σ is also transformed. Eq.(2.22) will be quite important for our arguments below. For this reason, in the Appendix we perform explicit calculations of ∆ T µν for mass flows in free scalar field theories, and exhibit the dependence on the Cauchy surface. Let us find out the possible behavior of the constant k with the cutoff. If k is divergent with the cutoff we expect a perturbative calculation would give its leading behavior. The reason is that the coupling g in (2.2), responsible for deforming T 0 into T 1 , is relevant and hence goes to zero in the UV. Perturbative corrections start at second order in g since T µν O = 0 for a primary operator in a CFT.
Then, this simple geometrical dependence is described by the flux of the field ξ ν through Σ (see figure 2). This changes because the flux in (2.24) is not constant and as a consequence of Gauss' theorem and V ΣΣ is the space-time region between the two surfaces. The infrared behavior of the expectation value of the modular Hamiltonian follows from this integral. For the planar Cauchy surface at x 0 = 0 we get where R is the radius of the spherical entangling surface and Ω = 2π is the area of the unit sphere immersed in R d−1 (S d−2 sphere). The same superextensive behavior ∼ R d holds for other spatial surfaces that do not approach much to the null horizon of the causal development of the sphere.

The null limit
Having understood the general dependence on the Cauchy surface, we are now ready to approach the null limit. From the expression (2.21) for ξ ν and the definition of η µ , we find that on the limit of the null Cauchy surface for the sphere In fact, both vectors becomes null vectors on the null Cauchy surface. With this, and (2.24), we obtain the interesting result This limit, however, is not necessarily justified because the coefficient in (2.24) can be divergent. As we mentioned above, we need to assume that the typical scale of curvature of Σ is large with respect to the cutoff . As we go to the null surface, the extrinsic and intrinsic curvature of a spatial surface will typically diverge. For example, a hyperboloid (x 0 ) 2 − ( x) 2 = a 2 has a curvature scale of order a −1 , and the null limit is a → 0. Put differently, we need that the cutoff scale is always much smaller than the total length across the surface Σ, in order for example, to associate the cutoff to a physical lattice on the surface. Hence, we need to keep a as we take the null limit a → 0. We can take the ratio a/ to be some arbitrarily large number, but keep it fixed as we take the simultaneous limit ∼ a → 0. This automatically keeps the curvature terms in (2.22) under control. Given this, we should understand next when ∆ H Σ vanishes.
Let us examine the expression (2.24) in the null limit. For simplicity we consider as Cauchy surfaces a family of hyperboloids Σ a parametrized with the radius a, That is, the null limit enlarges the window where the modular Hamiltonian gives a finite contribution from ∆ < d/2 to ∆ < (d + 2)/2. In this window in fact this contribution vanishes in the null limit. We do not have control of the null limit for ∆ ≥ (d + 2)/2.
In some special theories having a UV fixed point with free scalars, the modular Hamiltonian has an additional boundary term [21,22,23,16]. This term scales like the area R d−2 and does not depend on the Cauchy surface. Then it does not vanish in the null limit. However, this does not alter the conclusions about the relative entropy we want to make in this section. We discuss boundary terms in the modular Hamiltonian in more detail in the Appendix.

Entanglement entropy and regimes of relative entropy
Let us now briefly analyze the contribution of the entanglement entropy to the relative entropy. As we mentioned before, this does not depend on the Cauchy surface. The contribution of the entanglement entropy, in contrast to the one of the modular Hamiltonian, will generically be a complicated function of R that depends on the full RG running of the model. We will say more about the entanglement entropy in the next section; however, the main features are well known. At the fixed points its leading term is proportional to the area, except for d = 2 where it can grow logarithmically with R. We can ask when the EE will give a finite or divergent contribution. Again we expect that in the divergent case we can do a perturbative treatment. The divergent terms are going to be proportional to the boundary area since divergences are related to local entanglement that is extensive on the boundary of the region. Then we expect on dimensional grounds (2.35) The allowed window for having finite ∆S is ∆ < (d+2)/2. This is well known from holographic calculations [24,25,26] and direct computations of the renormalization of the area terms [27,28,29,16,17]. This coincides with the window (2.34) for having vanishing ∆ H in the null limit. We do not know of a deeper reason for this agreement. With this information and the one of the modular Hamiltonian we can summarize the different regimes for relative entropy between the two theories.
First, for spatial surfaces (flat, or with curvature ∼ R −1 ) the relative entropy is dominated by the contribution of the modular Hamiltonian for large distances. In the infrared it grows superextensively as R d . It is UV finite only for the window of perturbations with dimensions ∆ < d/2. For this range of ∆ and at short distances, the entanglement entropy is finite; conformal perturbation theory then gives ∆S ∼ g 2 R 2(d−∆) , which goes to zero faster than R d for small R. The modular Hamiltonian thus dominates over the entanglement entropy at all scales for ∆ < d/2. Since the entanglement entropy is independent of Σ, the relative entropy changes with Cauchy surface in a simple geometric form as the modular Hamiltonian, On the other hand, the limit of relative entropy on null surfaces is finite for dimensions ∆ < (d + 2)/2, extending the range ∆ < d/2 of spatial surfaces. In this window the contribution of the modular Hamiltonian vanishes and the relative entropy is entirely due to the entanglement entropies S(ρ 1 |ρ 0 ) = −∆S. It grows as the area ∼ R d−2 in the infrared. The null relative entropy is finite for the same window in which it can be defined as a limit from the relative entropy of spatial surfaces, ∆ < (d + 2)/2.
The result S(ρ 1 |ρ 0 ) Σ null = −∆S (or ∆ H Σ null = 0) gives to the null surface a special status. The relative entropy computed on it do not distinguish the vacuum states ρ 1 , ρ 0 as much as when computed in other (spatial) surfaces of the same causal domain. The reason for this is that, as we take the null limit, correlations in the direction that is getting null become short distance correlations, and then are less efficient in distinguishing the state from its UV limit.

Consequences for the entanglement entropy
The previous result ∆ H Σ null = 0 in the window on a null Cauchy surface. This reveals that −∆S has the positivity and monotonicity properties of the relative entropy, In this section we explore the consequences of this result in two and higher dimensions. For d = 2 we find a simple alternative proof of the c-theorem, while for d > 2 this will lead to the monotonicity of the area term in the entanglement entropy.

A simple proof of the c-theorem
Let us consider the implications of (3.2) for RG flows in d = 2 spacetime dimensions. In this case, the window (3.1) becomes 0 < ∆ < 2, capturing all possible deformations by relevant operators. We take the theory T 0 as an UV 2d CFT with central charge c U V . We recall that, in this case, the entanglement entropy for an interval of size R is of the form where is a short-distance cutoff and c 0 is a nonuniversal constant. In contrast, the entropy for T 1 will have a more complicated radial dependence because it undergoes a nontrivial RG flow. However, at distances much longer than the typical mass scale m ∼ g −1/(d−∆) of the RG flow, T 1 goes to the IR fixed point of central charge c IR . Taking into account that the UV divergences are still controlled by the UV fixed point of central charge c U V , the EE for T 1 at large distances is given by Subtracting (3.4) to (3.5), we obtain the difference in EE between both theories at long distances is given by This provides a new derivation of Zamolodchikov's c-theorem [4] using the relative entropy on null surfaces.

Monotonicity of the area term in entanglement entropy
Having understood the result for d = 2, let us now consider QFTs in d > 2. Note that for d > 2 the restriction (3.1) puts an upper bound ∆ < (d + 2)/2 on the dimensions of RG perturbations. When (d + 2)/2 < ∆ < d, the perturbation is still relevant but the change in the modular Hamiltonian no longer vanishes; it is then not clear whether −∆S, which is also divergent in this range, inherits the monotonicity and positivity properties of the relative entropy. It would be interesting to study in more detail the regime (d + 2)/2 < ∆ < d, looking for possible cancellations of divergences, but in this work we restrict for simplicity to ∆ < (d + 2)/2. The EE for a QFT on a sphere of radius R, much bigger than all the length scales of the theory, is extensive on the boundary of the sphere, and hence where µ is a constant of mass dimension d − 2, and '. . .' are terms subleading in R. We want to understand properties of this area term along RG flows.
For a CFT such as theory T 0 above, dimensional analysis dictates that where k 0 is a nonuniversal constant. On the other hand, theories with RG flows have additional mass scales that can also enter here. For T 1 this is determined by g, the coefficient of the relevant perturbation. If conformal perturbation theory applies, the first correction is of order g 2 , and hence we expect See also (2.35). The second term is divergent for ∆ > (d + 2)/2, which is outside the range of dimensions (3.1) under consideration. Instead, for ∆ < (d + 2)/2, the contribution to the area term sourced by the RG will be finite, The dimensionless coefficient k 1 is in general non perturbative. Comparing T 0 and T 1 through the relative entropy on a null surface implies ∆S < 0; this says that the coefficient of the area term decreases along RG flows, ∆µ < 0, or µ U V > µ IR . (3.12) We call this the area theorem. Note that the nonuniversal divergent term proportional to 1/ d−2 is the same in both theories, and hence it cancels out from this inequality. Therefore the finite renormalization in the area term in T 1 has to be negative, k 1 m d−2 < 0. We also note that the monotonicity condition d ∆S dR ≤ 0 does not give rise to new inequalities in this analysis of the IR behavior. For d = 2 and d = 3 eq. (3.12) also follows from strong subadditivity [1,2]. This result has some interesting implications for gravity. The idea that part of the black hole entropy is due to entanglement entropy, suggests that the universal area term in the EE should agree with the renormalization of Newton's constant. This was made more precise in [28,29,16,17], who related the Adler-Zee formula [14,15], (where Θ(x) = T µ µ (x) is the trace of the stress tensor) to the finite part of the area term in the EE. These derivations use the first law of EE [30] or holography [17].
From our approach, the universal part of the area term (given by ∆µ = µ IR − µ U V ) is proven to be negative due to its relation to relative entropy. This does not use positivity of the stress-tensor two-point function, as in (3.13), and does not need to go through the first law of EE or holography. The situation is analogous to what happened in d = 2, where positivity of the stress-tensor twopoint function leads to the c-theorem [31], while our proof relied on positivity of the relative entropy. In fact, the derivation based on the relative entropy emphasizes the common origin between the c-theorem and the area theorem, something that was also seen in the holographic context in [17]. Furthermore, our approach identifies ∆µ with a well-defined continuum quantity, and suggests further connections between quantum corrections to gravity and relative entropy.

Conclusions
In this work we have shown that the c-theorem in d = 2 and the decrease in the area term of the entanglement entropy between short and large distances are required by positivity and monotonicity of relative entropy. These results coincide with analogous results that use either reflection positivity of stress tensor correlators or strong subadditivity of entanglement entropy. However, as a bonus, the present proof relying on relative entropy gives a more direct QIT interpretation for the irreversibility of the RG: it corresponds to an increased distinguishability of vacuum states in a region as this region gets larger, allowing more operators to be used to distinguish states.
In this sense, these monotonicity properties of the RG are a common quantum mechanical phenomenon. However, relativity and QFT enter crucially in the proof, in the fact that we needed to compare the states on null surfaces. Otherwise the relative entropy distinguishes the states too much, giving non interesting information. The null surface decreased distinguishability in such a way that relative entropy turns out to be reduced to minus the difference in entanglement entropies. The reason the vacuum of the theory and the one of its CFT ultraviolet fixed point get more similar when compared on the null surface is physically clear. The correlators along null directions are UV correlators and cannot be used to distinguish them. Only correlation functions in the transverse directions matter.
This relative entropy in the null limit is finite only for ∆ < (d+2)/2. Otherwise correlators are different enough at arbitrarily short distances to allow for perfect distinguishability. When the relative entropy between the two vacuum states is not finite we may think they live in "different Hilbert spaces". 6 For ∆ < (d + 2)/2 this is not the case. However, for large regions relative entropy grows at least as R d−2 . Indeed, it is necessary to have divergent relative entropy for the full space, as in this limit we have two different pure states. It would be very interesting to develop techniques that could be applied to the full range of dimensions.
When the renomalization of the area term is finite, the result can be interpreted as an increase of Newton's constant towards the IR, due to QFT effects. This implies anti-screening of gravity. But at the same time it shows that the area term cannot be purely induced and finite, since it would be negative, and we would have a negative Newton constant. The entropy cannot be negative and needs an additional positive UV term to compensate for the sign, and the same should occur with the Newton constant. Of course this is an old problem (see [14] for example) and we just see it in a new perspective.
It is interesting that the null relative entropy does not coincide with −∆S for theories with free scalars in the UV, due to a boundary term in the modular Hamiltonian. In the Appendix we show calculations that suggest that taking the relative entropy as a form of regularized entropy restores the naive counting of divergent terms induced by the mass that fails for the free scalar. In this sense the relative entropy gives a different regularization of entropy than, for example, mutual information. However, the change with respect to other regularizations is a term exactly proportional to the area that, for example, does not alter the c-function. It corresponds to a specific choice of contact term in (3.13).

Acknowledgments
This work was supported by CONICET PIP grant 11220110100533, Universidad Nacional de Cuyo, CNEA, and the Simons Foundation "It from Qubit" grant.

A Free field examples
In the main text we compared the two theories T 0 and T 1 in terms of the relative entropy. A crucial consequence of this analysis is the dependence on the choice of Cauchy surface, which enters via ∆ T µν as in (2.22). In this Appendix we illustrate how this happens in detail for free scalar fields. The required calculations can be performed explicitly, and we discuss the results with different cutoffs. We also show how the divergence in ∆S at d = 4 is canceled by the boundary term in the modular Hamiltonian.

A.1 Massless and massive scalar fields
In free field theory we can consider an RG flow given by perturbing a massless scalar with a mass term. The UV fixed point is simply the free massless scalar, and the relevant mass deformation triggers a flow that ends in a trivial gapped theory. In fact, it will be useful to consider a slightly more general setup, where T 0 is the theory of a free scalar with squared mass m 2 0 , while T 1 is another theory with mass squared m 2 1 . We want to compute the variation ∆ T µν between both theories, with T µν the stress-tensor operator for T 0 .
Recall that a massive scalar field, has an energy-momentum tensor given by .
The last term is the improvement term. We have added it to have a traceless tensor in the massless limit. We will compute ∆ T µν with different regulators, and choose the spatial Cauchy surface x 0 = 0. A possible physical regulator is to use a point splitting associated to the choice of Cauchy surface; in the present case, we can split the points infinitesimally along the spatial surface. For this, we will need the Minkowskian propagator in d dimensions, The T 00 for a scalar field of mass m 0 , with the point splitting regularization, evaluated in the vacuum of mass m 1 is Here . . . 1 means that the expectation value is taken in the state specified by T 1 . It is important to take first x 0 = y 0 and then the limit | x − y| → 0. Note that the last term, giving the improvement term contribution to T 00 , vanishes identically by translation invariance.
Before proceeding to the calculation, let us see how (2.22) works out in this case. If we set m 0 = 0, T µν is an explicitly traceless operator, that we should write in terms of φ(x) and π(x) for x in the spatial surface x 0 = 0 before proceeding to evaluate expectation values in the theory of mass m 1 . This needs the massless equations of motion for d > 2 and the i, j components of the stress tensor, since these contain ∂ 2 0 φ in the improvement term. Once this is done, the operator T µν is explicitly traceless. Using the isotropy of the spatial surface, we have Comparing with (2.22), we then have η µ = δ µ0 , and k = d d−1 T 00 . This illustrates how the dependence on the Cauchy surface appears for the simple case of a free scalar.
Given (A.4), we can now evaluate ∆ T 00 = T 00 1 − T 00 0 . In d = 2 and with x 0 = 0 we have in the limit | x| → 0. This function is positive for all m 0 and m 1 , reaching a minimum of zero in m 0 as a function of m 1 . As we have seen, for m 0 = 0 this positivity is necessary to have a positive relative entropy in the interval. For m 0 = 0 the positivity of this quantity is still needed for positivity of relative entropy in Rindler space, where the modular Hamiltonian is still given in terms of T 00 . If instead of doing the point splitting on the x 0 = 0 surface we choose another spatial direction x 0 = αx 1 , with |α| < 1, we can split the points along this line to find ∆ T 00 = 1 8π as the regulator vanishes. This is not positive for all the range of m 0 , m 1 . The reason is that in using a point splitting in a slanted direction we have made use of correlators of the T 1 theory outside the Cauchy surface. Recalling these expectation values in different Cauchy surfaces belong to different states for the T 0 theory, we are not able to justify positivity from relative entropy, and in fact positivity fails. Let us consider next a hard momentum cutoff. Since the Cauchy surface at x 0 = 0 distinguishes space and time, we will allow for two different cutoffs on momenta, |p 0 | < Λ 0 , | p | < Λ. The physical limit corresponds to Λ 0 Λ, so that we have a spatial lattice that propagates in a continuous time variable. For Lorentz (or euclidean) invariant quantities, the order in which the cutoffs are sent to infinity does not matter. One then usually chooses Λ 0 = Λ to be able to use euclidean invariance. Here, however, we will see that Λ 0 Λ and Λ 0 Λ give different results for ∆ T 00 . To simplify the formulas, consider m 0 = 0, m 1 = m, and let us work in d = 2. Fourier-transforming T 00 = 1 2 (∂ 0 φ) 2 + 1 2 (∂ 1 φ) 2 and rotating to euclidean signature, we have, If Λ 0 Λ, we perform first the integral over p 0 , and can take Λ 0 → ∞. The resulting integral over p 1 is then finite and agrees with the point-splitting result (A.7). If we instead take Λ Λ 0 and integrate over p 1 first, the final result has opposite sign, which is not physical for the energy. As a last example for d = 2 we can compute T 00 in a lattice. We use first neighbors and the lattice correlators in the fundamental state , (A.11) The result for ∆ T 00 in the limit of small lattice spacing coincides with (A.6).
The results for higher dimensions can be similarly calculated. Using point splitting we obtain We see these are all positive for all m 0 , m 1 , as expected. The perturbation of the Hamiltonian due to a mass has dimension ∆ = d − 2, with coupling constant m 2 1 . These results match the expectations of a finite ∆T µν for ∆ < d/2, which gives d < 4. In fact for the finite cases d = 2, 3 we obtain the same results with other regularizations, such as a lattice. For the divergent cases d ≥ 4 the results also match the expectations from conformal perturbation theory (for m 0 = 0), that is, ∆ T 00 ∼ g 2 / 2∆−d = m 4 1 / d−4 . We obtain similar results for free fermions. However, for fermions ∆ T 00 ∼ m 2 / d−2 diverges in all dimensions, as corresponds to ∆ = d − 1. Nevertheless, on the null surface the relative entropy is finite for d = 2, 3, 4, 5 for scalars (up to a subtlety that we will address next), and d = 2, 3 for fermions.

A.2 Boundary term in the modular Hamiltonian
In the power-counting classification of Sec. 2.5 there is a subtle point for free scalars. These have divergent ∆S for d ≥ 4, and the dimension of the relevant perturbation m 2 φ 2 is ∆ = (d − 2). Hence they violate the standard counting which would produce divergences for ∆ = d − 2 ≥ (d + 2)/2 and then d ≥ 6.
We will now see that in fact this divergence cancels out (at least for d = 4) from the relative entropy because of an additional boundary term in the modular Hamiltonian [21,22,23,16].
The free scalar theory contains a subtlety that is generically absent from more general flows: the improvement term in the conformal stress tensor (A.2). The modular Hamiltonian in Rindler space is constructed with the canonical stress tensor rather than the conformal one. The sphere modular Hamiltonian comes from the Rindler one by a conformal transformation, and we have to use the conformal tensor. Adding the improvement term to the canonical tensor gives an additional boundary term proportional to φ 2 [16], where the integral is over the boundary of the spherical entangling surface. This term does not change with Cauchy surface and subsists in the null limit. Hence we have to add (A.14) to −∆S to obtain the relative entropy. The expectation value of φ 2 on the state corresponding to the massive theory is (A.15) This is finite for d = 2, 3 and divergent for d ≥ 4. For d ≥ 4 we can still get a universal part that is the finite term for d odd and the logarithmic term for d even. These universal pieces agree when computed using different regularizations, for example dimensional regularization, heat kernel, or point splitting. The general result for the universal term using dimensional regularization writes 16) where an expansion in d is assumed for even dimensions to get the logarithmic term. The boundary contribution corresponding to this universal part then reads for d even [27]. Therefore, for d = 3 we have ∆ H Σ null = − 1 16 mA and ∆S = − 1 12 mA. Note ∆ H Σ null , coming exclusively from the boundary term, is negative. However, the relative entropy is still positive, with a smaller area term than −∆S, Therefore, these divergences cancel out of the relative entropy. Thinking in the relative entropy on the null surface as a form of regularization of the entropy, this restores the validity of the counting argument in Sec. 2.5 for free scalars. Once the divergent parts cancel, there must remain a finite term proportional to m 2 A for S rel in d = 4. To get this area term requires using the same cutoff for the entropy and ∆ φ 2 . It would be interesting in the future to calculate S rel explicitly in terms of a physical cutoff. Here we will simply assume that the power-counting analysis of Sec. 2.5 becomes valid due to cancellations between ∆ H bdry and ∆S. For d = 5 both ∆S ∼ −1 and ∆ φ 2 ∼ −1 . If the boundary term generally restores the counting of divergences for the scalar, we should also have finite relative entropy in d = 5. This would mean that the leading divergences cancel, and we end up with the universal pieces. For these we have ∆S ∼ m 3 /(64π) and ∆ H Σ null ∼ m 3 /(72π). Again the result is positive, Finally, for d = 6 the naive counting gives a logarithmically divergent S rel . If all higher powers cancel, we have from the universal parts ∆S ∼ −1/(192π 2 ) log(m )m 4 A and ∆ H Σ null ∼ −1/(160π 2 ) log(m )m 4 A. This gives the divergent, though positive result S rel = − 1 960π 2 log(m )m 4 A . (A.23) For d ≥ 7 the combination of the universal parts is not positive, which is consistent with the relative entropy having leading divergent non universal terms that compensate for the sign.