Relative Entropy and Proximity of Quantum Field Theories

We study the question of how reliably one can distinguish two quantum field theories (QFTs). Each QFT defines a probability distribution on the space of fields. The relative entropy provides a notion of proximity between these distributions and quantifies the number of measurements required to distinguish between them. In the case of nearby conformal field theories, this reduces to the Zamolodchikov metric on the space of couplings. Our formulation quantifies the information lost under renormalization group flow from the UV to the IR and leads us to a quantification of fine-tuning. This formalism also leads us to a criterion for distinguishability of low energy effective field theories generated by the string theory landscape.


INTRODUCTION
A central aim of high energy physics is to determine the operator content, correlation functions, and coupling constants of the real world. This problem is challenging, especially in the context of string theory, because there are a priori many UV completions of a given low energy effective field theory such as the Standard Models of particle physics and cosmology.
Here we ask: given two competing theories of the world p and q, how reliably can we distinguish them given a finite number of measurements?
Broadly speaking, theory determination is a basic question in statistical inference and information theory. By interpreting the (Euclidean continuation) of a quantum field theory as a probability distribution on the space of field configurations, we shall convert well-studied information theoretic notions of proximity between probability distributions into analogous measures of proximity between QFTs.
This formalism gives a concrete method for evaluating the proximity of QFTs in any UV complete theory with a landscape of low energy effective field theories. Additionally, it provides a way to coarse-grain for specific features, and to quantify the distinguishability of different effective field theories.

PROXIMITY IN QUANTUM FIELD THEORY
One way to quantify the proximity between two probability distributions p(z) and q(z) on a probability space is via the relative entropy (also called the Kullback-Leibler (KL) divergence) [1]: Here, z is the outcome, and dµ(z) is a choice of measure on the space of outcomes. The relative entropy D KL (p||q) ≥ 0 and vanishes if and only if p = q almost surely.
In information theory, the KL divergence quantifies the amount of information which is lost when one uses the distribution q(z) to model the distribution p(z). In the context of statistical inference, one can consider N independent events E = {e 1 , ..., e N } drawn from the distribution q(z). At large N , the probability that these draws could have been obtained from p(z) is: Pr(p|E) ≃ exp(−N D KL (p||q)). ( For additional discussion and references to the literature, see e.g. [2][3][4]. As it is not symmetric, the KL divergence is not really a distance. Nevertheless, in the limit where p and q are nearby, the KL divergence reduces to a metric. 1 Consider a parametric family of distributions q(z|{ξ}) such that for some value of ξ, q(z|{ξ * }) = p(z). Expanding D KL (p||q) around this point yields the Fisher information metric: In this note we will consider probability distributions generated by a Euclidean quantum field theory with action S[φ] depending on field configurations {φ(x)}. The action of the Euclidean field theory defines an (unnormalized) probability distribution exp(−S[φ]). This distribution defines the quantum field theory via its analytic continuation to Lorentzian signature. The normalized probability distribution on the space of field configurations {φ(x)} is (5) Here Z p is the partition function. A draw from this probability distribution is a Euclidean field configuration φ(x).
Note that in quantum physics we are often also interested in a different distribution -i.e. the square of the ground state wavefunction over equal time configurations in the Lorentzian theory. The ground state wavefunction for a given spatial configuration Ψ(φ 0 ) is generated by summing the distribution exp(−S[φ]) over Euclidean trajectories that approach the boundary condition φ 0 on a fixed Euclidean time surface. We will not here be studying the associated distribution |Ψ(φ 0 )| 2 , which is also a quantity of physical interest.
Given two quantum field theories which depend on the same class of field configurations, we can now study the KL proximity between two theories with actions S p [φ] and S q [φ]: This is the expectation value of (S q − S p ) + log Z q /Z p in the ground state of theory p. Note that this D KL is not the same thing as the quantum relative entropy, Tr(ρ p log ρ p − ρ p log ρ q ), between the ground states of theories p and q, where ρ i is the density matrix for the ground state of theory i. Nor is D KL the quantum relative entropy between two density matrices of a given theory.

Master Theories
At first glance, our notion of proximity would appear to only work for comparing QFTs with the same field / operator content. All that is really required, however, is that there is some "master UV theory" p master [φ master ]. This master theory could be either a lattice formulation of a field theory, a continuum CFT, or a particular string compactification.
From this master theory we can consider deformations to various low energy effective field theories q 1 , ..., q M . Since all of the q's descend from the same master theory, we can continue to label the field and operator content according to that of theory p master . Hence, we can still speak of the KL divergence D KL (q i ||q j ) for all i and j. This point will be especially important when we turn to the study of effective field theories generated by the string landscape.

Perturbative Calculability
The notion of proximity we have introduced is calculable in perturbation theory. Consider a Euclidean theory with action S p [φ], which we perturb by a linear combination of local sources for operators: Each λ i (x) specifies a source, i.e. a position dependent coupling constant, although we will primarily focus on the case where the λ i are constant. We then get a family of probability distributions q[φ| {λ}] defined by the deformation: This deformation is proportional to the trace of the difference in stress energies between theories p and q: The KL proximity between p and q is calculable in perturbation theory since the term S q − S p depends only on the expected value of the deformation, while the partition functions Z p , Z q , are the generating functions of the correlation functions of p and q. Operationally, we do not even need an action for either theory, but only their correlation functions. In this sense the KL divergence also quantifies the amount of information contained in the correlation functions of a theory. We now study the leading order behavior of the KL divergence, with Θ λ (x) treated as a small perturbation to the original theory. Expanding q[φ] to quadratic order in the perturbation, the KL divergence is We conclude that the Fisher information metric on the space of couplings is (12) Integrated two-point functions of this sort have two types of divergences -an IR divergence proportional to the volume and a UV divergence coming from contact terms where x and y coincide. The IR divergence is easily regulated by putting the system at finite volume. The UV divergent contact terms are generally scheme dependent. If a particular finite UV completion is known, this fixes the regularization scheme unambiguously. As one might expect, the largest contribution to the KL divergence comes from such UV divergent pieces.
The finite part of D KL (p||q) is independent of the UV cutoff Λ UV and can be packaged in terms of the values of the couplings at an RG scale µ. Operationally, we introduce a regularization scheme, along with some counterterms in theory p. This finite piece is independent of Λ UV , but can depend on a choice of scheme, and can a priori be positive, zero, or negative. As we change Λ UV , the values of the couplings at µ must be adjusted to hold fixed the long distance behavior. This active tuning of the couplings is reflected in the fact that the beta function: is independent of Λ UV and can a priori be positive, zero, or negative. An important special case is when p and q are related by renormalization group flow; they describe a given quantum field theory with UV momentum cutoffs Λ (p) UV and Λ (q) UV , respectively. We will take Λ (q) UV . We can regard theories p and q as defining different probability measures on the same configuration space, in the usual way: one first integrates out field configurations φ k in theory p with momenta Λ (q) UV and then rescales positions and momenta as x → bx and k → k/b, UV . This defines two distributions with different coupling constants related by renormalization group flow. The KL divergence then provides a measure of the information lost as one coarse grains from Λ

CONFORMAL FIELD THEORIES
We now consider the special limit where either theory p or theory q is a conformal field theory (CFT).
First, suppose that p is a CFT and that q is another CFT obtained by perturbing p by a linear combination of exactly marginal scalar operators. The small variation δλ i can be viewed as a vector in the space of marginal couplings. In our conventions, the two-point function for 2 Another quantity of interest is the mutual information. Starting with a master theory pmaster[Λ IR , Λ U V ] with IR and UV cutoffs Λ IR and Λ U V , marginalize out either high or low momentum shells, to respectively produce distributions p hi [Λ IR , µ], and p lo [µ, Λ U V ]. The second operation is somewhat awkward in local quantum field theory, but makes sense both in the context of non-commutative field theories, and in theories which have a gravity dual with a finite length AdS throat. The product distribution p hi × p lo has support on the same momentum modes as pmaster, and D KL (pmaster||p hi × p lo ) = I(U V, IR) is the mutual information between the UV and IR. For related discussion for weakly coupled field theories, see e.g. [6].
a marginal primary scalar of dimension ∆ = D is where G (Zam) ij is the Zamolodchikov metric of the CFT [7]. Then, since the CFT one-point functions vanish, the KL divergence (10) is proportional to the length of the vector δλ i with respect to the Zamolodchikov metric: Thus the Fisher information metric is proportional to the Zamolodchikov metric! To make this more explicit, consider theories defined on a lattice of volume V = ℓ D IR with lattice-separation ℓ UV . Then The distance δλ in the space of couplings can then be interpreted as the KL density, as follows. Since there are K = V /ℓ D UV lattice sites, each draw from the Euclidean probability distribution gives K pieces of data about the couplings of the theory. Following (2), a measurement of the field configuration at one lattice site will fail to distinguish between the two theories with probability e −δλ 2 .
From the perspective of a continuum theory, the lattice described above is a particular regularization scheme, which will contain UV divergences as the lattice cutoff is taken to zero size. It is also of interest to extract the finite piece which remains in the continuum limit. In a CFT, different choices of regularization scheme correspond to different choices of contact terms in the OPE. Naively, these might appear to change the Zamolodchikov metric. However, changes of scheme can be interpreted as coordinate transformations on the space of couplings λ i ; the Zamalodchikov metric transforms covariantly under these diffeomorphisms (see e.g. [8]). The schemeindependent piece of D KL is, up to a factor of order one, proportional to the Zamolodchikov distance in the space of couplings: The factor of V /ℓ D UV has disappeared, since the regulated integrals appearing in (10) must be proportional to ℓ D UV /V for dimensional reasons. We can interpret this as follows: the finite piece of D KL is not proportional to the volume V because there are long range correlations in a conformal field theory, and hence a given draw from the distribution essentially gives one piece of data about the theory. The precise coefficient in (17) depends on the nature of the IR regularization. 3 Let us now consider the case when the theory q is not a CFT, but is related to theory p by the addition of some non-marginal operators O i . We will work in a basis where G (Zam) ij is diagonal. Then when the perturbations δλ i are small the above derivation can be easily generalized. In the lattice regularization where the c i are numerical constants of order one.
For each summand in equation (18), the finite piece of D KL (p||q) is proportional to c i δλ i 2 ℓ 2(D−∆i) IR . As expected, the contribution to D KL (p||q) is dominated by the contribution of the lowest dimension (most relevant) operator when ℓ IR is large. Irrelevant couplings (∆ > D) make a finite contribution to D KL that is suppressed by the infrared scale because the low-energy effective theories are identical. Nearly marginal perturbations (∆ ∼ D) contribute to D KL in a way that is almost insensitive to volume because nearly conformal theories have long-range spatial correlations and hence measurements at different locations do not give independent information about the theory. Relevant perturbations with dimensions above the Breitenlohner-Freedman bound (D > ∆ > D/2) all lead to sub-extensive scaling of D KL with volume, but the unitarity bound ∆ > D/2 − 1 leaves a narrow window with super-extensive scaling. It would be interesting to understand how this arises in terms of measurements distinguishing p from q.
Conversely, suppose q is an IR fixed point, and p is nearby. Similar statements apply, since to leading order D KL (p||q) = D KL (q||p) + O(δλ 3 ).

Renormalization Group Flows
Let us now consider the case where the deformation initiates an RG flow from the UV theory p to the IR theory q. The resulting flow and subsequent form of the KL divergence will be dominated by the operator(s) of lowest dimension ∆ < D. In the special case where the operator is marginally relevant, i.e. has dimension ∆ = D − δ for δ ≪ 1, this flow is short. In many situations such as 2D minimal models with central charge close to one, and various 4D supersymmetric quantum field theories, δ is calculable. The KL divergence in this case can again be computed and we get (in a lattice regularization) precisely (17) described above.
We can also consider the KL divergence between two points along an RG flow, D KL (t p ||t q ) as a function of the RG flow parameter t = log µ. Specializing to the case of a 2D CFT, we learn that the initial change in the central charge is closely related to the information lost in moving from the UV to the IR. We have where c(t RG ) is the c-function of a two-dimensional conformal field theory along the flow [7].

Metric Proximity
More generally, there is a deep intuition that the conformal anomalies of a CFT measure its degrees of freedom. We now show that this statement has a sharp information-theoretic interpretation. Consider a Euclidean signature conformal field theory on a Ddimensional manifold M D . Varying the background metric g µν defines a family of theories p[φ|{g µν }], and it is natural to consider the proximity of two such members. Perturbing about a fixed background g → g + δg, the line element for the information metric is: is the stress energy tensor with the one-point function subtracted off. For D odd this one-point function vanishes, and for D even, it is determined by the conformal anomaly.
Evaluating on M D conformally equivalent to flat space, the two-point function for T µν is closely related to the evaluation of a particular linear combination of central charges which counts the local degrees of freedom in the field theory. Recall that in flat space, we have: where I µν,ρσ (x) is a specific dimensionless combination of terms quadratic in the positions, as dictated by conformal invariance (see e.g. [10]). Here, C T > 0 in a reflection positive theory, which agrees with the information theoretic condition D KL (p||q) ≥ 0. In two and four dimensions, C T is proportional to c, and in three-dimensional N = 2 supersymmetric field theories it is proportional to τ RR , i.e. the normalization constant for the R-symmetry current two-point function (see e.g. [11]). This is a satisfying result. It tells us that the quantity C T is proportional to D KL , directly quantifying the level of distinguishability encoded in local degrees of freedom.
We can also extend this calculation to cover the case of RG flows. Along these lines, we introduce a UV cutoff Λ UV , and consider two UV CFTs which differ only in the choice of background metric g µν and a small perturbation to another metric g µν +δg µν . Suppose we perturb this UV CFT by a relevant operator. Upon flowing to the IR, 4 we can evaluate the KL proximity for these two background metrics. Hence, we see that if the two theories are closer together in the IR, then C UV T > C IR T . So in other words, the statement that C T typically decreases under RG flow means CFTs typically get closer in the IR.

LANDSCAPES
One clear lesson from recent work in string theory is the existence of a large landscape of self-consistent low energy effective field theories. We now show how to deploy our formalism in the study of the landscape.

Flux Vacua
In flux vacua (see e.g. [12] for a review), the flux quantum numbers define an integrally quantized lattice vector − → N , and with it an effective action S[ϕ 1 , ..., ϕ n , − → N ] for some fields ϕ 1 , ..., ϕ n . Given two flux vectors − → N and − → M , we can compute the proximity D KL ( − → N || − → M ) between the two effective theories. Note that a priori, this has nothing to do with the distance between − → N and − → M on the lattice of fluxes.
To illustrate, consider a toy model in which our flux vector − → N generates an effective action for a single canonically normalized real scalar φ with l isolated vacua: (22) This form readily generalizes to complex scalars, as well as supersymmetric models.
Suppose now that we have another flux vector − → M such that the form of the effective potential for this flux vector has minima which are nearby the minima of the theory with flux vector − → N . This means the KL divergence can be evaluated by just varying the l minima in the distribution p(φ|{φ (1) , ..., φ (l) }): where G ij is the information metric from varying with respect to the locations of the minima. Working in a saddle-point approximation around each of the l massive vacua yields the approximation: where is the mass-squared of the real scalar expanded around the i th critical point, and V is the regulated volume of the spacetime.
Weyl rescaling of our background metric. We thank H. Verlinde for this comment.

2D CFTs
2D CFTs are another machine for generating a vast number of vacua in (perturbative) string theory. The Zamolodchikov metric is insufficient to define a notion of distance since it cannot connect all CFTs [13]. A formal notion of distance given in [13] centered on defining a metric on the values of local n-point functions. Though the details differ, evaluating the KL divergence intuitively agrees with this, since it involves integrated n-point functions.
Let us illustrate in more detail for 2D CFTs defined on a torus. Consider the two c = 4/5 theories, with diagonal and off-diagonal partition functions, which respectively correspond to the tetra-critical Ising model and the three state Potts model. Although they cannot be connected by an operator deformation, they both descend from the same UV spin system and so there ought to be a "distance" between these theories [13]. Following our general considerations, we take the UV spin system to define our master theory p master with the two c = 4/5 models viewed as effective field theories q i .
The value of the KL divergence strongly depends on the UV lattice spacing, and as we now argue, diverges as we pass to the continuum limit. To see this, observe that the off-diagonal theory is a Z 2 orbifold of the diagonal theory [14]. Though the untwisted sector of the orbifold coincides with the singlet sector of the parent, the parent has non-singlet states, and the orbifold has twisted sector states. These additional states mean that some states of each theory are not present in the other, and so D KL is infinite in both directions.

Quantifying Fine-Tuning
To a low energy effective field theorist, what really matters is whether such UV completions lead to novel constraints on IR physics. Starting from some UV master theory, we might imagine that upon an appropriate operator deformation, there is a collection of intermediate values of the couplings, and corresponding theories p 1 , ..., p M which upon further flow respectively descend to q 1 , ..., q M . Given a set of M such RG trajectories, we can therefore evaluate D (p) ij = D KL (p i ||p j ) and D (q) ij = D KL (q i ||q j ), and the corresponding ratios: We say that a pair of theories is fine-tuned when F ij ≪ 1.
When F ij ≫ 1, then we say that the theory has no finetuning. Intermediate cases can also be evaluated by a similar token.
As an illustrative example, consider the theory of a single real scalar with potential V (φ) = m 2 φ 2 /2 + λφ 4 /4!. This theory is fine-tuned because small perturbations in the UV boundary conditions of the coupling constants lead to large changes in the IR parameters of the effective theory. Treating m and λ as bare parameters of a UV theory, we can evaluate the 2 × 2 information metric in this case to find the leading order cutoff dependent contributions: where each entry is multiplied by an "order one number", as follows from dimensional analysis considerations. As expected, if we evaluate the proximity of two theories in the IR, there is a power law divergent contribution. In the M S scheme, the leading log contribution is: 6 (28) with L = log(µ 2 /m 2 )/16π 2 , and V the regulated volume of the spacetime.

Large Field Range Inflation
A common claim in the study of string compactifications is that since the inflaton is sensitive to Planck scale physics, learning the exact shape of the inflaton potential would provide a wealth of information on the UV structure of a theory. Here, we quantify the amount of information obtained from the first correction to the simple m 2 φ 2 /2 potential of large field range inflation. We consider a correction term of order λφ 4 /4! and address the distinguishability of the theory with λ = 0 versus λ = 0.
Along these lines, we return to our calculation for φ 4 theory, viewing the reduced Planck scale M P L ∼ 10 18 GeV as a UV cutoff, and m as a soft IR cutoff. Using our methodology, we get that the KL divergence scales as G λλ in equation (27): On the other hand, to not spoil slow roll in the first place, we need to assume λ < (m/∆φ) 2 so for a field range ∆φ ∼ 10M P L , we learn that the KL divergence is bounded above: This upper bound is rather charitable, as it is the information content over the entire volume of the spacetime.

DISCUSSION
Viewing quantum field theory as a machine for generating probability distributions on the space of fields, the relative entropy leads to a measure of proximity in the space of QFTs. In the special case of conformal field theories connected by marginal deformations, we recover the familiar case of the Zamolodchikov metric. We have also seen how to track information loss both in terms of RG flows and the value of C T (for CFTs).
Using this setup, we can coarse-grain any landscape of low energy effective field theories. We simply ask how many idealized measurements (i.e. draws of field configurations) must be performed before we can reliably distinguish two theories. This dovetails with recent investigations aimed at understanding how well a low energy observer could reconstruct -even in principle-different UV completions [16,17].
In future work, it would be interesting to study the behavior of the KL divergence in various covariant regulator schemes such as [18], and also to apply our formalism in various scenarios where operators of the Standard Model mix with an extra sector. It would also be exciting to consider more general systems such as spin glasses, where it is quite common to encounter statistical ensembles of coupling constants.