The information theoretic interpretation of the length of a curve

In the context of holographic duality with AdS3 asymptotics, the Ryu-Takayanagi formula states that the entanglement entropy of a subregion is given by the length of a certain bulk geodesic. The entanglement entropy can be operationalized as the entanglement cost necessary to transmit the state of the subregion from one party to another while preserving all correlations with a reference party. The question then arises as to whether the lengths of other bulk curves can be interpreted as entanglement costs for some other information theoretic tasks. Building on recent results showing that the length of more general bulk curves is computed by the differential entropy, we introduce a new task called constrained state merging, whereby the state of the boundary subregion must be transmitted using operations restricted in location and scale in a way determined by the geometry of the bulk curve. Our main result is that the cost to transmit the state of a subregion under the conditions of constrained state merging is given by the differential entropy and hence the signed length of the corresponding bulk curve. When the cost is negative, constrained state merging distills entanglement rather than consuming it. This demon-stration has two parts: first, we exhibit a protocol whose cost is the length of the curve and second, we prove that this protocol is optimal in that it uses the minimum amount of entanglement. In order to complete the proof, we additionally demonstrate that single-shot smooth conditional entropies for intervals in 1+1-dimensional conformal field theories with large central charge are well approximated by their von Neumann counterparts. We also revisit the relationship between the differential entropy and the maximum entropy among locally consistent entropy density operators, demonstrating large quantitative discrepancy between the two quantities in conformal field theories. We conclude with a brief discussion of extensions and lessons.


Introduction
Holographic duality (AdS/CFT correspondence) [1,2] is an equivalence between a d-dimensional conformal field theory (CFT) and quantum gravity with asymptotically anti-de Sitter boundary conditions (AdS) in d + 1 dimensions.Since its discovery in 1997, it has been the focus of a massive body of research1 driven by diverse theoretical and phenomenological motivations.The AdS/CFT correspondence has been used to model a variety of condensed matter systems, yielding new insights not apparent using the standard techniques of field theory [5,6].On a more formal level, famous puzzles of quantum gravity, including its unitarity and nonperturbative definition, have been addressed and arguably solved by adverting to the field theory side of the duality (see, e.g.[7,8]).
This paper focuses on another foundational application of the AdS/CFT correspondence -the goal of understanding the fundamental constituents of space-time.Note that the duality relates a lower-dimensional field theory to a higher-dimensional theory of quantum gravity.The extra dimension is said to be emergent: on the field theory side it is not directly visible, but becomes apparent only when we discuss an appropriate set of quantities.In this way, the holographic duality is a toy model for how a geometric spacetime may arise from an amorphous collection of quantum gravity degrees of freedom, which lack an a priori spatial organization.In order to reap this benefit of holography, we must understand quantitatively how the lower-dimensional field theory gives rise to the extra dimension present in the gravitational space-time.The last years have made it increasingly clear that the right language for this problem involves quantum information theory.The conceptual link between information theory and the geometry of a holographic spacetime is the subject of the present paper.
Until recently, the understanding of the extra dimension (usually called radial) in holography had been mostly qualitative.It was understood early on that small distance physics in the field theory controls large radial scales on the gravity side, a rule of thumb known as the UV-IR connection [9].Consequently, the radial scale was conjectured to be related to a renormalization group (RG) scale in field theory [10][11][12].But the RG scheme implementing the radial evolution in gravity has never been explicitly identified (see [13,14] for recent progress).The first truly quantitative advance -one whose consequences continue to be explored -came in 2006.The Ryu-Takayanagi proposal [15,16] posits that areas of minimal surfaces on a static slice of anti-de Sitter space compute entanglement entropies of spatial regions in field theory.To appreciate the significance of this proposal, recall that a combination of entanglement entropies called mutual information bounds the connected correlator of any two observables applied in two spacelike separated regions [17].This means that entanglement entropies organize the correlations in a quantum state as a function of distance or scale.In effect, the Ryu-Takayanagi proposal posits that the amount of correlation up to a given scale µ 0 in field theory can be represented in anti-de Sitter space as a minimal surface, which spans different radial slices down to some minimal scale R 0 that depends on µ 0 .Amazingly, this geometric representation is quantitatively accurate.
The Ryu-Takayanagi proposal underscores the centrality of information theory to the emergence of a holographic spacetime.For example, it clarifies why spacetimes with horizons correspond to mixed states of the field theory and why a black hole with two asymptotic regions maps to the thermofield double state [18][19][20].Moreover, in a thought experiment in which we disentangle two regions of field theory by hand, the holographic spacetime pinches off into disconnected components [21,22].(When more than two regions are considered, however, mutual information can be zero between two connected regions of spacetime [23].)More quantitatively, representing entanglement entropies with minimal surfaces is automatically consistent with the strong subadditivity of entropy as a consequence of the geometric properties of anti-de Sitter space [24].Indeed, the strong subadditivity inequality plays a fundamentally geometric role in the holographic construction: in the AdS 3 /CFT 2 context, for instance, it underlies the triangle inequality in AdS 3 [25] and reduces the c-theorem in CFT 2 to Lorentz invariance [26].More intricate relations among minimal surfaces have been used to identify special properties of states in holographic field theories, including the monogamy of mutual information [27].This web of connections has motivated several authors to conjecture that a spacetime should be identified with (or defined as) a geometric encoding of field theory correlations organized by scale [21,[28][29][30][31][32][33][34][35][36][37].If so, every geometric construct in a holographic spacetime should have a meaning in information theory.The present paper interprets in information theoretic terms one of the most basic geometric objects: the length of a convex curve.
We work primarily in pure, three-dimensional anti-de Sitter space, which is dual to the vacuum of a two-dimensional conformal field theory.Some of our results are more general, but we defer a discussion of the generality to Sec. 6.A key technical fact borrowed from [38] -and a starting point of our work -is that the length of a convex curve in AdS 3 can be written as a linear combination of lengths of minimal curves, that is geodesics.By virtue of the Ryu-Takayanagi proposal, the latter compute entanglement entropies of intervals in the dual CFT 2 .In consequence, the length of a convex curve can expressed as: length 4G = S I(x) − S I(x) ∩ I(x − dx) Here S(•) denotes the entanglement entropy and I(x) is a one-parameter family of boundary intervals, each centered at x, which determine the shape of the curve in question.The integral expressions in (1) were called differential entropy in [38].Note that the second line of (1) involves only conditional entropies in field theory.This suggests that the length of a curve may be interpreted in information theory as the entanglement cost of a merging task [39,40].The details of such a task are the subject of Sec. 2.
Suppose that Alice, who controls a CFT from the outside, wishes to send the state on an interval I to Bob.By sending we mean transferring the entanglement between I and I c , the complement of I in the CFT, to another system controlled by Bob.Because the key object being transferred is the entanglement, classical communication is considered free.The cost refers to an inherently quantum resource -the entanglement between Alice and Bob's systems, which is used in the process of sending the state.A natural choice of currency for quantifying the cost is Bell pairs, which are initially shared by Alice and Bob.
One way for Alice to send the state to Bob is to compress the state on I to S(I) binary degrees of freedom [41] and then to teleport them [42] to Bob.This will use up exactly S(I) units of the entanglement currency.As such, sending the state gives an operational meaning of the entanglement entropy and, by virtue of the Ryu-Takayanagi proposal, of the length of a geodesic in AdS 3 .To make contact with (1), we now imagine that the merging is done in steps indexed by x, such that both Alice and Bob may only act on interval I(x) at step x.The details of this constrained merging protocol are given in Sec. 2. 2 A key point is that the optimal cost of sending a state subject to the locality restrictions imposed by the intervals I(x) is exactly the length of the curve given in (1).The proof of the optimality of (1) is given in Sec. 3. Sec. 4 establishes an important technical point crucial to identifying the constrained merging cost with the differential entropy, specifically that smooth minand max-entropies are well-approximated in CFT's with large central charge by the von Neumann entropy.
From a geometric viewpoint, the intervals I(x) determine the shape of the bulk curve.In a traditional view of the radial direction as an RG scale in field theory, we could think of I(x) as determining a spatially dependent cutoff in the CFT.The present paper offers an alternative view, which may serve as a gateway toward a quantitative formulation of holographic RG.We think of I(x) as restricting the class of operators, which are available to external agents manipulating the state.This can be viewed as a spatially dependent restriction of the class of operators of the field theory, which excludes IR-sensitive observables.The restrictions may only increase the cost of sending a state; when we lift the restrictions, the cost becomes the entanglement entropy.This gives an information theoretic interpretation of the definition of a geodesic as the shortest curve connecting two points.

Differential entropy and constrained state merging
In this section we focus on pure three-dimensional anti-de Sitter space (AdS 3 ).Our results apply in other asymptotically AdS 3 geometries and in higher-dimensional holographic spacetimes, but they are subject to a number of technical caveats.We discuss the generality of our results in Sec. 6.
We start with the metric on the Poincaré patch of AdS 3 : We assume that this geometry arises as the dual description of the vacuum state of a conformal field theory (CFT) living on its asymptotic boundary -that is on an infinite line cross time.We denote the transversal coordinate in the bulk as x, in contrast to x, which we reserve for the spatial coordinate on the boundary.
The Ryu-Takayanagi proposal [15,16] relates the entanglement entropy of an interval I = (−a/2, a/2) in the CFT to the length of the spacelike geodesic, which asymptotes to the endpoints of I: The quantity µ, which is a UV cutoff in the CFT, also defines an IR cutoff L 2 /µ on the dual gravity side, which regulates the otherwise infinite length of the geodesic.Eq. ( 3) relies on the Brown-Henneaux relation c = 3L/2G, which fixes the central charge of the 1+1dimensional CFT in terms of the curvature scale L of the dual AdS 3 in Planck units (G is Newton's constant) [43].
Ref. [38] (see also [25,[44][45][46][47][48]) showed how to use relation (3) to give a boundary computation of the length of an arbitrary differentiable curve on a constant time slice in geometry (2).Given a convex3 curve R = R(x), for every point x one finds the geodesic that is tangent to the curve at x.The endpoints of the geodesic lie on the asymptotic boundary, so they select a boundary interval.We shall refer to this interval as I(x), where x is the midpoint of the interval.Likewise, we denote the linear size of I(x) by a I (x).Note that x depends on x (the tangency point in the bulk) but is not equal to it.The construction is illustrated in the case of a geodesic curve in Fig. 1a and a nongeodesic curve in Fig. 2a.
The length of the curve is then given by the formula: Note that the integrand in (4), or rather the first nonvanishing term in its Taylor expansion, is a one-form, so the integral is well-defined.The right hand side was called "differential entropy" in [38], because the integrand can be expressed in terms of dS/da.For the purposes of this paper, however, it is most practical to work directly with expression (4), which involves conditional entropies.By definition, the conditional entropy S(A|B) of two disjoint subsystems A and B is the difference S(AB) − S(B).
To state our result, it is useful to introduce a little extra notation and discretize (4).Let x j = −a/2+j•a/N and define A j = I(x j )−I(x j−1 ) (hinting at Alice) and B j = I(x j )∩I(x j−1 ) (hinting at Bob).Then (4) becomes: In order to interpret (5) suppose that two agents, Alice and Bob, each hold a system described by a CFT, which they can manipulate from outside.For example, the system may be a onedimensional spin lattice at a quantum phase transition that is sitting in Alice's laboratory; Bob's lab contains an isomorphic lattice.Initially Alice and Bob's systems are not entangled, i.e. their states factorize.Alice's goal is to "teleport" [42] the state of interval of her CFT to Bob. 5 That is, using only Alice-Bob Bell pairs 1 √ 2 (|00 + |11 ) and classical communication plus local operations (we discuss the locality constraints below), Alice and Bob will prepare a state in Bob's lab equal to Alice's original state on I (up to isomorphism) and purifying I c , the complement of I in Alice's system.
A crucial role in interpreting lengths of curves is played by the locality restrictions, which constrain the type of operations Alice and Bob can perform.If we allow Alice and Bob to each act on the whole of their respective intervals I, their task reduces to standard teleportation [42], whose cost is famously given by the entropy of the state to be teleported, S(I).We wish to consider a situation, in which Alice and Bob are subject to tighter locality constraints.The procedure will act in N discrete steps and at the j th step Alice and Bob are allowed to act only on their respective intervals I(x j ).(N will ultimately be allowed to go to infinity as the UV cutoff µ goes to zero and the central charge to infinity.)We will consider all possible procedures that Alice and Bob could use to "merge" Alice's I to Bob, subject only to the prescribed constraints.(A mathematically precise definition of constrained merging can be found in Sec.3.) Among all such procedures, the minimal number of Bell pairs is asymptotically given by (5), the length of the curve in Planck units.

A geodesic: the cost of sending a state
We begin with the interpretation of the length of a spacelike geodesic g I .For definiteness, suppose the geodesic subtends the interval I = (−a/2, a/2) on the boundary.According to (3), the (IR-regulated) length of g I equals S(I), the (UV-regulated) entanglement entropy of the interval I.
As a first step, we must find the set of intervals I(x) such that geodesics subtending I(x) are tangent to g I .The task seems trivial, because at every point on g I the tangent geodesic is g I itself.But this conclusion holds only in the bulk; on the asymptotic boundary distinct geodesics become tangent to one another if their asymptotic endpoints coincide; see Fig. 1.Thus, the sequence of intervals I(x) = (−a/2, a/2 + 2x) (for −a/2 ≤ x ≤ 0) and I(x) = (2x − a/2, a/2) (for 0 ≤ x ≤ a/2) satisfies the tangency condition.(While we will use the notation of infinitesimals for simplicity, the reader should remember that we are always describing discretized expressions and processes, both because the CFT has a UV cut-off µ and because the procedure we implement will take place in finite steps.) Let us consider the entanglement cost of merging I to Bob subject to the constraints described above.For early values of j, corresponding to red and orange intervals in Fig. 1, Alice and Bob are only permitted to act on the left side of I, while for blue and purple intervals they can only act on the right.For x j = 0, however, corresponding to the full length green interval, they have access to all of I.So Alice could simply compress the state of I [41] in the x j = 0 step and teleport it to Bob, who would then decompress on his end.At all other steps, Alice and Bob would do nothing.The entanglement cost, postponing until later issues of single-shot versus von Neumann entropies and approximation, would be S(I).This is, of course, the familiar interpretation of the entropy as the effective number of Bell pairs required to faithfully compress and teleport the state of I without any constraints at all.

A non-geodesic curve: the cost of sending a state with constrained merging
Now consider a smooth, convex curve R = R(x), which asymptotes to the endpoints ±a/2 of the interval I; see Fig. 2. We again start by finding the geodesics, which are tangent to the curve at every −a/2 ≤ x ≤ a/2.These geodesics select a sequence of boundary intervals, which we denote J(x) to distinguish them from the intervals discussed in the previous subsection.It will be useful to introduce a special notation for the size of the interval J(x): we call this a J (x).Note that for the geodesic we have a I (x) = a − 2|x| for the function giving the lengths of the intervals I(x), as illustrated in Fig. 1.For our nongeodesic curve R = R(x) that asymptotes to g I , a J (x) has the following properties [25]: An example of such an a J (x) is shown in Fig. 2b.According to (4), the length of the curve is given by: Figure 2: a) A curve, which asymptotes to the geodesic g I .We have marked in color the geodesics tangent to the curve and the boundary intervals J(x), which they subtend.b) a plot of a J (x) -the length of the intervals J(x) as a function of the centerpoint x.If the curve asymptotes to the geodesic g I then a J (x) must agree with a I (x) (shown for comparison in dashed gray) outside some interval (x L , x R ) [25].
where now A j = J(x j ) − J(x j−1 ) and B j = J(x j ) ∩ J(x j−1 ).
In contrast to the geodesic case, for the nongeodesic curve of Fig. 2a, none of the intervals J(x j ) spans all of I.The simple-minded strategy of Sec.2.1 therefore cannot succeed.Instead, Alice and Bob will act non-trivially in each interval J(x j ).Specifically, in the jth step, Alice will merge A j to Bob.Since A j ⊆ J(x j ), Alice's actions are consistent with the constraint.Moreover, by the jth step, Bob will already have reconstructed the entire interval ∪ j−1 i=1 J(x i ), of which the rules give him access only to the portion intersecting J(x j ), namely J(x j−1 ) ∩ J(x j ) = B j .
The question becomes then, what is the cost of each incremental step?A celebrated result in quantum information theory is that the number of Bell pairs required to merge A j provided Bob has access to B j is S(A j |B j ), again ignoring approximations for the time being [39,40].By (7), the length of the bulk curve will therefore be approximated by 4G times the number of Bell pairs required to merge I to Bob, subject to the locality constraints.
For readers unfamiliar with state merging, it can be helpful to keep some simple examples in mind to motivate the appearance of the conditional entropy.If the state is a Bell pair shared between A and a third system R, then merging A to Bob is just teleportation of A and the cost in Bell pairs is indeed S(A|B) = S(A) = 1.
If the initial state is instead a GHZ state |ψ ABR = 1 √ 2 (|000 ABR + |111 ABR ), however, then Bob has a head start in the form of some correlation with Alice so the cost should be reduced.In fact, since S(A|B) = S(AB) − S(B) = 1 − 1 = 0, merging should be possible without any entanglement at all.Let's see how this is done.Alice could measure A in the basis |± = The method used in the general case is in spirit of the GHZ example above and involves performing a random incomplete measurement on A just fine-grained enough to approximately destroy all correlation between R and A. The reader can consult [40] for a detailed description.

Geodesics revisited: merging scale-by-scale
Let us return to the geodesic case in order to study in more detail how the entanglement entropy is recovered from the differential entropy formula.
If we substitute the sequence of intervals from Fig. 1b into (4), we obtain: This is because for −a/2 ≤ x ≤ 0 we have I(x − dx) ⊂ I(x); for 0 ≤ x ≤ a/2 the integrand in (4) vanishes because there I(x) ⊂ I(x − dx), so A j = ∅.The term I(−a/2) acts as a UV regulator; it reproduces (4) exactly if we cut off the integral at x = −a/2 + µ/2.
In the notation that S(I) = S(a), (8) can be written as: Changing the variable of integration to a I = a+2x highlights the way in which the differential entropy formula recovers ordinary entanglement entropy.It assembles it from successive pieces, which incorporate the entanglement at incrementally larger scales.
According to the rules of constrained merging, in the geodesic case it is permissible for Alice to send the interval I to Bob all at once, as described in Sec.2.1.She is not required to, however.Instead, she could use the incremental procedure described for general curves in Sec.2.2 and the cost would be the same.In the incremental procedure, she starts out by sending Bob the most ultraviolet data on I -the state of the smallest sensible interval tucked at the left endpoint of I, where "smallest" means roughly comparable to the UV cutoff in the CFT, in a sense that will be made precise in Sec. 4. In the next steps, she will send data necessary to recover the state on successively larger intervals.If Bob has the state on I(x − dx), the number of ebits necessary for him to recover the state on , which is the integrand in (8).The vanishing of the integrand in (4) for 0 ≤ x ≤ a/2 expresses the fact that once Bob has the state on I(x = 0) = I, there is nothing more to learn and thenceforth the cost is zero.

Minimality of geodesics: the most efficient merging protocol
This perspective is also helpful for interpreting the length of the nongeodesic curve of Fig. 2. Because a J (x) = a I (x) for x < x L , as shown in the righthand side of the figure, the incremental protocol for the nongeodesic curve will begin in exactly the same way as for the geodesic, sending the most UV information near the point −a/2 first and then the data required to reconstruct the state of successively longer intervals (−a/2, x).Once the interval reaches I(x L ), however, the protocols diverge.In the nongeodesic case, Alice and Bob are constrained to act over shorter distances than in the geodesic case so they cannot access the IR information as efficiently.The result is an increased entanglement cost in the merging protocol, which matches the difference in length between the two curves.
This gives an information theoretic interpretation of the definition of the geodesic as the shortest path between two points.Any other curve with the same endpoints as g I will select a different a J (x), which corresponds to a constrained merging protocol.Any restriction imposed on the merging protocol can only increase the cost of communication.Consequently, any other path connecting the same endpoints on the boundary is longer than a geodesic.
To see this algebraically, we first need to obtain an analogue of (9) for the nongeodesic case: With the formula in hand, subtract (9) from (10).After changing variables to r I (x) = x + a I (x)/2 and likewise for J and then relabelling them both r, we obtain: where āI (r I (x)) = a I (x) and likewise for J. Reading off āI,J (r) is illustrated in Fig. 3; note that āI (r) = r+a/2.The inequality follows from the concavity of entropy since āJ (r) ≤ āI (r).To make this argument, we have implicitly assumed that the incremental merging protocol described here -one whose entanglement cost is S diff -is the most efficient one possible given the constraints.We have sketched how to achieve the cost S diff , but we have not yet proved its optimality.Doing so is the purpose of Sec. 3. Assuming the result, however, we find a remarkable new addition to the holographic dictionary.The Ryu-Takayanagi formula states that the entropy of a boundary interval I is the length of the shortest bulk curve starting and ending at the endpoints of I. 6 Entropy can be interpreted as the minimal number of qubits required to compress a state [41] and, therefore, the minimal entanglement cost required to teleport it.We have demonstrated that those two minimizations, over bulk curves and boundary teleportation procedures, are effectively equivalent.Non-minimal length convex curves define constrained boundary state merging tasks whose optimal entanglement costs are the lengths of the curves themselves.

Orientation reversal and negative length -the "cost" of purifying a state
In fact, formula (4) computes the signed length of an oriented curve.A detailed explanation of this can be found in [25]; see also [48].The orientation of the curve is natural from the viewpoint of the merging protocol: it is decided by the direction of the flow of information.
To understand this, return to the geodesic g I drawn in Fig. 1.In Sec.2.3 we considered a stepwise merging protocol, in which Bob constructs the state on an interval I of size a from successive pieces received from Alice.Bob starts with the UV sector and builds up to the scale a.But we can consider the opposite situation, in which Bob initially holds the state on I c , the complement7 of I. Now Alice will send Bob the information about the purifier system I.She will again do so piecewise, this time starting from the data about the largest scale a that is inaccessible to Bob and zooming down to the UV.
In the unconstrained merging protocol, Bob will use his full knowledge of the previously received state to merge each incoming chunk of information.To track his progress, we can use the intervals I(x) from Fig. 1a, except now starting from x = 0 up to x = a/2.This is convenient for comparison with Sec.2.3, because it corresponds to complementing Bob's state from the left endpoint of the interval I at x = −a/2 to the right endpoint at x = a/2, i.e. in the direction of increasing x. (Of course, Alice could send Bob data about the interval I starting from the right endpoint, in which case the I(x) with −a/2 ≤ x ≤ 0 would be natural.)Overall, prior to step x Bob holds the state on I(x − dx) c while after step x he knows the state on I(x) c .The cost of this step in the merging protocol is where we assume that the global state is pure.Adding up the cost of all steps, we get: The "cost" of purifying a given mixed state is apparently negative!What this means is that rather than requiring an investment of Bell pairs, state merging produces Bell pairs as a side effect [39].
For readers unfamiliar with state merging, another simple (in fact, trivial) example may again be helpful.Suppose that Alice and Bob share the state |ψ AB = 1 √ 2 (|00 AB + |11 AB ) and that Alice wishes to merge A to Bob, who holds B. The objective is to prepare a state in Bob's laboratory identical to |ψ AB .The key point in this case, as compared to the merging examples in Sec.2.2, is the absence of a third system R.In general, the Alice-Bob merging procedure is required to maintain all the correlations with outside systems like R. In the absence of such systems, there are no correlations to preserve.Therefore, a perfectly good merging protocol is for Bob to prepare the state |ψ = 1 √ 2 (|00 A B + |11 A B ) in his own laboratory, without any help from Alice.The initial state |ψ AB is left untouched so at the end of the merging protocol Alice and Bob share 1 Bell pair.This is exactly as it should be: for |ψ AB , we have S(A|B) = −1 so instead of consuming a Bell pair, Alice and Bob return one.In more realistic situations, the state initially shared between Alice and Bob will be mixed.In that case, negative cost merging amounts to entanglement distillation: the extraction of good Bell pairs from noisy entanglement using only local operations and classical communication [49,50].
Can (13) be interpreted as a differential entropy?Above we switched the roles of I(x) and I(x) c .If we substitute the complementary intervals in the definition (4), we obtain: (14) We take this to be the definition of the differential entropy under reversal of orientation [25,44,48].This extension is sensible and necessary.Recall that the intervals I(x) are defined by the requirement that the geodesics subtending them are tangent to the bulk curve.But this definition is ambiguous: if a family I(x) satisfies it, so does I(x) c .This ambiguity is fixed by a choice of orientation on the curve.Augmenting the definition by ( 14) makes it covariant under orientation reversal, whose boundary counterpart is to take the complement of each set I(x).
Does this amendment make the definition ambiguous?Yes, but only up to a sign.Given a curve R = R(x) that subtends a boundary interval I, select a family of intervals I(x).We can now compute the length of the curve using formula (4) or using formula (14).One computes the number of Bell pairs required by Bob to learn the state on I starting from nothing while the other computes the number of Bell pairs that can be extracted as Bob purifies his initial state on I c .They always give opposite answers, because for the family of intervals Ĩ(x) = I(x) ∪ I(x − dx), which is equivalent to I(x) when dx → 0.

Closed curves: constrained state swapping
The preceding subsections consider the length of a convex curve with endpoints on the boundary.Its information theoretic interpretation involves sending the quantum state on the boundary interval lying between the curve's endpoints, subject to constraints that specify the shape of the curve.We now give a similar interpretation of the length of a closed, convex curve in AdS 3 .This introduces several important differences.
First, we can only speak of closed convex curves in global AdS 3 and not on the Poincaré patch.The dual field theory now lives on a circle instead of a line.Our curves will be given by R = R( θ) in coordinates instead of R(x) in coordinates (2).Coordinate θ is an angular coordinate with period 2π.
As before, we distinguish the bulk coordinate θ from θ, which we reserve for the asymptotic boundary.Following our earlier prescription, every point θ on R = R( θ) determines an interval J(θ) with center at θ and width a J (θ), such that the geodesic subtending it is tangent to the curve.Once more, we caution that θ depends on θ, but is not equal to it.The construction is illustrated in Fig. 4a.
The second difference is that in contrast to the curves asymptoting to the boundary that we considered before, a closed bulk curve does not select a boundary interval.In consequence, the length of a closed curve does not compute the cost of sending the state on any one interval.Instead, we now consider Alice and Bob, who control complementary intervals on the CFT and wish to swap them.As before, the intervals J(θ) define a set of locality constraints, which limit the type of operations Alice and Bob can perform.The length of the closed curve is the total cost in Bell pairs for Alice and Bob to swap their states completely.
Our protocol will be carried out in discrete steps.To index them, we define Figure 4: a) The intervals J(θ) defined by a closed, convex curve in global AdS 3 .b) The intervals (17)(18) initially held by Alice and Bob.We have indicated J(θ N ) and J(θ 2N ) and the bulk axis, which joins their centers.
for j = 1, . . ., 2N .Initially, let Alice and Bob control the states on: Bob : A simple calculation in set arithmetic confirms that these intervals are complementary provided that J(θ N ) ∩ J(θ 2N ) = ∅.In pure AdS 3 , this and the more general condition J(θ j ) ∩ J(θ j+N ) = ∅ follows directly from the concavity of the bulk curve.As a consequence, note that Alice controls all of J(θ N ) and Bob controls all of J(θ 2N ).The intervals (17)(18) are illustrated in Fig. 4b.
Each discrete step consists of two parts.First, Alice sends to Bob the state on an infinitesimal piece on one end of her interval, i.e.A 1 = J(θ 1 ) − J(θ 2N ).Then Bob sends to Alice the state on an infinitesimal interval on the other end, A N +1 = J(θ N +1 ) − J(θ N ).Both state transfers happen via constrained state merging.This means that Bob (Alice) can only use the operations on to merge the quantum state on A 1 (respectively A N +1 ).For the Bob→Alice transfer, we assume that J(θ 1 ) ∩ J(θ N ) = ∅, so that Alice can utilize all operations on J(θ N ) to decode the message from Bob.At sufficiently large N this assumption is true for every convex curve of finite size.At the end of this first step, the states controled by Alice and Bob are: Bob : We have used the periodicity in θ to rewrite θ 1 = θ 2N +1 in (20).Comparing with (17)(18), we see that these intervals are of the same form as before, except the indices that set the interfaces between Alice and Bob have shifted by 1.Since both parts of the first step were constrained state merging, Alice and Bob have paid an entanglement cost equal to To effect a full swap, we must shift the index of the interface by N , so we repeat the steps outlined above N times.The total cost is the differential entropy 2N j=1 S(A j |B j ).
3 Optimality: Minimization over all possible constrained merging strategies

The need for an optimality proof
The goal of this article is to give an information theoretic interpretation of the length of a convex spacelike curve.So far, what we have demonstrated is a boundary merging procedure meeting a set of locality and scale constraints whose entanglement cost is the length of the curve.Who is to say, however, that we should pay any attention to the cost of that specific procedure?If we are to claim that the length of the curve is the cost of merging Alice's interval to Bob subject to the constraints, we need to be sure that there is not some other way of achieving the same goal but at a reduced entanglement cost.This is a crucial point.Consider the special case of the interpretation of the entropy S(I) as the logarithm of the effective Hilbert space dimension, that is, the number of qubits required to compress I. 8 There are two halves to the interpretation.First, that there exists a subspace of dimension 2 S(I)+subleading containing nearly all the support of the density operator and, second, that no significantly smaller subspace can do so.Indeed, if there were a subspace of dimension 2 S(I)/2 containing nearly all the support of the density operator, the effective Hilbert space dimension would obviously be at most 2 S(I)/2 , not 2 S(I) .
In keeping with this credo, the purpose of this section is to complete the interpretation of the length of a curve by proving that no constrained merging procedure can have entanglement cost less than S diff .
Throughout this section we will be discussing the properties of general constrained merging protocols.In order not to create confusion, we will refer to the constrained merging protocol described in Sec.2.2 as the greedy constrained merging protocol because at each stage as much state as possible is merged from Alice to Bob.

Formal definition of constrained merging and statement of the theorem
First we need to formally define the permissible procedures.Write Q E for the Hilbert space (C 2 ) ⊗E and let H A ≡ H B ≡ ⊗ x∈Z H x , with dim H x < ∞ and only a finite number of x such that dim H x > 1 .Let I ⊆ Z be a finite interval and I(x) ⊆ I itself be an interval for each x, such that the left and right endpoints (x) and r(x) of the intervals are non-decreasing with in which the space Q A Ex ⊗ Q B Ex is initially prepared as a maximally entangled state, with Fx will be used to store entanglement distilled by the merging procedure in the event that the cost is negative.)LOCC stands for Local Operations and Classical Communication.The details of the definition of LOCC are a bit complicated [49] but for the purposes of the optimality proof, it suffices to know that any LOCC transformation of a density operator ρ AB will have the form AB .The index k can roughly be thought of as recording the outcomes of the measurements that were part of the procedure.LOCC maps obey the inequality S(ρ B ) ≥ k p k S(σ [49].That is, they cannot cause the entanglement entropy to increase on average.(They can, however, increase the entanglement for individual measurement outcomes k.) Given an initial state |ψ A ∈ H A , an E-ebit constrained merging protocol will produce an ensemble (p k , σ While this is an operationally sensible definition, it turns out that just requiring the merging error to be small is not quite enough to ensure the optimality of S diff .Instead we will impose a slightly stronger condition, namely that at each of the N steps of the protocol, those sites of A that have never been acted upon, combined with those sites of B that will never be acted upon again, are consistent with |ψ .This is a reasonable demand: at any given time the interval I can be divided into three sections: the completed section, a portion under construction, and an untouched section.We will require that the completed and untouched sections be properly correlated.Formalizing that notion requires some further notation.x .)After the step of the protocol acting on I(x), the interval (r(x), R] = {r(x) + 1, r(x) + 2, . . ., R} remains untouched in both A and B. Similarly, the definition of constrained merging implies that none of the remaining steps x + 1, . . ., N will act on the [L, (x + 1)) subsystem of B. See Fig. 5 for a visual depiction of these assertions.
Write (p k,x , σ (k,x) AB ) k for the ensemble of states produced after completion of the x step of the protocol.Given the notational complexity, let us begin with the special case in which the protocol never generates entanglement so that F x = 1 for all x.An E-ebit constrained merging protocol is then said to have sequential merging error if for all initial states |ψ ∈ H A and x ∈ I, x + 1 The definition is quite subtle.Intuitively, it enforces the requirement that long-range entanglement in |ψ be transferred from Alice to Bob rather than just manufactured entirely in Bob's laboratory, as illustrated in Fig. 6.The analogous conditions for [L, (x + 1)) B and (r(x), R] A alone are in fact already consequences of the weaker definition (23).Small sequential merging error imposes the additional requirement that the joint density operator of [L, (x + 1)) B and (r(x), R] A , along with I c , have the proper form.This ensures that the correlations Bob arranges in his lab are with the (r(x), R] A sitting in Alice's lab, not some new state he will manufacture himself later on in the protocol.
Requiring (24) still leaves a great deal of freedom.In the case of a geodesic, for example, it is flexible enough to be consistent with both the all-at-once strategy of Sec.2.1 and the greedy merging scale-by-scale strategy of Sec.2.3.In addition, after step x, the sites [ (x + 1), r(x)] B are permitted be in arbitrarily messy intermediate states very different from the good approximations to ψ produced at that stage by either of those merging protocols.
However, we will demonstrate that there is no advantage to be gained from all this freedom: greedy constrained merging is optimal.
The definition in the case in which the protocol can generate entanglement requires replacing (24) with  R] and a set of merging constraints in the form of intervals I(x) depicted as green ovals.The state to be merged is a maximally entangled state shared between L and R, as illustrated in the second line.One way to merge this state is for Bob to prepare a maximally entangled state between L and L + 1 in the first sequential merging step and then, in the subsequent steps, simply swap the information to the right until a maximally entangled state is established between L and R.This procedure doesn't require any Alice-Bob entanglement and yet produces longrange entanglement in Bob's lab.It is prohibited by the sequential merging error condition because, while the final state is correct, the intermediate state fails the sequential merging condition.The joint state of Bob's L and Alice's R is not correct at the intermediate merging steps after L has been completed and before R has been reached; they should be maximally entangled but instead they are product.portion of each of those sites must be left with high fidelity maximally entangled states of Schmidt rank F x .Theorem 3.1 For any E-ebit constrained merging protocol with sequential merging error < 1/4, the following inequality holds for every initial state |ψ : where f ( ) vanishes as → 0.

Proof
To prove the theorem, begin by fixing an initial state |ψ and an arbitrary E-ebit constrained merging protocol with sequential merging error .Again to keep notation relatively simple, we will assume that the protocol never generates any entanglement (F x = 1 for all x), the general case being a straightforward if cumbersome modification.We will write S(J) B x for the entropy averaged over reduced states on subsystem J of B that are produced after the step of the constrained merging protocol acting on interval I(x).
There is an entanglement gain S(I(x)) B x − S(I(x)) B x−1 for the I(x) step of the protocol, between the I(x) portion of B and its purification.Since the average entanglement entropy cannot increase under LOCC, we must have and, therefore, where we have set S(J) B 0 = 0 to reflect that the initial B state is a pure product state and have defined I(N + 1) = ∅.As discussed above, after step x, the protocol has not yet acted on (r(x), R] so S(I(x + 1)) B x = S([ (x + 1), r(x)]) B x .That allows us to write where, as usual, S(J|K) B x ≡ S(JK) B x − S(K) B x .By the sequential merging condition, k p k,x kx ≤ , where Since these two density operators are close to each other, their purifications are related by isometric Hilbert space transformations.The Hilbert space purifying the ψ mixed state is obviously H A [ (x+1),r(x)] .The Hilbert space purifying the σ (k,x) mixed state, on the other hand, is . But outside of [L, r(x)], the state σ (k,x) is just |00 • • • 0 so the purification can be taken to be a state of . Using Uhlmann's theorem [51] and standard inequalities relating the trace distance and fidelity [52], we conclude that there exists an an isometry taking The function f ( k,x ) also vanishes with k,x .The first inequality holds thanks to the existence of the isometry relating the two systems being conditioned upon and the Alicki-Fannes conditional entropy continuity inequality [53], while the second inequality is an application of strong subadditivity.
The function f is concave and, for < 1/4 also monotone, so Averaging (32) over k then summing over x finally gives x∈I S [ (x), (x + 1)) [ (x + 1), r(x)] which completes the proof since we saw earlier that the first line was a lower bound on the entanglement cost E.
A few remarks are in order.The careful reader will have noticed that the theorem as stated only applies to systems with finite dimensional constituent Hilbert spaces H x .That is manifestly not true for the Hilbert space of the CFT, even after imposing the UV cut-off µ.Imposing the cut-off does, however, ensure that each of the entropies S(x) is finite.In the limit of large central charge, therefore, we could compress the initial state |ψ on each of the lattice sites x ∈ I, disturbing the state by a total amount .The compressed state would sit inside a Hilbert space satisfying the hypotheses of the theorem.Moreover, any constrained merging protocol with error for the original state would have error at most + for the compressed state by the triangle inequality.Therefore, S diff is indeed a lower bound on the entanglement cost for constrained merging in the CFT.

Single-shot versus von Neumann entropies
So as not to complicate the presentation in Sec. 2, we ignored two important issues.First, in the preceding discussion the conditional von Neumann entropy was identified as the entanglement cost in each of the N merging steps of the greedy constrained merging protocol, but the cost in the single-shot setting appropriate for us here is in fact the smooth conditional max-entropy H max (A j |B j ) [54].H max asymptotes to the von Neumann entropy S(A|B) in the limit of many copies of a state, but our procedure is intended to act on a single copy of the CFT state, so we must work with the max-entropy.Second, other than in extremely simple cases, achieving the optimal merging cost requires allowing small imperfections in the final state.Tracking the accumulation of those imperfections through the multistep protocol will be important.
The definition of the smooth conditional max-entropy is somewhat complicated [54,55] and will not actually be necessary.We will only need the following two facts:9  min (B) + const [55].
In light of these results, to get an upper bound on the entanglement cost of merging an interval of the CFT it suffices to bound the smooth H max (AB) from above and the smooth H min (B) from below.The unconditioned min-and max-entropies are much simpler to define and work with.
The smooth min-and max-entropies are defined as follows.Given a state ρ, consider the set B(ρ, ) of all σ with tr(σ) ≤ 1, σ ≥ 0, and σ − ρ 1 < .This set is relevant because we want to determine the optimal resource cost, but without the unrealistic assumption that the state ρ is perfectly transmitted.The smooth max-entropy H max is then defined as where the minimum is taken over B(ρ, ).In essence H max instructs us to truncate ρ to its largest eigenvalues of total weight 1 − .In other words, we throw away the smallest eigenvalues of ρ up to weight , but then we must transmit the remaining state in its entirety since this is a single-shot protocol.Once more, we are allowed to ignore very rare events, but once these events have been cut out of ρ, all the remaining states must be sent to guarantee that the protocol succeeds in a single shot.
The smooth min-entropy H min is defined as where λ max (σ) is the largest eigenvalue of σ.For the smooth min-entropy we are doing a similar kind of truncation as with the smooth max-entropy, except that now we are truncating the largest eigenvalues of ρ up to weight .The smooth min-entropy is then the negative logarithm of the new largest eigenvalue after the truncation, which is a measure of one's ability to guess the quantum state correctly.As noted above, it plays an important role in bounding the single-shot state merging cost.
If the state ρ were an equal weight mixture of M pure states, it would immediately follow that both H max and H min are within of the von Neumann entropy log(M ).In this case the single-shot cost is the same as the asymptotic cost (as it should be).What we now show is that, to leading order in the central charge c, the smooth min-and max-entropies for an interval in a CFT are also given by the interval's von Neumann entropy.Hence for CFT intervals (with some care taken about the errors) we find that the single-shot cost approximately reproduces the asymptotic cost.
To compute these smooth entropies, we use a formula found by Calabrese and Lefevre for the eigenvalue distribution of an interval's reduced density operator in either a vacuum or thermal state of a 1+1 dimensional CFT [56]: where b = H min (I) = − log(λ max ) and I 1 is a modified Bessel function of the first kind.This formula is derived starting from the fact that the Rényi entropies of an interval I, S α (I) = 1 1−α log(tr(ρ I )), are given by where S(I) is the von Neumann entropy.The limit α → ∞ gives S ∞ = H min , so b = S(I)/2.Furthermore, conformal invariance fixes S(I) in the ground state to be where a is the size of the interval I. Finally, it should be noted that for a given regulator, the form (39) will be corrected due to irrelevant operators, but for our purposes the important feature is that these corrections are expected to be subleading in c, the central charge.
Given that b ∝ c, the large central charge limit corresponds to the limit of large b.Investigation of the limit b → ∞ reveals the distribution of eigenvalues of ρ I to be wellapproximated by a tightly peaked Gaussian after a suitable change of variables, yielding These estimates follow from a straightforward change of variables.Setting y(λ) = 2 b log(λ max /λ) ≥ 0, the distribution of eigenvalues becomes P (y) ≡ P (λ(y)) where we have neglected the δ function at λ max (dropping one eigenvalue makes no difference at large c).Large b corresponds to large y where I 1 is well approximated by the form I 1 (y) ≈ e y / √ 2πy.Thus, the distribution of eigenvalues is roughly exponential in the y variable.The leading correction to the asymptotic behavior is given by which is only a minor power-law correction.
The normalization i λ i translates to the statement that with λ(y) = λ max e − y 2 4b = λ max e − y 2 2S .The object P (y)λ(y) at large y is well approximated by where in the last step we replaced √ y with √ S (its central value) in the denominator, a mild simplifying approximation.The final form of P (y)λ(y) is thus a normalized Gaussian with polynomial in (y − S) corrections (arising from subleading terms in I 1 and the expansion of y −1/2 about y = S): Once the tightly peaked nature of the Gaussian is taken into account, all subleading corrections are O(S −1/2 ) or smaller.
To compute the smooth entropies with generic smoothing parameter , the probability distribution P (y)λ(y) must be truncated to weight 1 − .Define y min,max by the equations: In (48) we assumed, up to corrections of order e −S , that y runs over the whole real line.It follows from symmetry that y min = S − δy and y max = S + δy, where The corrections may be bounded with elementary properties of the error function.With erf(x) defined as it follows that: The asymptotic form of the error function is so to leading order An upper bound on δy is obtained by neglecting the denominator on the LHS, which gives the estimate: The smooth max entropy is then the logarithm of the rank of the truncated state (the number of eigenvalues), yielding Similarly, the smooth entropy is the logarithm of the largest remaining eigenvalue after truncating the largest eigenvalues, so Pulling out a factor of S, the smooth entropies take the claimed form (41).
We now have the tools to bound the parameters required to achieve a cumulative error of δ in an N -step constrained merging protocol.Each step should contribute an error of at most δ/N , which means that = poly(δ/N ) in (41) thanks to (35).The error term in ( 41) is then where a I is the length of the entire interval I. N must be allowed to go to infinity but not so quickly as to invalidate the Rényi entropy formula (39).Choosing N = (a I /µ) γ for 0 < γ < 1 is sufficient, in which case taking the limit of vanishing UV cut-off µ implies that the cost of each merging step is bounded above by the conditional von Neumann entropy, with a multiplicative correction of order O(1/ √ c).The total entanglement cost of the greedy constrained merging procedure is therefore precisely the differential entropy, up to the same O(1/ √ c) corrections.

Differential entropy and Markov chains
From the point of view of the boundary field theory, the differential entropy is an entropic function of a collection of reduced density matrices.In the first part of this work we provided an information theoretic interpretation for differential entropy in terms of the entanglement cost of a restricted communication task.However, there may be alternative interpretations in terms of the entanglement entropy of a reconstructed global state.In fact, it has been conjectured that the differential entropy is the maximum entropy among all global states consistent with the marginals [38].(In the reconstructability literature, the reduced density matrices are referred to as marginals, a nomenclature we follow in this section.)Arguments against this conjecture were first given in [57].Recently, Kim and Swingle showed that this conjecture is false by arguing that it does not apply to a global pure state or subsystems with local modular Hamiltonians [58].While their argument disproves the conjecture, there remains an intriguing connection between differential entropy and the problem of reconstructing the global state.Indeed, the greedy constrained merging protocol is nothing but an operational way of reconstructing the global state by assembling its marginals.We believe that density matrix reconstructability deserves further investigation, as it may be relevant for thermal states or highly excited energy eigenstates [58].Hence, we devote this section to the study of reconstructability and its intimate connection with quantum Markov chains.

Reconstructability and Markov chains
Given a set of marginals, there can be many global states consistent with that local data because local data need not fix long-range correlations.In fact, it is unlikely that the set of all consistent global states can even be characterized efficiently [59]. 10Nevertheless, all consistent global states should satisfy certain entropic inequalities.In particular, the entropy of a consistent global state cannot be arbitrarily large.In the absence of further information, the best guess for the global state is the state that maximizes global entropy; we denote this state by ρ max .
Suppose we are given a set of local density matrices {ρ I j } for the intervals {I j }.If the I j do not overlap, then the subadditivity of entropy requires that for any global state ρ consistent with the marginals.The product state ⊗ j ρ I j saturates the above inequality and is therefore equal to ρ max .(58) also suggests a strategy for moving beyond this simple situation.The quantity j S(ρ I j ) − S(ρ), which quantifies "unnecessary" correlations in ρ, is nothing other than the relative entropy S(ρ ρ max ) provided ρ is consistent.This relative entropy measure of deviation from the maximum entropy consistent state generalizes naturally to the case of overlapping intervals.
Reconstructing the global state from local data is much more interesting when regions overlap.The density matrix of two overlapping regions A and B always satisfies the strong subaddivity of entropy: where In analogy with the case of nonoverlapping intervals in which ρ max was the state saturating (58), in the general case it would be natural to expect ρ max to saturate (59).Any consistent state saturating strong subadditivity is uniquely determined from the local data according to the prescription (Equality in ( 60) is in fact a necessary and sufficient condition for the saturation of strong subadditivity [63].)This equation gives a prescription for writing down ρ A∪B that will always yield a positive semidefinite Hermitian operator.In general, however, ρ A∪B need not have unit trace.When it does not, ρ max will necessarily fail to saturate strong subadditivity.
The neighboring regions in the formula for differential entropy, I(x) and I(x − dx), have significant overlap.In this case, the conditional mutual information computed for a 2D CFT is negligible: Therefore, one might hope that the maximum entropy consistent global state could be found by a repeated application of (60) using the local data.Assuming the saturation of strong subadditivity for all neighboring regions and iterating (60) over all j defines an operator σ satisfying which we call the Markov operator.In those cases when σ is normalized, it defines a state consistent with the marginals whose only long-range correlations are purely Markovian.
A key observation is that if the global state is a pure state |Ψ , then the formula for the differential entropy in ( 5) is nothing but the relative entropy of |Ψ with respect to σ: as can be verified by a simple calculation.If σ were properly normalized, this would have constituted a proof for the conjecture that the differential entropy is the entropy of the maximum entropy consistent state.
In appendix D we demonstrate that on a line (without periodic boundary conditions) tr(σ) ≤ 1.This inequality is saturated in the special case where all ρ I j and ρ I j ∩I j−1 commute.Therefore, the Markov operator σ in this case is the consistent global state with maximum entropy, and its von Neumann entropy is the differential entropy.

Markov operator in conformal field theory
In this section, we show explicitly that the Markov operator σ corresponding to marginals of size R in the vacuum of a 1+1-dimensional conformal field theory does not have unit trace.We thereby give a quantitative refutation of the conjecture that the entropy of the maximum entropy consistent state is S diff .We will show that the maximum entropy state consistent with marginals of size R in the vacuum has entropy at most 2/3 of S diff .
The reduced density matrix of a region A of size 2R in a 1+1-dimensional conformal field theory on a line is [64] As we saw in the previous section, the sufficient condition for differential entropy to be the entropy of a consistent state is that reduced density matrices on different intervals commute.Naively, it appears that reduced density matrices should commute because they are functions of only one operator, T 00 .However, it is well known that in field theory equaltime commutators of symmetry currents need not be zero [65].In particular, in relativistic field theories the so-called Schwinger term quantifies the amount by which stress tensors fail to commute: According to (62), knowledge of the reduced density matrices in ( 64) is sufficient to directly compute the Markov operator.In Appendix B we find σ for a collection of marginals of size R in a CFT on a line to be which is proportional to the thermal state at temperature T = 3 4πR .The identity S diff = S(|Ω Ω| σ) fixes the value of C when the global state is the vacuum |Ω : Therefore, where ρ T and Z T are the thermal state and partition function at temperature T = 3 4πR , respectively.In this case, Z T = e S thermal /2 = e cL/(8R) , and S diff = cL 8R .
Consider ρ max , the maximum entropy consistent density matrix for the set of marginals of size R.The relative entropy of this state with respect to σ is by direct substitution of the definition of σ into the formula for the relative entropy.However, as we have seen, σ = ρ T e − cL 24R .Using the nonnegativity of relative entropy then leads to Therefore, there exists no consistent density matrix with an entropy that is near the differential entropy; the largest entropy among all of them is at most 2/3 of S diff .
As a final remark, it is worth noting that the thermal density matrix at temperature T = 3 4πR is not consistent with vacuum at scales smaller than 2R.In fact, the thermal state at temperature T remains distinguishable from the vacuum in the large central charge limit since S(ρ T Ω Ω|) = O(c).

Generalization to more curves and surfaces
Our discussion of the constrained merging protocol was given mostly for a convex curve with endpoints on the boundary on a static slice of pure AdS 3 , which is dual to the ground state of a 1+1-dimensional CFT.In fact, our interpretation of the length of a convex curve works for a broader class of curves and even extends to areas of some surfaces in higher dimensions.
A key geometric fact underlying the scope of our results is that the area of any spacelike convex surface in a holographic spacetime can be written in the form for some family of sets I j , where as before A j = I j − I j−1 and B j = I j ∩ I j−1 [45,48].This includes lengths of curves in asymptotically AdS 3 spacetimes which do not lie on a constant time slice [48].The areas on the right hand side of (71) are of extremal surfaces which asymptote to A j ∪ B j and B j .So long as these areas compute appropriate boundary entanglement entropies, the total area on the left hand side becomes a sum of conditional entropies: For example, Fig. 7 depicts two surfaces on a static slice of the Poincaré patch of AdS 4 .Their areas compute the entanglement costs of the constrained merging and constrained swapping protocols discussed in Sec. 2.
However, applying our results in more general settings is subject to a number of caveats, which we discuss below.Extremal but nonminimal surfaces The surfaces appearing on the right hand side of ( 71) are guaranteed to be extremal, but not necessarily minimal.In pure AdS spacetimes and some excited geometries [66,67] this distinction does not play a role.In generic holographic spacetimes, however, more than one extremal surface may be anchored on the same boundary region [68][69][70] and only one of them computes the entanglement entropy of the said region.In discussions of the Ryu-Takayanagi proposal, the non-uniqueness of the extremal surfaces came to focus with the introduction of the homology constraint, without which the proposal gives incorrect answers for entanglement entropies of subregions in the thermal state [68].Another easy example of an extremal but non-minimal surface is a geodesic in the BTZ spacetime, which wraps around the black hole multiple times in a spacelike analogue of gravitational lensing.
In order to interpret the area of a surface written in the form (71) as the cost of a merging protocol (72), all surfaces on the right hand side must be minimal.It is difficult to give a general characterization of bulk surfaces for which this is true; for a discussion of this consult [45].Here we content ourselves with some qualitative rules of thumb.The right hand side of (71) typically involves extremal but nonminimal surfaces (a) if the surface approaches a horizon, (b) if its extrinsic curvature is anywhere large, and / or (c) if it is locally approximately radial.In addition, closed convex curves in higher dimensions always involve nonminimal surfaces in (71).
It would be interesting to understand (71) in information theoretic terms also in cases, where nonminimal surfaces make an appearance.A possible starting point was given in [46], which studied the field theory meaning of the lengths of nonminimal surfaces in the simplest setting: the conical defect geometry AdS 3 /Z n .The field theory state dual to this geometry is an excited state, so the level spacing in its neighborhood is reduced relative to the vacuum.Converting the level spacing into a length scale, we obtain a scale larger than system size -a dynamical scale, which cannot be spatially realized in a single copy of the system.In a certain technical sense, [46] associated nonminimal geodesics with the physics of such extended, dynamical scales.Now recall that in Sec.2.3 we used differential entropy to decompose entanglement -that is, a minimal surface -into scale-specific components.This suggests that it may be possible to interpret (71) in information theoretic terms in the general case, perhaps in terms of a communication task where Alice transmits to Bob data about all scales in the theory, including scales larger than system size, which are not captured by the entanglement entropy of any subregion.
Overlapping A j 's In the constrained merging protocol Alice sends data about successive regions A j to Bob.This makes sense only if the sets A j are disjoint.In more than 2+1 bulk dimensions this is not guaranteed.The large freedom in choosing shapes of boundary regions B j and A j makes it possible to construct an example, where distinct A j 's overlap even though the bulk surface is convex.
Non-convex regions All our results pertain to convex surfaces.As explained in [25], the differential entropy formula also computes lengths of nonconvex curves, but as a difference of two terms: integral (4) taken over the segments where the curve is convex minus integral (14) taken over concave segments.Integral (14) showed up in Sec.2.5, where we discussed orientation reversal and the negative "cost" of purifying an initial mixed state.This suggests that an information theoretic interpretation of the length of a nonconvex curve might involve a flow of information in both directions -from Alice to Bob and from Bob to Alice.However, we have not yet succeeded in finding a well motivated quantum communication task whose cost would be precisely the length of a nonconvex curve.

Technical proofs
To prove that the one-shot and von Neumann entropies coincide in the large c limit, we used the eigenvalue distribution of the reduced density matrix of an interval given in [56].This eigenvalue distribution applies to an interval in the vacuum or in the thermal state of a 1+1-dimensional CFT.Consequently, the proof in Sec. 4 is valid only for curves in pure AdS 3 or in the BTZ spacetime.It would be surprising, however, if analogous results did not hold in higher dimensions.
The optimality proof of Sec. 3 applies to the constrained merging protocol.We have not proven the optimality of the constrained swapping protocol, which is relevant to closed convex curves in AdS 3 and to higher-dimensional surfaces like that in Fig. 7b.

Summary of results
In this paper we have studied holographic theories of gravity, an important class of gravitational models which enjoy an equivalent description as field theories.In the last years, it has become increasingly clear that in these models the geometric structure of spacetime is intimately related to quantum information theory.In order to clarify this relation, we have given an explicit, information theoretic interpretation of one of the most basic geometric quantities in spacetime: the length of a convex curve.This interpretation involves a certain communication task in the dual field theory, whose details are encoded in the shape of the curve.Our discussion was set in the context of pure AdS 3 , which is dual to the vacuum of a 1+1-dimensional conformal field theory.Our findings generalize to varying degrees as discussed in Sec. 6.
The specific results are: Sec.2: The exhibition of a protocol for merging the state of a boundary interval from Alice to Bob at an entanglement cost equal, to leading order in the CFT central charge, to the length of a bulk curve starting and ending at the endpoints of the interval.In each step of the protocol, Alice and Bob act only in subintervals of the boundary determined by the geometry of the bulk curve.These constraints provide a precise operational implementation of the UV-IR relation.
Sec. 3: A proof that, subject to appropriate locality constraints on Alice and Bob's actions, the entanglement cost is optimal: no procedure meeting the locality constraints can use less entanglement.The minimal constrained merging cost is, therefore, the length of the curve.Together, the protocol and optimality proof add a new entry to the holographic dictionary: convex bulk curves are in one-to-one correspondence with constrained boundary merging tasks whose optimal costs are the lengths of the curves themselves.From the information theory point of view, the optimality proof characterizes the rates achievable in "streaming" state merging protocols [71].
Sec. 4: A demonstration that the smooth conditional min-entropy in a 1+1 dimensional CFT with large central charge c is well approximated by the conditional von Neumann entropy.As a consequence, the error terms by which the length of the curve and the entanglement cost of the communication protocol differ, vanish in the limit of large c.
Sec. 2.6: An analogous protocol for closed bulk curves, in which case Alice and Bob are required to swap their boundary intervals.There is at the moment, however, no matching optimality proof.
Sec. 5: A quantitative refutation of the conjecture that the length of the bulk curve is the maximum entropy among all boundary states matching certain consistency criteria.This demonstration complements an earlier refutation [58].Detailed CFT calculations combined with the structure theory of quantum Markov chains reveals that the entropy of the maximally entropic consistent state is at most 2/3 of the differential entropy.
While these results establish a clear information theoretic interpretation for the length of a curve in AdS 3 , they leave open a number of questions.We have only worked to first order in the central charge, which we assumed was large.Quantum gravity effects typically enter as corrections to this leading order behavior, so it would be interesting to compare those corrections with more detailed calculations of the constrained merging cost to see if there is agreement.Likewise, the constrained merging interpretation proposed here depends on being able to arrange the boundary intervals in sequence, a requirement that breaks down for non-convex curves in AdS 3 and generically in higher dimensions.Therefore, finding an appropriate generalization of constrained state merging remains a challenge.Limitations aside, our interpretation provides a quantitative operational way of associating a bulk curve to a set of boundary degrees of freedom, helping to illuminate the meaning of holographic renormalization group flow.
Consider a lattice of N sites at x i = iL/N for some infrared cut-off L. Our starting point is the vacuum density matrix of a region of size 2R(x i ) centered at x i in a 1+1-dimensional conformal field theory [64]: The Markov operator σ is defined in (62) to be dx f R (x, x i )T 00 (x) − x i +R(x i ) x i −(R(x i +δ)−δ) dx f R(x i ,x i+1 ) (x, x i,i+1 )T 00 (x) + c, where δ = L/N , R i,i+1 = (R(x i )+R(x i +δ)−δ)/2 and x i,i+1 = x i +δ/2+(R(x i )−R(x i +δ))/2.
Expanding to the first order in δ gives x i −(R(x i +δ)−δ) dx f R (x, x i ) − f R(x,x i ) (x, x i,i+1 ) T 00 (x) + c = i πδ 2 x i +R(x i ) x i −(R(x i +δ)−δ) (1 − R (x i )) T 00 (x) + c. (75) Here, we have used the identities In the limit δ → 0 the sum over intervals becomes an integral over x i , and the expression for the Markov operator simplifies to where (1 − R (x i )).We find σ to be proportional to a thermal density matrix with local temperature β(x).For intervals of constant size R(x i ) = R the Markov operator is the thermal state at temperature 3 4πR : for some constant C.

C Smooth entropy conversions
The recent literature on single-shot entropies [54,55,73] usually defines smoothing with respect to the purified distance P (ρ, σ) instead of the trace distance T (ρ, σ) = ρ − σ 1 + |tr(ρ) − tr(σ)| that we have elected to use here (the extra term is required when considering non-normalized states).Thanks to the inequality [73] T Likewise, the merging bound (35) was originally stated in terms of − H 2 /13 min (A|R) [54].The virtue of the purified distance, however, is that it obeys a convenient relationship between min-and max-entropies: for a pure state |ϕ ABR , semidefinite programming duality can be used to show that − H 2 /13 min (A|R) = H 2 /13 max (A|B) [73].The latter is bounded above by H 4 /169 max (A|B) by (80).
In fact, the purified distance can be computed in our setting.Given two positive operators ρ and σ with at least one of them normalized, the purified distance P (ρ, σ) is defined in terms of the fidelity F (ρ, σ) as D Norm of the Markov operator Theorem D.1 Consider a global state ρ on a line and its marginals on a set of intervals I j that we denote by ρ I j .Assume that for all j, I j is to the right of I j−1 , that is ∀j : ∅ = I j ∩ I j−1 = I j ∩ (∪ j−1 i=1 I i ). (87) Then, If all ρ I j and ρ I j ∩I j−1 commute, then the inequality is saturated.
Consider the first three intervals I 1 , I 2 and I 3 .Since log ρ is a Hermitian and bounded operator for all intervals we have [74] tr σ 12 ≡ exp (log ρ I 1 + log ρ The inequality in (91) is an equality if all ρ I j and ρ I j,j−1 commute.On a circle there is an extra term in the exponent of σ with a negative sign.Unfortunately, we do not know how to generalize our argument to apply to this case.

Figure 1 :
Figure 1: a) Geodesic g I , which subtends a boundary interval I (black) and geodesics, which are tangent to g I on the boundary along with the corresponding boundary intervals I(x) (color).The dashed geodesics contribute zero to integral (4).b) a I (x), the linear size of the interval I(x) centered at x.

a I x 2 a J x 2 rFigure 3 :
Figure 3: The way to read off āI,J (r) from the plot of a I,J (x)/2.
x. Without loss of generality, let I = [1, N ] = {1, 2, . . ., N }.An E-ebit constrained merging protocol consists of an N step procedure.H B is initially prepared in the fixed product state |00 • • • 0 B .Write D(H) for the density operators on H. Step x consists of an A ↔ B LOCC transformation Let I = {L, L + 1, . . ., R} =: [L, R] so that L and R are the left and right endpoints of I. Likewise, let I(x) = [ (x), r(x)].(We will also have occasion to make use of abbreviations like [L, (x+1)) B to indicate the subsystem corresponding to ⊗ x∈[L, (x+1)) H B

Figure 5 :
Figure 5: Intermediate stage in a constrained merging protocol.The top row depicts the interval I = [L, R] with endpoints L and R. Step x of the protocol acts on interval I(x) = [ (x), r(x)], drawn in the second row.Because r(x) is non-decreasing with x, the sites marked by the orange bar have not yet been acted upon.The third row depicts the interval I(x + 1) = [ (x + 1), r(x + 1)].Once step x is complete, none of the sites indicated by the green bar will ever be acted upon again since (x) is non-decreasing with x.Therefore, after step x the reduced density operators corresponding to the green marked sites on B and the orange marked sites on A should approximate the reduced density operator of the target state |ψ .

Figure 6 :
Figure 6: Sequential merging error.The top line depicts the interval I = [L,R] and a set of merging constraints in the form of intervals I(x) depicted as green ovals.The state to be merged is a maximally entangled state shared between L and R, as illustrated in the second line.One way to merge this state is for Bob to prepare a maximally entangled state between L and L + 1 in the first sequential merging step and then, in the subsequent steps, simply swap the information to the right until a maximally entangled state is established between L and R.This procedure doesn't require any Alice-Bob entanglement and yet produces longrange entanglement in Bob's lab.It is prohibited by the sequential merging error condition because, while the final state is correct, the intermediate state fails the sequential merging condition.The joint state of Bob's L and Alice's R is not correct at the intermediate merging steps after L has been completed and before R has been reached; they should be maximally entangled but instead they are product.
) is saturated if and only if the quantum state decomposes into a quantum Markov chain A − B → A ∩ B → B−A [61,62].Informally, that means that there exists an incomplete projective measurement of A ∩ B leaving the state invariant but such that A − B and B − A factorize conditioned on the measurement outcome.In other words, all correlations between A − B and B − A are classical correlations mediated by A ∩ B. (See Appendix A for more precise statements.)

Figure 7
Figure 7: a) A surface in Poincaré AdS 4 built up from minimal surfaces anchored on a sequence of boundary circles.Its area computes the entanglement cost of a constrained merging protocol.b) A surface built up from a cycle of minimal surfaces.Its area computes the entanglement cost of a constrained swapping protocol.

1 .
[54]any > 0 and quantum state |ψ ABR , there exists a quantum state merging protocol ABR such that |ψ ψ|−ρ 1 ≤[54].(Infact, if the cost is negative then the final joint state of ABR together with the entanglement is -close to |ψ ψ| tensored with a perfect maximally entangled state.)2.H max (A|B) ≤ H the two forms is straightforward.Let H be the symbol for entropy smoothed with respect to P rather than T .From (79) and the definition of smoothing, we immediately findH√ max ≤ H max , H max ≥ H /2 max (A|B) ≤ H 2 max (AB) − H 2 /4 min (B) + const.